Intelligent Computing: Proceedings of the 2022 Computing Conference, Volume 3 (Lecture Notes in Networks and Systems, 508) 3031104668, 9783031104664

The book, “Intelligent Computing - Proceedings of the 2022 Computing Conference”, is a comprehensive collection of chapt

148 16 82MB

English Pages 916 [917] Year 2022

Table of contents :
Editor’s Preface
Contents
Wearable Internet of Things (IoT) Device Model Design Based on Low-Cost Healthcare Monitoring System for Current Covid-19 Disease
1 Introduction
2 State of the Art
2.1 IoT in the Context of Covid-19
2.2 WIoT and Covid-19
3 Design Requirements for Healthcare WIoT Devices
3.1 Disease Factors for WIoT Technology Design
3.2 Wearable Architecture
3.3 Oxygen Saturation Monitoring
3.4 Temperature Monitoring
3.5 WIoT Wireless Communications
4 Proposed WIoT Device Model
4.1 Hardware Layer
4.2 Software Layer
4.3 IoT Layer
5 Results of Prototype Implementation
6 Evaluation and Discussion of the Model
7 Conclusions and Future Work
References
A Matching Mechanism for Provision of Housing to the Marginalized
1 Introduction
1.1 Background
1.2 Literature Review
1.3 Our Contribution
2 Model
2.1 Problem Formulation
2.2 The Matching Mechanism
3 Analysis
3.1 Pareto Optimality
3.2 Strategy Proofness
4 Project RoomKey Revisited
4.1 The over Supply, Low-Income Demand Picture
4.2 Project Roomkey Housing Assignment
4.3 Pareto Optimality
4.4 Strategy Proofness
4.5 Locality Expansion
5 Conclusion
5.1 Discussion
5.2 Future Work
References
Speed Harmonisation Strategy for Human-Driven and Autonomous Vehicles Co-existence
1 Introduction
2 Review of the State of the Art
3 Methodology
3.1 Vehicle Movement Schedule
3.2 Traffic Flow Model
4 Experiments
5 Result Discussion and Evaluation
6 Contributions to Knowledge
7 Future Research Direction
8 Conclusion
8.1 Summary
References
Road Intersection Coordination Scheme for Mixed Traffic (Human Driven and Driver-Less Vehicles): A Systematic Review
1 Introduction
1.1 Classification of Traffic Control Means
2 Review of Related Literature
2.1 Intelligent Transportation System
3 Transition from Human-Driven to Autonomous Vehicle Technology
4 Autonomous Intersection Management
4.1 Review Strategy
4.2 Inclusion and Exclusion Criteria
4.3 Data Extraction and Analysis Based on Traffic Control Parameters
5 Research Gap
6 Conclusion
References
Selection of Driving Mode in Autonomous Vehicles Based on Road Profile and Vehicle Speed
1 Introduction
2 Methodology
3 Results
4 Analysis and Discussion
4.1 Sprung Mass Acceleration Tracing
4.2 Tire Spring Length Variation
4.3 Suspension Spring Length Variation
5 Conclusions
References
Compass Errors in Mobile Augmented Reality Navigation Apps: Consequences and Implications
1 Introduction
2 Related Work
3 Experiment Design
3.1 Analysis of Errors: Reference Bearings
3.2 Analysis of Errors: Observed Bearings
4 Results
4.1 All Devices
4.2 Android Phones
4.3 IPhones
4.4 IPads
4.5 Different Apps on the Same Device
4.6 Uncalibrated Results
5 Discussion
5.1 The Scale of the Deviation Curves
5.2 IPhone Versus Android Phones
5.3 Calibration
5.4 Differences Between Apps on the Same Device
5.5 Other Observations
5.6 Implications for Reliability of mAR Compasses
6 Compass Errors and Navigation
6.1 The Function of Hand Bearing Compasses
6.2 Registration at Sea
6.3 Risk of Error Magnification
6.4 Augmented Reality Navigation Apps
6.5 Questions to Address
7 Conclusion
References
Virtual Reality as a Powerful Persuasive Technology to Change Attitude During the Spread of COVID-19
1 Introduction
2 Simulation
2.1 Simulated Cause and Effect Scenarios
2.2 Simulated Environments
2.3 Simulated Objects
3 History
4 VR Explication
5 Equipment Used in Virtual Reality
6 Conclusion
References
Predicting Traffic Indexes on Urban Roads Based on Public Transportation Vehicle Data in Experimental Environment
1 Introduction
2 Methodology
2.1 First Stage - Generating a Schedule
2.2 Second Stage - SUMO Simulation
2.3 Third Stage - Running the Algorithm on the Generated Data
2.4 Fourth Stage - Transformation of the Generated Data
2.5 Fifth Stage - Machine Learning Platform
3 Running the Prediction Models
3.1 Data Preparation
3.2 Single-Step Models
3.3 Multi-step Models
4 Conclusion
References
Corona Virus and Entropy of Shannon at the Cardiac Cycle: A Mathematical Model
1 Introduction
2 The Mathematical Proposal
2.1 Naive Model
2.2 Parameters Correlation
3 Implications
3.1 Derivation of Fundamental Equation
4 Interpretation of Eq.34
4.1 Shannon Entropy from Covid-19 Infection
5 Conclusion
References
Efficiency of Local Binarization Methods in Segmentation of Selected Objects in Echocardiographic Images
1 Introduction
2 Data
3 Methods
3.1 Local Thresholding
3.2 Statistical Dominance Algorithm
3.3 Local Normalization
3.4 Dice Coefficient
3.5 Processing
4 Results
5 Conclusions
References
Automatic Speech-Based Smoking Status Identification
1 Introduction
2 Acoustic Features for Smoking Status Identification
2.1 MFCC and Fbank
2.2 Fundamental Frequency (F0)
2.3 Jitter
2.4 Shimmer
3 Our Method
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Performance Metrics
5 Results
6 Conclusion
References
Independent Component Analysis for Spectral Unmixing of Raman Microscopic Images of Single Human Cells
1 Introduction
2 Independent Component Analysis
3 Results and Discussion
3.1 Raman Data Acquisition and Pre-processing
3.2 Experimental Results
4 Conclusion
References
A Method of Hepatocytes Segmentation in Microscopic Images of Trypan Blue Stained Cellular Suspension
1 Introduction
1.1 Related Works
1.2 Contents of the Paper
2 Materials and Methods
2.1 Used Data
2.2 Proposed Algorithm
3 Results
3.1 Discussion
4 Conclusions
References
Using Technology to Create Personalised Environments for Dementia Care: Results of an Empathy Map Study
1 Premises
2 Objective
3 Method
4 SENSE-GARDEN Concept
4.1 Design Hierarchy of Needs for SENSE-GARDEN
4.2 SENSE-GARDEN usability–the ace up the Sleeve
5 Empathy Mapping for SENSE-GARDEN
5.1 Initial Empathy Map Quadrants
5.2 User Satisfaction with SENSE-GARDEN – the Empathy Map of the First Sessions
5.3 The Final Empathy Map
6 Conclusion
References
Learning Analytics for Knowledge Creation and Inventing in K-12: A Systematic Review
1 Introduction
2 Background
2.1 Constructivist Pedagogies
2.2 Collaborative Learning
2.3 Knowledge-Creating Pedagogies
2.4 Learning Analytics
3 Research Design
3.1 Search Strategy
3.2 Study Selection for Full Read
4 Descriptives: Final Set
4.1 Disciplinary Focus
5 Results
5.1 Educational Technology and Pedagogy
5.2 Three Classes of Learning Analytics
5.3 Learning Analytics: Approaches
6 Discussion
6.1 Edutech for Knowledge Creation and Inventing
6.2 Learning Analytics for Knowledge Creation and Inventing
6.3 Limitations
References
A Conceptual Framework for the Development of Argumentation Skills Using CSCL in a Graduate Students' Research Course
1 Introduction
2 Theoretical Framework
3 Proposed ASDF
3.1 Course Requirements
3.2 Pedagogical Approaches for ASD
3.3 Human Capacity: The Student as a Researcher
3.4 Infrastructural Requirements
3.5 ODeL Technology Infrastructure
3.6 Output
3.7 Evaluation of the Approach
4 The Scaffolded Learning Journey
5 Toulmin's Argumentation Model
6 The Focus Groups
6.1 Focus Group Discussion Findings
6.2 Online Questionnaire Findings
7 Revised ASDF
8 Conclusion
References
Single Access for the Use of Information Technology in University Education During the SARS-CoV-2 Pandemic
1 Introduction
2 Literature Review
2.1 Information Technology Context
2.2 Digital Maturity
2.3 Digital Maturity
2.4 Digital Disruption and Digital Platforms for Education
3 Methodology
4 Results
5 Conclusion
References
Camcorder as an e-Learning Tool
1 Introduction
2 Overview
3 Methodology for Conducting Remote Laboratory Work
3.1 Application of a Video Camera for Laboratory Work in the “Mechanics” Section
3.2 Application of a Video Camera for Laboratory Work in the “Thermal Radiation” Section
4 Discussion
5 Main Conclusions
References
Smartphone in Detecting Developmental Disability in Infancy: A Theoretical Approach to Shared Intentionality for Assessment Tool of Cognitive Decline and e-Learning
1 Introduction
1.1 Shared Intentionality in Cognition
1.2 Primary Data Entry Problem for Understanding Shared Intentionality
2 Approaches to the PDE Problem
2.1 Communication Theory About the PDE Problem
2.2 The PDE Problem in Cognitive Science
2.3 Genetics About the PDE Problem
3 Discussion
4 Future Work
References
Centralized Data Driven Decision Making System for Bangladeshi University Admission
1 Introduction
2 Literature Review
3 Proposed Solution
3.1 Research Method Used and Challenges Faced
3.2 Structure and Source of Data
3.3 Data Pre-processing
3.4 Analyzing Gathered Data
4 Result and Discussion
4.1 Insight on Decision Makers
4.2 Detailed Results
5 Future Work and Conclusion
References
Battle Card, Card Game for Teaching the History of the Incas Through Intelligent Techniques
1 Introduction
2 Related Work
3 Materials and Methods
3.1 Trello
3.2 Artificial Intelligence Algorithms
3.3 Scrum
3.4 Sprints
4 Results
4.1 Unit Tests
4.2 Test of Performance
4.3 Acceptance Tests
5 Conclusions
References
Barriers for Lecturers to Use Open Educational Resources and Participate in Inter-university Teaching Exchange Networks
1 Introduction
2 Open Educational Resources and Teaching Exchange Networks
3 Method
4 Results
4.1 Findings
4.2 Discussion
5 Conclusion
References
Lessons Learnt During the COVID-19 Pandemic: Preparing Tertiary Education in Japan for 4IR
1 Literature Review
1.1 Pre-pandemic Texts
1.2 Texts Written During and in Response to the Pandemic
2 Introduction
3 Methodology
4 Results and Findings
4.1 Demographics of Interviewees
4.2 Teaching Methods of Interviewees
4.3 Preparations Made for Curriculum Change
4.4 Qualitative Data from Open Questions
5 Conclusion
Appendix 1: Raw Data Collected From Questionnaire
References
Ambient Intelligence in Learning Management System (LMS)
1 Introduction
2 AmI System Structure
2.1 Tasks
2.2 Skills
3 Conclusions
References
eMudra: A Leftover Foreign Currency Exchange System Utilizing the Blockchain Technology
1 Introduction
2 Background Research
2.1 Current LFC Utilization Methods
2.2 Issues with Existing LFC Exchange Methods
2.3 LFC Market
2.4 New LFC Exchange Methods
2.5 Drawbacks of the New LFC Exchange Methods
2.6 Peer to Peer Exchange Systems
3 eMudra: The LFC Exchange System
3.1 Traveller’s Journey Use Case
3.2 Business Model
3.3 Blockchain Technology
3.4 Proposed System Architecture of e-Mudra: A P2P Cash LFC Exchange Application
3.5 e-Mudra: The Implementation of the Prototype
4 Conclusion
References
A Multi-layered Ontological Approach to Blockchain Technology
1 Introduction
2 Related Work
2.1 Semantic Blockchain
2.2 Layered Blockchain Ontology
3 Our Approach
3.1 ODLC and Methodology
3.2 Blockchain Application Ontology
3.3 Observations
4 Conclusion and Future Work
References
A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading
1 Introduction
2 Related Work
3 Preliminaries
3.1 Ethereum Smart Contract
3.2 Merkle Tree
3.3 zk-SNARK
4 Trading Protocol
4.1 Participants
4.2 Data Integrity
4.3 Trading Details
4.4 Potential Attack
4.5 Security Discussion
5 Implementation
5.1 Zero-Knowledge Proof
6 Trading Process and Smart Contracts
7 Conclusion
References
Blockchain Industry Implementation and Use Case Framework: A Survey
1 Introduction
2 Research Objective
3 Research Method
4 Brief Overview of Blockchain
5 Blockchain Innovation and Disruption
6 Use Case Frameworks
7 Industry’s Use Cases
8 Security Issues and Challenges
9 Conclusions and Recommendations
References
A Blockchain Approach for Exchanging Machine Learning Solutions Over Smart Contracts
1 Introduction
2 Existing Approaches and Corresponding Shortcomings
3 Proposed Solution
4 Functional Overview
4.1 Specifications of Smart Contracts
4.2 Evaluation and Submission Mechanism
4.3 Machine Learning for Market Regulation
4.4 Architectural Diagram
5 Validation
6 Conclusion
References
Untangling the Overlap Between Blockchain and DLTs
1 Introduction
2 DCEA Framework
3 Data Layer
3.1 Components and Properties
3.2 Data Layer: State of the Art
4 Consensus Layer
4.1 Components and Properties
4.2 Consensus Layer: Dtate of the Srt
5 Execution Layer
5.1 Components and Properties
5.2 Execution Layer: State of the Art
5.3 Environment Openness
6 Application Layer
6.1 Components and Properties
6.2 Application Layer: State of the Art
7 The Distinction Between Blockchain and Blockchain Like System
8 Conclusion
References
A Secure Data Controller System Based on IPFS and Blockchain
1 Introduction
2 Related Work
3 IPFS
4 Materials and Methods
5 Conclusion
References
Key Exchange Protocol Based on the Matrix Power Function Defined Over IM16
1 Introduction
2 Preliminaries
3 Properties of MPF
4 Key Exchange Protocol
5 Resistance of the Proposed KEP Against Decisional Attack
6 Resistance of the Proposed KEP Against Computational Attack
7 Conclusions
References
Design and Analysis of Pre-formed ReRAM-Based PUF
1 Introduction
2 Background Information
2.1 Physical Unclonable Functions
2.2 Resistive Random Access Memory Technology
3 Experimental Setup
3.1 Die Structure
3.2 Device Layout
3.3 Electrical Equipment
3.4 Pre-formed ReRAM-Based PUF Design
4 PUF Analysis
4.1 Temperature Testing
4.2 Reliability
4.3 Stability
4.4 Randomness
5 Conduction Mechanisms
5.1 Poole-Frenkel (P-F) Emissions
5.2 Ohmic Conduction (O-C)
5.3 Space Charge Limited Conduction (SCLC)
5.4 Simulated Butler-Volmer Equation
6 Conclusion and Future Work
References
A Graph Theoretical Methodology for Network Intrusion Fingerprinting and Attack Attribution
1 Introduction
2 Literature Review
3 Methodology
3.1 Initial Steps
3.2 Integrating Information Theory into Graph Theory
3.3 Integrating Spectral Graph Theory
3.4 Experimental Results
3.5 Virus Experiment 2
3.6 Comparing the Virus Experiments
3.7 Virus Experiment Summary
4 Conclusion
References
Using Memory Forensics to Investigate Security and Privacy of Facebook
1 Introduction
2 Background
3 Scope and Methodology
3.1 Platform
3.2 Tools
3.3 Target Data
3.4 Scenario
3.5 Methodology
4 Experiment Results
4.1 First Set of Experiment
4.2 Second Set of Experiments
4.3 Third Set of Experiment
4.4 User Personal Information
4.5 Facebook Activity Information
5 Conclusions
6 Future Research
References
Hash Based Encryption Schemes Using Physically Unclonable Functions
1 Introduction
2 Preliminary
2.1 Cryptographic Hash Functions
2.2 Lamport Digital Signature
2.3 Winternitz
2.4 HORS Digital Signature
2.5 PUFs
2.6 Ternary Addressable Public Key Infrastructure and Keys with Variable Kength
3 Method
3.1 Generic Encryption with Multiple Hashing
3.2 Generic Encryption Combining Multiple Hashing and Random Ordering
4 Results
5 Conclusion
References
Cyber-Safety Awareness: Assisting Schools in Implementation Guidelines
1 Introduction
2 Cyber-Safety Awareness Building Blocks
3 Methodology
4 Theoretical Framework
5 Data Gathering and Findings
6 Proposed Framework
7 Conclusion
References
Reducing Exposure to Hateful Speech Online
1 Introduction
2 Background
2.1 Defining Hate Speech
2.2 Technology to Manage Hate Speech
2.3 Machine Learning
2.4 Summary
3 Prototype Development
3.1 Text Detection
3.2 API
3.3 Text Processing
3.4 Notifying the User
4 Evaluation
4.1 Survey Items
4.2 Survey Results
5 Discussion
5.1 Design
5.2 Technical Implementation
6 Conclusion
References
Hybrid-AI Blockchain Supported Protection Framework for Smart Grids
1 Introduction
2 Background
2.1 State Estimation
2.2 Bad Data Detection
2.3 False Data Injection Attack
2.4 Distributed Denial of Service
3 Proposed Scheme
3.1 Data Integrity and Authentication
3.2 FDIA Mitigation
3.3 DDoS Attack Detection
4 Results and Analysis
4.1 Simulation Settings
4.2 Performance Analysis
4.3 Accuracy Analysis
5 Conclusion
References
Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-Purpose Regularizer and Privacy Budget Retrieval and Recycling
1 Introduction
1.1 Background
1.2 Related Work
1.3 Our Contributions
2 Preliminaries
3 Noise-Augmented Privacy-Preserving ERM
3.1 Noise Augmentation Scheme
3.2 Noise Augmented Privacy-Preserving ERM
3.3 Dual-Purpose Regularization and Mitigation of Over-Regularization (MOOR) Through Iterative Weighted l2
3.4 Computational Algorithm
3.5 NAPP-ERM for Variable Selection
3.6 Guarantees of DP in NAPP-ERM
3.7 Privacy Budget Retrieval
3.8 Summary on NAPP-ERM
4 Utility Analysis
4.1 Excess Risk Bound
4.2 Sample Complexity
5 Experiments
5.1 Mitigation of Over-Regularization (MOOR)
5.2 Private Variable Selection and Outcome Prediction
5.3 Privacy Budget Retrieval and Recycling
5.4 Experiment Result Summary
6 Discussion and Conclusion
References
Data Security Awareness and Proper Handling of ICT Equipment of Employees: An Assessment
1 Introduction
1.1 Research Framework
2 Methods
3 Results and Discussions
3.1 Profile of the Respondents
3.2 Facility Exposure of the Respondents to ICT Equipment
3.3 Technology Competency Level of Employees in Proper Handling of ICT Equipment
3.4 Respondents Level of Data Security Awareness
3.5 Relationship Between Respondents’ Competency Level on the Proper Handling of ICT Equipment and Their Profile Variables
3.6 Relationship Between Respondents’ Competency Level on the Proper Handling of ICT Equipment and Their Level of Awareness on Data Security
4 Conclusions and Recommendations
References
Developing a Webpage Phishing Attack Detection Tool
1 Introduction
2 Webpages Phishing Detection Techniques
2.1 Search Engine Based
2.2 Visual Similarity Based
2.3 Blacklist and Whitelist Based
2.4 Heuristics and Machine Learning Based
2.5 Proactive Phishing URL Detection Based
2.6 DNS Based
3 Related Work
4 The Proposed Approach
4.1 The First Phase
4.2 The Second Phase
5 Implementation and Testing
5.1 Dataset
5.2 Storing White-List Database
5.3 Experiment Results
5.4 Results Evaluation
6 Conclusion and Future Work
References
An Evaluation Model Supporting IT Outsourcing Decision for Organizations
1 Introduction
2 Research Aim and Methodology
3 Findings
4 Discussion
4.1 Market
4.2 IT Governance
4.3 Security
4.4 Benefits and Disadvantages
4.5 Outsourcing Methods
4.6 Determinants
5 Evaluation Model Supporting IT Outsourcing
6 Conclusions and Future Research
References
Immunizing Files Against Ransomware with Koalafied Immunity
1 Introduction
2 Koalafied Immunity
2.1 What if Ransomware Encrypts the Entire File?
3 Experiment Design
4 Results
5 Future Work
6 Conclusion
References
Measuring the Resolution Resiliency of Second-Level Domain Name
1 Introduction
2 DNS Ecosystem
3 DNS Resolution Failure
4 Related Work
4.1 Measure DNS Resiliency
4.2 Address DNS Resolution Failures
5 Resolution Resiliency Measurement
5.1 Domain Status Code Metrics
5.2 NS Diversity Metrics
5.3 TTL Configuration Metrics
5.4 Measure Resolution Resiliency
5.5 Experiment
6 Resolution Resiliency Improvement
7 Conclusion
References
An Advanced Algorithm for Email Classification by Using SMTP Code
1 Introduction
2 Research Background
2.1 Email Environment
2.2 Literature Review
3 Proposed Algorithm
3.1 Terms
3.2 Basic Concept and Procedure
3.3 Flow of Email Transmission Agent
3.4 Logic Flow
3.5 Each Step of Algorithm
4 Performance
4.1 Computing Performance
4.2 Classification Performance
5 Conclusion
5.1 Summary
5.2 Implications for Research and Practice
5.3 Limitation and Direction of Further Research
References
Protecting Privacy Using Low-Cost Data Diodes and Strong Cryptography
1 Introduction
2 Network Security Threats
3 Data Diodes
3.1 Low-Cost Data Diodes
3.2 Side-Channels
4 Encryption
4.1 One-Time-Pad
4.2 Post-Quantum Cryptography
4.3 Meta-Data
4.4 True Random Number Generators
4.5 Low-Cost TRNG's
5 Discussion
References
A Security System for National Network
1 Introduction
2 Research Background
3 Security System
3.1 Internal/External Interface for System Access Control
3.2 Alert Generation and Unauthorized Access Prevention Module
3.3 Real-Time Defense Function for Control System
4 Performance Evaluation
4.1 Unauthorized IP Blocking Rate
4.2 Unauthorized MAC Blocking Rate
4.3 Blocking Rate Based on Security Policy
4.4 Abnormal Traffic Detection
4.5 Real-Time Monitoring of Illegal Access
5 Conclusion
References
Secured Digital Oblivious Pseudorandom and Linear Regression Privacy for Connected Health Services
1 Introduction
2 Related Works
3 Methodology
3.1 Distinctive Nearest Neighbor Confidence Feature Selection Model
3.2 Digital Oblivious Pseudorandom Signature-based Authentication Model
3.3 Linear Regression Privacy Preservation Communication Model
4 Experiments and Results Section
5 Results Section
5.1 Case 1: Response Time Analysis
5.2 Case 2: Authentication Accuracy Analysis
5.3 Case 3: Security Analysis
6 Conclusion
References
Hardware Implementation for Analog Key Encapsulation Based on ReRAM PUF
1 Introduction
2 Background
3 Objective
4 Methodology
4.1 Protocol Steps
4.2 ReRAM Shield 2.0
4.3 Hardware
5 Implementation
5.1 Implementation Challenges
6 Results
7 Summary and Future Work
References
Questions of Trust in Norms of Zero Trust
1 Introduction
2 Norms and Emergence
3 Zero Trust and Norms
4 Trust
5 SolarWinds
6 Understanding Trust in Zero Trust
7 Conclusion, Limitations Implications
References
Leveraging Zero Trust Security Strategy to Facilitate Compliance to Data Protection Regulations
1 Introduction
1.1 Problem Statement
1.2 Information Security Frameworks to Protect Privacy
1.3 Zero Trust as a Security Strategy
1.4 The Role of the CISO and the DPO
2 Proposing the ON2IT Zero Trust Framework for Privacy
3 Research Approach
4 Results for a Zero Trust Framework Applied to Privacy
4.1 Discussion on the GSS Session Results
5 Leveraging the Zero Trust Strategy to Facilitate Compliance to Data Protection Regulations
5.1 From Corporate Governance Policies to Zero Trust Measures
5.2 How Zero Trust Measures Enables Data Protection Laws
6 Discussion
7 Conclusion
References
Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises
1 Introduction
1.1 Introducing Zero Trust
1.2 Industry Promises
2 The Problem
2.1 Problem Statement
3 Expert Panel Interviews
4 Results
4.1 Interview Results
5 Important Takeaways
5.1 How to Design a Protect Surface
5.2 Zero Trust Documentation
6 Conclusions
References
Bit Error Rate Analysis of Pre-formed ReRAM-based PUF
1 Introduction
2 Background Information
2.1 Physically Unclonable Functions (PUFs)
2.2 Existing Key Generation Protocols Based on Ternary Implementation
2.3 Resistive RAM (ReRAM) Technology
2.4 Background on ReRAM-Based PUFs
3 ReRAM PUFs Operating at Low Power
3.1 Pre-Formed ReRAM PUF
3.2 Cryptographic Key Generation Protocol Using Pre-Formed ReRAM
3.3 Benchmarking Various PUF BERs
4 Methodology
4.1 Experimental Setup
4.2 Computing Error Rate of Each Cell of the ReRAM
4.3 Compute BER with Different Buffer Size
5 Experimental Results
6 Conclusions and Future Work
References
Correction to: Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises
Correction to: Chapter “Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises” in: K. Arai (Ed.): Intelligent Computing, LNNS 508, https://doi.org/10.1007/978-3-031-10467-1_53
Author Index

Recommend Papers

Intelligent Computing: Proceedings of the 2022 Computing Conference, Volume 1 (Lecture Notes in Networks and Systems, 506) 3031104609, 9783031104602

The book, “Intelligent Computing - Proceedings of the 2022 Computing Conference”, is a comprehensive collection of chapt

123 69 86MB Read more

Intelligent Computing: Proceedings of the 2022 Computing Conference, Volume 2 (Lecture Notes in Networks and Systems, 507) 3031104633, 9783031104633

The book, “Intelligent Computing - Proceedings of the 2022 Computing Conference”, is a comprehensive collection of chapt

102 16 100MB Read more

Intelligent Computing: Proceedings of the 2021 Computing Conference, Volume 3 (Lecture Notes in Networks and Systems, 285) 3030801284, 9783030801281

This book is a comprehensive collection of chapters focusing on the core areas of computing and their further applicatio

108 45 102MB Read more

Intelligent Computing: Proceedings of the 2023 Computing Conference, Volume 2 (Lecture Notes in Networks and Systems, 739) 3031379624, 9783031379628

This book is a collection of extremely well-articulated, insightful and unique state-ofthe-art papers presented at the C

109 100 122MB Read more

Intelligent Computing: Proceedings of the 2023 Computing Conference, Volume 1 (Lecture Notes in Networks and Systems, 711) 3031377168, 9783031377167

This book is a collection of insightful and unique state-of the-art papers presented at the Computing Conference which t

103 28 165MB Read more

Intelligent Computing and Optimization: Proceedings of the 6th International Conference on Intelligent Computing and Optimization 2023 (ICO2023), Volume 5 (Lecture Notes in Networks and Systems) 3031501578, 9783031501579

This book of Springer Nature is another proof of Springer’s outstanding greatness on the lively interface of Holistic Co

107 39 35MB Read more

Intelligent Computing and Optimization: Proceedings of the 6th International Conference on Intelligent Computing and Optimization 2023 (ICO2023), Volume 2 (Lecture Notes in Networks and Systems, 852) 3031503295, 9783031503290

This book of Springer Nature is another proof of Springer’s outstanding greatness on the lively interface of Holistic Co

121 112 26MB Read more

Intelligent Computing and Optimization: Proceedings of the 6th International Conference on Intelligent Computing and Optimization 2023 (ICO2023), Volume 4 (Lecture Notes in Networks and Systems) 3031501500, 9783031501500

This book of Springer Nature is another proof of Springer’s outstanding greatness on the lively interface of Holistic Co

114 55 38MB Read more

Intelligent Computing and Optimization: Proceedings of the 6th International Conference on Intelligent Computing and Optimization 2023 (ICO2023), Volume 1 (Lecture Notes in Networks and Systems, 729) 3031362454, 9783031362453

This book of Springer Nature is another proof of Springer’s outstanding greatness on the lively interface of Holistic Co

117 93 27MB Read more

Intelligent Computing & Optimization: Proceedings of the 4th International Conference on Intelligent Computing and Optimization 2021 (ICO2021) (Lecture Notes in Networks and Systems, 371) 303093246X, 9783030932466

This book includes the scientific results of the fourth edition of the International Conference on Intelligent Computing

124 68 114MB Read more

Intelligent Computing: Proceedings of the 2022 Computing Conference, Volume 3 (Lecture Notes in Networks and Systems, 508)
3031104668, 9783031104664

Author / Uploaded
Kohei Arai (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lecture Notes in Networks and Systems 508

Kohei Arai Editor

Intelligent Computing Proceedings of the 2022 Computing Conference, Volume 3

Lecture Notes in Networks and Systems Volume 508

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subﬁelds of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the ﬁelds of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

More information about this series at https://link.springer.com/bookseries/15179

Kohei Arai Editor

Intelligent Computing Proceedings of the 2022 Computing Conference, Volume 3

123

Editor Kohei Arai Saga University Saga, Japan

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-10466-4 ISBN 978-3-031-10467-1 (eBook) https://doi.org/10.1007/978-3-031-10467-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022, corrected publication 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Editor’s Preface

This edition of the proceedings series, “Intelligent Computing: Proceedings of the 2022 Computing Conference” contains papers presented at the Computing Conference 2022, held virtually on the 14th and 15th of July 2022. We are delighted to announce that the complete conference proceedings were successfully executed through the will and co-operation of all its organizers, hosts, participants and all other contributors. The conference is held every year since 2013, with an aim to provide an ideal platform for researchers to exchange ideas, discuss on research results and present practical and theoretical applications in areas, such as technology trends, computing, artiﬁcial intelligence, machine vision, security, communication, ambient intelligence and e-learning. The proceedings of 2022 conference has been divided into two volumes which cover a wide range of abovementioned conference topics. This year Computing Conference received a total of 498 papers from around the globe, out of which only 179 papers were selected to be published in the proceedings for this edition. All the published papers passed the double-blind review process by an international panel of at least three international expert referees, and the decisions were taken based on the research quality. We are very pleased to report that the quality of the submissions this year turned out to be very high. The conference brings a single-track sessions covering research papers, posters, videos followed with keynote talks by experts to stimulate signiﬁcant contemplation and discussions. Moreover, all authors had very professionally presented their research papers which were viewed by a large international audience online. We are conﬁdent that all the participants and the interested readers beneﬁt scientiﬁcally from this book and will have signiﬁcant impact to the research community in the longer term. Acknowledgment goes to the keynote speakers for sharing their knowledge and expertise with us. A big thanks to the session chairs and the members of the technical program committee for their detailed and constructive comments which

v

vi

Editor’s Preface

were valuable for the authors to continue improving their papers. We are also indebted to the organizing committee for their invaluable assistance to ensure the conference comes out in such a great success. We expect that the Computing Conference 2023 will be as stimulating as this most recent one was. Kohei Arai

Contents

Wearable Internet of Things (IoT) Device Model Design Based on Low-Cost Healthcare Monitoring System for Current Covid-19 Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo Álvarez-González, Edgar R. González-Campos, Nicolás Quiroz-Hernández, and Alba M. Sánchez-Gálvez

1

A Matching Mechanism for Provision of Housing to the Marginalized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Ceasar Aguma

20

Speed Harmonisation Strategy for Human-Driven and Autonomous Vehicles Co-existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekene Frank Ozioko, Julian Kunkel, and Fredric Stahl

34

Road Intersection Coordination Scheme for Mixed Trafﬁc (Human Driven and Driver-Less Vehicles): A Systematic Review . . . . . . . . . . . . Ekene F. Ozioko, Julian Kunkel, and Fredric Stahl

67

Selection of Driving Mode in Autonomous Vehicles Based on Road Proﬁle and Vehicle Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahmoud Zaki Iskandarani

95

Compass Errors in Mobile Augmented Reality Navigation Apps: Consequences and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 David S. Bowers Virtual Reality as a Powerful Persuasive Technology to Change Attitude During the Spread of COVID-19 . . . . . . . . . . . . . . . . . . . . . . . 143 Sara Alami and Mostafa Hanoune Predicting Trafﬁc Indexes on Urban Roads Based on Public Transportation Vehicle Data in Experimental Environment . . . . . . . . . . 159 Georgi Yosifov and Milen Petrov

vii

viii

Contents

Corona Virus and Entropy of Shannon at the Cardiac Cycle: A Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Huber Nieto-Chaupis Efﬁciency of Local Binarization Methods in Segmentation of Selected Objects in Echocardiographic Images . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Joanna Sorysz and Danuta Sorysz Automatic Speech-Based Smoking Status Identiﬁcation . . . . . . . . . . . . . 193 Zhizhong Ma, Satwinder Singh, Yuanhang Qiu, Feng Hou, Ruili Wang, Christopher Bullen, and Joanna Ting Wai Chu Independent Component Analysis for Spectral Unmixing of Raman Microscopic Images of Single Human Cells . . . . . . . . . . . . . . . . . . . . . . 204 M. Hamed Mozaffari and Li-Lin Tay A Method of Hepatocytes Segmentation in Microscopic Images of Trypan Blue Stained Cellular Suspension . . . . . . . . . . . . . . . . . . . . . 214 Kuba Chrobociński, Wojciech Witarski, and Katarzyna Piórkowska Using Technology to Create Personalised Environments for Dementia Care: Results of an Empathy Map Study . . . . . . . . . . . . . . . . . . . . . . . . 225 Ronny Broekx, J. Artur Serrano, Ileana Ciobanu, Alina Iliescu, Andreea Marin, and Mihai Berteanu Learning Analytics for Knowledge Creation and Inventing in K-12: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Mikko-Ville Apiola, Soﬁa Lipponen, Aino Seitamaa, Tiina Korhonen, and Kai Hakkarainen A Conceptual Framework for the Development of Argumentation Skills Using CSCL in a Graduate Students’ Research Course . . . . . . . . 258 R. van der Merwe, J. van Biljon, and C. Pilkington Single Access for the Use of Information Technology in University Education During the SARS-CoV-2 Pandemic . . . . . . . . . . . . . . . . . . . . 279 José L. Cendejas Valdez, Heberto Ferreira Medina, María E. Benítez Ramírez, Gustavo A. Vanegas Contreras, Miguel A. Acuña López, and Jesús L. Soto Sumuano Camcorder as an e-Learning Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Isaeva Oksana, Boronenko Yuri, and Boronenko Marina Smartphone in Detecting Developmental Disability in Infancy: A Theoretical Approach to Shared Intentionality for Assessment Tool of Cognitive Decline and e-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Igor Val Danilov

Contents

ix

Centralized Data Driven Decision Making System for Bangladeshi University Admission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Fatema Tuj Johora, Aurpa Anindita, Noushin Islam, Mahmudul Islam, and Mahady Hasan Battle Card, Card Game for Teaching the History of the Incas Through Intelligent Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Javier Apaza Humpire, Maria Guerra Vidal, Miguel Tupayachi Moina, Milagros Vega Colque, and José Sulla-Torres Barriers for Lecturers to Use Open Educational Resources and Participate in Inter-university Teaching Exchange Networks . . . . . . . . . 347 Paul Greiff, Carla Reinken, and Uwe Hoppe Lessons Learnt During the COVID-19 Pandemic: Preparing Tertiary Education in Japan for 4IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Adam L. Miller Ambient Intelligence in Learning Management System (LMS) . . . . . . . 379 Ilan Daniels Rahimi eMudra: A Leftover Foreign Currency Exchange System Utilizing the Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 Rituparna Bhattacharya, Martin White, and Natalia Beloff A Multi-layered Ontological Approach to Blockchain Technology . . . . . 409 Rituparna Bhattacharya, Martin White, and Natalia Beloff A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Zan-Jun Wang, Ching-Chun Huang, Shih-Wei Liao, and Zih-shiuan Spin Yuan Blockchain Industry Implementation and Use Case Framework: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Gabriela Ziegler A Blockchain Approach for Exchanging Machine Learning Solutions Over Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Aditya Ajgaonkar, Anuj Raghani, Bhavya Sheth, Dyuwan Shukla, Dhiren Patel, and Sanket Shanbhag Untangling the Overlap Between Blockchain and DLTs . . . . . . . . . . . . . 483 Badr Bellaj, Aafaf Ouaddah, Emmanuel Bertin, Noel Crespi, and Abdellatif Mezrioui A Secure Data Controller System Based on IPFS and Blockchain . . . . . 506 Saad Alshihri and Sooyong Park

x

Contents

Key Exchange Protocol Based on the Matrix Power Function Deﬁned Over IM16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Aleksejus Mihalkovich, Eligijus Sakalauskas, and Matas Levinskas Design and Analysis of Pre-formed ReRAM-Based PUF . . . . . . . . . . . . 532 Taylor Wilson, Bertrand Cambou, Brit Riggs, Ian Burke, Julie Heynssens, and Sung-Hyun Jo A Graph Theoretical Methodology for Network Intrusion Fingerprinting and Attack Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . 550 Chuck Easttom Using Memory Forensics to Investigate Security and Privacy of Facebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Ahmad Ghafarian and Deniz Keskin Hash Based Encryption Schemes Using Physically Unclonable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Dina Ghanai Miandaob, Duane Booher, Bertrand Cambou, and Sareh Assiri Cyber-Safety Awareness: Assisting Schools in Implementation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 E. Kritzinger and G. Lautenbach Reducing Exposure to Hateful Speech Online . . . . . . . . . . . . . . . . . . . . 630 Jack Bowker and Jacques Ophoff Hybrid-AI Blockchain Supported Protection Framework for Smart Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 S Sai Ganesh, S Surya Siddharthan, Balaji Rajaguru Rajakumar, S Neelavathy Pari, Jayashree Padmanabhan, and Vishnu Priya Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-Purpose Regularizer and Privacy Budget Retrieval and Recycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 Yinan Li and Fang Liu Data Security Awareness and Proper Handling of ICT Equipment of Employees: An Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682 Dorothy M. Ayuyang Developing a Webpage Phishing Attack Detection Tool . . . . . . . . . . . . . 693 Abdulrahman Almutairi and Abdullah I. Alshoshan An Evaluation Model Supporting IT Outsourcing Decision for Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710 Alessandro Annarelli, Lavinia Foscolo Fonticoli, Fabio Nonino, and Giulia Palombi

Contents

xi

Immunizing Files Against Ransomware with Koalaﬁed Immunity . . . . . . 735 William Hutton Measuring the Resolution Resiliency of Second-Level Domain Name . . . 742 Lanlan Pan, Ruonan Qiu, Anyu Wang, Minghui Yang, Yong Chen, and Anlei Hu An Advanced Algorithm for Email Classiﬁcation by Using SMTP Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Woo Young Park, Sang Hyun Kim, Duy-Son Vu, Chang Han Song, Hee Soo Jung, and Hyeon Jo Protecting Privacy Using Low-Cost Data Diodes and Strong Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 André Frank Krause and Kai Essig A Security System for National Network . . . . . . . . . . . . . . . . . . . . . . . . 789 Woo Young Park, Sang Hyun Kim, Duy-Son Vu, Chang Han Song, Hee Soo Jung, and Hyeon Jo Secured Digital Oblivious Pseudorandom and Linear Regression Privacy for Connected Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . 804 Renuka Mohanraj Hardware Implementation for Analog Key Encapsulation Based on ReRAM PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Manuel Aguilar Rios, Sareh Assiri, and Bertrand Cambou Questions of Trust in Norms of Zero Trust . . . . . . . . . . . . . . . . . . . . . . 837 Allison Wylde Leveraging Zero Trust Security Strategy to Facilitate Compliance to Data Protection Regulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Jean-Hugues Migeon and Yuri Bobbert Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises . . . . . . . . . . . 864 Yuri Bobbert, Jeroen Scheerder, and Tim Timmermans Bit Error Rate Analysis of Pre-formed ReRAM-based PUF . . . . . . . . . . 882 Saloni Jain, Taylor Wilson, Sareh Assiri, and Bertrand Cambou Correction to: Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuri Bobbert, Jeroen Scheerder, and Tim Timmermans

C1

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903

Wearable Internet of Things (IoT) Device Model Design Based on Low-Cost Healthcare Monitoring System for Current Covid-19 Disease ´ Ricardo Alvarez-Gonz´ alez1 , Edgar R. Gonz´ alez-Campos1(B) , anchez-G´alvez2 Nicol´ as Quiroz-Hern´andez1 , and Alba M. S´ 1

2

Faculty of Electronic Sciences, Benem´erita Universidad Aut´ onoma de Puebla, Puebla, Mexico {ricardo.alvarez,nicolas.quirozh}@correo.buap.mx, [email protected] Faculty of Computer Science, Benem´erita Universidad Aut´ onoma de Puebla, Puebla, Mexico [email protected]

Abstract. IoT has become an essential resource in health applications, mainly for the monitoring of chronic diseases with integration with wearable devices, which helps to analyze symptoms in a non-invasive way, becoming a highly qualiﬁed resource for the health care of the patient. Novel Covid-19 disease emerged from China in December 2019 and became a problem of global concern the following year. Elderly and people with comorbidities are the most aﬀected, causing a critical health condition or even death. Therefore, remote monitoring is necessary in patients with Covid-19 to avoid health complications caused by irregular conditions such as silent hypoxia. This paper proposes an eﬃcient, low-cost, and rapidly assembled wearable IoT device model, focused on monitoring the health of the Covid-19 patients, including oxygen saturation and body temperature measurements with the aim of notifying patients and medical experts of health status during disease. Implementation is based on Espressif’s ESP32 SoC (System on a Chip) using its connectivity resources for Wi-Fi communication toward IoT platform in order to deploy physiological measurements in mobile devices and alert in case of critical values. The results of the prototype implementation are compared to commercial medical devices to demonstrate the functionality and eﬃciency of the wearable IoT device model. Keywords: IoT

1

· Healthcare · Covid-19 · Wearable · WIoT healthcare

Introduction

Recently, the Internet of Things has improved various aspects of individual wellness monitoring due to the increased incidence of life-threatening chronic diseases c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 1–19, 2022. https://doi.org/10.1007/978-3-031-10467-1_1

2

´ R. Alvarez-Gonz´ alez et al.

such as diabetes mellitus (DM), arterial hypertension, and many others. In the same way, health care costs have increased the saturation of hospitals and the demand for medical services. For these reasons, there is a need to develop a model to transform the healthcare area from hospital-centric to patient-centric in order to satisfy healthcare demand for chronic diseases [1]. The interaction between wearable technology and IoT has become an inexorable relationship, causing the emergence of Wearable IoT (WIoT), a branch that has gained popularity in personalized health technology. Furthermore, IoT takes place in the healthcare realm only when wearable devices were developed, improving the diagnosis of health conditions by constantly monitoring physiological signs [1]. IoT is deﬁned as all scenarios in which there is an interaction between objects and the Internet through computing capabilities. Objects can be sensors, actuators, or any entity that is not considered a computer. IoT operability lies in the generation and consumption of data from devices, minimizing human interaction as possible [2]. Even IoT does not have a speciﬁc and universal deﬁnition, it is an emerging technology in constant growth in which experts predict that in 2025 there will be approximately one hundred billion devices connected by IoT, becoming an economic, social and technological issue [2]. On the other hand, wearable technology includes particular devices with the ability to detect speciﬁc variables of the environment or certain regions of the human body. These devices must be integrated into the human body to track physiological or biomechanical parameters [3,4]. Wearable devices are characterized by having wireless communications and, generally, they are designed to help people in daily activities or in health care; due to the popularity of smartphones, wearable technology has acquired a relevant importance [4]. On December of 2019 the ﬁrst case of the novel Covid-19 was registered in Wuhan, Hubei province in China [3]. It was named by World Health Organization (WHO) as an acronym of Coronavirus Disease 2019 [5]. Covid-19 is a respiratory illness caused by the severe acute syndrome Coronavirus-2, named as SARSCoV-2 and belonging to positive-sense RNA virus family, known as coronaviridae [3]. According to the WHO, viral infections and especially those caused by diﬀerent types of coronavirus continue to emerge [3]. Actually, Covid-19 has several variants that aﬀect the health of people individually, even after recovery, patients report sequelae, especially in the lungs [3]. WIoT devices have represented one of the most useful support resources to handle the current situation due to the Covid-19 pandemic, where the design perspective has changed due to the actual needs of aﬀected people [1,3]. Due to health centers saturation by worsening of the disease, remote monitoring systems are necessary. An eﬃcient low-cost Wearable IoT prototype with the capacity of sense human body temperature, oxygen saturation in blood and heart rate is realized. Through theoretical aspects as well as the main conditions caused by Covid-

Wearable Internet of Things (IoT) Device Model Design

3

19, a design is presented with the ability to communicate sensors data to the cloud platform in order to present important information to the user and health experts, including a system of alert management for critical data. The organization of this paper begins with the focus of the problem in this, Sect. 1, followed by Sect. 2 related to the review of the state of the art of WIoT medical devices in the context of Covid-19. In Sect. 3 design requirements for WIoT model based on theoretical research for main components are presented. Section 4 contains the proposed WIoT model and divided into three main layers identiﬁed as hardware, software and IoT. Section 5 presents the results obtained from the prototype implementation. Finally, in Sect. 6 the results of the proposed model are discussed, followed by the conclusions and future work in Sect. 7.

2

State of the Art

This section presents the main WIoT devices in terms of health and with the capabilities to implement them in Covid-19 patients, as well as the main challenges in the design of WIoT technology for the current pandemic crisis. 2.1

IoT in the Context of Covid-19

Recent past IoT became as a solution with smart systems for healthcare monitoring of patients with chronical diseases, taking into consideration the low-cost, improved Quality of Services (QoS) and advanced user experience scheme. In the last months IoT proved its capabilities to be a secure and eﬃcient technology in the various approaches to deal with the pandemic when Covid-19 emerged [6]. Main applications of IoT in Covid-19 context are classiﬁed in three stages related with the transition of the illness, and they are: (1) early diagnosis, (2) quarantine and (3) after recovery. The idea of including IoT in the three different stages of Covid-19 disease is justiﬁed by the need to obtain an accurate diagnosis quickly [6], allowing prudent actions to contain the disease and giving the opportunity to provide better medical treatments. It is possible to analyze large amounts of data from infected people with IoT intervention, and constantly monitor the most aﬀected physiological signs, through smartphones, wearables, robots, drones, and IoT buttons [6]. 2.2

WIoT and Covid-19

Wearable IoT technology has the ability to collect data from network sensors, process the information, and send it to cloud platforms. The four stages that govern the performance of WIoT devices in healthcare applications that guarantee the best eﬃciency and produce clinical data in an early stage of Covid-19 disease are presented [7]: – Data gathering and remote monitoring. – Data management.

4

´ R. Alvarez-Gonz´ alez et al.

– Analysis and control of data. – Results-based monitoring from data treatment. In WIoT technology it is possible to identify two main entities: the wearable device and the Host. The wearable device performs data collection, however its battery and storage capacity is limited due to dimensions and other factors. On the other hand, the Host cannot collect user data, but has enough storage, battery, and computing capacity to process a large number of data packets [7]. A case of WIoT device used for the actual pandemic crisis is Masimo [8], which is capable of detecting oxygen saturation, and attends the Centers for Disease Control and Prevention (CDC) normative. Vital Patch [9] and Shimmer [10] are other examples of WIoT devices used for Covid-19 patients due to their characteristics. But the most popular alternative is the Food and Drug Administration (FDA) approved Oxitone [11] device, which can monitor oxygen saturation, heart rate, stress levels, and sleep disturbances [12].

3

Design Requirements for Healthcare WIoT Devices

In order to design an eﬃcient and functional wearable IoT device for the medical care of the Covid-19 patient, the system is separated into its main stages considering the analysis of the theoretical background of the same disease and the WIoT architecture, both presented in the following sections. Therefore, the design requirements are studied to deﬁne the proposed solution and obtain the expected results. In this section, research results for Covid-19 symptoms, WIoT architecture, pulse oximetry and body temperature sensors, and wireless communications topics are compiled and discussed. 3.1

Disease Factors for WIoT Technology Design

Covid-19 is characterized as a highly infectious disease, which mainly aﬀects older people (60 years old) as well as people with chronic degenerative diseases, better known as comorbidities [13]. These aﬀections include mainly systemic arterial hypertension, diabetes mellitus, cardiovascular disease, obesity, cancer [14] and acquired immune deﬁciency syndrome (AIDS) [15]. To design WIoT technology for the actual pandemic it is important to consider the main symptoms associated with the disease. Fever is present in almost 98% of conﬁrmed cases, cough in 75% and respiratory distress in 55% [16]. There are more symptoms related to Covid-19 disease, but the ones mentioned above are the best known and are relatively easy to measure using non-invasive methods. Silent hypoxia is a medical concern reported as a serious health problem due to the emergence of Covid-19 and described as a condition in which patients don’t perceive respiratory distress or general malaise, but are perceived with cyanosis and hypoxemia; hence, this patients are at exceptionally high risk [17,18]. Hypoxia is deﬁned as respiratory insuﬃciency that diﬃcult the exchange

Wearable Internet of Things (IoT) Device Model Design

5

between oxygen and carbon dioxide in circulating blood [19]. The recommendation to diagnose silent hypoxia is to use a pulse oximeter, monitor blood gas levels or with walking tests. Patients with this highly risky condition requires strict day-to-day monitoring [18]. On the other hand, another global concern derived from Covid-19 disease is the overload of health centers due to patients with Non-Covid-19 disease. Remote monitoring helps patients and healthcare systems by assigning medical resources to truly Covid-19 patients and avoiding cross infection with medical staﬀ or fomite [20]. 3.2

Wearable Architecture

In recent years wearable devices have become in a very popular alternative for healthcare applications, cataloged as Wearable Biomedical Systems (WBS). Their architecture provides an important performance for health monitoring, and to design them is necessary to analyze the next 5 stages [21]. – Hardware: include design and implementation of new sensors to record physiological signs, shape and dimension parameters are also analyzed. – System architecture: implementation of sensors and additional circuits. – Software: design layer in which eﬃcient algorithms are implemented for data pre-processing and control of wireless communications. – Materials: analysis and selection of materials, considering that these devices are used for several hours. – Ergonomics: comfort and security are enhanced at all stages, such as a single sensor use, device size and software. 3.3

Oxygen Saturation Monitoring

Regarding to oxygen saturation in Covid-19 patients, it is known that respiratory damage can conduce to several health issues like multiple organ failure. Therefore, oxygen saturation must be carefully studied in infected people to assess the condition of the respiratory tract [20]. Pulse oximetry is a non-invasive method that allows to estimate oxygen saturation of arterial hemoglobin, and provides heart rate and pulse width values [22]. The principle of operation is the absorption of light by two (in reduced cases three) diodes with diﬀerent wavelengths, through speciﬁc body tissue. When different wavelengths are induced, light absorption of oxygenated and deoxygenated hemoglobin is not directly related [23]. Pulse oximetry devices are cataloged into commercial and medical usage and both use photoplethysmography (PPG) method to obtain oxygen saturation measures [23]. The main body regions to measure are the ﬁngers, earlobe, or toes [12], but in many cases ﬁngers are preferred for the accuracy obtained. The method of photoplethysmography is generally instrumented in two different conﬁgurations, and this depends on the location of the photoreceptor

6

´ R. Alvarez-Gonz´ alez et al.

and the emitter; conﬁgurations are transmittance and reﬂectance. In transmittance, the emitter is on the opposite side of the photoreceptor so that light can pass through tissue and bone. On the other hand, the reﬂectance conﬁguration requires both devices side by side, in which the light reﬂected from the tissue is obtained by the photoreceptor [24]. Reﬂectance method includes a plethora of advantages over Transmittance conﬁguration, like diﬀerent body regions location of the sensor. Furthermore, enhanced performance is obtained, providing more accurate monitoring. However, there exists some drawbacks related to artifact movement or perturbations by blood pressure changes [24]. 3.4

Temperature Monitoring

Human body temperature is a physiological sign of high interest in patients with Covid-19, because fever is the most recurrent symptom of this disease [25]. Human beings have the capacity to regulate body temperature regardless of external environmental conditions. Therefore, the core temperature is nothing more than the temperature of the hypothalamus, which has the mission of regulating the total body temperature [24]. Unfortunately the hypothalamus temperature is not possible to monitor with an external sensor, hence skin temperature is measured instead. Skin temperature is vulnerable to external environmental conditions, but is the easiest way to monitor this physiological sign with a wearable device [24]. Contact temperature sensors are commonly used to measure body temperature, located on speciﬁc body regions [23]. Some of the most widely used contact temperature sensors for wearable devices in the actual Covid-19 outbreak are LM35, SMT160-30, DS18B20 and MAX30205 devices [7]. These sensors belong to the Integrated Circuit classiﬁcation for sensors, and are characterized for the P-N junction with a linear behavior depending on the temperature [24]. 3.5

WIoT Wireless Communications

Wearable IoT technology focuses its communication capacity on an approximate coverage of 10 m, allowing data communications from sensor networks, and low energy consumption [23]. These devices must perform the exchange of data with other similar devices through protocols, standards and technologies [26], hence IoT architectures are proposed to perform the required tasks. The most popular IoT scheme is presented in Fig. 1, divided into three main stages: Devices, Wireless Access Point and Web Server [27]. Based on this IoT architecture, the devices can be sensor networks, wearable devices or any mobile device. The access point refers to the resource that allows devices and the Internet to communicate, generally performed by Gateways. Finally, data communication is linked with local or web servers to store, analyze or exchange data [27]. In addition, there are four types of data transmission, but for WIoT technology the Device-Gateway mode is mainly used, and this is because Gateways are high-throughput devices [27].

Wearable Internet of Things (IoT) Device Model Design

7

Fig. 1. IoT main communication architecture involving devices such as drones, wearables, sensor networks; gateway and local or web servers that receive the processed data from the device stage; Reproduced from [27]

Regarding to wireless networks, in WIoT technology is widely used Wireless Body Area Network (WBAN) architecture that allows communication between devices in just a few meters. WBAN are described in three operational stages for the healthcare environment as: (1) sensor networks embedded in human body, data storage and wireless data sending; (2) control center that receives data and routes it to the Gateway; and (3) main network that allows data exchange with health centers [23,27]. Communication protocols are required to send WIoT data through the access point to cloud platforms; for IoT devices, MQTT (Message Queue Telemetry Transport) is a useful alternative. MQTT is a lightweight communication protocol with enhanced security and quality of service (QoS) features. This protocol is based on Machine to Machine (M2M) connectivity, transmitting data through the Pub/Sub model to the Brokers, which execute the data communication management. This is a useful protocol for domotics and WIoT technology [27].

4

Proposed WIoT Device Model

The proposed WIoT model to support Covid-19 patients is presented in this section. Based on the theoretical background from the previous design requirements section, the system architecture is shown in Fig. 2. The model consists of the embedded system that includes temperature and oxygen saturation sensors, core processing element based on a SoC and a cloud platform communication stage. Diﬀerent alternatives proposed for the WIoT prototype model are described in the following sections.

8

´ R. Alvarez-Gonz´ alez et al.

Fig. 2. Block diagram of the proposed WIoT health device model, which considers the embedded device communicating via Wi-Fi to an IoT platform to send health data

4.1

Hardware Layer

A SoC based system to receive data from sensors, processing and data sending is proposed. Using a SoC device beneﬁts technological integration for diﬀerent applications, in this case for wireless communications. The selected device is the ESP32 developed by Espressif Systems, a China company. ESP32 is a low-cost SoC that improves security features and other features. This is a dual-core 32-bit microcontroller with a clock frequency of 160– 240 MHz, and it counts with more GPIOs (General Purpose In/Out) compared to ESP8266; furthermore, it is possible to monitor the device via Wi-Fi, Bluetooth or Bluetooth Low Energy (BLE) [28]. Regarding to oxygen saturation monitoring, there are some built-in alternatives from diﬀerent manufacturers, however the most aﬀordable options are resources from Maxim Integrated. This company oﬀers three main pulse oximetry sensors with diﬀerent characteristics depending on the requirements, and they are MAX30102, MAX30100 and MAX30105; the latter has better performance but is signiﬁcantly more expensive and diﬃcult to buy. The selected pulse oximetry sensor is MAX30100 due to the advanced capabilities focused in wearable, ﬁtness assistance and medical monitoring devices. This module can perform oxygen saturation and heart rate monitoring through two LEDs, a photodetector, optimized optics and low-noise analog signal processing; data communication is performed with I2C (Inter-Integrated Circuit)

Wearable Internet of Things (IoT) Device Model Design

9

interface. The module has characteristics for motion artifact resilience, ambient light cancelation and fast data output capabilities [29]. For temperature monitoring, the MAX30205 sensor from Maxim Integrated was selected, because this module is listed as an ideal human body temperature sensor for medical and ﬁtness applications. This sensor converts the temperature signal to digital using a Sigma-Delta ADC (Analog-to-Digital Converter) and the resulting measurements are in accordance with the ASTM E1112 standard. MAX30205 operates in a temperature range from 0 ◦ C to 50 ◦ C and its communication is done through the I2C interface, with a 16-bit temperature resolution [30]. Although wearable technology suggests an integrated implementation of the entire architecture, oxygen saturation requires monitoring in speciﬁc body regions. Due to hypoxia is a critical condition developed in many cases during Covid-19 disease, speciﬁc rules for pulse oximetry must be satisﬁed [31] in order to present an optimized design. In accordance with these recommendations, the location of the wearable device is proposed in Fig. 3. The pulse oximetry reading is performed on the ﬁnger, while the temperature sensor and the rest of the system are placed on the patient’s wrist.

Fig. 3. Proposed location for the WIoT device designed and based on the recommended regions for monitoring the physiological signs included in the model

Therefore, for the MAX30100 pulse oximeter sensor, a gripper was designed to isolate the device and the ﬁnger in order to improve the monitoring performance of the sensor; the pulse oximetry sensor is wired to ESP32 to communicate its data.

10

4.2

´ R. Alvarez-Gonz´ alez et al.

Software Layer

The software layer is as important as the hardware layer because it covers the conﬁguration and execution control instructions designed for the WIoT device. A ﬂow chart showing the ﬁrmware layout for ESP32 is presented in Fig. 4 and is mainly divided into three stages: (1) declaration of constants and functions, (2) conﬁguration stage, and (3) main loop.

Fig. 4. Representative ﬂowchart of the algorithm designed for its implementation in the ESP32 development board

Energy eﬃciency is an essential feature for wearable technology as these systems need to operate for long periods of time, however, it is not necessary to run all scheduled tasks constantly, therefore shutdown functions are required to save energy while measurements and Wi-Fi communication are not necessary [32]. Both selected sensors have shutdown capabilities when no action is required and can be enabled through software. The main loop begins to initialize the MAX30100 sensor and takes 50 samples, discarding those that are below to 80% due to external factors [31]. Afterwards, the average is calculated and sent to a cloud platform, ending with the shutdown of the MAX30100. The same performance is designed for temperature sensor but taking 20 samples and sending their average, continuing with the MAX30205 shutdown.

Wearable Internet of Things (IoT) Device Model Design

11

The ﬁnal step for the main loop is the activation of the ESP32 deep sleep mode with the aim of reducing power consumption once the sensors are in shutdown mode. The ESP32 module has an architecture composed of seven modules [28]: – – – – – – –

ESP32 core and memory ULP (Ultra Low Power) coprocessor RTC (Real Time Clock) and RTC peripherals In/Out Peripherals Radio module Bluetooth module Wi-Fi module

In active mode all modules in the architecture are powered on, causing power consumption between 240 mA and 790 mA. But in deep sleep mode, only the RTC and ULP coprocessor modules remain active, oﬀering a power consumption of about 10 µA [33]. Oxygen saturation and temperature monitoring are proposed every hour, therefore, only for about two seconds, the ESP32 is activated to conﬁgure the sensors, wireless communication, receive measurement data and send it to a cloud platform, which is discussed in the next section. This algorithm makes it possible to achieve reduced power consumption of the WIoT system. 4.3

IoT Layer

Once measurement data is obtained from pulse oximetry and temperature sensors, it is necessary to deﬁne technology and protocol communication for the WIoT device. Wi-Fi is the technology selected to communicate ESP32 with a Gateway. Since Wi-Fi is the most widely used wireless communication [27] Bluetooth and other technologies are not considered. Second, MQTT is proposed as a communication protocol to send measurement data through the selected technology. MQTT is a full-duplex Byte transmission protocol and its main advantages include low device processing, it is energy eﬃcient and it is compatible with other communication protocols such as ZigBee and UDP (User Datagram Protocol) [34]. The IoT platform receives data from the WIoT designed device to store and deploy it. One of the particular features of these platforms is a dashboard designer to customize stored data, providing various indicator or controller options, but storage capacity is limited by member subscriptions. To oﬀer support and security, almost all cloud platforms have paid plans, but in some cases they oﬀer a free subscription with limited features, which is beneﬁcial for the design and test stages of an IoT project. Ubidots is an IoT platform described as systems integration tools for selfbuild applications and services connected through multiple communication protocols, MQTT or HTTP for example. This platform allows to collect, analyze

12

´ R. Alvarez-Gonz´ alez et al.

and deploy data whit the PC or smartphone dashboard application. Regarding to security, Ubidots provides TLS encryption for the MQTT communication protocol to protect data when receiving or sending; therefore, this is the IoT platform selected for the WIoT model presented.

5

Results of Prototype Implementation

To implement the prototype model, the development board based on the ESP32WROOM-32D module, the MAX30100 module and the MAX30205 module was used. The pulse oximetry sensor is mounted on the designed gripper and wired with ESP32 board; the temperature sensor is attached with microporous tape to the wrist. In Fig. 5 the implementation of the prototype for the execution of tests is presented.

Fig. 5. Implementation of the WIoT prototype for the testing process

The design and implementation of the ﬁrmware was done through the Microsoft Visual Studio Code IDE, with the support of the integrated PlatformIO development tools. These resources allow to compile and deploy code to Espressif hardware resources. In the developed code, the Wi-Fi communication parameters are conﬁgured, and the same for linking Ubidots.

Wearable Internet of Things (IoT) Device Model Design

13

Once the WIoT system was implemented, the results deployed in the Ubidots dashboard are presented in Fig. 6. The latest data value and historical data are set on the dashboard for three physiological signs corresponding to body temperature (upper), oxygen saturation (middle), and heart rate (lower); heart rate value is provided by MAX30100 sensor. Also, all historical data can be presented individually with a download option.

Fig. 6. Dashboard application implemented in ubidots for the presentation of blood oxygen saturation, heart rate and body temperature data, sent from the proposed WIoT device

6

Evaluation and Discussion of the Model

After the implementation stage, the WIoT device was tested; the reference elements were digital thermometer and commercial pulse oximeter. The MAX30100 sensor was embedded in the gripper and tested, while the MAX30205 sensor was placed on the wrist. At the same time, oxygen saturation and body temperature were measured with commercial devices. In Fig. 7 comparative measurement data from MAX30100 sensor and commercial pulse oximeter are compared. The commercial pulse oximeter corresponds to the model 0010-20-00547 from Takumi Shop Rex [35]. Both devices provide oxygen saturation and heart rate measurements.

14

´ R. Alvarez-Gonz´ alez et al.

Fig. 7. Comparison of Oxygen Saturation (SpO2) and Heart Rate (HR) results obtained from the MAX30100 sensor and the commercial pulse oximeter, where the measurement was taken every 10 min

Comparison of SpO2 (blood oxygen saturation) measurement data shows that the MAX30100 sensor provides reliable results when monitoring is performed under optimal conditions; the hand should not move and should be on a stable surface horizontally. In addition, the designed gripper reduces ambient light, which produces noise and incorrect data during measurements. Regarding the heart rate reading, the results obtained show slight variations. This may be due to an over-tightening of the designed gripper that when exerting additional force, the reading obtained may show slight variations with respect to the commercial device used. However, heart rate data is not an essential sign during the transition from Covid-19 disease, but it does provide added value to the designed system. On the other hand, Fig. 8 shows the temperature comparison data of the MAX30205 sensor and the OMRON digital thermometer [36]. During the testing process, the MAX30205 sensor takes less time to measure temperature than the digital thermometer, because the commercial device takes around 5 min to provide a result, while the wrist-worn sensor takes less than one second to measure the temperature. The results demonstrate that MAX30205 sensor provides measurements with an expected similarity to the digital thermometer used, because Maxim Integrated presents this sensor as an excellent resource for medical devices. Both sensors have excellent characteristics and performance for a healthcare device, while the temperature module provides 0.1 ◦ C accuracy [26], the pulse oximetry sensor is designed with special circuitry to improve performance in medical applications [25].

Wearable Internet of Things (IoT) Device Model Design

15

Fig. 8. Comparison of body temperature results obtained from the MAX30205 Sensor and OMRON’s digital thermometer, where the measurement was taken every 10 min

Regarding to design costs, selected sensors meet the aﬀordability factor due to their low cost, with the MAX30100 purchased at USD 7.92 [37] and the MAX30205 at USD 13.80 [38]. Additionally, the ESP32 development board used in the prototype was purchased for USD 9.55 [39]. Compared to used commercial devices, the pulse oximeter was purchased at USD 36.44 [35] and the digital thermometer at USD 14.36 [36]. The total costs in both cases are USD 31.28 and USD 50.80, respectively. Thus, the cost-eﬀective factor provided by the Maxim Integrated sensors and ESP32 development board is demonstrated. However, if the proposed model were improved for the ﬁnal version, the manufacturing costs of PCBs, the use of welding and additional material as fabrics should be considered, but we would still oﬀer a competitive cost. The designed ﬁrmware is optimized to not overload the memory of the ESP32, and has additional features to detect loss of connection, failed communication with the IoT platform and local data storage when that happens. As for the IoT stage, the MQTT protocol is fully adapted to the WIoT design, the data sending is fast, and the same communication protocol is characterized by low power consumption when achieved. For this model, only the MQTT protocol was used for data communication. Ubidots optimizes the display and sending of data, and it is possible to consult information from the oﬃcial mobile application. Additionally, this IoT platform provides an event handler to automatically send text messages or emails when a customizable value is detected. Power consumption is not constant during performance as the ESP32 and sensors have sleep modes that are activated by software. The power consumption of the ESP32 through WiFi communication and all its internal modules reaches approximately 240 mA, but in deep sleep mode the current decreases to 10 µA. On the other hand, the MAX30100 sensor requires 600 µA during operation, but

16

´ R. Alvarez-Gonz´ alez et al.

in shutdown mode it only uses 0.7 µA. MAX30205 has the same behavior using 600 µA in active mode and 3.5 µA in shutdown mode. Therefore, the proposed WIoT device consumes approximately 241.2 mA to perform the monitoring, processing and sending of data over WiFi. This process takes around 2 s and then the entire system goes into sleep mode, in which the sensors go into shutdown mode and the ESP32 into deep sleep, reducing power consumption to 14.2 µA. As mentioned, the active mode is used only once in an hour and the monitoring process only takes 2 s, so in an hour the whole system consumes 148.1921 µAh (microampere per hour). The energy consumption information is crucial to design a supply system based on rechargeable batteries, in which the type of battery and the battery charge controller must be studied, but the energy consumption of the proposed device allows to select a battery without various restrictions. It is possible to use a lithium-ion polymer battery with a 60 mAh (milliamps per hour) charge that provides 3 mA in 20 h. The developed system would consume 2.96 mA in 20 h, for which the battery would provide an autonomy of the WIoT device of 20 h. By selecting a battery with a higher charge, the autonomy time is extended. Regarding the limitation of the implementation, the sensor modules beneﬁt the prototype implementation because no additional circuitry is required, but the size restrictions of the wearable devices are not met. Recommended architecture and ancillary devices are included in the sensor modules to improve sensor performance; however, these additional elements are soldered onto a PCB (Printed Circuit Board) and its implementation in a main PCB results in an increase in the size of the ﬁnal product. The PCB layout for the ﬁnal version of the proposed model must consider the additional circuits recommended by the manufacturers in the data sheets for the sensors and the ESP32 module.

7

Conclusions and Future Work

WIoT for Covid-19 healthcare model was designed in accordance with the main design factors related with this technology; such as scalability, aﬀordability, connectivity and energy eﬃciency were introduced in the design. Through the implementation of the prototype and evaluation of results, a reliable and eﬃcient model is presented for patients with Covid-19 to monitor and attend to the main physiological signs aﬀected by the disease. The model provides an autonomous system with the aim of preventing health complications through the communication of measurement data with the user and health experts, allowing the patient to be aware of their own health status. The design presented ensures low power consumption through shutdown functions of selected sensors and ESP32 deep sleep mode activation. This is an aﬀordable design with a focus on signal performance, power eﬃciency, data collection frequency, and data packet size. The proposed system is guided to a functional prototype in which the PCB design will be developed, considering an appropriate sensor location to ensure reliable measurement and

Wearable Internet of Things (IoT) Device Model Design

17

improve response time. The aim of developing this model is to provide an eﬃcient resource for medical research purposes in which data from various patients can be used for special analysis in medical treatments. The next step for the proposed model is the integration of the supply stage, based on battery charge controllers. In addition, a ﬁnal version of the prototype will undergo a rigorous testing process with multiple test subjects in order to improve the user experience and ergonomics.

References 1. Khan, S., Alam, M.: Wearable internet of things for personalized healthcare: study of trends and latent research. In: Patgiri, R., Biswas, A., Roy, P. (eds.) Health Informatics: A Computational Perspective in Healthcare. SCI, vol. 932, pp. 43–60. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-9735-0 3 2. Rose, K., Eldridge, S., Chapin, L.: La Internet de las Cosas-Una Breve Rese˜ na. Technical report (2015) 3. Chamola, V., Hassija, V., Gupta, V., Guizani, M.: A comprehensive review of the COVID-19 pandemic and the role of IoT, Drones, AI, Blockchain, and 5G in managing its impact. IEEE Access 8, 90225–90265 (2020) 4. Godfrey, A., Hetherington, V., Shum, H., Bonato, P., Lovell, N.H., Stuart, S.: From A to Z: wearable technology explained. Maturitas 113, 40–47 (2018) 5. World Health Organization.: Los nombres de la enfermedad por coronavirus (COVID-19) y del virus que la causa. WHO. https://www.who.int/es/ emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-thecoronavirus-disease-(covid-2019)-and-the-virus-that-causes-it. Accessed 03 Mar 2021 6. Nasajpour, M., Pouriyeh, S., Parizi, R., Dorodchi, M., Valero, M., Arabnia, H.: Internet of Things for current COVID-19 and future pandemics: an exploratory study. J. Healthcare Inform. Res. 4, 325–364 (2020). https://doi.org/10.1007/ s41666-020-00080-6 7. Krishnamurthi, R., Gopinathan, D., Kumar, A.: Wearable devices and COVID-19: state of the art, framework, and challenges. In: Al-Turjman, F., Devi, A., Nayyar, A. (eds.) Emerging Technologies for Battling Covid-19. SSDC, vol. 324, pp. 157– 180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60039-6 8 8. Masimo Home: Solutions for COVID-19 Surge Capacity Monitoring (2021). https://www.masimo.com/. Accessed 28 Dec 2021 9. VitalConnect: Home Patient Monitoring (2021). https://vitalconnect.com/. Accessed 28 Dec 2021 10. Shimmer Wearable Sensor Technology — Wireless IMU — ECG — EMG — GSR: Shimmer Wearable Sensor Technology. https://shimmersensing.com/. Accessed 28 Dec 2021 11. Oxitone: Medical Follow-Up Made Eﬀortless (2020). https://www.oxitone.com/. Accessed 28 Dec 2021 12. Hedayatipour A., McFarlane N.: Wearables for the next pandemic. Natl. Sci. Found. Grant 8(1816703), 184457–184473 (2020) 13. D´ıaz Castrill´ on, F.J., Toro Montoya, A.I.: SARS-CoV-2/COVID-19: el virus, la enfermedad y la pandemia. Medicina & Laboratorio 24(3), 183–205 (2020)

18

´ R. Alvarez-Gonz´ alez et al.

14. Instituto Nacional del C´ ancer de los Institutos Nacionales de la Salud de EE. UU.: Coronavirus: informaci´ on para las personas con c´ ancer, https://www.cancer. gov/espanol/cancer/coronavirus/coronavirus-informacion-personas-con-cancer. Accessed 12 Mar 2021 15. ONUSIDA: COVID-19 y VIH, ONU. http://onusidalac.org/1/index.php/internas/ item/2555-covid-19. Accessed 20 Mar 2021 16. Jiang, F., Deng, L., Zhang, L., Cai, Y., Cheung, C.W., Xia, Z.: Review of the clinical characteristics of coronavirus disease 2019 (COVID-19). Soc. Gen. Internal Med. 2020, 1545–1549 (2020) 17. P´erez Padilla, J.R., Thiri´ on Romero, I.I., Aguirre P´erez, T., Rodr´ıguez Llamazares, S.: Qu´e tan silenciosa es la hipoxemia en COVID-19. NCT Neumolog´ıa y Cirug´ıa de T´ orax 79(2) (2020) 18. Rahman, A., Tabassum, T., Araf, Y., Al Nahid, A., Ullah, M.A., Hosen, M.J.: Silent hypoxia in COVID-19: pathomechanism and possible management strategy. Mol. Biol. Rep. 48(4), 3863–3869 (2021). https://doi.org/10.1007/s11033-021-06358-1 19. Guti´errez Mu˜ noz, F.R.: Insuﬁciencia respiratoria aguda. Acta Med. Per. 27(4), 2010 (2010) 20. Dhadge, A., Tilekar, G.: Severity monitoring device for COVID-19 positive patients. In: 2020 3rd International Conference on Control and Robots (2020) 21. Andreoni, G., Barbieri, M., Colombo, B.: Developing Biomedical Devices: Design, Innovation and Protection. Springer, Cham (2014). https://doi.org/10.1007/9783-319-01207-0 22. Mej´ıa Salas, H., Mej´ıa Su´ arez, M.: Oximetr´ıa de pulso. Educaci´ on M´edica Continua 2(149), 149–155 (2012) 23. Bonﬁglio, A., De Rossi, D.: Wearable Monitoring Systems, 9 ed., pp. 3–4. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-7384-9 24. Tamura, T., Chen, W.: Seamless Healthcare Monitoring: Advancements in Wearable, Attachable and Invisible Devices. Springer, Cham (2018). https://doi.org/ 10.1007/978-3-319-69362-0 25. Channa, A., Popescu, N., Rehman Malik, N.: Managing COVID-19 global pandemic with high-tech consumer wearables: a comprehensive review. In: 2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, pp. 222–228 (2020) 26. Kumar, U.S., Kumar, S.N.: Internet of Things and Sensor Network for COVID-19. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7654-6 27. Balas, V., Kumar, S.V., Kumar, R., Rahman, A.A.: A Handbook of Internet of Things in Biomedical and Cyber Physical System, vol. 165. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23983-1 28. Electronics Hub: ESP32 vs ESP8266 - Which One To Choose? https://www. electronicshub.org/esp32-vs-esp8266/. Accessed 22 Sep 2021 29. Maxim Integrated: MAX30100- Pulse Oximeter and Heart-Rate Sensor IC for Wearable Health. Datasheet 19–7065; Rev 0; 9/14 (2014). https://datasheets. maximintegrated.com/en/ds/MAX30100.pdf 30. Maxim Integrated: MAX30205- Human Body Temperature Sensor. Datasheet 19–8505; Rev 0; 3/16 (2016). https://datasheets.maximintegrated.com/en/ds/ MAX30205.pdf 31. World Health Organization: Using the Pulse Oximeter. Tutorial 2 - Advanced (2011). https://www.who.int/patientsafety/safesurgery/pulse oximetry/who ps pulse oxymetry tutorial2 advanced en.pdf

Wearable Internet of Things (IoT) Device Model Design

19

32. Kumar, S.A., Thangavelu, A., Meenakshi, S.V.: Cognitive Computing for Big Data Systems Over IoT: Frameworks, Tools and Applications, vol. 14. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70688-7 33. Espressif Systems: ESP32-WROOM-32D & ESP32-WROOM-32U. Datasheet Version 2.2 (2021). https://www.espressif.com/sites/default/ﬁles/documentation/ esp32-wroom-32d esp32-wroom-32u datasheet en.pdf 34. La Marra, A., Martinelli, F., Mori, P., Rizos, A., Saracino, A.: Introducing usage control in MQTT. In: Katsikas, S.K., et al. (eds.) CyberICPS/SECPRE -2017. LNCS, vol. 10683, pp. 35–43. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-72817-9 3 35. REX Ox´ımetro de Pulso de Dedo 4 en 1, Monitor de saturaci´ on de ox´ıgeno en Sangre, medidor de frecuencia card´ıaca, frecuencia respiratoria y ´ındice de perfusi´ on sangu´ınea (Negro): Amazon.com.mx: Salud y Cuidado Personal: Takumi Shop Rex Ox´ımetro de Pulso de Dedo 4 en 1, Monitor de saturaci´ on de ox´ıgeno en Sangre, medidor de frecuencia card´ıaca, frecuencia respiratoria y ´ındice de perfusi´ on sangu´ınea. https://www.amazon.com.mx/gp/product/B08QRPXGDQ/ ref=ppx yo dt b asin title o00 s00?ie=UTF8&psc=1 (2021). Accessed 06 Feb 2021 36. Term´ ometro digital Oral Y Axilar Flexible Omron Mc-343f (2021). https:// articulo.mercadolibre.com.mx/MLM-785774798-termometrodigital-oral-y-axilarﬂexible-omron-mc-343f- JM. Accessed 06 Feb 2021 37. M´ odulo Sensor De Ritmo Card´ıaco Oxigeno Max30102 (2021). https://articulo. mercadolibre.com.mx/MLM-853642316-modulo-sensor-de-ritmo-cardiacooxigeno-max30102- JM. Accessed 15 Sep 2021 38. MCU-30205 MAX30205MTA M´ odulo de sensor de temperatura del cuerpo humano de alta precisi´ on de 16 bits 2.7V-3.3V para Arduino: Amazon.com.mx: Herramientas y Mejoras del Hogar (2021). https://www.amazon.com.mx/gp/product/ B09829SFTN/ref=ppx yo dt b asin title o07 s00?ie=UTF8&psc=1. Accessed 13 Sep 2021 39. Tarjeta De Desarrollo Esp32 Esp-32d M´ odulo Wiﬁ + Bluetooth (2021). https:// articulo.mercadolibre.com.mx/MLM-1319655921-tarjeta-de-desarrollo-esp32-esp32d-modulo-wiﬁ-bluetooth- JM. Accessed 03 Jul 2021

A Matching Mechanism for Provision of Housing to the Marginalized J. Ceasar Aguma(B) University of California Irvine, Irvine, CA, USA [email protected] https://jaguma.wixsite.com/j-ceasar-aguma

Abstract. During this pandemic, there have been unprecedented community and local government eﬀorts to slow down the spread of the coronavirus, and also to protect our local economies. One such eﬀort is California’s project Roomkey that provided emergency housing to over 2,000 vulnerable persons but fell short of the set goal of 15,000. It is projected that the homelessness problem will only get worse after the pandemic. With that in mind, we borrow from eﬀorts like project Roomkey and suggest a solution that looks to improve upon these eﬀorts to eﬃciently assign housing to the unhoused in our communities. The pandemic, together with the project Roomkey, shed light on the underlying supply demand mismatch that presents an opportunity for a matching mechanism solution to assigning housing options to the unhoused in a way that maximizes social welfare and minimizes susceptibility to strategic manipulation. Additionally, we argue that this automated solution would cut down on the amount of funding and personnel required for the assignment of housing to unhoused persons. Our solution is not intended to replace current solutions to homeless housing assignments but rather improve upon them. We can not postpone a proper solution to homelessness anymore, the time is now as the need for an eﬃcient solution is most dire.

Keywords: Matching markets Project roomkey

1

· Pareto optimality · Homelessness ·

Introduction

In this global pandemic, humanity as a collective has been awakened to what is most important to our uniﬁed survival. Now more than ever, we understand the signiﬁcance of a permanent shelter to call home. However, while many of us could stay indoors, and protect ourselves and our communities from the spread of the virus, those unhoused among us were, and still have been left vulnerable. The United States Department of Housing and Urban Development reported the homeless population to be over 500,000 across the nation [5]. Of these 500,000, Culhane et al. estimate the modal age to be between 50–55 in several cities [4]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 20–33, 2022. https://doi.org/10.1007/978-3-031-10467-1_2

A Matching Mechanism for Provision of Housing to the Marginalized

21

This happens to be the most COVID-19 vulnerable group as reported by the Center for Disease Control (CDC). In what further emphasize this vulnerability, a 2019 study found that 84% of the unhoused population self-reported to have pre-existing physical health conditions [10]. California and New York, states that have been gravely aﬀected by COVID-19, also have the largest unhoused populations. This summary does not even tell the global story which paints an even bleaker picture. Los Angeles and many other cities scrambled to provide temporary housing for the unhoused during the pandemic through tent cities and vacant hotel rooms [8]. However, most of these were either poorly assigned, as in the case of disabled persons, or left vacant because of the lack of an eﬃcient allocation procedure. This metropolitan eﬀort also leaves a few questions unanswered, for example, what happens after this pandemic? How many other people will be left unhoused? How many additional housing options will become available for low-income persons? To answer some of these questions and meet the need for a better housing assignment procedure, we propose a matching mechanism to improve the allocation of available housing to unhoused marginalized groups such as veterans and low-income families. In the background, we review LA county’s project Roomkey initiative and set the stage for a matching mechanism that could improve this initiative. 1.1

Background

Case Study: Project Roomkey: According to the LA county COVID-19 website, project Roomkey is “a collaborative eﬀort by the State, County, and the Los Angeles Homeless Services Authority (LAHSA) to secure hotel and motel rooms for vulnerable people experiencing homelessness. It provides a way for people who don’t have a home to stay inside to prevent the spread of COVID-19” [12]. Eligible persons, where eligibility is determined on the basis of vulnerability to COVID-19 and a reference from a local homeless shelter or law enforcement oﬃce, are assigned temporary housing in the form of hotel and motel rooms. The matching of eligible persons has been done by local homeless shelters that match the individuals to available hotel or motel rooms in their locality. Whether this is automated is unclear, but given the program’s failures, one would assume that the matching was NOT done by a central clearing house but rather arbitrarily without full knowledge of preferences and optimal matches. Furthermore, the program was not clear on how individuals would be moved to permanent or transitional housing when it closes. To quote the website, “while participants are staying at these hotels, on-site service providers are working with each client individually to develop an exit plan, with the goal of moving them to a situation that permanently resolves their homelessness. In cases where this isn’t feasible, LAHSA will use existing shelter capacity to move people into an interim housing environment or explore other options” [12]. The key part is “on-site service providers are working with each client individually,” which implies that the matches are not automated and were made depending on whatever information

22

J. C. Aguma

was locally-available to the on-site service provider. The LA Times has highlighted some failures in the project, for example, the project was slammed for discriminating against the elderly and disabled because, “the agency deliberately excluded those who cannot handle their own basic activities, such as going to the toilet or getting out of bed” [20]. The project’s leadership cited a lack of personnel and funding as the reason it did not succeed [19]. So clearly, a cheap and automated option for matching individuals to housing options is required. The project is now coming to an end after housing about 30% of the projected total. While the program has been reported as a failure, it allows one to imagine a real solution to homelessness in LA county and in fact, any metropolis. What the program showed is that there is room for a central matching mechanism that can help move persons from the streets into shelters and from shelters into permanent housing. What we will show below, is that this mechanism can be designed to be Pareto-optimal (assign every person their best possible option at the time of assignment), and strategy-proof (persons cannot do better by cheating in this mechanism). Given a lack of funding and personnel, we felt that an automated matching mechanism that is theoretically optimal would be a great solution. Further analysis of project Roomkey reveals a rich structure that reinforces the need for a matching mechanism. We will give a detailed look at this structure in a later section, and only a summary here. Because of state and federal mandated lockdowns, hotels and motels found themselves with large volumes of vacant rooms, an oversupply of sorts. In the same communities as the oversupplied hotel and motel rooms are the many unhoused folks that, due to different circumstances, can not aﬀord to access and pay for the vacant rooms but do demand shelter, more so in a pandemic with federal and state-mandated lockdowns. What we see here is an oversupply of a commodity/service, and an abundance of demand but the two sides are inaccessible to each other without the help of a third party like the local, state, or federal government. This third party is what we consider as the matching mechanism designer, something, we will argue, should have done better when matching unhoused folks to the vacant rooms. This text, therefore, intervenes at this point, to further highlight the structure of oversupply to handicapped demand, and calling for a simple but sophisticated matching mechanism that can navigate locality constraints that arose in the allocation of vacant rooms to unhoused persons. 1.2

Literature Review

This paper contributes to a well-established body of work on homelessness, matching markets applied towards social good and matching mechanisms specifically for housing assignment. Below we review a few key papers on the abovementioned research topics. While we highlight the need for a matching mechanism to mitigate homelessness in cities around the globe, there is a long history of scholars deploying matching or mechanism design towards eﬃcient housing solutions. We will summarize some relevant and notable works here.

A Matching Mechanism for Provision of Housing to the Marginalized

23

Theoretical scholars have been studying housing matching markets from as far back as 1974 when Shapley and Scarf put forth economic mechanism theory for the housing market with existing tenants and introduced the Gale Top Trading Cycles algorithm [16]. In 1979, Hylland and Zeckhauser set the foundation for a house allocation problem with new applicants, deﬁning a housing market core [9]. Abdulkadiroglu and Sonmez extend the work to a model with new and existing tenants [1]. We direct the reader to [2] for a more comprehensive review of matching markets theory. O’Flaherty goes beyond economic game theory to provide a full economic “theory of the housing market that includes homelessness and relates it to measurable phenomena” [13]. He later extends the work to answering “when and how operators of shelters should place homeless families in subsidized housing” [14] and also updates the economics of “homelessness under a dynamic stochastic framework in continuous time” [15]. Sharam gives a comprehensive breakdown of how matching markets have been applied towards the provision of new subsidized multifamily housing for low-income families in Australia [18]. Sharam also illustrates ways in which the use of digital platforms for matching could help improve the optimality of matching in housing assistance [17]. To the best of our knowledge, [18] and [17] are the only texts that explore the use of a matching mechanism for the provision of low-income housing. Sharam, however, does not extend the work to marginalized groups and only considers the Australian eﬀort whereas we look to create a mechanism that not only looks at low-income multifamilies but all unhoused persons. Because of the overwhelmingly unstable labor and the housing market at the present, Hanratty’s work on the impact of local economic conditions on homelessness using Housing and Urban Development(HUD) data from 2007– 2014 [7] is also very relevant to our research. Mansur et al., which examines policies to reduce homelessness, will be useful for the future work in this research that concerns itself with policy recommendation [11]. 1.3

Our Contribution

Building from all this past work and the unique structure of the project Roomkey, we provide a matching mechanism for better interim and/or permanent housing assignments for unhoused individuals. This mechanism is derived from those explored in [9,16], and [1]. We show that this mechanism is also Pareto-optimal and strategy-proof. We also present a clear picture of that unique structure underlying project Roomkey and invite scholars to pay attention to other areas where this structure presents itself, for example, the food industry.

2 2.1

Model Problem Formulation

Consider a metropolis where n agents, which would be persons without permanent housing, ranging from multifamilies to single individuals entered in a shelter

24

J. C. Aguma

or veteran aﬀairs database, looking to transition to better housing options. Let us further assume a collection of available housing options m in many forms: low rent apartments, vacant motels or hotels, tent cities, or group homes (many housing options of this kind have been acquired or created by local governments during the COVID-19 lockdown). A person i has a preference list (πi (j)) on the housing options which is derived from their individual preferences on size, location, cost, accessibility, and many others. We assume that there is no preference list on the persons as that could open the model up to circumstantial bias. However, we assume that there exists a priority ranking, R on the agents on basis of factors like family size, health risk like the current COVID-19 risk for elders, time spent waiting for a housing assignment. How R is determined is left up to the decision maker like the Veteran Aﬀairs oﬃce or city oﬃcials. (In fact, most shelter or housing authorities almost always have such priority lists already. For example for COVID-19 emergency housing, persons under the highest COVID-19 risk were given top priority.) Research Goal. The goal then is to design a mechanism that matches the n persons to the m housing options according the preference lists and priority ranking while considering that n = m, n > m, or n < m. Because we intend to implement this model with local policy makers, we have the additional goals of minimizing cost of implementation and personnel required. With a model deﬁned and a research goal speciﬁed, the next section details how this goal is attained using a simple matching algorithm. 2.2

The Matching Mechanism

We have established that there three cases expected in this matching problem. We will give an algorithm for each one of these here, starting with the most straightforward case where m = n, then explain how simple modiﬁcation can help tackle the other two cases.

Algorithm 1: A Matching Algorithm for Assigning Housing to the Marginalized when m = n Organize agents in some priority queue in descending order (ties are broken randomly) for Each agent in the queue do 1. Assign them the best housing option currently available according to their preference list 2. Terminate when queue is empty

Quick inspection will reveal from in the literature of matching markets, this algorithm is in fact serial dictatorship with a ﬁxed priority queue in place of

A Matching Mechanism for Provision of Housing to the Marginalized

25

a random order on agents. Like random serial dictatorship, this algorithm is Pareto-optimal and strategy proof. For completeness, proofs for both will be given here.

3

Analysis

For the case of n > m, the last n−m agents in the priority queue simply maintain their current housing options. So, in a sense, we tackle this case the same way we go about the m = n case. This is also true for m > n, where the m − n least preferred housing options are simply left unassigned. A deﬁnition and proof for Pareto optimality follows. 3.1

Pareto Optimality

As a precursor to the proof, we provide a deﬁnition of Pareto Optimality. Given n users and n resources, an assignment X = (x1 , x2 , ........, xn−1 , xn ) is Pareto optimal, if it is not Pareto dominated by any other assignment X = (x1 , x2 , ........, xn−1 , xn ). Assignment X Pareto dominates X if for each user i; xi xi with at least one user j for whom ; xj xj Theorem 1. The simple Algorithm 1 above is Pareto optimal in all three cases; m < n, m = n, and m > n. Proof. Let us assume that X is not Pareto-optimal. This means X is dominated by another matching assignment X in which at least one agent j must have a better and diﬀerent allocation a. But we know that X assigns every agent their best available preference at the time of assignment. So if j indeed has a better assignment in X , this would mean that an agent i (who got the assignment a in X) earlier in the priority queue also has a diﬀerent assignment in X . Observe that agent i’s assignment is either worse in X or must be an assignment that was awarded to another agent earlier in the priority queue in X. One can follow this cycle until at least one agent gets a worse oﬀ assignment in X . This presents a contradiction because now we see that either j does better and another agent does worse in X or j themselves gets a diﬀerent option that is not their best available option, in which case they would do worse in X . Therefore, it is impossible that X dominates X. To illustrate this better, we provide a few examples below.

26

J. C. Aguma

Example 1: m = n Given an agent set i, j, k, and housing options a, b, c. With a priority queue: i − j − k and preference lists: i:abc j:bca

(1) (2)

k:cab

(3)

Our algorithm would assign housing options as follows, xi = a, xj = b, xk = c. All three agents would get their best options and so any other algorithm must either produce the same assignment or at least one agent would be worse oﬀ. if we altered the preference lists to: i:abc j:acb

(4) (5)

k:cab

(6)

Our algorithm would assign housing options as follows, xi = a, xj = c, xk = b. Observe that in this case j and k do not get their best possible assignment but get the best available assignments. If another algorithm gave j housing option a, then i must get a diﬀerent assignment and hence be worse oﬀ. We ask the reader to try out diﬀerent permutations of the preference lists and check to see that in each, no other assignment would dominate that produced by algorithm 1 given in this text. Example 1: m < n For the case where, m < n, the same algorithm is employed but this time the n − m remaining agents in the priority queue simply maintain their existing housing options or, more harshly put, do not get a new housing option. Consider an agent set i, j, k, l, and housing options a, b, c. With a priority queue: i − j − k − l and preference lists: i:abc

(7)

j:bca k:cab

(8) (9)

l:acb

(10)

Our algorithm would assign housing options as follows, xi = a, xj = b, xk = c, xl = N one. Any algorithm that gives l any of the options a, b, c would leave another agent worse oﬀ, unless the number of housing options increased. 3.2

Strategy Proofness

A matching mechanism is strategy-proof if truth telling is a utility-maximizing strategy, that is, the only way an agent can be guaranteed to get their best possible assignment is if they report true information. Theorem 2. Algorithm 1 is strategy-proof because the only way an agent gets their best option is by picking it in their turn.

A Matching Mechanism for Provision of Housing to the Marginalized

27

Proof. If we assume that the priority queue is out of the agents’ control and decided by a third party like a policy-making entity or shelter management according to some standard criteria like period of time spent waiting for a housing option, then it’s easy to see that agents can not cheat by misreporting their preferences. The only way an agents their best option is if this option is correctly placed in their preference order and is available when their assignment turn comes. A quick example: let us assume we have three agents i, j, k with that exact order in the priority queue, i.e.; i − j − k. With housing options a, b, c, lets also assume their true preferences are as follows: i:cab

(11)

j:cba k:bac

(12) (13)

The algorithm would assign housing options as follows: xi = c, xj = b, xk = a (The reader can check for Pareto optimality). But if agent j misreports their preferences as j : a b c, they would get a when their best option c was available. In fact, if j alters their preference list in any way, they would get most likely miss out on getting their best option. We, therefore, can say this algorithm 1 is strategy-proof.

4

Project RoomKey Revisited

In this section, we will investigate how the algorithm proposed by this text compares to the current algorithm employed by LAHSA for assigning housing under project Roomkey. Of course, LAHSA does not oﬃcially call their procedure an algorithm or have a clear outline of steps taken in assigning housing options. We had to read through their program policies and procedures [3] and decipher some outline of the implicit algorithm they use for the assignments. Below, we will present preliminaries to that algorithm including the nontrivial structure that allowed for the proposal of project Roomkey, the project’s priority criteria, the algorithm itself, and an analysis of it. 4.1

The over Supply, Low-Income Demand Picture

As a precursor to further evaluation of project Roomkey, we would like to highlight the unique structure that rendered project Roomkey necessary. However, this is simply an introduction of this structure, more elaborate treatment of the structure will be done in future work. This structure is nevertheless crucial to understanding the matching problem that made project Roomkey necessary. Figure 1 illustrates the two sides that create the matching scenario. We have a producer that has an oversupply of a commodity or service, and because of the oversupply, the commodity/service is of nearly zero value to them. The

28

J. C. Aguma

Fig. 1. The Unique ‘unmatched’ Structure

pandemic created this situation for hotel and motel owners who suddenly had an abundance of rooms, that in many places around the world, were left unused. Adjacent to this is the demand that by diﬀerent circumstances, is rendered unable to access the oversupplied commodity/service. Circumstances like low to zero income to purchase a commodity that the producer would rather waste than avail for cheap or charity. In many cities around the world, unhoused persons demand these rooms but can not access them because there is rarely a third party (or producer) willing to incur the cost of redistribution. This is the scenario that created the vacuum for project Roomkey to ﬁll. With California state and local governments stepping in to incur the cost of redistribution, this matching of oversupplied commodities/services to handicapped demand happens. The one step left to reconcile then is, how to eﬃciently match the vacant rooms to the unhoused folks. Below, we will compare project Roomkey’s matching procedure to the one proposed by this text, in the context of Pareto optimality and strategyProofness. 4.2

Project Roomkey Housing Assignment

From [3], given a n unhoused persons and m housing options distributed among diﬀerent homeless service provides, the algorithm for assignment is as follows:

A Matching Mechanism for Provision of Housing to the Marginalized

29

Algorithm 2: The Assignment Procedure Employed by LAHSA under Project Roomkey Organize agents in some priority list for Each eligible agent on the list do – Assign agent to a local homeless service provider – The local homeless service provider assigns the agent a housing option according to their needs

Eligibility and priority for assignment of interim housing under project Roomkey are determined by “high-risk proﬁle for COVID-19” [3]. According to LAHSA, high-risk is deﬁned or determined by age, chronic health condition, COVID-19 asymptomatic condition, persons currently staying in congregate facilities. A priority list is generated from the above criteria [3]. The immediate red ﬂag from this algorithm is that preferences and assignment are restricted by locality from the fact that the m housing options are distributed among local homeless service providers. Better options according to one’s preferences could exist through another local service provider but they would never be available to this individual. We will do a deeper analysis of the above algorithm (check for Pareto optimality and strategy proofness) next. 4.3

Pareto Optimality

In the proof for why algorithm 1 is Pareto optimal, we showed two examples where the algorithm always ﬁnds an assignment that can not be improved without making any agent worse oﬀ, we then dared the reader to ﬁnd an example that proves otherwise. Here we will show an example in which algorithm 1 dominates LAHSA’s algorithm 2. This is suﬃcient to prove that algorithm 2 is not Pareto optimal. Example: Two Homeless Service Providers. We will assume that the LAHSA has two local homeless service providers in diﬀerent localities under the project Roomkey. Homeless service provider, P has housing options, a, b, c available. While homeless service provider, Q has housing options, x, z. We additionally assume an eligible person i seeking a housing option with the following preference list generated from their needs: i:zbcx Under algorithm 2, we have two possible outcomes that depend on whether the LAHSA sends i to P or Q.

30

J. C. Aguma

– If P , then i will most likely be assigned housing option b. – And if Q, then i probably gets their most preferred option z. Observe that because of locality, i could be assigned to an option that does not best ﬁt their needs. This means that there exists cases where algorithm 2 can be dominated by another algorithm that can guarantee a better housing option to i. One such algorithm is algorithm 1 where we would have all the available housing options a, b, c, x, z in one database. We would then assign i their most preferred housing option z. Algorithm 1 clearly Pareto dominates algorithm 2 in this example. As a counter-example, one could ask, what if there was another person j in the locality of Q that also preferred Z? Assigning z to i would sure leave j worse oﬀ. Let us set up this example and see why this is not a valid counter-example. Example: Two Homeless Service Providers and Two Eligible Persons. We will assume that the LAHSA has two homeless service providers in diﬀerent localities under the project Roomkey. Homeless service provider, P has housing options, a, b, c available. While homeless service provider, Q has housing options, x, z. We additionally assume two eligible persons i&j seeking housing with the following preference list generated from their needs: i:zbcx j:zacb

(14) (15)

Under algorithm 2, we assume that LAHSA sends both to the homeless service provider of their respective locality, that is, i to P and j to Q, j would then be assigned their most preferred option z but i would get b. However, if i has higher priority, then we see that algorithm 2 would never properly honor that priority while algorithm 1 would rightfully assign z to i and a to j which are their best possible outcome given a descending priority ordering of i − j. 4.4

Strategy Proofness

Example 4.3.2 demonstrates that it would be possible for someone to get a better housing option by simply misreporting their locality to LAHSA. We present that example again here, with a few changes, as a proof that algorithm 2 is not strategy proof. Example: Two Homeless Service Providers and Two Eligible Persons Revisited. We will assume that the LAHSA has two homeless service providers in diﬀerent localities under the project Roomkey. Homeless service provider, P has housing options, a, b, c available. While homeless service provider, Q has housing options, x, z. We additionally assume two eligible persons i&j in the

A Matching Mechanism for Provision of Housing to the Marginalized

31

localities of P &Q respectively, seeking housing with the following preference list generated from their needs: i:azcx j:axcb

(16) (17)

Under algorithm 2, If j misreported their locality, they would have a shot at getting their most preferred option a, but if the priority queue of i−j is followed, they would end up with x. Observe that under algorithm 1, it would not matter if j misreports or not, they would get option x either way while i will always get option a (which would be their rightful assignments according to the priority queue). This simple example shows us that, indeed, persons can improve their chances by misreporting their preferences and locality in algorithm 2, which could leave them worse oﬀ whereas algorithm 1 protects against such incidents. 4.5

Locality Expansion

Locality is a key component in the comparison of our matching scheme to that employed by project Roomkey. And by locality, we mean the area considered when assigning housing option to new persons. If U (p) is the expected utility for an individual k for an allocation from the m housing options, with a utility u(xi ) and probability pi for each housing option, we get the following deﬁnition. U (p) =

m

u(xi )pi

i=1

It is easy to see that the expected utility for any individual i is non-decreasing with increase in size of the locality or the number of housing providers as long as all the utilities are non-negative.

5 5.1

Conclusion Discussion

For several decades now, matching mechanisms have been deployed, sometimes invincibly, to solve economic and social problems with unprecedented eﬃciency. Most notably, the kidney transplant matching algorithm was key to saving 39,000 lives in 2019 alone [6]. Like past matching mechanism solutions to social problems, the proposed mechanism promises to solve an age-old, complex problem more eﬃciently. With a Pareto optimal algorithm, we have conﬁdence that this solution would be fair and consequentially improve social welfare without being susceptible to unfair strategies from those trying to cheat their way in. Automating the assignment of housing options, which is currently done oneon-one by local service providers, should help reduce the amount of personnel required by undertakings like LA county’s project Roomkey that cited a lack of

32

J. C. Aguma

suﬃcient personnel as a hindrance to its success. Besides personnel, we speculate that automation would also render other parts of the current system obsolete therefore resulting in a reduction of cost. This, too, would tackle another hindrance cited by LA county, that is, a lack of suﬃcient funding. Both these advantage coming in addition to faster and more ﬁtting assignments for all categories of unhoused persons including those left out (like the disabled with project Roomkey) by current assignment procedures. Of course the matching algorithm alone can not ﬁx homelessness and has to be supplemented by already-existing programs for job placement, drug addiction rehabilitation, domestic violence prevention and recovery programs, health care provision, among others. We do not propose this mechanism as a overhauling solution but rather as a more eﬃcient piece to be plugged into the vast eﬀort to end homelessness. 5.2

Future Work

We hope to obtain the support of many policy-makers and homeless service providers from diﬀerent cities, support in the form of homelessness data, for example data from the recent project Roomkey eﬀort, and a clear outline of the current housing assignment procedure. This would allow for numerical investigation on the eﬀectiveness of this matching mechanism on real world data. We also intend to go beyond research and actually work with the same city policy makers and homeless service providers in implementing this algorithm in the ﬁeld. In particular, we are seeking a collaboration with LA county to make this matching mechanism a part of future project Roomkey and Homekey eﬀorts. The unique structure that we presented in Sect. 5 will also be a subject of future work as we look to understand it’s mathematical implication and where else we see it in the modern socio-economic societies. One quick example as mentioned earlier is the Food and donations industry. Acknowledgment. Dr. Victoria Basolo (UCI), Dr. Amelia Regan (UCI), Samantha A. Carter (UCI), and Sophia G. Bardetti (Middlebury College), thank you all for your wise insight.

References 1. Abdulkadiroˇ glu, A., S¨ onmez, T.: Random serial dictatorship and the core from random endowments in house allocation problems. Econometrica 66(3), 689–701 (1998) 2. Abdulkadiroˇ glu, A., S¨ onmez, T.: Matching markets: theory and practice. Adv. Econ. Econometr. 1, 3–47 (2013) 3. Los Angeles Housing Services Authority. Project roomkey interim housing program policies and procedures (2021) 4. Kuhn, R., Culhane, D.P., Treglia, D., Steif, K., Byrne, T.: Estimated emergency and observational/quarantine capacity need for the us homeless population related to COVID-19 exposure by county; projected hospitalizations, intensive care units and mortality, April 3 2020

A Matching Mechanism for Provision of Housing to the Marginalized

33

5. United States Department for Housing and Urban Development. Homelessness data exchange (hdx), versions 1.0 and 2.0:point-in-time count and housing inventory count (2019) 6. United Network for Organ Sharing. Transplant trends (2020) 7. Hanratty, M.: Do local economic conditions aﬀect homelessness? impact of area housing market factors, unemployment, and poverty on community homeless rates. Hous. Policy Debate 27(4), 640–655 (2017) 8. Gale Holland. L.a. is getting a government-run tent city. all it took was 40 years and a pandemic, Apr 14 2020. Copyright - Copyright Los Angeles Times Communications LLC Apr 14, 2020; Last updated - 15 April 2020 9. Hylland, A., Zeckhauser, R.: The eﬃcient allocation of individuals to positions. J. Polit. Econ. 87(2), 293–314 (1979) 10. Hess, N., Rountree, J., Lyke, A.: Health conditions among unsheltered adults in the U.S, October 6 2019 11. Mansur, E.T., Quigley, J.M., Raphael, S., Smolensky, E.: Examining policies to reduce homelessness using a general equilibrium model of the housing market. J. Urban Econ. 52(2), 316–340 (2002) 12. County of Los Angeles. Covid-19: Project room key (2020) 13. O’Flaherty, B.: An economic theory of homelessness and housing. J. Hous. Econ. 4(1), 13–49 (1995) 14. O’Flaherty, B.: When should homeless families get subsidized apartments? a theoretical inquiry. J. Hous. Econ. 18(2), 69–80 (2009) 15. O’Flaherty, B.: Individual homelessness: entries, exits, and policy. J. Hous. Econ. 21(2), 77–100 (2012) 16. Shapley, L., Scarf, H.: On cores and indivisibility. J. Math. Econ. 1(1), 23–37 (1974) 17. Sharam, A., Byford, M., McNelis, S., Karabay, B., Burke, T.: Matching markets in housing and housing assistance, November 2018 18. Sharam, A.G.: Disruption and the matching market for new multifamily housing in Melbourne, Australia. J. Gen. Manag. 44(3), 160–169 (2019) 19. Smith, D., Oreskes, B.: Checkout time for project roomkey; program to house homeless people in hotels is ending after falling short of goal, Sep 23 2020. Name Los Angeles Times; Copyright - Copyright Los Angeles Times Sep 23, 2020; Last updated 23 September 2020 20. Smith, D., Oreskes, B.: City & state; critics slam hotel program for homeless; project roomkey discriminates against elderly and disabled, their advocates say, Aug 16 2020. Copyright - Copyright Los Angeles Times Aug 16, 2020; Last updated 16 September 2020

Speed Harmonisation Strategy for Human-Driven and Autonomous Vehicles Co-existence Ekene Frank Ozioko1(B) , Julian Kunkel2 , and Fredric Stahl3 1

2 3

Department of Computer Science, University of Reading, Reading, UK [email protected] Department of Computer Science, University of G¨ ottingen, G¨ ottingen, Germany [email protected] German Research Center for Artiﬁcial Intelligence, GmbH (DFKI), Laboratory Niedersachsen, Marine Perception, 26129 Oldenburg, Germany [email protected]

Abstract. Autonomous vehicle emergence with the potential to improve the traﬃc system eﬃciency and user comfort have made the co-existence of human-driven and autonomous vehicles inevitable in the near future. The diﬀerent vehicle type co-existence has facilitated vehicle speed harmonisation to enhance traﬃc ﬂow eﬃciency and prevent vehicle collision risk on the road. To a large extent, speed control and supervision of mix-traﬃc behaviours will go a long way to ameliorate the concerns envisaged in the autonomous vehicle integration process. A model predictive control-based autonomous vehicle speed adjustment technique with safe distance is developed to optimise the ﬂow of mixed vehicles based on estimated driving behaviour. The main contribution of this work is employing the autonomous vehicle speed adjustment to the existing car-following model in mixed traﬃc. A mixed-traﬃc simulator is developed to test the proposed method in a car following model using a merging road to quantify the beneﬁt of the proposed speed control strategy. The proposed simulation model is validated, and experiments are conducted with varying traﬃc intersection control strategies and vehicle type proportions. The obtained results demonstrate that the speed adjustment strategy has about 18.2% performance margin. Keywords: Reservation Node (RN) · Traﬃc Light (TL) · Car-following model · Speed harmonisation · Mix-traﬃc · Vehicle cooperation level · Intersection capacity utilisation

Paper Structure This work is structured to provide a summary of the contributions made by the contents of each section of the research paper. This paragraph provides readers with a guide in understanding the organisation and relationships between the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 34–66, 2022. https://doi.org/10.1007/978-3-031-10467-1_3

Speed Harmonisation Strategy

35

work sections. This paper is organised as follows: Sect. 1 covers the general background introduction to the mix-traﬃc with the state-of-the-art traﬃc management strategies, emphasising the research problem and goals. Within this same section, we have the motivation for the research, research questions, research aims and objectives, and the contribution to knowledge. Section 2 presents an overview of state of the art in traﬃc management strategies and control architecture. These reviews mainly involve traﬃc management schemes involving human-driven and autonomous vehicles (mixed-traﬃc environments). The proposed mixed-traﬃc solution, covering the research framework, research design methods, and strategy for speed harmonisation in mixed-traﬃc, is presented in Sect. 3. The detailed experiment conducted, results obtained with validation of data, and discussions are presented in Sect. 4. Section 8 presents an n allembracing conclusion of the work and some encountered challenges with suggestions for future research work on mix-traﬃc integration.

1

Introduction

The control and optimisation of mix-traﬃc ﬂow at road intersections are crucial as a baseline for the autonomous vehicle integration process. Based on the emergence of autonomous cars, mixed traﬃc problems have attracted researchers to develop many related technologies to ﬁnd solutions associated with the autonomous vehicle integration process. Recently, autonomous vehicles have been looked on as an alternative way to solve road traﬃc problems. Autonomous vehicles have the potential to share car movement parameter information or with a central controller in real-time. This information-sharing feature makes it possible to predict its velocities in managing traﬃc at the intersection while human-driven vehicles use traﬃc signals with the associated stochastic drivers’ behaviour. The human drivers’ behaviour is unpredictable and associated with a delay in making a driving decision, and autonomous vehicles are in the sink with intelligent transportation Fig. 1 systems where cars sense the environment via sensors and take the best decision in real-time to avoid collision or accidents. According to [2], the capacity of the road can be increased with the increase of the cooperation level between vehicles when their behaviours are homogeneous. This makes the study of traﬃc mix more complex considering the underlining diﬀerence in the behaviour of the two cars category of vehicles. Moreover, the simulation results from the survey by [22] show that from mixing automated (AVs) and human-driven (or manually-controlled vehicles), the road capacity can be increased by 2.5 times when the percentage of automated vehicles is more than 70%. Also, the works of [9,20,50] show that vehicles forming a platoon can improve the stability and eﬃciency of the traﬃc ﬂow. Developing a mix-traﬃc ﬂow model is the ﬁrst step towards shaping a more sophisticated traﬃc management strategy to midwife the transition period seamlessly. The proposed mix-traﬃc management strategy aims to use the intervehicle distance to make a judgement of vehicle position and predict its movement. This strategy follows the safe distance model to keep each car at a safe dis-

36

E. F. Ozioko et al.

tance away from each other (the safe distance is dependent on car type). Besides this, the vehicles also check for nodes (where the Road-Vehicle Communication comes into play) on the roads and how far away each car is from its reference node position. In accessing the intersection, our method uses a ﬁrst-in-F-ﬁrst out policy approach; the right of way is assigned based on the vehicle type and the car nearest to a merging node. An analysis from the vehicle evolution and behavioural pattern indicates that vehicles driven by a human being are more aggressive in behaviour and has an associated delay in responding to the carfollowing or at merging situation. Figure 2 model is deﬁned by using a T-junction with merging and priority road to simulate the mix-traﬃc ﬂow meticulously and realistically and review the impact of our strategy on the alternatives. It is common knowledge that HV’s are made up of radical drivers who usually exhibit aggressive behaviours when they are in contact with AVs. This design assigned HV the priority to access and get rid of AVs by forcing the AVs to stop and give them the right of way rather than waiting or following them from behind. Inter-vehicle distance is usually considered at junctions where a minor street (non-priority road) intersects a major highway (priority road). If a priority road vehicle has just arrived at the intersection, it may clear the intersection while rolling; otherwise, it starts the movement from rest depending on the car type. Human drivers intending to perform merging maneuvers are presented with a space between vehicles in a conﬂicting traﬃc movement. The pattern of arrivals of the major street vehicles creates varying time gaps of diﬀerent values subject to when the vehicle mix is involved. From the research of [13], the distance between the rear bumper of the ﬁrst vehicle and the front bumper of the following vehicle and is usually measured in seconds and is called inter-vehicle space. This space is the time interval between the arrivals of vehicles at a stop line of the non-priority road and the arrival of the ﬁrst vehicle at the priority road. The earlier study by [3] shows that modelling delays for homogeneous traﬃc show a linear relationship with the same type of vehicle. This may be caused by the reduction in the number of available inter-vehicle spaces because of uniformity in vehicle behaviours. There is a signiﬁcant increase in the occupation time of low-priority movements. However, such linear models will not be suitable for mixed-traﬃc co-existence and non-uniform car behaviour, leading to traﬃc collisions. According to [46], intersection capacity is generally analysed either by the regression method or gap-acceptance method. The Gap-acceptance method is the widely used method in most of the countries in their intersection capacity manual. However, in earlier studies, it was reported that the gap-acceptance method has a few drawbacks. The gap-acceptance strategy cannot apply to the traﬃc streams which do not comply with the uniform car behavioural pattern. The gap-acceptance theory fails when a mixed behaviour of aggressive and gentle cars co-exist.

Speed Harmonisation Strategy

37

Fig. 1. A 4-way intelligent road intersection with double lanes

Contribution Computing and Computation to Knowledge: This work builds on the existing approaches by employing 1-dimensional homogeneous traﬃc control strategies into a 2-dimensional complex traﬃc behaviour of the mix-traﬃc environment. The work combined both traﬃc lights and vehiclevehicle/road infrastructure communications for controlling the human-driven vehicle and autonomous vehicles, respectively, at a road intersection. However, the main areas of the contributions are as follows: – Investigating the complicated behaviour involved in a mix of AV and HV to fastract autonomous vehicle integration process. – Conduct as many simulated experiments to develop optimal data as a basis for the AV and HV integration process. – Development of a 2-D traﬃc model with the concept of car-following models to comprehensively simulate both lateral and longitudinal mix-traﬃc behaviour. – Modelling driving behaviour with vehicle-type-contingency and human psychological driving characteristics. – The method of adjusting the distance headway improves the performance of a human-driven vehicle. This makes the HVs yield to much smoother trajectories. – A method for speed harmonisation algorithm for a traﬃc-mix setting. – A centralised traﬃc control method that controls both AV and HV using one control unit with diﬀerent proportion of vehicle types to serve as an integration of autonomous vehicle pattern.

38

2

E. F. Ozioko et al.

Review of the State of the Art

With traﬃc ﬂow parameters, the behavioural pattern of vehicles could be evaluated reasonably to suggest that the co-existence of human and autonomous vehicles is possible. The scenario of mixing diﬀerent vehicle type behaviours in a single traﬃc ﬂow model brings a mountain of complex variables into consideration. The co-existence of traﬃcs lies between the elements of human and machine co-existence. In a mixed traﬃc ﬂow at a road intersection, each vehicle type are expected to behave in line with its default design, maintain behavioural deviation and the essential objectives of traﬃcs, which is to safely reach its target destination or goal in the shortest possible time. Mix-traﬃc ﬂow management creates room for vehicle co-existing by negotiating with other vehicle types and traﬃc participants who have a diﬀerent behavioural pattern based on agreed set down rule to avoid a collision. The platooning model of traﬃc management strategy is used to optimise the traﬃc ﬂow. Besides, the process of varying the safe distance between autonomous and human-driven vehicles is also deployed to enhance the optimal eﬃciency of the traﬃc. [53] proposed a real-time cooperative eco-driving scheme for AV and HV mixed-vehicles using platoon. According to [53], the lead vehicle receives timing and phase signal information through (V2I) communication, while the preceding vehicle on the reference platoon communicates via V2V. Generally, mixed traﬃc is mainly comprised of road users, which include vehicles, pedestrians, and cyclists. However, the vehicles involved in [53] generally were termed “homogeneous” traﬃc, but the vehicles have a wide variation in their static and dynamic characteristics. The vehicle type for the purpose of this research is human-driven (HV) and autonomous vehicles (AV). The vehicles share the same right-of-way, resulting in a jumbled traﬃc ﬂow. The main distinguishing characteristic of this mixed vehicle is based on their driving behaviour and the means of communication among vehicles and road infrastructure. These driving characteristics resulted in a wide variation in behaviours of the vehicles, which makes the mix-traﬃc management more complicated. The emergence of autonomous vehicles has witnessed a growing demand for mixed traﬃc research for the autonomous vehicle integration process. The design concepts of managing human and autonomous vehicles are because of the difﬁculties and problematic areas of human-machine interaction and the theoretical context of mental modelling. In contrast, the traﬃc-mix model seeks to use existing homogeneous traﬃc management techniques to manage a heterogeneous traﬃc system. The approach ﬁrst made use of the traﬃc ﬂow models presented in Fig. 3, using the relative distance in a car-following model and compared it with the alternative strategy. The core problem in mix-traﬃc modelling, is the case of modelling the driver’s behaviour. The driving behaviour model predicts drivers’ intent, vehicle and driver’s state, and environmental inﬂuence, to enhance eﬃciency in driving experience [1]. [52], deﬁne “a driving behaviour is aggressive if it is deliberate, likely to increase the risk of collision and is motivated by impatience, annoyance, hostility and an attempt to save time.” non-observance to successfully model drivers’ behaviour is a critical diﬃculty in modelling microscopic traﬃc ﬂows.

Speed Harmonisation Strategy

39

Most drivers’ behavioumodelsel currently uses estimates. Modelling drivers’ behaviours predict human drivers’ psychological behaviour, ranging from driver state, driver intention, vehicle, and environmental inﬂuence to enhance traﬃc safety and societal well-being. It involves the design and analysis of drivers’ psychological and behavioural characteristics to predict their capabilities in traﬃc and make an eﬀort to acknowledge and emphatically increase traﬃc throughput. This provides an informed understanding of traﬃc and has the prospect to improve driving behaviour, supporting safer and eﬃcient driving. Driver’s behaviour model is capable of generating a classiﬁcation that characterises the diﬀerent proﬁle levels of drivers’ aggressiveness. According to [15], drivers’ behaviour impacts traﬃc security, safety and eﬃciency, better understanding, and potentially improves driver behaviour. Attaining a driving task is a mobility goal while avoiding obstacles and collisions on the roadway. Aside from the mobility target, there are several secondary goals, one of which has sparked a long-running debate about drivers’ psychological behaviour when driving to their destination. For a vehicle to get to its destination, there is much decisionmaking based on feedback. [18] considered driving behaviour with regards to the diﬃculty of the driving task and the risk of collision. The work of [18] classiﬁed driving risk into three main components: risk is measured in three ways: quantitative risk, subjective risk assessment, and risk perception. The most important aspects of the driving role were avoiding potential adverse eﬀects of risky driving and maintaining a high level of safety. Also, the work of [31,43] proposed drivers maintain safety margins to change their speed to cope more eﬃciently with any danger or possible diﬃculty along the lane. The advent of the autonomous vehicle has moved the role of human drivers from active control operation to a passive supervising role [7]. A closer look at the modern road vehicles, one will observe that there is a high-level advancement in the automation of most vehicle devices, like the adaptive cruise control, obstacle sensor, and automated brake system [40,48]. In [53], proposed a receding horizon model predictive control (MPC) with dynamic platoon splitting and integration rules for AVs and HVs, which mostly ease out the trajectory and prevent any shock-wave but does not concurrently optimise the trajectory and signal to time of the road intersection. Currently, there is a large diversity of research going on in mixed traﬃc generally and the co-existence of human-driven and autonomous vehicles. However, most of these applications are directed towards diﬀerent types of human-driven vehicles (car, bus, truck), motorcycles, bicycles, and pedestrians, which exposes very few design details. Generally, the state-of-the-art in traﬃc management was implemented with the event-driven traﬃc control system. However, there are drawbacks concerning throughput and safety when these methods are implemented in a mixed scenario. There are some traﬃc management techniques, [6,11,16,30,32] who investigated the impact of integrating AV’s on the existing roads to co-exist with the HV’s; how will the mix work concerning traﬃc eﬃciency? The researcher looked critically at a high-way road system using the following three traﬃc parameters: Traﬃc ﬂow characteristics (vehicle, driver behaviour and road intersection, Merging entry, and Exit at intersections).

40

E. F. Ozioko et al.

This work appears attractive, but it was only restricted at the microscopic level. However, Tesla, Incorporation, based in Palo Alto, California, developed electric cars with high-tech features like autonomous vehicles and has been chaining the growing impact of autonomous vehicle integration. In a mix-traﬃc system, microscopic models are used to model each vehicle as a kind of particle. The interactions among cars are modeled with simulations with each component of the proposed framework veriﬁed. Each car type model with the cars and road interaction protocol system is being implemented in the proposed mix-traﬃc framework. This framework was veriﬁed through simulations involving 3-way and 4-way intersection environments with a full detailed assessment of the impact of each vehicle type. The critical challenge in agentbased traﬃc simulation is re-creating practical traﬃc ﬂow at both the macro and micro levels. By seeing traﬃc ﬂows as emergent phenomena, [5] proposed a multi-agent-based traﬃc simulator. According to [17], car agent’s behaviours are often implemented by applying car-following theories using a continuous onedimensional road model. [21] proposed a multilevel agent composed of agents models involving micro-meso, micro-macro, meso-macro simulation framework to address a large scale road traﬃc mix system using an organisational modelling approach. The multiple-leader car-following model involves a heterogeneous mixture of vehicle types that lack lane discipline. According to [8,33,35], these traﬃc conditions lead to a complex driving maneuver that combines vehicle motion in the lateral and longitudinal direction that needed to address multiple-leader following. [33] sought to simplify mixed traﬃc modelling by developing a technique based on the concept of virtual lane shifts, which centred on identifying major lateral changes as a signal of a lane-changing situation. Vehicle-to-vehicle (V 2V ) and vehicle-to-infrastructure (V 2I) communications are possible in a connected vehicle system [24]. CACC systems can safely drive vehicles with very short headways by forming platoons to increase road traﬃc ﬂow capability using V 2V communication. [37,39,47]. CAVs’ advanced technologies open up a world of possibilities for developing novel traﬃc ﬂow management approaches, such as cooperative adaptive cruise control (CACC), speed harmonisation, and signal control, to name a few. With much room for improvement in terms of traﬃc safety, quality, and environmental sustainability, the intersection coordination scheme has obtained broad research interests. [9,14,20,26,27,44,51,54]. For several years, the idea of following a vehicle with a short gap in CACC has been generalised to provide a new intersection control model, in which nearly conﬂicting vehicles approaching from diﬀerent directions will cross the intersection with marginal gaps without using a traﬃc signal. This will enable automated vehicles to reach their maximum potential to reduce trafﬁc congestion, reduce travel time, and increase intersection capability. However, Omae et al. [37] suggested a virtual platooning system for automated vehicle control at an intersection that allows vehicles to pass through without pausing. Vehicles in both lanes are deemed to be in a virtual lane situation, and their intersection interference is considered. They are separately managed so that they can safely follow the platoon’s previous vehicle. The system, which was tested

Speed Harmonisation Strategy

41

using four electric vehicles ﬁtted with automated driving and V2V communication technologies at a one-way intersection, resulted in a signiﬁcant reduction in traﬃc congestion. The current literature conﬁrms that typical constraints in the car-following model is its rigidity to longitudinal vehicle dynamics of safe- distance, average speed, and acceleration/deceleration rate. Most existing traﬃc models are only suitable for describing a homogeneous traﬃc environment using a ﬁrm lane behaviour. As a result, an in-depth analysis of vehicle lateral and longitudinal movements are needed to assess driver behaviour in a heterogeneous traﬃc ﬂow system. Currently, no widely used traﬃc theory could exhaustively simulate a 2-dimensional mix-traﬃc ﬂow involving a lateral and longitudinal behavioural model because of the intricate human driving behavioural pattern involved.

3

Methodology

Following the above-detailed review of the existing research on the impact of an autonomous vehicle on traﬃc, and considering the current gaps in collision avoidance method in a car-following model, listed below are our reason for choosing this method: – Combining intelligent vehicle communications (IVC), road vehicle communications (RVC), and Safe-distance model produces a robust solution to the problem of the safe distance model as well as collision avoidance in traﬃc ﬂow management. – Autonomous Vehicles can be made to communicate with other objects in their environment as well as visualise or read some vital information of other objects, especially how far away a car might be from a reference car position. – human-operated vehicles can also tap into this approach by engaging the sense of sight and looking out for traﬃc control mechanisms, though at a less precise value than autonomous cars. – Eﬃciency of traﬃc management increases when cars form a platoon From the above motivations, we deemed it wise to propose a speed harmonisation strategy for a mix of autonomous vehicles (AVs) and human-driven vehicles (HVs) at a merging single lane road with the priority lane. The road model Fig. 2 outlines a single lane merging road system with its physical properties. Mix traﬃc involves cars with a diﬀerent behavioural pattern but the same road system; how can the co-existence work without heavily impacting traﬃc ﬂow eﬃciency? In this situation, a merging T-intersection at an angle of 45◦ is being considered for cars sharing space to test the hypothesis. This is a scenario where cars are coming from a separate road and will merge into a priority road at a common node between them and cannot always tell by their distance from each other if they are likely to collide. They consider that they might eventually crash in a situation where they are heading towards the same destination. Naturally, the major problem will arise from human-controlled

42

E. F. Ozioko et al.

Fig. 2. Merging road model

vehicles because they do not possess the features of being self-aware of their environment as their behaviour is stochastic and more prone to prediction errors. The proposed model considered the combination of the two traﬃc management strategies of a centralised and decentralised approach. In the centralised strategy, drivers and vehicles communicate with a central controller and the traﬃc signal to assign right-of-way access priority to the intersection. On the other hand, in the decentralized strategy, drivers and vehicles communicate and negotiate for right-of-way access priority. Several research investigation works have been done in the area of the impact of autonomous cars on the traﬃc ﬂow at the intersection [12]. [23] proposes the Optimisation of traﬃc intersection using connected and autonomous vehicles. Also, [10] considered the impact of autonomous cars on traﬃc with consideration of two-vehicle types, which are distinguished by their maximum velocities; slow (Vs ) and fast (Vf ), which denotes the fraction of the slow and fast vehicles respectively. The mixed behaviour results in a very complex traﬃc situation that significantly impacts the capacity and performance of traﬃc intersections. Vehicles with one behavioural pattern are treated with a uniﬁed protocol since each vehicle behaves diﬀerently and observes a simple rule at the conﬂict of intersection zones. However, investigation shows that car accidents’ at merging roads are one of the most critical common aspects of the traﬃc study because it is associated with human life. Within the framework of our traﬃc model, the co-existence of mixed behaviour, safe distance, and collision avoidance have been widely studied in this work. Most models developed to study the traﬃc at the intersection

Speed Harmonisation Strategy

43

with mixed vehicles have been designed to avoid collisions among autonomous vehicles and human-driven vehicles. In this work, besides studying mixed traﬃc behaviour, we investigated the impact of autonomous cars on merging roads in a mixed environment. We propose a new traﬃc control model in a mixed environment at the merging road using a safe distance model to maximise the delay and reduce the probability of accidents. Vehicle occupation time is the key parameter for the required method, deﬁned as the time a vehicle or vehicle group crosses at the intersection area. The occupation time of vehicles’ mixed ratio at these intersections was studied and compared. The volume and the occupation time data by each vehicle ratio were extracted from simulation. The developed Mathematical relationships model describes the two types of vehicle mixed behaviour to the occupation time with the traﬃc ﬂow in a merging single lane road system. A proposal is made for a mix-traﬃc model at a merging T-junction with a priority road section. In which case, we have a vehicle mix of human-driven and driver-less cars accessing the intersection simultaneously. Figure 2 is the proposed model of a one-way merging T-junction with a priority road section and dimensions of the nodes considered. The mix-vehicles on the two road system with start nodes (7 and 11) has a common destination or target at node 9. The vehicle emerges from the two roads segment at node 8 and joins the priority road. The fundamental work of intersection is to direct vehicle trajectories from start to target. The driving behavioural diﬀerence and the dynamic headway are taken into account. Hence, the driving behaviour is classiﬁed into two typical types: the aggressive driving style for humans and the gentle driving for autonomous vehicles, respectively. The equations obtained were used to estimate the critical safe and inter-vehicle distances for aggressive driving (HV). The queuing distance is calculated using an existing method of clearing behaviour approach according to [13]. It is also shown that the estimation of the safe distance is more realistic if reaction time and the aggressive behaviour of drivers are considered simultaneously. Correspondingly, the safe distance, car-following, and platooning rules are used to optimise the traﬃc model. The proposed mix traﬃc model considers the position and dynamic headway of the leading and preceding vehicles. In this case, both vehicles are on the current lane or will share a common node upon merging, thereby using the safe distance and car-following model. Based on the preceding, the following protocols of the central controller determines the traﬃc ﬂow schedule under the below model underpinning protocols: – The car movement priority is assigned to the road between nodes 7, 8, and 9 Fig. 2. – If the merging point (node 8) in Fig. 2 of the road system experiences vehicle arrivals at the same time, then the priority to cross the merging point is given to the road with a Human-driven vehicle in front. – If both roads have the same vehicle types at the front, the priority road takes precedence.

44

E. F. Ozioko et al.

Design of the Road Model The length of the road in Fig. 2 s determined by the addition of all route lengths present in the model. The length of the routes of the roads are calculated thus: We assume that if the route is horizontally straight, like 1 − 2 − 4 − 5 then the route length is the diﬀerence of their x coordinates: 5(x) − 1(x)

(1)

While if the route is vertically straight: Therefore, the length of the road is calculated thus: lroad = (1 − 2 − 4 − 5) + (11 − 12 − 14 − 15) + (7 − 6 − 8 − 9) + (17 − 16 − 18 − 19) + (12 − 8) + (2 − 18) + (16 − 4) + (6 − 14) + (12 − 16) + (2 − 6) + (8 − 4) lroad = 600 + 600 + 600 + 600 + 49.5 + 106.1 + 49.5 + 106.1 + 106.1 + 49.5 + 106.1 Therefore: lroad = 2972.9 m (approximately) lcar = 4.5 m (Average) v = 10 m/s ncars = lroad /(S + lcar )

(2)

where saf edistance , S is 5 m for AVs, S is 7 m for HVs during platooning and S is 3 m after merging. The after merging time is when the cars are on straight road and maintain a steady velocity. Vehicle Model There are two (2) types of vehicles being considered: 1. Autonomous Vehicles (AVs) with intelligent transportation systems (ITSs) features. These ITSs come in the implementation of technologies like sensors and the Internet of Things. 2. Human Driven Vehicles (HVs) with no intelligent transportation system, but makes use of a human driver who engages their sense of sight and hearing to watch for traﬃc signals from traﬃc signalling devices. Interaction Between HV and AV: The AV is modelled with a gentle driving style where the car is responsible for avoiding all obstacles and mitigates its movement through the environment. The AV driving system has the following features as implemented:

Speed Harmonisation Strategy

45

– High precision in obstacle avoidance and manoeuvring, thereby controlling the velocity and acceleration seamlessly – The AV has a safe distance 0f 3 s. While the HV driving was implemented as a conventional system of driving with features as listed below. – The human drivers’ response to a stimulus is about 6 s when compared to the AV, which is a real-time – The HV has a safe distance 0f 5 s. – The braking distance for HD is higher than that AV, but all are subject to the car’s current speed. Because of these distinct diﬀerences between the AVs and HVs, there will be challenges in implementing a safe distance model that produces collision-free traﬃc ﬂow due to the diﬀerent vehicle behaviours. Human drivers are unpredictable with stochastic behaviour, less precise, and more prone to mistakes. [49] observed that human drivers respond to unforeseen events in about 6 s, while autonomously driven vehicles respond close to real-time, the distance to be kept between autonomous vehicles will be: sr = v · t r

(3)

While that for human-driven vehicles will be: sr = v · tr + 6

(4)

where 6 s is the reaction time for human drivers [38]. The developed model combined the microscopic and macroscopic vehicle level of vehicle modelling to address the longitudinal and lateral mixed-vehicle behaviour process. [33], observed that the roadway and traﬃc impact driving behaviour features while the 2-dimensional behaviour of heterogeneous vehicles impacts the intersection capacity. This condition makes the driving behaviour control vehicles longitudinal and lateral manoeuvre at the merging points. This bi-directional behavioural feature is sophisticated when compared with the carfollowing model for homogeneous traﬃc behaviour, giving rise to abreast careful guide, ﬁltering, tailgating, and co-existence. Therefore, the need for a rigorous investigation of the traﬃc parameters at the microscopic level to assess the traﬃc behaviour and model an all-inclusive numerical prototype. The mix-traﬃc simulation strategies are subdivided into two controlling routines or approaches: 1. Longitudinal Control for Car Following model: One of the fundamental features of the car-following model is that vehicles observe an average spacing, “S,”(m) that one vehicle would follow another at a given speed, “V”(mi/hr). This parameter is of interest in accessing the throughput of the Car-following

46

E. F. Ozioko et al.

model. The average speed-spacing relation in Eq. (5) proposed by [41] deals with the longitudinal features of the road and has a relationship with the single-lane road capacity ‘C’(veh/hr) estimation in the form: C = (100)

V S

(5)

where the constant 100 represent the default optimal capacity of the intersection. However, the average spacing relations could be represented as: S = α + βV + γV 2

(6)

where α = vehicle length, L β = the reaction time, T γ = the reciprocal of the average maximum deceleration of a following vehicle to provide enough space for safety. 2. Lateral control of vehicle impacts macroscopic and microscopic behaviours on a car-following model. The lateral control causes a lateral interference in a car-following model designed to impact its management only on the longitudinal pattern [36]. The essence of the lateral behaviour in this AVHVcontol model is to address the driver behaviour characteristics in a mixed vehicle environment. The AVHV control introduces the coupling model between lateral and longitudinal vehicle dynamics through velocity vx control process and the front wheel steering angle λi derived from the steering angle βv . The relationship between the vehicle velocity v, the longitudinal velocity components vx , and the vehicle’s side slip angle θ is represented in Eq. (7) vx = v · cos θ

(7)

In addition, the steering angle θ of the vehicle front wheel λi the angle of the steering wheel, βv and steering ration iu is represented in Eq. (8). λi =

βv iu

(8)

A mix of these two approaches is vital to modelling mix-traﬃc ﬂow simulations at road intersections to eﬀectively manage longitudinal and lateral driving behaviour. The longitudinal car-following model used the optimal velocity function to relax the equilibrium value of the gap between vehicles. Besides, there are still high acceleration and deceleration problems after a vehicle cuts in front, but the intelligent Driver Model addressed this problem. The lateral model uses the technique of maintaining the safe distance braking process to decide the possibility, necessity, and desirability of lateral control of vehicles. The lateral approach model is addressed on a simpliﬁed decision-making process using acceleration according to [25].

Speed Harmonisation Strategy

47

Algorithm 1: Car Behaviour Algorithm- Collision free method

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16 17 18

3.1

Data: Default Gentle behaviour of AV, Aggressiveness in human drivers psychology (quantiﬁed by random values) Result: AVs and HVs Behaviour for Every HV : do Assign aggressiveness with the following attributes; Randomised Reaction time ; Randomised Safe distance (in time); if The Vehicle is AV then Maintain the constant Reaction time; Maintain the constant Safe distance (in time); end if AV and HV having the same expected arrival time (EAT),comes into conﬂict to share an available road space (eg Reservation Node (RN), Traﬃc Light (TL) or Cross Collision Point (CCP)) then // (apply priority considerations); Assign priority to HVs to move; Decelerate the AV; Then move the next Car (AV); if the two Vehicle has diﬀerent expected arrival time (EAT) then move the vehicle with the shortest EAT ﬁrst ; end At Intersection; AV is guided by the Vehicle to Vehicle and to infrastructural communication; HV is guided by the traﬃc light control; The control unit (CU) sync the 2 control methods end if Emergency situation occurs then The AV drives defensively by applying deceleration/acceleration as necessary ; end end

Vehicle Movement Schedule

The car’s movement parameters are primarily controlled by the longitudinal and lateral forces separately for acceleration or deceleration and for turning, respectively. The prototype simulator is developed in a virtual environment using a physics engine to model the traﬃc system. The vehicle movement schedule uses physics’ fundamental laws to move a car from point A to point B in a straight or curved direction. Vehicle movement involves two schedule: – Straight movement schedule – Curved Movement schedule For the car to move, calculations of parameter values are based on Newton’s second law of motion. The drag force Fdrag and rolling resistance forces Frr resist

48

E. F. Ozioko et al.

the traction force Ftraction while driving horizontally. If cruising at constant speed scenario, then Fdrag , Frr and Ftraction are in equilibrium, which makes the longitudinal force Flong to be zero. To simulate car movement at the curve Fig. 3, one needs some geometry, kinematics and need to consider forces and mass.

Fig. 3. Model of curved vehicle movement

The curved movement describes how vehicles move in relation to their coordinate position. Without the curve movement model, this experiment will fail because of the vehicles’ necessity to maintain lane track. Accessing the Bend: The angle of the curve is calculated thus: θ = 360 · v/lcircle (arc)

(9)

θactual = time · θ

(10)

vmax(curve) = x · v · r

(11)

The curve’s angle α = the angle between two intersecting planes. Curved angle is a measure of the angle between two intersecting straight lines and the lines perpendicular to the intersection in respective lanes. This angle can be calculated thus: α=

θactual 180 · π

(12)

Speed Harmonisation Strategy

49

The distance s in a curve can be calculated thus: scurve =

θactual − (θend · l) v

(13)

Car Following Model With Safe Distance The car-following model maintains the behaviour pattern of the leading vehicle. The characteristics pattern of the model analyses Fig. 4 shows how a human being reacts in a traﬃc situation, represented in drivers’ longitudinal behaviour following a leading vehicle and maintaining a safe gap in between vehicle groups.

Fig. 4. Car following model with safe distance

The driving behaviour does not altogether depend on the leader in a carfollowing model, but it depends on the immediate vehicle’s optimal velocity in front. This model does not consider lane changing and overtaking scenarios as that will involve lateral behaviour. The car-following model behaviour could be described in detail using the below three points: – The leading vehicle can accelerate to its desired speed because no vehicle can inﬂuence its speed. – The leading vehicle’s speed primarily determines the following vehicle state because drivers try to maintain a reasonable interval of space or time. – The braking process involves the use of varying degrees of braking force to avoid the collision Conditions for Safe Distance Is Dependent 1. The braking manoeuvre is always executed with constant deceleration b. There is no distinction between comfortable and maximum deceleration. 2. There is a constant Reactiontime tr of 0.3 s for Avs and a randomised reactiontime of 0.3 to 1.7 s for HVs. 3. For safety reasons, all vehicle must maintain a constant gap. We propose a new mathematical model with aggressive factors and adjustable inter-vehicle distance to describe the hybrid vehicle moving behaviour in which

50

E. F. Ozioko et al.

the vehicle platoon used to balance the traﬃc ﬂow. This model deals with the concept that a driver recognises and follows a lead vehicle at a lower speed. According to [4,34,55], the potential to observe and estimate the vehicle response to its predecessor’s behaviour in a traﬃc stream is essential in evaluating what impact the changes to the driving condition will have on traﬃc ﬂow. The car that follows the leader concept is dependent on the below two assumptions: – The collision avoidance approach demands that a driver maintain a safe distance from other vehicles on the road. – The vehicle speed is directly proportional to the spacing between the vehicles. Let δstn+1 represent the distance available for (n + 1)th vehicle, δxsaf e represent the safe distance t and vnt represents velocities vn+1 Therefore, the gap required for safety is given by t δstn+1 = δxsaf e + T · vn+1

(14)

where: T = sensitivity coeﬃcient. However Eq. (14) above could be expressed as: t xn − xtn+1 = δxsaf e + T · vn+1

(15)

When the above equation is diﬀerentiated with respect to time t: t vnt − vn+1 = T · atn+1

(16)

1 t · [vnt − vn+1 ] (17) T From the model prototype, the chosen random values of (0.3 to 1.7) for the human drivers’ safe distance based on the UK transport authority [45] According to the sensitivity coeﬃcient term resulting from generations of models, we have atn+1 =

atn+1 = [

αl,se (vnt )m t ][v t − vn+1 ] xtn − (xtn+1 )l n

(18)

where l = headway se = speed exponent α = sensitivity coeﬃcient Figure 5 is a background description of the vehicle’s safe distance as suggested by the UK Highway Code. The baseline of the method indicates that a humandriven vehicle moving at 30 mph will take approximately 23 m for the braking and stopping process. This is not the case with the autonomous vehicle with

Speed Harmonisation Strategy

51

Fig. 5. Safe distance description for HV

about 0.1 s of thinking distance. This stopping distance s is a component of the thinking distance (the time it takes for a driver to activate brakes and time involved in covering distance before the applied), and from the time brake eﬀect the car speed by initiating the deceleration process. Also involved within the braking distance is the stopping time (time/distance it takes the car to come to a stop). According to [29], in the ﬁeld of driving behaviour, many researchers have devoted themselves to modelling driving behaviour, analysing conﬂict mechanisms, and improving traﬃc safety. All values are based on the S.I units of metres, seconds, and kilograms. Consideration is based on distinguishing between conservative driving and optimistic driving style to help in the prediction of the car motion: In conservative driving, a car must decelerate to a complete stop when the car in front stops suddenly or entirely like in a crash-like scenario, and it is the worst-case scenario. In this case, the distance gap to the leading vehicle should not become smaller than a minimum gap of 30 m [28], while in the optimistic driving style, it is assumed that the car in front brakes as well, and the safe distance takes care of the situation. During reaction time, the vehicle moves by: (19) sr = v · tr However, based on the above assumptions, the safe distance between vehicles is set to constant for the AVs and varies for the HVs. The safe distance values are measured in seconds, eﬀectively describing the distance related to the current car speed. Condition 1 implies that the braking distance that the leading vehicle needs to come to a complete stop is given by v12 (20) 2·a From condition 2 it follows that to come to a complete stop, the driver of the 2 considered vehicle needs not only braking distance v 2b , but also an additional s=

52

E. F. Ozioko et al.

reaction distance vδt travelled during the reaction time (the time to decode and execute the breaking instruction needed). Consequently, the stopping distance is given by δx = vδt +

v2 2·b

(21)

Finally, condition 3 is satisﬁed if the gap ‘s’ exceeds the required minimum ﬁnal value of 0 by considering the stopping distance. δx = δt +

v2 v2 − 1 2b 2 · b

(22)

The speed ‘v’ for which the equal sign holds (the highest possible speed) deﬁnes the “safe speed” vsaf e (s, v1 ) = b · δt + b2 δt2 + 2 · (s − s0 ) (23) What happens in a situation where the car in front applies an automatic break? It would help if you had time (reaction time) to use an automatic brake to avoid collision with the available space and stop. If v = 40 m/s on the motorway, then 20 m distance to start braking time is ideal using the 2 s rule proposed by [42]. Condition for the Minimum Distance y[m] from the Lead Vehicle If the distance between the lead vehicle and the next is greater than the calculated value of y, the merging AV decides to enter the intersection. For AV y =v·t

(24)

where – t[s] = Transit time of the T-junction – v[km/h]= velocity of coming vehicle y can be related to the intersection capacity estimates by c=v·y

(25)

y = lcar + treaction · v + a · v 2 · t

(26)

and where – – – –

lcar = vehicle length t = reaction time a= deceleration rate v = speed

Going by the above analysis equations, the inter-vehicle distance for the diﬀerent car categories can be driven as:

Speed Harmonisation Strategy

53

For HV y = v · (t + 1.8)

(27)

where the constant 1.8 is the inter vehicle time of transit for HV However, considering the human anxiety due to AV by adding a stopping distance d for safety. We have y =v·t·d (28) where d is the safe distance. The stopping, braking and reaction time were enumerated for clarity ss = v0 · tl +

v02 · aF 2

(29)

Model Validation Process Figures 6 and 7 represents two diﬀerent speed graph scenarios of two cars straight movement model without braking and two cars straight movement model with braking, respectively. This model validation is a conﬁrmation that the developed model is predictive under the conditions of its intended use. From Fig. 7, using the ﬁrst in - ﬁrst out approach, the ﬁrst human vehicle is followed by the ﬁrst autonomous vehicle and, so on, based on the ﬁrst to arrive, has the right of way. Also, note the similarities in the plots, where aggressive cars 1 and 2 approaching a curve have a similar velocity pattern to gentle cars 1 and 2 that slow down to keep a safe distance. Simulation Parameter Values For a real traﬃc system behaviour and better control on the parameters of the experiments given the dimensions of the road stated in the Fig. 2, the following parameter values were used: – – – – – – –

Vmax = 10 m/s (maximum velocity) 2 Amax = 9.9 m/s (maximum acceleration) 2 Dmax = −9.9 m/s (maximum declaration) MCar = 1200 kg (mass of car) Fm = 2200 N (moving force) Fb = 1200 N (braking force). C = 100 cars (intersection capacity)

3.2

Traﬃc Flow Model

The traﬃc state q = k.vt

(30)

(where q = volume, v = speed and k = density) vk = Vf −

vf kmax

· k = vf (1 −

k kmax

)

(31)

54

E. F. Ozioko et al.

Fig. 6. Two cars straight movement model

Fig. 7. Two cars straight movement model with braking

where vf = free ﬂow speed kmax = max traﬃc density. From Eq. 1 and 2, we have: q k = vf · (

k − k2 ) kmax

(32)

Traﬃc Flow Procedure – Autonomous and Human-driven vehicles are ﬁlled out; let’s say HVs is onroad A and AVs, on-road B for simplicity. Road A is a straight road, and the HVs proceed without making any turns or bends. – While road B merges or joins road A midway at node 8 after a curve and at an intersection. – The AVs on approaching the curve, slow down considerably and, depending on how close they are to the intersection node 8, get a sense of how far the other car (HV) might be from the nearest RVC server or node. – More importantly, the RVC server judges how far away both vehicles are from each other. – The RVC then uses this information to grant RN to vehicle. It signals the AV to decelerate, keep moving or halt and displays a traﬃc signal for the human driver in the HV that prompts them to move or slow down or halt. – As a result of this, other cars behind the car that slows down while communicating with an RVC node or due to traﬃc or while arriving at an intersection will also slow down to obey the safe-distance model by judging how far they are from the car ahead of them (which is where Inter-Vehicle Communication applies). – At this point, two vehicles from diﬀerent roads obey the merging algorithm rule before fusing together and forming a platoon.

Speed Harmonisation Strategy

55

Vehicle Movement Algorithm However, looking ahead on how the cars will decide on their movement to the target, each vehicle has a deﬁned route by identifying all the node-id along its trajectory or path between the start node and the destination node, then analysing each of the nodes within each identiﬁed route according to a metric function value calculated for each identiﬁed route. The measured function may include parameters associated with each of the road nodes in the system, including a node-to-node distance parameter, traﬃc movement rules, crossing time, straight and curve movement model.

Algorithm 2: Car Movement Algorithm

1

2 3

4 5 6 7 8 9

function Start to destination node movement; Assign vehicle type upon entering the intersection zone ; for Car movement is equal true do Carspeed = carvelocity multiply by the carmagnitude ; carvelocity on xaxis = speed multiply by cosine theta; carvelocity on yaxis = speed multiply by sin θ. for Car movement is equal to False: do Decelerate by initialising the acceleration to zero; Stop end caracceleration on xaxis = 0.0; caracceleration on yaxis = 0.0; Carspeed = carvelocity multiply by the carmagnitude ; carvelocity on xaxis = speed multiply by cosineθ; carvelocity on yaxis = speed multiply by sinθ; for All the next node is a Road node do if node is a valid RoadNode object; check edges and append connected nodes to destination list; append this node to destination lists of connected node; Decelerate the car by multiplying the acceleration by 0; Stop end end

During Platooning, the safe distance is maintained at 5 m for AVs and 7 m for HVs respectively. For AVs: ncars = 2972.9/(5 + 4.5) ncars = 312.93(approx.) Therefore, ncars = 312 cars for AVs

56

E. F. Ozioko et al.

For HVs: ncars = 2972.9/(7 + 4.5) ncars = 258.51 (approx) Therefore, ncars = 258 for HVs Based on the above calculations, the road capacity for the diﬀerent category of cars are calculated as follow: – capacity for AVs = 396 cars – capacity for HVs = 312 cars of the road The vehicle describes a curved circular path perfectly when the front wheels turn at an angle θ , while the car maintains a constant speed. For optimal performance, keep the car speed constant while the physics of turning is simulated at low speed and high speed. Car wheels can sometimes have a velocity not aligned with the wheel orientation, and this is because, at high speed, one observes that the wheel can be heading in one direction while the car body is still moving in another direction. This means a velocity component is at a right angle to the wheel, which generates frictions. capacity = max traﬃc volume q = k.vt

(33)

density 1 vTh + L Th = time gap(temporal distance) L = length of vehicle k=

HA Ch = qmax =

v vTh + L

(34)

(35)

VA

v (36) vTa + L When HV and AV are combined togethe, one will be able o generate the expected impact of AV on HV when implemented on a graph with varying parameters. Ca vTh + L (37) = Ch vTa + L For traﬃc mix, n represent Av capacity cm is now dependent on n n represent the ratio of AV integrated into the road. v cm = (38) nvT a + (1 − n)vTh + Lpkw Ca =

Considering an additional distance by AV to a vehicle steered by HV to avoid harassment of drivers cm =

1 n2 vT aa + n(1 − n)vTah + (1 − n)vThx + L

(39)

Speed Harmonisation Strategy

57

Road Traﬃc Capacity Estimation Approach 1. Shortening of headway between Av 2. Speed of the vehicle group. The higher the speed at a constant density, the higher the traﬃc volume

4

Experiments

Collision Avoidance with Save Distance (CAwSD) Control Method The collision avoidance techniques describe how the interaction between traﬃcs and the road system is represented as a chain of conﬂict points as proposed by Gipps [19]. There is no requirement for a phase assignment or cycle time compared with the traﬃc light control method. At each time, traﬃc arriving at the intersection check to know if another traﬃc shares the collision points along its trajectory. The vehicle arriving parameters of position, speed, time are used to calculate which vehicle would be given way to the collision point in a real traﬃc situation. On arrival at the intersection, conﬂicting vehicles cannot enter the intersection simultaneously when they share the same collision point but can move concurrently on intersection as it provides that they do not share the same collision point simultaneously. This method takes an analytical approach by calculating the probability of traﬃcs arriving at a conﬂict point simultaneously and the subsequent delay. When vehicles are sharing the same collision point from a diﬀerent route, they might eventually collide. Naturally, the major problem will arise from human-controlled vehicles as their behaviour is stochastic, and they are more prone to errors in prediction. Consideration is based on two types of vehicles which varies in their maximum velocities; slow (Vs ) and fast (Vf ) which denotes the fraction of the slow and fast vehicles, respectively. Inter-Vehicle Space Adjustment with Reservation Node Technique This proposed technique is a reservation-based algorithm that schedules the vehicles’ entrances into the intersection space by reserving a collision cell to one particular vehicle every instance. The process of using the intersection collision point is based on a request, and reservations are made based on a predeﬁned protocol before vehicles can pass. This eﬃcient schedule is formulated to calculate the vehicle’s relative speed to the reservation cell and assign a vehicle sequence. Car’s distances to other cars before it is calculated and the minimum distance to the reservation node is found. After this, the environment’s central collision avoidance system signals the car to brake, decelerate, and, if not, to keep going. The safe vehicle distance, reaction time, and relative distance model are proposed to maximise the delay and reduce the probability of accidents at cross collision points. This traﬃc management strategy is a decentralised strategy where drivers and vehicles communicate and negotiate for access to the cross collision point based on their relative distance and access priority to the intersection.

58

5

E. F. Ozioko et al.

Result Discussion and Evaluation

To test the hypothesis, which states that vehicles move more eﬃciently when the road intersection cell and reserved. With the adjustment of the Avs inter-vehicle distance, the performance of HVs increases, and the vehicle’s occupation time increases with an increase in the ratio of human-driven vehicles. An analysis of variance in the time analysis of diﬀerent ratio simulation tests is conducted Fig. 10 which gives statistics for the variation in time occupancy with vehicle mix ratio. This is due to the diﬀerence in the behavioural aspects of human-driven and driver-less cars.

Fig. 8. 50% capacity

Fig. 9. 100% capacity

Speed Harmonisation Strategy

59

Stability: In the contest of this research, traﬃc ﬂow stability as represented in Fig. 12 is analysed with the number of traﬃc braking in response to the volume for the diﬀerent control methods under the same condition. The traﬃc ﬂow eﬃciency at road intersections depends partly on traﬃc ﬂow stability which is analysed with the number of braking associated with a control method. The traﬃc stability could be accessed from the uniformity of the ﬂow speed. It is a state where all cars move with an identical safe distance and optimal velocity. A speed ﬂuctuation impacts the vehicle ﬂow stability when in motion. It is observed that the diﬀerent traﬃc control methods are associated with varying levels of stability. The vehicle safe distance process involves deceleration and acceleration, which causes a perturbation in the stability of the overall ﬂow. Travel Time Delay Figure 11 represents the travel time delay associated with the diﬀerent traﬃc control strategies. It is evident in Fig. 11 that the RN traﬃc control strategy expenses the shortest queues of cars. Depending on the intersection-speciﬁc conditions, delay anal- yses for transportation system plans, transportation planning rule (TPR) may be required for operational research. Traﬃc congestion is often associated with stop-and-go traﬃc, slower speeds, longer travel times, and increased vehicular queuing as its characteristics. These characteristics could be quantiﬁed by the number of vehicles waiting for access permission around the intersection. It is the cumulative eﬀect of these delays that makes up the overall travel time.

Discussions The proposed methodology for analysing the impact of mixing AVs and HVs will help determine the integration pattern of an autonomous vehicle for the mixed vehicle transition period. In addition, traﬃc engineers can use the models developed in this study to estimate the capacity of a road intersection in a mixedtraﬃc environment. This investigation discovered that autonomous vehicles are

Fig. 10. Vehicle occupancy matrix

60

E. F. Ozioko et al.

Fig. 11. Travel time delay

much safer, time-eﬃcient, and help decongest roads. Figures 8 and 9 represents the simulation result of the intersection performance under half number of vehicle and full number of vehicles, respectively. It is evident from Fig. 9 that intersection eﬃciency increases with an increase in the ratio of an autonomous vehicle. This is because AVs combine and interpret their surroundings’ sensory data to identify appropriate navigation paths, obstacles, and appropriate signage. The measure of intersection eﬃciency is conducted using traﬃc parameters performance metrics relating to throughput and delay. The Performance for diﬀerent traﬃc control strategies is analysed using diﬀerent parameter values based on simulations to see the eﬀect of their values on the system’s throughput performance. The values of the vehicle mixed ratio were increased in every simulation to establish the impact of the ration variation to guide the integration pattern. The Performance of diﬀerent ratio cases is analysed and compared under the three traﬃc control methods. This trend makes the HV beneﬁt ineﬃciency from the AV in a co-existence scenario.

Speed Harmonisation Strategy

61

Fig. 12. The number of braking occurred

6

Contributions to Knowledge

In the cause of this work, some new knowledge based on the previously available knowledge has been created. They include – – – – –

7

Guide to mixing traﬃc integration pattern Describe 2-D mix-traﬃc behaviour eﬀectively Increases HVs performance when AV inter-vehicle distance is adjusted A speed harmonisation method for mixed traﬃc Serves as a mixed driving behaviour model

Future Research Direction

Future research work could improve the mixed traﬃc management scheme in the following four main categories: Drivers Behaviour Models – Incorporate the drivers’ decision to accept or reject RN oﬀer – Investigate the factors that inﬂuence the driver’s behaviour Vehicle Models – Model varying vehicle lengths to reﬂect the real city traﬃc situation Road Intersection Model – Extend the strategy to a multi-lane, multi intersection road network – Investigate the cooperation level between AV and HV

62

E. F. Ozioko et al.

Traﬃc Flow Model – Investigate the eﬀect of safe distance and reaction time distribution – Apply Machine Learning to control traﬃc and provide real-life physics – Investigate non-compliance to an emergency

8

Conclusion

The novelty of the mixed traﬃc speed adjustment strategy is that it harmonise the AVs and HVs vehicle speed, thereby increasing the ﬂow eﬃciency. Secondly, by addressing a 2-dimensional traﬃc ﬂow problem in heterogeneous trafﬁc, an existing 1-dimensional car-following model compensates for unexpected changes in human-driven vehicles. The algorithm controls the mix-traﬃc variable speed bottleneck to smooth the traﬃc ﬂow eﬀectively. Using the acceptance safe distance model, this proposed model entails interpolating human-driven and autonomous vehicles’ behaviour with inter-vehicle distance adjustment. The above strategy has been implemented on the developed model and calibrated with realistic parameters, vehicle distribution, and vehicle ratio mixes. The concept of the cell reservation method appears to be eﬃcient as it centrally synchronises both AH and HV parameters simultaneously. The feature of real-time traﬃc parameter sharing in AV makes predicting vehicle velocities in managing traﬃc possible. This work provides scientiﬁc support for the integration plan of autonomous vehicles and a mixed traﬃc control system. It will improve mixedtraﬃc eﬃciency, mitigate traﬃc congestion at road intersections, and provide technical support for future research in traﬃc control systems. A mix of humandriven and automated vehicles is gradually becoming the norm around the world. The large-scale advancement and application of new technologies in vehicle and traﬃc management will greatly promote urban traﬃc control systems and support a full-scale intelligent transportation system. The developed AVHV model behaviour appears to be able to reasonably mimic the behaviour of mixed traﬃc with parameters consistent in the behaviour of the mixed traﬃc and simulated ﬂow. The combined behaviour of the traﬃc is mainly controlled by the distribution of the harmonised speed, safe distance distribution, and the number of braking. In contrast, the reaction time distribution controls the individual vehicle behaviour, and the vehicle length. The experimental results show a wellharmonised vehicle group speed at every instance of time. The cell reservation method has investigated the eﬀect of driverless cars on human-driven cars at a merging road intersection using inter-vehicle distance. The vehicle occupation time is observed at a merging road, and mixed mathematical relations relating to occupation time of diﬀerent vehicle types were developed. From our ﬁndings, a vehicle ratio occupancy pattern was developed to serve as a valuable tool for evaluating the integration process of autonomous cars on the road. The key conclusions arising out of this study were: 1. Traﬃc ﬂow eﬃciency increases when road intersection cells are reserved. 2. It has been established that the integration of autonomous cars on the road will positively impact the eﬃciency of human-driven cars. 3. The vehicle occupancy time depends on the traﬃc mixed ratio.

Speed Harmonisation Strategy

8.1

63

Summary

Related traﬃc technologies have been developed to support the autonomous vehicle integration process, which is essential for eﬀectively utilizing autonomous vehicles’ beneﬁts. A Mathematical model describes the two types of vehicle mixed behaviour to the occupation time with the traﬃc ﬂow in a merging T-junction. It has been observed that the vehicle occupation time in a mixed traﬃc ﬂow increases with a higher ratio of an autonomous vehicle. Also, the throughput increases by adjusting the inter-vehicle distance. The proposed methodology will be helpful to determine the integration pattern of an autonomous vehicle for the mixed vehicle transition period. Also, the models developed in this research can be used by traﬃc engineers to estimate the capacity of a merging road intersection in a mixed traﬃc environment. The investigation discovered that autonomous cars are much safer, time-eﬃcient, and help decongest roads. The work done so far represents steps towards a system of safe and eﬃcient mixed traﬃc management schemes to implement a mixed traﬃc integration environment. It is an important goal as reliance on these autonomous cars is ever increasing, the objectives of this project have been identiﬁed, autonomous cars have come to stay and co-exist with human-driven cars inevitable. Towards this end, a promising method of managing traﬃc mix is realisable. The generated experimental results promise to produce a traﬃc schedule that will sustain state of art in mixed traﬃc environment management. The results obtained are based on an intersection capacity of 100 cars with a varying ratio mixed of autonomous and human-driven cars. Looking at the result in Fig. 10, it is observed that the obtained result shows that an increase in the ratios of autonomous cars is inversely proportional to a decrease in the simulation time, and this supports the research hypothesis. Hence, we conclude that the intersection eﬃciency increases with the ratio of autonomous cars to human-driven cars, which shows that autonomous cars improve traﬃc eﬃciency. We have examined the potential impact of integrating autonomous cars to coexist with human-driven cars on the road. The assessment was carried out under parameters that align with the realistic operating environment of the city traﬃc ﬂow system. Modern traﬃc lights use real-time event-driven control models but are designed to model a homogeneous traﬃc system. However, the AVHV control model supports a traﬃc schedule with a traﬃc signal light to control HVs and wireless communications for controlling AVs. This control method involves the dynamic representation of a mix-traﬃc system at road intersections to help plan, design better, and operate traﬃc systems moving it through time. The research direction taken was the utilisation of reservation cells to improve the traﬃc ﬂow performance. By reserving any of the 12 intersection reservation cells for a vehicle at every instance, the traﬃc ﬂow throughput increases better than when using traﬃc light or collision avoidance methods. The obtained results demonstrate that the cell reservation strategy has about 18.2% performance margin. Acknowledgments. This research is part of the outcome of my PhD research which was funded by the Nigerian Tertiary Education Trust Fund.

64

E. F. Ozioko et al.

References 1. AbuAli, N., Abou-zeid, H.: Driver behavior modeling: developments and future directions. Int. J. Veh. Technol. 2016 (2016) 2. Arnaout, G.M., Arnaout, J.-P.: Exploring the eﬀects of cooperative adaptive cruise control on highway traﬃc ﬂow using microscopic traﬃc simulation. Transp. Plan. Technol. 37(2), 186–199 (2014) 3. Asaithambi, G., Anuroop, C.: Analysis of occupation time of vehicles at urban unsignalized intersections in non-lane-based mixed traﬃc conditions. J. Modern Transp. 24(4), 304–313 (2016). https://doi.org/10.1007/s40534-016-0113-7 4. Azlan, N.N.N., Rohani, M.M.: Overview of application of traﬃc simulation model. In: MATEC Web of Conferences, vol. 150, p. 03006. EDP Sciences (2018) 5. Benhamza, K., Ellagoune, S., Seridi, H., Akdag, H.: Agent-based modeling for traﬃc simulation. In: Proceedings of the International Symposium on Modeling and Implementation of Complex Systems MISC2010, Constantine, Algeria, pp. 30–31 (2010) 6. Bento, L.C., Paraﬁta, R., Nunes, U.: Intelligent traﬃc management at intersections supported by v2v and v2i communications. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 1495–1502. IEEE (2012) 7. Biondi, F., Alvarez, I., Jeong, K.-A.: Human-vehicle cooperation in automated driving: a multidisciplinary review and appraisal. Int. J. Hum.-Comput. Interact. 35(11), 932–946 (2019) 8. Budhkar, A.K., Maurya, A.K.: Multiple-leader vehicle-following behavior in heterogeneous weak lane discipline traﬃc. Transp. Dev. Econ. 3(2), 20 (2017) 9. Chan, E., Gilhead, P., Jelinek, P., Krejci, P., Robinson, T.: Cooperative control of sartre automated platoon vehicles. In: 19th ITS World CongressERTICO-ITS EuropeEuropean CommissionITS AmericaITS Asia-Paciﬁc (2012) 10. Chen, D., Srivastava, A., Ahn, S., Li, T.: Traﬃc dynamics under speed disturbance in mixed traﬃc with automated and non-automated vehicles. Transp. Res. Procedia 38, 709–729 (2019) 11. Dro´zdziel, P., Tarkowski, S., Rybicka, I., Wrona, R.: Drivers’ reaction time research in the conditions in the real traﬃc. Open Eng. 10(1), 35–47 (2020) 12. Duarte, F., Ratti, C.: The impact of autonomous vehicles on cities: a review. J. Urban Technol. 25(4), 3–18 (2018) 13. Dutta, M., Ahmed, M.A.: Gap acceptance behavior of drivers at uncontrolled tintersections under mixed traﬃc conditions. J. Modern Transp. 26(2), 119–132 (2018) 14. Feng, Y., Head, K.L., Khoshmagham, S., Zamanipour, M.: A real-time adaptive signal control in a connected vehicle environment. Transp. Res. Part C: Emerging Technol. 55, 460–473 (2015) 15. Ferreira, J., et al.: Driver behavior proﬁling: an investigation with diﬀerent smartphone sensors and machine learning. PLoS ONE 12(4), e0174959 (2017) 16. Friedrich, B.: The eﬀect of autonomous vehicles on traﬃc. In: Maurer, M., Gerdes, J.C., Lenz, B., Winner, H. (eds.) Autonomous Driving, pp. 317–334. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48847-8 16 17. Fujii, H., Uchida, H., Yoshimura, S.: Agent-based simulation framework for mixed traﬃc of cars, pedestrians and trams. Transp. Res. part C: Emerg. Technol. 85, 234–248 (2017) 18. Fuller, R.: Towards a general theory of driver behaviour. Accid. Anal. Prevent. 37(3), 461–472 (2005)

Speed Harmonisation Strategy

65

19. Gipps, P.G.: A behavioural car-following model for computer simulation. Transp. Res. Part B: Methodol. 15(2), 105–111 (1981) 20. Gong, S., Lili, D.: Cooperative platoon control for a mixed traﬃc ﬂow including human drive vehicles and connected and autonomous vehicles. Transp. Res. Part B: Methodol. 116, 25–61 (2018) 21. Haman, I.T., Kamla, V.C., Galland, S., Kamgang, J.C.: Towards an multilevel agent-based model for traﬃc simulation. Procedia Comput. Sci. 109, 887–892 (2017) 22. Huang, S., Ren, W.: Autonomous intelligent vehicle and its performance in automated traﬃc systems. Int. J. Control 72(18), 1665–1688 (1999) 23. Hussain, R., Zeadally, S.: Autonomous cars: research results, issues, and future challenges. IEEE Commun. Surv. Tutor. 21(2), 1275–1313 (2018) 24. Kamal, M.A.S., Imura, J., Ohata, A., Hayakawa, T., Aihara, K.: Coordination of automated vehicles at a traﬃc-lightless intersection. In: 2013 16th International IEEE Conference on Intelligent Transportation Systems-(ITSC), pp. 922– 927. IEEE (2013) 25. Kesting, A., Treiber, M., Helbing, D.: General lane-changing model mobil for carfollowing models. Transp. Res. Rec. 1999(1), 86–94 (2007) 26. Khondaker, B., Kattan, L.: Variable speed limit: a microscopic analysis in a connected vehicle environment. Transp. Res. Part C: Emerg. Technol. 58, 146–159 (2015) 27. Knorn, S., Donaire, A., Ag¨ uero, J.C., Middleton, R.H.: Passivity-based control for multi-vehicle systems subject to string constraints. Automatica 50(12), 3224–3230 (2014) 28. Lertworawanich, P.: Safe-following distances based on the car-following model. In: PIARC International seminar on Intelligent Transport System (ITS) in Road Network Operations (2006) 29. Li, H., Li, S., Li, H., Qin, L., Li, S., Zhang, Z.: Modeling left-turn driving behavior at signalized intersections with mixed traﬃc conditions. Math. Probl. Eng. 2016 (2016) 30. Liao, R.: Smart mobility: challenges and trends. Toward Sustainable And Economic Smart Mobility: Shaping The Future Of Smart Cities, p. 1 (2020) 31. Liu, Y., Ozguner, U.: Human driver model and driver decision making for intersection driving. In: 2007 IEEE Intelligent Vehicles Symposium, pp. 642–647. IEEE (2007) 32. Maitre, M.L., Prorok, A.: Eﬀects of controller heterogeneity on autonomous vehicle traﬃc. arXiv preprintarXiv:2005.04995 (2020) 33. Matcha, B.N., Namasivayam, S.N., Fouladi, M.H., Ng, K.C., Sivanesan, S., Eh Noum, S.Y.: Simulation strategies for mixed traﬃc conditions: a review of car-following models and simulation frameworks. J. Eng. 2020 (2020) 34. Mathew,T.: Lecture notes in transportation systems engineering. Indian Institute of Technology (Bombay) (2009). http://www.civil.iitb.ac.in/tvm/1100 LnTse/ 124 lntse/plain/plain.html 35. Mathew, T.V., Munigety, C.R., Bajpai, A.: Strip-based approach for the simulation of mixed traﬃc conditions. J. Comput. Civ. Eng. 29(5), 04014069 (2015) 36. Miloradovic, D., Gliˇsovi´c, J., Stojanovi´c, N., Gruji´c, I.: Simulation of vehicle’s lateral dynamics using nonlinear model with real inputs. In: IOP Conference Series: Materials Science and Engineering, vol. 659, p. 012060. IOP Publishing (2019) 37. Omae, M., Ogitsu, T., Honma, N., Usami, K.: Automatic driving control for passing through intersection without stopping. Int. J. Intell. Transp. Syst. Res. 8(3), 201– 210 (2010)

66

E. F. Ozioko et al.

38. Pawar, N.M., Velaga, N.R.: Modelling the inﬂuence of time pressure on reaction time of drivers. Transport. Res. F: Traﬃc Psychol. Beh. 72, 1–22 (2020) 39. Ploeg, J., Serrarens, A.F.A., Heijenk, G.J.: Connect & drive: design and evaluation of cooperative adaptive cruise control for congestion reduction. J. Modern Transp. 19(3), 207–213 (2011) 40. Vivan, G.P., Goberville, N., Asher, Z., Brown, N., et al.: No cost autonomous vehicle advancements in carla through ROS. SAE Technical Paper, p. 01–0106 (2021) 41. Rothery, R.W.: Car following models. Trac Flow Theory (1992) 42. Saifuzzaman, M., Zheng, Z., Haque, M.M., Washington, S.: Revisiting the taskcapability interface model for incorporating human factors into car-following models. Transp. Res. Part B: Methodol. 82, 1–19 (2015) 43. Salvucci, D.D.: Modeling driver behavior in a cognitive architecture. Hum. Fact. 48(2), 362–380 (2006) 44. Swaroop, D., Hedrick, J.K., Chien, C.C., Ioannou, P.: A comparision of spacing and headway control laws for automatically controlled vehicles1. Veh. Syst. Dyn. 23(1), 597–625 (1994) 45. UK transport autority. Driving test success is the UK’s copyright 2020 - driving test success, May 2020. https://www.drivingtestsuccess.com/blog/safe-separationdistance 46. Tripathy, S., Asaithambi, K., Jayaram, P., Medhamurthy, R.: Analysis of 17βestradiol (e 2) role in the regulation of corpus luteum function in pregnant rats: involvement of igfbp5 in the e 2-mediated actions. Reprod. Biol. Endocrinol. 14(1), 19 (2016) 47. Van Arem, B., Van Driel, C.J.G., Visser, R.: The impact of cooperative adaptive cruise control on traﬃc-ﬂow characteristics. IEEE Trans. Intell. Transp. Syst. 7(4), 429–436 (2006) 48. Van Brummelen, J., O’Brien, M., Gruyer, D., Najjaran, H.: Autonomous vehicle perception: the technology of today and tomorrow. Transp. Res. Part C: Emerg. Technol. 89, 384–406 (2018) 49. van Wees, K., Brookhuis, K.: Product liability for ADAS; legal and human factors perspectives. Eur. J. Transp. Infrastruct. Res. 5(4) (2020) 50. Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T.: Scheduling autonomous vehicle platoons through an unregulated intersection. arXiv preprintarXiv:1609. 04512 (2016) 51. Wang, M., Daamen, W., Hoogendoorn, S.P., van Arem, B.: Rolling horizon control framework for driver assistance systems, Part II: Cooperative sensing and cooperative control. Transp. Res. Part C: Emerg. Technol. 40, 290–311 (2014) 52. Young, M.S., Birrell, S.A., Stanton, N.A.: Safe driving in a green world: a review of driver performance benchmarks and technologies to support ‘smart’ driving. Appl. Ergon. 42(4), 533–539 (2011) 53. Zhao, W., Ngoduy, D., Shepherd, S., Liu, R., Papageorgiou, M.: A platoon based cooperative eco-driving model for mixed automated and human-driven vehicles at a signalised intersection. Transp. Res. Part C: Emerg. Technol. 95, 802–821 (2018) 54. Zhou, Y., Ahn, S., Chitturi, M., Noyce, D.A.: Rolling horizon stochastic optimal control strategy for ACC and CACC under uncertainty. Transp. Res. Part C: Emerg. Technol. 83, 61–76 (2017) 55. Zhu, W.-X., Zhang, H.M.: Analysis of mixed traﬃc ﬂow with human-driving and autonomous cars based on car-following model. Phys. A 496, 274–285 (2018)

Road Intersection Coordination Scheme for Mixed Traﬃc (Human Driven and Driver-Less Vehicles): A Systematic Review Ekene F. Ozioko1(B) , Julian Kunkel2 , and Fredric Stahl1,2 1

2

Computer Science Department, University of Reading, Reading, UK [email protected], [email protected] Computer Science Department, University of G¨ ottingen, G¨ ottingen, Germany [email protected]

Abstract. Autonomous vehicles (AV) are emerging with enormous potentials to solve many challenging road traﬃc problems. The AV emergence leads to a paradigm shift in the road traﬃc system, making the penetration of autonomous vehicles fast and its co-existence with humandriven cars inevitable. The migration from the traditional driving to the intelligent driving system with AV’s gradual deployment needs supporting technology to address mixed traﬃc systems problems; mixed driving behaviour in a car-following model, variation in vehicle type control means, the impact of a proportion of AV in traﬃc mixed traﬃc, and many more. The migration to fully AV will solve many traﬃc problems: desire to reclaim travel and commuting time, driving comfort, and accident reduction. Motivated by the above facts, this paper presents an extensive review of road intersection traﬃc management techniques with a classiﬁcation matrix of diﬀerent traﬃc management strategies and technologies that could eﬀectively describe a mix of human and autonomous vehicles. It explores the existing traﬃc control strategies, analyse their compatibility in a mixed traﬃc environment. Then review their drawback and build on it for the proposed robust mix of traﬃc management schemes. Though many traﬃc control strategies have been in existence, the analysis presented in this paper gives new insights to the readers on the applications of the cell reservation strategy in a mixed traﬃc environment. The cell assignment and reservation method are the operations systems associated with the air traﬃc control systems used to coordinate aircraft landing. The proposed method identiﬁes the cross collision point (CCP) in a 4-way road intersection and develops an optimisation strategy to assign vehicles to the CCP sequentially and eﬃciently. The traﬃc ﬂow eﬃciency uses a hybrid Gipps car-following model to describe a 2dimensional traﬃc behaviour involved in a mixed traﬃc system. Though many traﬃc control strategies have been in existence, the car-following model has shown to be very eﬀective for optimal traﬃc ﬂow performance. The main challenge with the car-following model is that it only controls traﬃc in the longitudinal pattern, which is not suitable in describing mixed traﬃc behaviour. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 67–94, 2022. https://doi.org/10.1007/978-3-031-10467-1_4

68

E. F. Ozioko et al. Keywords: Mixed traﬃc · Driving behaviour · Vehicle communication control parameters · Machine learning · Cell reservation

1

Introduction

Overview This paper presents a systematic review of road traﬃc ﬂow control strategies which are based on traﬃc theories. It also looked at the fundamentals impact of driving behaviour on traﬃc ﬂow parameters with emphasis on mix-traﬃc management at road intersection. The theoretical introduction to traﬃc management, traﬃc rules, and regulation is presented in Sect. 1, also covered in this section is an introduction to the relevant traﬃc terms and concepts. Section 2 reviewed the state-of-the-art in traﬃc and mixed traﬃc intersection management, covering the diﬀerent types and means used in managing road traﬃc at intersection. Section 2.1, deals with introducing intelligent transportation systems, covering the history of intelligent transportation and autonomous intersections. The transition from human-driven to autonomous vehicle technology is covered in Sect. 3, with details of the diﬀerences involved in the vehicle autonomy process stages. The classiﬁcation matrix of the related works is presented in Table 1 covering means of vehicle communication and mix-traﬃc management approaches. Also covered in Table 2 are some key intersection performance indicators like eﬃciency, fairness in traﬃc scheduling, safety, and scalability features of each of the traﬃc control approaches. A summary of the pros and cons of the approaches is also captured in this section. The research gap is discussed in Sect. 5 with a justiﬁcation for the proposed strategy, while the summary was covered in Sect. 6. The proposed integration of autonomous and human-driven vehicles is associated with many challenges, ranging from how will AV and HV co-exist harmoniously in an enhanced Gipp’s car-following model, implementation of road technologies to support the co-existence, addressing the control communication barriers between the vehicle types, social acceptability, and many more. The eﬃcient use of the existing road infrastructure for a novel intersection control management is the feasible solution for cities where road redesign, expansion, and additional construction are deemed challenging. Generally, innovative trafﬁc management aims to improve the traﬃc ﬂow system by integrating modern technology and management strategies to develop a robust traﬃc management scheme that aims to prevent traﬃc collisions/accidents and create a seamless traﬃc ﬂow. The increase in population and number of vehicles without a corresponding increase in road infrastructure leads to most cities’ current worsening traﬃc control status. Road traﬃc management involves using predeﬁned rules to organise, predict, arrange, guide, and manage road users, both stopped, and moving traﬃcs. Road traﬃc includes vehicles of diﬀerent types, pedestrians, bicycles, and all types of vehicles. By default, the traﬃc management system is guided by protocols mostly executed by traﬃc signal lights. The conventional traﬃc control system uses lights, signals, pedestrian crossings, and signalling

Road Intersection Coordination Scheme for Mixed Traﬃc

69

equipment located at the intersection zone to control traﬃc ﬂow. Traditional traﬃc management system uses time-based scheduling for traﬃc management at road intersections. The innovative traﬃc scheduling system improves the idle time associated with time-based management by involving a set of applications, management, or command-control and signalling system to improve a road intersection’s overall traﬃc performance and safety. Traﬃc management applications gather complex real-time traﬃc information (vehicle type, vehicle speeds, inroad, and roadside sensors), analyse it, and use it to provide safe and eﬃcient traﬃc control services for all vehicles using the road facility in real-time. However, Tesla, Incorporation, based in Palo Alto, California, developed electric cars with high-tech features like autonomous vehicles and has been changing the growing impact of autonomous car integration. Besides the co-existence of mixed traﬃc on the roads, traﬃc risks are predominantly high at road intersections because of the multi-road and multi-traﬃc participants who converge from diﬀerent routes to diverge after crossing the intersection. Conventional vehicles observe road traﬃc rules to protect drivers’ safety and everyone using the road system. At intersections, human drivers are guided by light traﬃc systems, while the driver-less vehicles came in with new technology for accessing the road facilities involving vehicle-vehicle and vehicleinfrastructure communication. The mixed-vehicle integration process has to be built on the existing road and traﬃc control infrastructure meant for humanderived vehicles. These technological infrastructures involve vehicles, road systems, and eﬃcient traﬃc control strategies such as car-following model, Cruise control, and lane-keeping assist system, which have been used in the humandriven vehicle type control process. Areas of innovation will be based on a hybrid strategy to address a mixed traﬃc scenario considering the vehicle type independent behaviour. However, for the transition period to the intelligent transportation system, traﬃc ﬂows which involve the co-existence of automated-driven and manually operated vehicles are unavoidable. In this AV and HV co-existence situation, it will be diﬃcult for the driver of a human-driven vehicle to predict the movements of an autonomous vehicle and vice versa, mainly because they both use diﬀerent communication parameters. However, this systematic review document analysed and classiﬁed state of the art in mixed traﬃc management, emphasising the co-existence of human and autonomous vehicles with a proposal for a hybrid strategy for controlling the mixed vehicle. Based on the reviews of the current research gap in the mixed traﬃc system, a proposal is made for an alternative strategy for managing hybrid vehicles at the intersection and supporting the mixed vehicle integration process. Consideration is based on simulating an eﬃcient and safe traﬃc management scheme at road intersections to address a combination of autonomous and human-driven vehicles. Autonomous vehicles are deﬁned by several diﬀerent levels, depending on their capabilities, for example, levels of human control. Inter-vehicle communication (IVC) and road-vehicle communication (RVC) are technologies that help drivers perceive the surrounding traﬃc situation that guide the safety navigation process. Additionally, the collision point at the road intersection is identiﬁed and used to assign vehicles

70

E. F. Ozioko et al.

crossing the intersection sequentially. Also, a safe distance model helps drivers maintain a safe distance from the cars ahead by automatically adjusting the vehicle’s speed. A cooperative ITS combines the mixed driving behaviour functions and enables collaboration between the diﬀerent vehicle types and their technologies, but it only works with diﬀerent autonomous vehicles. The introduction of new technology is not usually automatic, and new ones are gradually replacing the current human-driven vehicles technology. There is an obvious need to integrate driver-less vehicle movement parameters with human-driven vehicles to midwife the smooth transition to a fully intelligent transportation system. This mixed-vehicle integration is necessary because conventional vehicles currently occupying the road cannot just be phased out sooner and consider the enormous autonomous advantages. Traﬃc conditions are usually evaluated from traﬃc characteristics assessment, utilising several methods, typically being cleaved into data-driven, modelbased, or both. A summary, the distinct traﬃc ﬂow modelling methodologies include microscopic, macroscopic, and mesoscopic ﬂow models. Microscopic trafﬁc models give a detailed, high-level account of an individual vehicle’s motion. Traﬃc group conditions are represented using an aggregated behaviour for macroscopic traﬃc models, generally concerning mean speed and mean density over a speciﬁed period or an observation distance. Mesoscopic models employ microscopic and macroscopic approaches by utilising varying levels/degrees of detail to model traﬃc behaviour. Some road locations are modelled with aggregated measurements as macroscopic, and the remaining locations are modelled down to the details of individual vehicles as is done in the case of microscopic models. In most cases, modelling traﬃcs at the macroscopic levels is adequate to generate a sustainable mix-traﬃc model because they proﬀered the alternatives for most experimental purposes such as traﬃc control/management, road intersection cell reservation, and road infrastructure model alternatives. 1.1

Classiﬁcation of Traﬃc Control Means

Based on the current state of the art in-vehicle technology and road trafﬁc management for human-driven and driver-less vehicles, there are two main approaches to controlling traﬃc ﬂow within an intersection; this includes: – Traﬃc signal light: The HVs use traﬃc light in its control process, and it consists of the installation of signals lights that controls traﬃc streams by using diﬀerent light indicators. This technology control traﬃc statically and dynamically. Its primary aim is to prevent the simultaneous movement of two or more incompatible traﬃc streams by assigning and canceling the right-of-way to a particular traﬃc stream. However, right-of-way assignment is performed by diﬀerent signal indicators to a stream of traﬃc, which is done by conventions: • Green light = allow passage of cars • amber = get ready to move or to stop • Red = forbidden passage.

Road Intersection Coordination Scheme for Mixed Traﬃc

71

The duration of amber - red and - green intervals in some countries is determined by traﬃc regulations, and it is most frequently speciﬁed as 3 s for amber and 2 s for red-amber indication. – V2V and VI Communication: This is for AVs. It involves a traﬃc intersection control scheme without light. In this case, an autonomous or semiautonomous vehicle accesses an intersection using vehicle-to vehicle (V2V) or vehicle-to-infrastructure (V2I) communication means in smoothly controlling vehicles. The investigation response to research questions: the following ﬁndings were made from the primary studies concerning the research questions: Question 1: How does human drivers and autonomous vehicle behavioural parameters co-exist? The studies revealed the diﬀerent components of human driving and autonomous driving system. While [19,35,36,39] were interested in safety, [16,33] worked primarily on describing mixed traﬃc behaviour. Question 2: How can the human driver’s behaviour be predicted? The studies show that component features considered in predicting the human driver’s attitude on the road, [19,35,36,39] are in agreement in terms of the factors to be considered in predicting the behaviour in mixed traﬃc. Question 3: What is the traﬃc ﬂow performance when cross collision avoidance traﬃc control method is applied? the diﬀerent mechanisms adopted on this were treated by the following papers [30,31,39], which described these as complex, heterogeneous backgrounds.

2

Review of Related Literature

The earliest global traﬃc signals control system was established outside the Houses of Parliament in Britain on 10 December 1868[58]. The system is operated manually with semaphores to control traﬃc by alternating the right of way to traﬃcs at a ﬁxed time interval. The few existing traﬃc ﬂow models [21,53] used in modelling a mix-traﬃc involves diﬀerent types of human-driven vehicles, pedestrians, and cyclists, whose behavioural patterns are heterogeneous but was implemented based on the concept of homogeneous traﬃc model strategies. An Intelligent Traﬃc System (ITS) application aims to provide innovative services by combining traﬃc control strategies with communication technologies for a seamless and optimal traﬃc ﬂow. Figure 1, represents a fully intelligent transportation system features involving all traﬃc participants (this includes vehicles, cyclists, pedestrians, and animal-like dogs), with seamless communication between all the participants. The ITS feature provides a communication platform for all the road users, ranging from communication among traﬃcs, communications between traﬃc and road infrastructure, traveler’s information, and most importantly, improved traﬃc safety. The primary measurement parameters for an eﬃcient road traﬃc management system are:

72

– – – – – – –

E. F. Ozioko et al.

Good driving experience, Reduction in commuting/travel time, Congestion reduction, Traﬃc eﬃciency improvement, Fuel consumption reduction, Accident reduction, and Pollution reduction.

An early approach to automation in vehicles started with the Automated Highway System (AHS) [6,33,33,50,80]. This review focuses on the impact of autonomous vehicle integration on road intersection capacity utilization and the ﬂow eﬃciency in a mixed traﬃc scenario of AVs and HVs. The advent of automated vehicles led to the birth of vehicle-to-vehicle and vehicle-to-infrastructure communication, which inadvertently led to road intersection control management without traﬃc lights but has smooth and eﬃcient traﬃc ﬂows with reasonable safety measures. Automated vehicles (AVs) have shown the capacity to improve the safety and eﬃciency of traﬃc ﬂow with its environmental awareness by reducing and mitigating traﬃc accidents in real-time with a seamless ﬂow of traﬃc and the suitable safety measures [1,47,67,68]. However, according to [7], the road’s capacity can be increased with an increase in the cooperation level between vehicles when their behaviours are homogeneous, but this feature could be extended to a heterogeneous traﬃc system by improving the cooperation levels between AVs and HVs. Improving the cooperation level between AVs and HVs makes this study of traﬃc mix more complex considering the underlining diﬀerence in the behaviour of the two cars categories. Moreover, the simulation results from the study by [34] show that from mixing automated (AVs) and human-driven (or manually-controlled vehicles), the road capacity will start decreasing when compared with homogeneous traﬃc. [34] states that the road capacity of mixed traﬃc could increase by 2.5 times when the percentage of automated vehicles are more than 70%. 2.1

Intelligent Transportation System

An intelligent transportation system is an innovative traﬃc control management application that guarantees an eﬃcient traﬃc ﬂow system with a better informed, safer, more coordinated, and more creative use of traﬃc information and infrastructure. It is an economically optimised solution to general traﬃc problems. ITS employs traﬃc and road infrastructure technologies to reduce congestion by monitoring traﬃc ﬂow performance using sensors, cameras or analysing mobile phone data and rerouting traﬃc through navigational devices as the need may arise. The advent of Intelligent Transportation Systems (ITS) in the last decades has resulted in a dramatic change in traﬃc management. ITS has changed the approach to traﬃc planning, monitoring, management/control, and throughput enhancement. Intelligent transport systems (ITS) are compatible with modern vehicles as they use state-of-the-art communications devices (electronics, navigation) and data analysis technologies to enhance the throughput of the existing

Road Intersection Coordination Scheme for Mixed Traﬃc

73

road traﬃc system. This review aims at investigating the measure to be taken in integrating AV and HV using ITS to beneﬁt HV traﬃc behaviour in the following aspect: safety, throughput, comfort, fuel reduction, and decrease other unfavourable environmental eﬀects.

Fig. 1. Intelligent transportation involving traﬃcs, pedestrians and animals

Models with intelligent transportation features like Advanced Traveller Information Systems (ATIS) [49], and Advanced Traﬃc Management Systems (ATMS) [71] provide travelers with real-time information for travel decision making purposes like the shortest path to its destination and traﬃc congestion measures, respectively. The ITS, which are classiﬁed into three categories: mobility, safety, and environmental, will drive the integration process of AVs and HVs. Other ITS model applications include: – – – –

smart city traﬃc systems vehicle navigation equipment (satnav) vehicle cruise control systems, and platooning.

The popularity of autonomous vehicles is increasing, and It is highly expected that the coexistence of AV and HV will persist as a part of the intelligent transportation system (ITS) for many decades. The traﬃc coexistence of AVs and HVs

74

E. F. Ozioko et al.

will beneﬁt HVs in a mixed-traﬃc environment by enhancing the performance parameters of AV (like shortening the inter-vehicle distance of AV)HVs throughput and safety in a mixed environment. Traﬃc congestion in most cities has been overgrowing with universal mobility pressure and safety issues. Irrespective of the rise in the number of traﬃc is on the rise, constructing new roads is constrained by meager public funds and deep environmental concerns. Robust traﬃc control management and traveller services are essential to enhance the eﬃciency of the existing road system infrastructure and improve the quality of service in a mixed-vehicle environment without constructing additional road capacity separately for AV and HV. Intelligent Transportation System (ITS) requires a speciﬁc traﬃc environment and behaviour for productive traﬃc observation and management. Autonomous Vehicle (AV) An autonomous vehicle sense and observe its surrounding environment, take an informed decision based on its aim, target/destination and the surrounding environment for its safety [44]. A fully autonomous vehicle does not need direct human intervention to achieve its movement objectives when in motion. Autonomous vehicles are intelligent-based vehicles [51,63] that control themselves with electronics devices of ultrasonic sensors, radars, and sensors-video cameras. Some of the advantages of autonomous vehicles are the signiﬁcant increase in road safety, which reduces traﬃc deaths, harmful emissions, travel time, and fuel economy. Besides, autonomous vehicles eliminate stop-and-go waves of traﬃc with an increase in lane capacity. The communication features of the autonomous vehicle create a potential platform for the application of seamless and highly safe traﬃc management approaches. The actual reality of autonomous vehicles is yet to appear after years of conﬁdence from the information technology and car technology industries. In 2015, BMW launched a self-driving prototype car along the autobahn [43], with the promise that by 2020, entirely unaided self-driving vehicles would come to stay in real-life but, unfortunately, in the last month of 2020, this dream has not come to reality because of the challenges associated with the AV and HV integration process. In 2019, Musk claimed that a one million global ﬂeet of Teslas self-driving cars would be in place by 2020 [39]. These Teslas robotics taxis, like cars, will earn their owner money while they sleep or are on holiday. This projection by Musk has not been realised as of today because of the challenges involved in the autonomous vehicle integration process for a seamless AV co-existence with conventional vehicles (HV). Besides, Waymo, in 2018 [9], asserts that its ﬂeet of 20,000 Jaguar I-Pace electric cars would utter up to one million autonomous cars per day soon. However, it did not feel very conﬁdent that December 2021 is feasible for the realisable of the full ﬂeets of autonomous vehicles, which can take us to the shops or workplace from home and extended the self-drive to cover everyday activities. This full emergence of a fully autonomous vehicle on the road is hindered by the following signiﬁcant challenges: co-existence with the humandriven vehicle and useful sensors for seeing the environment around them and

Road Intersection Coordination Scheme for Mixed Traﬃc

75

detecting objects such as pedestrians, other vehicles, and road signs. The problem surrounding the current human-driven vehicle road system and its co-existence with autonomous vehicles could be solved with machine learning applications for its safety behaviour. Besides the above challenges, we still have the challenges associated with autonomous vehicles’ regulation and social acceptability. Human-Driven Vehicle (HV) A human-driven vehicle has a human being at the wheel that controls the vehicle’s full kinetic operations based on human perception. Human drivers’ behaviour is unpredictable and associated with a delay in making a driving decision. Autonomous vehicles’ behaviour is in the sink with intelligent driving systems where vehicles sense the environment and create a driving decision in real-time based on current traﬃc environment status, which serves as input to the system. The traﬃc ﬂow theories relate to diﬀerent traﬃc modelling approaches, namely: microscopic, macroscopic, and mesoscopic traﬃc ﬂow models. These followed a matrix of categorisation of the diﬀerent traﬃc management schemes with their pros and cons; this can be seen in Tables 2 and 3. Traﬃc management parameters involve a set of applications and management tools to enhance the eﬃciency of road transportation systems, all-inclusive traﬃc control, eﬀectiveness, and security. An eﬃcient traﬃc light signal system should regularly maximise the available intersection space by adjusting traﬃc control and coordinating traﬃc parameters based on the available vehicles. This consists of the installation of signal lights that control traﬃc streams by using diﬀerent light indicators whose primary aim is to prevent simultaneous movement of two or more incompatible traﬃc schedules by assigning and cancelling the right-of-way to a set of traﬃc schedules [36,60,62,76,79]. Most research studies conducted for mixed traﬃc systems [11] assume an environment of diﬀerent vehicles size and a mix of vehicles and human beings, while very few researchers have investigated a combination of automatically and manually operated vehicles. Contribution from [31] whose work concentrated on fully mixed environments of human and autonomous vehicles, but the focus was on straight roads, investigated and quantiﬁed driver behaviour changes due to the spread of autonomous vehicles. This review paper focuses on a hybrid intersection that combines diﬀerent traﬃc control strategies and clariﬁes how to cope with safety and eﬃcient traﬃc ﬂow in a mixed environment. Wakui et al. [81] proposed a reduction in the time it takes for a vehicle to pass through an intersection using IVC and RVC technology through the collision avoidance model. However, their experiments assumed that the vehicles passing through the intersection only were autonomous vehicles implementing ITS functions and a mixed environment of human beings crossing the road. Guni Sharon et al. [72] also proposed an intersection entry using a traﬃc signal that has sensing technology to detect non-autonomous vehicles, as well as technology that communicates with autonomous vehicles.

76

3

E. F. Ozioko et al.

Transition from Human-Driven to Autonomous Vehicle Technology

The rate of autonomous vehicles emergence appears to be increasing with a glaring impact on human-drivers vehicle performance. Modern vehicles are designed with some autonomous attributes, such as adaptive cruise and electronic stability control systems. Vehicle autonomy is a stagewise process with a baseline from the human-driving system and subsequent enhancement to address the humandriving system’s challenges. According to [16,35], the vehicle automation process has been divided into ﬁve levels based on the level of human assistance. The vehicle autonomy level is a gradual automation enhancement in the human-driving system to a fully autonomous vehicle driving system. We have six levels of vehicle autonomy stages: Level 0 - No Automation. This stage is the traditional driving system where a human being is responsible for absolute vehicle control. At this level, there is 100% human control for the vehicle. Human drivers handle the vehicle’s motion (acceleration and deceleration process), steering control, and safety intervention systems response. Level 1 - Driver Assistance. Here, the human driver is being assisted with the task of controlling the speed of each one of the vehicle’s via cruise control, position, and through lane guidance. The human driver must be active and observe roads and vehicles every time and take control when the need arises. The human driver is responsible for controlling the vehicle steering wheel and the brake/throttle pedals. At this level of automation, the steering and pedals control of the vehicle is done by a human being. For example, the vehicle adaptive cruise control and parking assistant system belong to this level of automation. Level 2 - Partial Self-Driving. At this automation level, the computer is designed to control the vehicle’s speed and lane position in some deﬁned or secluded environment. The driver may disengage oﬀ the steering control and pedals at this level but is expected to observe navigation to assist in the vehicle control if the need arises. The control of the vehicle at this level is fully automated in a particular environment. This level of automation provides the driver with options to intervene in controlling both pedals and the steering wheel at the same time automatically if necessary. Level 3 - Limited Self-Driving. This level is the beginning of the complete disengagement to complete control and fully independent control of vehicles in some secluded environment. It involves comprehensively monitoring the vehicle’s motion along the road and then triggers for drivers’ assistance as the need arises. When a vehicle is in self-control mode, the driver does not need to monitor vehicle road and traﬃc navigation but must be ready to control when required. This stage is associated with the risk of safety liability for incidence. At this critical automation level, the vehicle has a speciﬁc model to take driving charge

Road Intersection Coordination Scheme for Mixed Traﬃc

77

in certain conditions, but the driver must take the control back when the system requests it. The driver’s attention is highly needed as the vehicle on its own can make lane changes and event response decisions and uses the human driver as a backup in a high-risk environment. Level 4 - Full Self-Driving Under Certain Conditions. This level involves complete vehicle control with or without a human driver in certain situations or environments. An example of this condition is urban ride-sharing. The driver’s role, if present, is to provide the destination of the vehicle. This level is safer than level 3 as the vehicle has complete control of itself under a suitable or isolated/controlled environment without any request for driver’s intervention. The vehicle takes care of its safety challenges at this level. Level 5 - Full Self-Driving Under All Conditions. This is the destination of vehicle automation where vehicles operate absolutely on their own. At this level, human intervention is not needed as the vehicle drives its self. This is an entire automation stage without any human intervention. The level of full vehicle autonomy goes with the state of the art environment control protocols, advanced detection devices, vision response and uses real-time obstacle position measurements for guidance and safety purposes.

4

Autonomous Intersection Management

The emergence of an autonomous driving system led to the advent of autonomous road intersections management system. In most cases, an autonomous road intersection does not make use of traﬃc light control because it is assumed that all the AVs make use of sensors. For an intersection to be autonomous, it must be equipped with sensors, road-side communication units, and other intelligent transportation system devices. In the proposed intersection control scheme, conventional vehicles use a traﬃc light signal system, while autonomous vehicles access road facilities via wireless communication platforms: vehicle-vehicle and vehicle-infrastructure communication. Human-driven vehicles only involve driver-to-road infrastructure communication (one-way communication), while autonomous vehicles are equipped with intelligent navigational collision avoidance features with a 2-way communication system. The deployment of new technologies is usually a gradual process with high-risk factors being considered. The latest technology will gradually replace the current technology while integrating autonomous vehicle movement parameters with the human-driven vehicle to midwife; the smooth transition to a fully automated or intelligent city is necessary for society to enjoy the full beneﬁt of AVs. [12,32,61] Suggests that autonomous vehicles have a very high prospect to increase traﬃc eﬃciency by reducing traﬃc congestion through improved cooperation among vehicles. Also, AVs can enhance the eﬃciency of intersection capacity, enhance the safety margins in a car-following/platoon model, and

78

E. F. Ozioko et al.

improve the road users’ welfare. Research in autonomous vehicles and their integration process has been in researchers’ eyes for a while because of the increasing population with existing traﬃc congestion challenges, urbanisation, and the enormous advantages of autonomous vehicles. Classiﬁcation of Means of Traﬃc Control Consideration is based on the two principal means of traﬃc control communication and ﬂow management at an intersection. Based on the state-of-the-art invehicle technology and road traﬃc management strategy as it applies to humandriven and autonomous vehicles, below is a list of the traﬃc light communication means: – Traﬃc Light Control: Most nations usually adopt two types of traﬃc light control processes. These are the ﬁxed time and dynamic/event-driven traﬃc control modes. • Fixed time-based Scheduling systems: This control process is conﬁgured to turn on and oﬀ or switch in-between the diﬀerent road segments using the three lights sequentially after a given period, in terms of control ﬂexibility and coordination capability. • Event-driven scheduling systems, which is the dynamic traﬃc light control systems, on the other hand, is more appropriate for dense traﬃc control based on queue length, vehicle arrival sequence or is on the traﬃc density from each trajectory. – Connected Vehicle Control: This traﬃc control process involves vehicles that communicate with each other (V2V) or roadside units (V2I). The connected vehicles control process schedules traﬃc at road intersections without traﬃc lights. These control measures allow the traﬃc agent to access the vehicle location and trajectory information which is used in analysing and managing services such as collision prevention or traﬃc lane maintenance and other traﬃc control measures. The work is based on Gipp’s model of collision avoidance. In this case, autonomous or semi-autonomous vehicle accesses intersections using vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) communication means [8,22,37,57,78]. An eﬃcient urban traﬃc management and control framework strategies are usually designed based on centralised and decentralised approaches. – Centralised Approach. This approach has in common at least one scheduling component of all the road segments. It can also power both the traﬃc light and at the same time incorporates Vehicle-to-Infrastructure (V2I) or vehicleto-vehicle communication. In some instances, an intersection agent (IA), upon the receipt of requests from vehicles to cross the intersection, schedules them and determines the best crossing sequence as proposed by [17]. In this method of traﬃc control strategy, as reﬂected in Table 2, at least one factor in the traﬃc scheduling characteristics or features is centrally decided for all vehicles in the scheme through a coordination unit. When a signiﬁcant decision is made for at least one of the factors, it is called a centralized approach [5,48,56,70].

Road Intersection Coordination Scheme for Mixed Traﬃc

79

– Decentralized Approach. Instead of using traﬃc lights or a manager, the decentralised solution relies on vehicle-to-vehicle (V2V) contact synchronisation, enabling vehicles to cross an intersection without anticipating their potential trajectory. In this category Table 3, all the vehicles are handled as autonomous agents but use the interaction between (vehicle-to-vehicle and vehicle-to-infrastructure) to maximise their communication and control eﬃciency. In this case, however, the individual agents (vehicle) receive information from other vehicles and or road-side infrastructure to enhance performance criteria like safety, eﬃciency, and travel time before accessing the intersection [15,18,20,24,42,83]. The correlation used in intersection control with the underpinning technologies with the evaluation of its performance matrix is as shown in Table 3. Classiﬁcation Matrix for the Diﬀerent Traﬃc Control Measure In general, homogeneous and heterogeneous traﬃc control strategies were reviewed based on mixed traﬃc environment compatibility. The classiﬁcation assessment aims to look out for the traﬃc control features that could beneﬁt the co-existence of AH and HV. Each column header of the classiﬁcation matrix table describes the performance index of various methods and identiﬁes which characteristics are to balance. Tables 1 and 2 presents a detailed picture of components for consideration in developing a robust hybrid-based system with some degree of safety, high performance, low costing, scalable, and adaptability. The classiﬁcation categories are based on the following criteria: – Method: This involves the underpinning features of traﬃc management strategy consisting of systematic planning, designing, control, implementing, observation, measurement, formulation, testing, and modiﬁcation of the trafﬁc management system to solve a complex traﬃc problem. Most traﬃc control methods involve direct communication between traﬃcs and road infrastructures, such as signs, signals, and pavement markings. The primary objective of any traﬃc control system is to guarantee safety and optimised traﬃc ﬂow. The control or strategy to orchestrate the traﬃc ﬂow, such as deciding which car may drive or wait? – Vehicle Type: Vehicle type means the category of vehicle driving system characteristics of human-driven or autonomous driven vehicles. This component describes the two-vehicle distinct category: autonomous vehicles (AV) and human-driven (HV) vehicles. Besides, the vehicle classiﬁcation is done based on the communication capability with the intersection control unit, while the assumption is made for all the vehicles to be of the same physical dimension. – Performance Index (PI): This is a measure of traﬃc ﬂow eﬃciency: where +, ++ means good and best performance, respectively. Every traﬃc intersection control model has a Performance Index (PI) that indicates the overall eﬃciency of the vehicle’s control method. The traﬃc control eﬃciency s measured based on the delay associated with traﬃc ﬂow. The traﬃc control performance measurement and monitoring signiﬁcantly impact the design,

80

–

–

–

–

E. F. Ozioko et al.

implementation, and management of traﬃc control models and, to a large extent, contribute to the identiﬁcation, comparison, and assessment of alternative traﬃc management strategies. Means of Communication: These are the channels within a medium that vehicle and roadside devices use in sending signals or messages across to each other at the road intersection. Traﬃc light signal and vehicle to vehicle (V2V) or vehicle to infrastructure (V2I) communication are their means of vehicle control communication. This represents the means of vehicle intercommunication where the signal from a traﬃc light, V2V, and V2I wireless transmission of data that occurs between vehicles is communicated for AV. Road intersection-vehicular communication systems involve networks in which vehicles and roadside units communicate for a free and safe traﬃc ﬂow. The communicating devices (vehicles/drivers and road roadside devices) provide each other with traﬃc information such as expected arrival time, speed, position, and direction, eﬀective in collision avoidance and traﬃc congestion. Fairness: In the intersection management contest, Fairness is the impartial and just treatment or behaviour without favoritism or discrimination to trafﬁcs. Fairness metrics are the waiting time used in traﬃc network engineering to determine whether traﬃc participants are treated fairly, considering traﬃc eﬃciency. The fairness measure to traﬃc requests at the intersection is based on a classiﬁcation algorithm using the vehicle of the ﬁrst arrival and queue lengths. This feature takes care of the waiting time among vehicles, in which case, the principle of “FIFO” is obeyed at the point of an intersection unless there is a priority request from an emergency vehicle. Safety: The road traﬃc safety matrix refers to the approaches and strategies applied in preventing traﬃc collisions or road accidents at the intersection. Every traﬃc management solution usually deﬁnes the potential collision areas before making optimal decisions about which countermeasures to use and when they should be used to ﬁx intersection safety issues. This deals with the percentage eﬃciency of the control system’s safety in preventing vehicle collisions or accidents. Though there is no ideal system considering human error, health and safety issues are paramount in traﬃc management methods. Scalability: “Scaling a road intersection” means to “’increase or grow several roads that join together in an intersection” or “increase the size of a road segment network or several intersections that make up the intersection network generally” This is the estimate of a system’s potentials to vary the number of roads infrastructure like roads size, and the number of the lane is of interest. Besides, Scalability is related to both eﬃciency and cost in response to changes in application and device processing demands. For a new system to serve the test of time, the system must pose the capability or potentials to be expanded to address more complex traﬃc control challenges and

Road Intersection Coordination Scheme for Mixed Traﬃc

81

scenarios with a diﬀerent type of road network and size. This scalability component addresses the following question: is the new system robust enough to be applied in another area of the traﬃc intersection management problem? What is the risk factor involved in applying it to more complex traﬃc intersections? – Cost: This cost component could be quantiﬁed with a variable. In analysing the intersection design and deployment cost of diﬀerent traﬃc control methods, the comparison process could be based on any of the following cost matrices: • The initial project capital cost: This takes care of the Cost of implementation and deployment, which involves the total Cost of preliminary design and analysis of the method, right-of-way, utilities, and construction. • Operation and maintenance cost: This is an ongoing cost associated with the intersection throughout and the design life. According to [69], the relative or average annual cost of lighting an intersection includes maintenance and power supply, is $750 in EU nations. • Delay cost: According to the Texas Transportation Institute’s 2012 Urban Mobility Report [69], the Cost of an hour of delay of vehicles at road intersection is $16.79. This report quantiﬁes the amount of congestion in cities across the US and provides many cost-related impacts of congestion. • Safety cost: This is the computation of the expected number of collisions that may be associated with each of the control methods. This component looks at how safe the strategy is and the Cost of the risk factors associated with it. – Complexity: The design and implementation of a road traﬃc intersection range from a simple road network joining two roads to a complicated and convergence of several high-volume multi-lane road networks. The management of intersection is directly proportional to its complexity. The more complex an intersection is, the more expensive it will be to maintain or manage it. Besides, complexity describes how complex an intersection is and its time to execute a traﬃc scheduling algorithm. Therefore, complexity deals with how diﬃcult the traﬃc control can be implemented in real-time and how to resolve the errors. The Tables 2 and 3 show a matrix of classiﬁcation used to quantify the quality of each traﬃc control feature concerning traﬃc management strategy and eﬃciency. The signs: 0, +, −, ++, and−− are used in this order to show statistical impact levels of non-impact, adverse impact, positive impact, major negative impact, and signiﬁcant positive impact respectively. A detailed Pros and cons matrix of each reviewed traﬃc management method was analysed in Table 3.

82

E. F. Ozioko et al.

Table 1. Categorisation based on centralised intersection control Method

Vehicle type

Communication Performance Fairness Safety Scalability Cost Complexity

Cooperative eco-driving model

AV and HV

V2I and V2V

++

++

+

+

+

-

Fuzzy-based

AV

V2V

++

++

++

++

+

-

Automatic AV merge control

V2V and V2I

+

++

++

+

–

+

Vehicle platooning

V2V and V2I

+

+

+

++

+

– +

A

Signal

+

++

++

-

Cooperative AV adaptive cruise control

H

V2V

+

+

+

+

++

++ +

Game theory-based intersection control

AV

Signal

+

++

++

+

+

Genetic Algorithm

AV

Signal

+

++

++

+

++ +

Optimisation AV (CVIC) approach

V2V and V2I

+

++

+

++

++ -

HV (MPC)

V2V and V2I

-

++

++

++

+

++ -

AV (Multi-agents) Signal

+

++

+

+

-

Safe velocity and acceleration

HV and AV

V2V

+

++

++

++

++ ++

Buﬀerassignment based coordinated

AV

V2V and V2I

+

++

++

++

++ ++

-

Table 2. Categorisation based on decentralised intersection control Method

Vehicle type small Performance Fairness Safety Scalability Cost Complexity Communication

Job scheduling

AV

Optimisation of Connected vehicle environment

AV and HV Signal

Signal

+

+

++

+

–

++

++

++

++

++ ++

–

Marginal gap AV intersection crossing

V2V and V2I

++

++

++

++

++ ++

Merge control using AV virtual vehicles to map lanes

Signal, V2V and V2I

++

++

++

+

++ ++

Autonomous agent-based scheduling

AV

V2V and V2I

++

+

+

++

+

+

Virtual platooning

AV

V2V

++

++

++

+

+

–

++

++

+

+

–

**

++

**

**

+

Our Approach: AV and HV Signal,V2I –– Space-time cell with and V2V HV and AV Virtual platooning

AV

Space-time cell reservation

AV and HV Signal,V2I ** and V2V

V2V

++

Road Intersection Coordination Scheme for Mixed Traﬃc

83

The State of the Art in Mix Traﬃc Management Currently, there is a large diversity of research going on in mixed traﬃc generally, which are mostly directed towards diﬀerent traﬃc participants: humandriven vehicles (cars, buses, trucks), motorcycles, bicycles, and pedestrians, which exposes very few design details for a mix of HVs and AVs. To fully embrace the emergence of AV and HV, research in a mix of human-driven and autonomous vehicle co-existence is necessary to provide the enabling environment needed for the integration process. Generally, the state-of-the-art in the conventional human-driven traﬃc management applications was implemented using the event-driven traﬃc light control system. However, there are drawbacks in the level of traﬃc cooperation, throughput, and safety when these methods are implemented in a mixed scenario because of the complexities involved in a mix of AV and HV behaviour. The emergence of autonomous vehicles has moved human drivers’ role from active control operation to a passive supervisory role. A closer look at the modern road vehicles, one will observe a high-level advancement in the automation of most vehicle devices, like adaptive cruise control, obstacle maneuvers, automated brake systems, to mention but a few. [84], proposed a reducing horizon model predictive control (MPC) with dynamic platoon splitting and integration rules for AVs and HVs, which mostly ease out the trajectory problem and prevent any shock wave but do not concurrently optimise the trajectory and signal timing of the road intersection. There are some traﬃc management techniques, [10,19,25,46,52] investigated the impact of integrating AVs on the existing roads to co-exist with the HV’s, but most of the model performance eﬃciency is below average because of the cooperation levels of the vehicle type. The heterogeneous nature involved in the driving behaviour and the vehicle communication parameters will naturally exacerbate the cooperation level between the two vehicles, thereby drastically reducing the traﬃc ﬂow eﬃciency. How will the mix work with diﬀerent vehicle behaviour in a mixed traﬃc system while maintaining the full vehicle characteristics at reasonable traﬃc eﬃciency? The current research on mixed traﬃc [10,19,25,46,52], looked critically on a high-way road system using the following three main traﬃc ﬂow characteristics components: – Vehicle characteristics – Driving behaviour – Road system characteristics This work appears interesting, but it was only restricted at the microscopic level which will deﬁnitely give one a diﬀerent result at the macroscopic level when many vehicles are involved. In a mix-traﬃc system, microscopic models are used to model each vehicle as a kind of particle. The interactions among cars are modelled with simulations with each component of the proposed framework veriﬁed. Each car type is modelled with the cars and the road interaction protocol system implemented in the proposed mix-traﬃc framework. This framework is veriﬁed through simulations involving 3-way and 4-way intersection environments with a full detailed assessment of the impact of each vehicle type. The critical challenge in agent-based

84

E. F. Ozioko et al.

traﬃc simulation is re-creating practical traﬃc ﬂow at macro and micro levels. By seeing traﬃc ﬂows as emergent phenomena, [59] proposed a multi-agent-based traﬃc simulator because drivers’ behaviour is a crucial factor that gives rise to traﬃc congestion. According to [26], car agent’s behaviours are often implemented by applying car-following theories using a continuous one-dimensional road model. [30] proposed a multilevel agent composed of agents models involving micro-meso, micro-macro, meso-macro simulation framework to address a large scale road mixed-traﬃc system using an organisational modelling approach. The multiple-leader car-following model involves a heterogeneous mixture of vehicle types that lack lane discipline. According to [13,54,55], these traﬃc conditions lead to a complex driving maneuver that combines vehicle motion in the lateral and longitudinal direction that needed to address multiple-leader following. [28,55] sought to simplify mixed traﬃc modelling by developing a control technique based on the concept of virtual lane shifts, which entered on identifying signiﬁcant lateral changes as a signal of a lane-changing situation. Because the vehicle’s behaviour is not homogeneous, the following driver’s behaviour is not necessarily inﬂuenced by a single leader but is mainly dependent on the type of the front vehicle. Connected Vehicle Communication: Vehicle-to-vehicle (V 2V ) and vehicle-toinfrastructure (V 2I) communications are both possible in a connected vehicle system [38]. The Cooperative Adaptive Cruise Control (CACC) systems that can safely drive vehicles with very short head-ways by forming platoons to increase road traﬃc ﬂow capability [60,65,79]. CAVs’ advanced technologies open up a world of possibilities for developing novel traﬃc ﬂow management approaches, such as cooperative adaptive cruise control (CACC), speed harmonisation, and signal control, to name a few. With much room for improvement in traﬃc safety, quality, and environmental sustainability, the intersection coordination scheme has obtained broad research interests [14,23,27,41,42,73,82,85]. For several years, the idea of following a vehicle with a short gap in CACC has been generalised to provide a new and eﬃcient intersection control model, in which nearly conﬂicting vehicles approaching from diﬀerent directions will cross the intersection with marginal gaps without using a traﬃc signal. Optimising the level of cooperation between vehicles will enable automated vehicles to reach their maximum potential to reduce traﬃc congestion, reduce travel time, and increase intersection capability. However, Omae et al. [60] suggested a virtual platooning system for automated vehicle control at an intersection that allows vehicles to pass through without pausing, but this approach is not feasible in a mixed environment because of the presence of HVs. Vehicles in both lanes are deemed to be in a virtual lane situation, and their intersection interference is taken into account. They are separately managed so that they can safely follow the leading platoon’s vehicle. The system, which was tested using four electric vehicles ﬁtted with automated driving and V2V communication technologies at a one-way intersection, resulted in a signiﬁcant reduction in traﬃc congestion.

Road Intersection Coordination Scheme for Mixed Traﬃc

4.1

85

Review Strategy

This systematic review of Road Intersection Coordination Scheme for Hybrid Vehicles (Human Driven and Autonomous Vehicles) is based on the below guidelines as reﬂected in Table 3: Table 3. List of data sources Source type

Name of database

Online databases IEEEXplore, Springer, ACM, ArXiv DOAJ, PUBMED, DfT Search engines

4.2

Google Scholar, CiteSeerx

Inclusion and Exclusion Criteria

The review strategies employed the inclusion and exclusion criteria of the selected primary studies research materials reﬂected Table 4. This table contains the publication source, the category and the numbers of material utilised. The exclusion criteria include: – Materials with inadequate reference information – Articles concerned mainly with the human-driven traﬃc scheduling scheme – Studies that also review papers homogeneous traﬃc management without addressing mixed traﬃc scenarios. – Conference papers, which have also been published in a journal. The inclusion criteria include the following: – – – – –

Articles on general mixed traﬃc management. Articles that discussed traﬃc intersection scheduling scheme Journals ranked by the Scientiﬁc Journal Ranking (SJR) Conferences ranked by the Computing Research Education (CORE) Papers presented in the English language.

By applying the inclusion and exclusion criteria, 44 studies were selected, as summarised in Table 2. Table 3 shows a distribution of the materials from the search by year of publication. 4.3

Data Extraction and Analysis Based on Traﬃc Control Parameters

Table 5 gives a snapshot of the relationship between the primary studies used in this review. From this, we can extract the important information from the selected studies. The following strategies are adopted:

86

E. F. Ozioko et al. Table 4. Selected primary studies Sources IEEEXplore Springer

Journal Conf. Paper Selected 3

62

62

32

18

6

24

7

CiteSeerX

1

1

2

1

Google scholar

2

3

2

2

DfT

2

4

3

2

Total

26

75

93

44

Table 5. Problem solved in primary study Type of problem

Study

Communication

[2, 6, 7, 8, 9, 12,]

Inter-vehicle time [11, 15, 20, 15, 16, 22] Entry distance

[1, 34, 26, 28, 29]

– Answer the individual research question; – Search for additional information within the study. – Identify research gaps and provide recommendations for further studies The classiﬁcation matrix Tables 1 and 2 summarises the traﬃc control strategies, their performance, and ﬁndings from the primary studies. From the investigation, each paper is reviewed and analysed to know the exact problem solved and the strategy used in solving it. This included the component of interest and the features or characteristics of the parameters required in solving the problem. The research review also seeks to determine if the studies considered the impact of the traﬃc ﬂow control strategies on traﬃc eﬃciency, safety, and travel time constraints. While [3, 4, 7, 9, 1] solved the intersection optimisation and eﬃciency problem [7, 10, 11, 40] presented the techniques for analysing intersection safety.

5

Research Gap

There appears to be a wide range of adoptable microscopic simulation models for lane-following homogeneous traﬃc based on the current literature [66]. Gipp’s car-following model’s unidirectional (longitudinal) interaction is not suitable for two-dimensional mixed traﬃc modelling. The Existing mix-traﬃc models are unable to describe the lateral vehicle interactions using the theory of the carfollowing model because of the driver behaviour. The number of vehicle types present and the relationship between the lateral and longitudinal characteristics and vehicle speed play a signiﬁcant role in managing heterogeneous traﬃc behaviour in a mixed AV and HV. The current literature conﬁrms that there

Road Intersection Coordination Scheme for Mixed Traﬃc

87

are typical constraints in the car-following model [29] which are its rigidity to longitudinal vehicle dynamics parameters: safe-distance, maximum speed, and acceleration/deceleration rate. Besides, the recognition and integration of these traﬃc parameters that control the complex 2-dimensional vehicle behavioural models of traﬃc participants are critical tasks towards a new research direction. Currently, there has been minimal real-time data from mix-traﬃc of AV and HV studies; the little that is available consists mainly of assumptions based on the traﬃc ﬂow theories and simulation. Most existing traﬃc models are only suitable for describing a homogeneous traﬃc environment using healthy lane behaviour. To solve the mix-traﬃc problem eﬀectively, a model should simultaneously describe AV and HV types’ lateral and longitudinal behaviours using the microscopic simulation model. As a result, an in-depth analysis of vehicle lateral and longitudinal movements is needed to assess driver behaviour in this heterogeneous traﬃc ﬂow system. Currently, no widely used traﬃc theory could exhaustively simulate a 2-dimensional mix-traﬃc ﬂow involving a lateral and longitudinal behavioural model because of the intricate human driving behavioural pattern involved. It is only a robust 2-dimensional traﬃc ﬂow model that can perfectly describe the characteristics of vehicles with complex behaviour that can successfully simulate a mix-traﬃc of AV and HV. However, few studies attempted to develop an integrated and robust driving behaviour model, but the proposed model eﬃciency was below average. Some of the model are as shown below: – Multiple-leader car following, road tides (rise and fall of the road surface), – Multiple-leader car following, road tides (rise and fall of the road surface), – Tailgating (driving dangerously close to a leading vehicle making it impossible that they would be able to avoid a crash if the driver braked quickly), – ﬁltering (which involves moving past queues of stationary or slow-moving traﬃc), – swerving in a dull mix-traﬃc setting (involves using operational data received to identify a potentially high-risk or unsafe driving behaviour by the ﬁrst vehicle). Most existing mix-traﬃc models employed the basic principles of homogeneous traﬃc models development, which deviate from the heterogeneous nature of a mix of Av and HV. [75] proposed a Generate-Spatio-Temporal-Data (GSTD) algorithm for generating two-dimensional moving points over time as a line in three-dimensional space or rectangular data that follow an extended set of distributions. This work of [75] was extended by [64,77] with the introduction of new parameters to create more realistic object movements and permit the creation of trajectories originating from objects moving in an obstructed environment. However, the works of the above three authors did not consider a road intersection as the basis of its simulation. In contrast, other researchers like [45] considered their model as a network but not in a mixed traﬃc environment. Traﬃc intersection is the major part of the road segment that experiences high traﬃc congestion and high-risk level. The regression approach or the gapacceptance method are often used to analyse the intersection performance, but previous research [2,3,74] has found that the gap-acceptance approach has a

88

E. F. Ozioko et al.

few disadvantages, like its inability to be used on traﬃc streams that do not follow a consistent pattern of cars behaviour. The gap-acceptance theory fails when a mixed behaviour of aggressive and gentle cars co-exist. The basic carfollowing model was designed for homogeneous traﬃc conditions whose parameters utilisation cannot eﬀectively address mix-traﬃc conditions. In a heterogeneous traﬃc behaviour scenario, the current research direction in mixed traﬃc is to apply the technique of the homogeneous car-following model to heterogeneous mix-traﬃc models. A proposal is made for the combination of the intelligent driver model (IDM) [40] proposed for a single-lane road with the Gazis-HermanRothery (GHR) heterogeneous traﬃc behaviour model [4] to model a complex 2-dimensional mixed traﬃc of Av and HV. This will go a long way to address the research gap of evaluation of 2-dimensional traﬃc using both a linear and IDM traﬃc ﬂow model. However, Gipps’ model modiﬁcation was used on a single-lane route to provide vehicle-type-based parameters for various combinations of cars, trucks, and buses. Simulation of Mixed Traﬃc Mobility (SiMTraM) is a standard car-following model simulator that could be modiﬁed to create a new approach to modeling heterogeneous traﬃc ﬂow conditions involving AV and HV. Also, Simulation of Urban Mobility (SUMO) is an open-source multi-modal traﬃc simulation package that is a compact, microscopic, and continuous system which is used to simulate mixed traﬃc involving vehicles, public transport, and pedestrians. It is designed to manage massive traﬃc networks. However, SiMTraM and SUMO have a downside as they cannot comprehend vehicle behaviour in various traﬃc environments. In addition, to modelling a mixed-traﬃc mix ﬂow scenario, assessing the eﬀect of each traﬃc participant on individual vehicle behaviour is needed for an eﬀective description of the traﬃc ﬂow. This model can account for the dynamic interactions between individual vehicles, road structures and the need for model calibration and validation using real-time data. Because drivers tend to maintain a safe gap with other vehicles to avoid a collision, the safe-distance modelling approach is reliable in simulating the longitudinal movements of diﬀerent vehicles in a mixed traﬃc stream. Furthermore, compared to fuzzy logic models, cellular automata models appear to be more appropriate for modelling lateral interactions or lane-changing behaviour of vehicles that evolve through a number of discrete time steps. Incorporating vehicletype-dependent behaviour in mixed traﬃc conditions of car-following models to precisely recognise the driving behaviour is the right direction to go in optimizing a mix of AVs and HVs co-existence. In the basic car-following model, a traﬃc collision is imminent when the leading vehicle’s operation is uncertain, resulting in a decrease in relative spacing between the vehicles group, thereby jeopardising the safe following distance. Azevedo, made an essential contribution to developing a safe distance model that successfully estimated actual vehicle behaviour in various traﬃc conditions. However, the model’s accuracy in estimating the safe distance remains unclear because the safe distance is a critical aspect of the traﬃc model which needs to be captured in the circular automation.

Road Intersection Coordination Scheme for Mixed Traﬃc

6

89

Conclusion

This section presents the summary background of literature in mixed-traﬃc management, with fundamentals in describing traﬃc control systems and details of the impact of traﬃc control parameters. The related literature on conventional traﬃc management, intelligent transportation, and mix-traﬃc management systems is captured in detail. The state-of-the-art methods in managing heterogeneous traﬃc systems are investigated with suggested solutions to addressing mixed traﬃc problems from the state-of-the-art mixed traﬃc management strategies. From the above review of traﬃc modelling, mesoscopic models are often hard to discretise or represent traﬃc accurately and are therefore not often used in traﬃc simulation and modelling. The microscopic and macroscopic traﬃc simulation modelling approaches are often used in traﬃc modelling because they can easily describe the full details of an individual vehicle and group of vehicles, respectively. From the fundamental diagrams of traﬃc ﬂow, the proposed AVHV control mixed-traﬃc ﬂow could be realised by combining both microscopic and macroscopic traﬃc model parameters. However, microscopic models and simulation tools could perfectly forecast traﬃc in a more detailed way. Therefore, the microscopic model is proposed to predict the behaviour of individual vehicles in mix-traﬃc settings eﬀectively. The review of the state-of-the-art mix-traﬃc modelling capabilities indicated that no single traﬃc model could eﬀectively address a mix-traﬃc of AV and HV. Existing analytical models, such as the carfollowing models, have demonstrated greater ﬂexibility with less computational workload than rule-based cellular automata models, which use complex rules to simulate vehicle dynamics. In modelling an eﬃcient 2-D behavioural traﬃc ﬂow model that can accurately describe a mix of Av and HV behaviour environments, there is a need to incorporate more than one traﬃc ﬂow model with varieties of performance parameters. The proposed model involves integrating the existing traﬃc simulation models with the required modiﬁcation to meet the functionality involved in a mix-traﬃc setting. The proposed single lane-based model considers the left and right lanes as agents for joining the vehicle platoon or the new lane. These behavioural features of the model are what will make vehicle co-existence possible. Based on the preceding, three traﬃc models (reservation nodes, car-following, and collision avoidance by Gipps) were identiﬁed for integration and enhancements to support a mix of AVs and HVs model ﬂow at a road intersection. There has been much improvement in mix-traﬃc management strategies over the years, but the state-of-the-art has not addressed the challenges of mixing-traﬃc using the 2-dimensional gap-acceptance method in a car-following model. From the review of the ﬁndings, the currently available mix-traﬃc models cannot be directly utilised to simulate a traﬃc mix involving AV and HV without modiﬁcation to the identiﬁed essential traﬃc parameters of lateral motion holistically in each vehicle type for the model. The proposed transition to the fully autonomous driving vehicle has generated various expectations ranging from an increase in driving comfort, decrease in delay, reduction in traﬃc incidence, increase in road comfort, decrease in carbon emission, decrease in fuel consumption, and decrease in driver shortage. Within this envisaged tran-

90

E. F. Ozioko et al.

sition and integration of AVs and HVs period of the co-existence, there is a need for a robust technology to be put in place to drive and support the transition process seamlessly. Though this study showed positive results using logical reasoning, implementing this traﬃc management system depends on the existing infrastructure, and the technology is potentially cost-ineﬀective.

References 1. Abduljabbar, R., Dia, H., Liyanage, S., Bagloee, S.: Applications of artiﬁcial intelligence in transport: an overview. Sustainability 11(1), 189 (2019) 2. Akcelik, R.: Gap-acceptance modelling by traﬃc signal analogy. Traﬀ. Eng.+ Control 35(9), 498–501 (1994) 3. Ak¸celik, R.: A review of gap-acceptance capacity models. In: The 29th Conference of Australian Institutes of Transport Research (CAITR 2007), pp. 5–7. University of South Australia, Adelaide (2007) 4. Al-Jameel, H.A.Z., et al.: Examining and improving the limitations of GazisHerman-Rothery car following model (2009) 5. Anderson, J.M., Nidhi, K., Stanley, K.D., Sorensen, P., Samaras, C., Oluwatola, O.A.: Autonomous Vehicle Technology: A Guide for Policymakers. Rand Corporation, Santa Monica (2014) 6. Annell, S., Gratner, A., Svensson, L.: Probabilistic collision estimation system for autonomous vehicles. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 473–478. IEEE (2016) 7. Arnaout, G.M., Arnaout, J.-P.: Exploring the eﬀects of cooperative adaptive cruise control on highway traﬃc ﬂow using microscopic traﬃc simulation. Transp. Plan. Technol. 37(2), 186–199 (2014) 8. Au, T.-C., Zhang, S., Stone, P.: Autonomous intersection management for semiautonomous vehicles. In: Handbook of Transportation, pp. 88–104 (2015) 9. Bazilinskyy, P., Kyriakidis, M., Dodou, D., de Winter, J.: When will most cars be able to drive fully automatically? Projections of 18,970 survey respondents. Transp. Res. F Traﬀ. Psychol. Behav. 64, 184–195 (2019) 10. Bento, L.C., Paraﬁta, R., Nunes, U.: Intelligent traﬃc management at intersections supported by v2v and v2i communications. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 1495–1502. IEEE (2012) 11. Bingfeng, S.I., Zhong, M, Gao, Z.: Link resistance function of urban mixed traﬃc network. J. Transp. Syst. Eng. Inf. Technol. 8(1), 68–73 (2008) 12. Booth, L., Norman, R., Pettigrew, S.: The potential implications of autonomous vehicles for active transport. J. Transp. Health 15, 100623 (2019) 13. Budhkar, A.K., Maurya, A.K.: Multiple-leader vehicle-following behavior in heterogeneous weak lane discipline traﬃc. Transp. Dev. Econ. 3(2), 20 (2017) 14. Chan, E., Gilhead, P., Jelinek, P., Krejci, P., Robinson, T.: Cooperative control of sartre automated platoon vehicles. In: 19th ITS World Congress ERTICO-ITS Europe European Commission ITS America ITS Asia-Paciﬁc (2012) 15. Claes, R., Holvoet, T., Weyns, D.: A decentralized approach for anticipatory vehicle routing using delegate multiagent systems. IEEE Trans. Intell. Transp. Syst. 12(2), 364–373 (2011) 16. Domeyer, J.E., Lee, J.D., Toyoda, H.: Vehicle automation–other road user communication and coordination: theory and mechanisms. IEEE Access 8, 19860–19872 (2020)

Road Intersection Coordination Scheme for Mixed Traﬃc

91

17. Dresner, K., Stone, P.: Traﬃc intersections of the future. In: Proceedings of the National Conference on Artiﬁcial Intelligence, vol. 21, p. 1593. AAAI Press; MIT Press, Menlo Park, Cambridge, London (1999, 2006) 18. Dresner, K., Stone, P.: A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 31, 591–656 (2008) 19. Dro´zdziel, P., Tarkowski, S., Rybicka, I., Wrona, R.: Drivers’ reaction time research in the conditions in the real traﬃc. Open Eng. 10(1), 35–47 (2020) 20. Eguchi, J., Koike, H.: Discrimination of an approaching vehicle at an intersection using a monocular camera. In: 2007 Intelligent Vehicles Symposium, pp. 618–623. IEEE (2007) 21. Emami, P., Pourmehrab, M., Martin-Gasulla, M., Ranka, S., Elefteriadou, L.: A comparison of intelligent signalized intersection controllers under mixed traﬃc. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 341–348. IEEE (2018) 22. Fajardo, D., Au, T.-C., Waller, S.T., Stone, P., Yang, D.: Automated intersection control: performance of future innovation versus current traﬃc signal control. Transp. Res. Rec. 2259, 223–232 (2011) 23. Feng, Y., Head, K.L., Khoshmagham, S., Zamanipour, M.: A real-time adaptive signal control in a connected vehicle environment. Transp. Res. Part C Emerg. Technol. 55, 460–473 (2015) 24. Ferreira, M.C.P., Tonguz, O., Fernandes, R.J., DaConceicao, H.M.F., Viriyasitavat, W.: Methods and systems for coordinating vehicular traﬃc using in-vehicle virtual traﬃc control signals enabled by vehicle-to-vehicle communications. US Patent 8,972,159, 3 March 2015 25. Friedrich, B.: The eﬀect of autonomous vehicles on traﬃc. In: Maurer, M., Gerdes, J.C., Lenz, B., Winner, H. (eds.) Autonomous Driving, pp. 317–334. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-48847-8 16 26. Fujii, H., Uchida, H., Yoshimura, S.: Agent-based simulation framework for mixed traﬃc of cars, pedestrians and trams. Transp. Res. Part C Emerg. Technol. 85, 234–248 (2017) 27. Gong, S., Lili, D.: Cooperative platoon control for a mixed traﬃc ﬂow including human drive vehicles and connected and autonomous vehicles. Transp. Res. Part B Methodol. 116, 25–61 (2018) 28. Guo, J., Cheng, S., Liu, Y.: Merging and diverging impact on mixed traﬃc of regular and autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 22(3), 1639– 1649 (2020) 29. Guo, L., Ge, P., Sun, D., Qiao, Y.: Adaptive cruise control based on model predictive control with constraints softening. Appl. Sci. 10(5), 1635 (2020) 30. Haman, I.T., Kamla, V.C., Galland, S., Kamgang, J.C.: Towards an multilevel agent-based model for traﬃc simulation. Procedia Comput. Sci. 109, 887–892 (2017) 31. Hara, T., Kiyohara, R.: Vehicle approaching model for t-junction during transition to autonomous vehicles. In: 2018 International Conference on Information Networking (ICOIN), pp. 304–309. IEEE (2018) 32. Heilig, M., Hilgert, T., Mallig, N., Kagerbauer, M., Vortisch, P.: Potentials of autonomous vehicles in a changing private transportation system-a case study in the Stuttgart region. Transp. Res. Procedia 26, 13–21 (2017) 33. Horowitz, R., Varaiya, P.: Control design of an automated highway system. Proc. IEEE 88(7), 913–925 (2000) 34. Huang, S., Ren, W.: Autonomous intelligent vehicle and its performance in automated traﬃc systems. Int. J. Control 72(18), 1665–1688 (1999)

92

E. F. Ozioko et al.

35. Jeong, E., Cheol, O., Lee, S.: Is vehicle automation enough to prevent crashes? role of traﬃc operations in automated driving environments for traﬃc safety. Acc. Anal. Prevent. 104, 115–124 (2017) 36. Kamal, M.A.S., Imura, J., Ohata, A., Hayakawa, T., Aihara, K.: Control of traﬃc signals in a model predictive control framework. In: IFAC Proceedings Volumes, vol. 45, no. 24, pp. 221–226 (2012) 37. Kamal, M.A.S., Imura, J., Hayakawa, T., Ohata, A., Aihara, K.: A vehicleintersection coordination scheme for smooth ﬂows of traﬃc without using traﬃc lights. IEEE Trans. Intell. Transp. Syst. 16(3), 1136–1147 (2015) 38. Kamal, M.A.S., Imura, J., Ohata, A., Hayakawa, T., Aihara, K.: Coordination of automated vehicles at a traﬃc-lightless intersection. In: 2013 16th International IEEE Conference on Intelligent Transportation Systems-(ITSC), pp. 922– 927. IEEE (2013) 39. Kanthack, C.A.: Autonomous vehicles and driving under the inﬂuence: examining the ambiguity surrounding modern laws applied to future technology. Creighton L. Rev. 53, 397 (2019) 40. Kesting, A., Treiber, M., Helbing, D.: Enhanced intelligent driver model to access the impact of driving strategies on traﬃc capacity. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 368(1928), 4585–4605 (2010) 41. Khondaker, B., Kattan, L.: Variable speed limit: a microscopic analysis in a connected vehicle environment. Transp. Res. Part C Emerg. Technol. 58, 146–159 (2015) 42. Knorn, S., Donaire, A., Ag¨ uero, J.C., Middleton, R.H.: Passivity-based control for multi-vehicle systems subject to string constraints. Automatica 50(12), 3224–3230 (2014) 43. Lari, A., Douma, F., Onyiah, I.: Self-driving vehicles and policy implications: Current status of autonomous vehicle development and Minnesota policy implications. Minn. JL Sci. Technol. 16, 735 (2015) 44. Le Vine, S., Zolfaghari, A., Polak, J.: Autonomous cars: the tension between occupant experience and intersection capacity. Transp. Res. Part C Emerg. Technol. 52, 1–14 (2015) 45. Li, N.: Large-scale realistic macro-simulation of vehicle movement on road networks (2013) 46. Liao, R.: Smart mobility: challenges and trends. In: Toward Sustainable and Economic Smart Mobility: Shaping the Future of Smart Cities, p. 1 (2020) 47. Lin, S.-H., Ho, T.-Y.: Autonomous vehicle routing in multiple intersections. In: Proceedings of the 24th Asia and South Paciﬁc Design Automation Conference, pp. 585–590. ACM (2019) 48. Litman, T.: Autonomous Vehicle Implementation Predictions. Victoria Transport Policy Institute, Victoria, Canada (2017) 49. Liu, Y.-C.: Comparative study of the eﬀects of auditory, visual and multimodality displays on drivers’ performance in advanced traveller information systems. Ergonomics 44(4), 425–442 (2001) 50. Lu, X.-Y., Tan, H.-S., Shladover, S.E., Hedrick, J.K.: Automated vehicle merging maneuver implementation for ahs. Veh. Syst. Dyn. 41(2), 85–107 (2004) 51. Luettel, T., Himmelsbach, M., Wuensche, H.-J.: Autonomous ground vehiclesconcepts and a path to the future. Proc. IEEE 100(Special Centennial Issue), 1831–1839 (2012) 52. Le Maitre, M., Prorok, A.: Eﬀects of controller heterogeneity on autonomous vehicle traﬃc. arXiv preprint arXiv:2005.04995 (2020)

Road Intersection Coordination Scheme for Mixed Traﬃc

93

53. Marisamynathan, S., Vedagiri, P.: Modeling pedestrian delay at signalized intersection crosswalks under mixed traﬃc condition. Procedia Soc. Behav. Sci. 104, 708–717 (2013) 54. Matcha, B.N., Namasivayam, S.N., Fouladi, M.H., Ng, K.C., Sivanesan, S., Noum, S.Y.E.: Simulation strategies for mixed traﬃc conditions: areview of car-following models and simulation frameworks. J. Eng. 2020 (2020) 55. Mathew, T.V., Munigety, C.R., Bajpai, A.: Strip-based approach for the simulation of mixed traﬃc conditions. J. Comput. Civil Eng. 29(5), 04014069 (2015) 56. Milan´es, V., P´erez, J., Onieva, E., Gonz´ alez, C.: Controller for urban intersections based on wireless communications and fuzzy logic. IEEE Trans. Intell. Transp. Syst. 11(1), 243–248 (2010) 57. Montanaro, U., Dixit, S., Fallah, S., Dianati, M., Stevens, A., Oxtoby, D., Mouzakitis, Al.: Towards connected autonomous driving: review of use-cases. Veh. Syst. Dyn. 1–36 (2018) 58. Mueller, E.A.: Aspects of the history of traﬃc signals. IEEE Trans. Veh. Technol. 19(1), 6–17 (1970) 59. Naiem, A., Reda, M., El-Beltagy, M., El-Khodary, I.: An agent based approach for modeling traﬃc ﬂow. In: 2010 The 7th International Conference on Informatics and Systems (INFOS), pp. 1–6. IEEE (2010) 60. Omae, M., Ogitsu, T., Honma, N., Usami, K.: Automatic driving control for passing through intersection without stopping. Int. J. Intell. Transp. Syst. Res. 8(3), 201– 210 (2010) 61. Pakusch, C., Stevens, G., Bossauer, P.: Shared autonomous vehicles: potentials for a sustainable mobility and risks of unintended eﬀects. In: ICT4S, pp. 258–269 (2018) 62. Papageorgiou, M., Diakaki, C., Dinopoulou, V., Kotsialos, A., Wang, Y.: Review of road traﬃc control strategies. Proc. IEEE 91(12), 2043–2067 (2003) 63. Payre, W., Cestac, J., Delhomme, P.: Intention to use a fully automated car: Attitudes and a priori acceptability. Transp. Res. F Traﬀ. Psychol. Behav. 27, 252–263 (2014) 64. Pfoser, D., Theodoridis, Y.: Generating semantics-based trajectories of moving objects. Comput. Environ. Urban Syst. 27(3), 243–263 (2003) 65. Ploeg, J., Serrarens, A.F.A., Heijenk, G.J.: Connect drive: design and evaluation of cooperative adaptive cruise control for congestion reduction. J. Mod. Transp. 19(3), 207–213 (2011) 66. Raju, N., Arkatkar, S., Easa, S., Joshi, G.: Customizing the following behavior models to mimic the weak lane based mixed traﬃc conditions. Transportmetrica B Transp. Dyn. 1–28 (2021) 67. Rios-Torres, J., Malikopoulos, A.A.: A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps. IEEE Trans. Intell. Transp. Syst. 18(5), 1066–1077 (2017) 68. Schrank, D., Eisele, B., Lomax, T.: Tti’s 2012 Urban Mobility Report. Texas A&M Transportation Institute. The Texas A&M University System, vol. 4 (2012) 69. Schrank, D., Lomax, T., Eisele, B.: 2012 urban mobility report. Texas Transportation Institute (2012). http://mobility.tamu.edu/ums/report 70. Seshia, S.A., Sadigh, D., Sastry, S.S.: Towards veriﬁed artiﬁcial intelligence. arXiv preprint arXiv:1606.08514 (2016) 71. Shahgholian, M., Gharavian, D.: Advanced traﬃc management systems: an overview and a development strategy. arXiv preprint arXiv:1810.02530 (2018)

94

E. F. Ozioko et al.

72. Sharon, G., Stone, P.: A protocol for mixed autonomous and human-operated vehicles at intersections. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 151–167. Springer, Cham (2017). https://doi. org/10.1007/978-3-319-71682-4 10 73. Swaroop, D.V.A.H.G., Hedrick, J.K., Chien, C.C., Ioannou, P.: A comparison of spacing and headway control laws for automatically controlled vehicles 1. Veh. Syst. Dyn. 23(1), 597–625 (1994) 74. Teply, S., Abou-Henaidy, M.I., Hunt, J.D.: Gap acceptance behaviour-aggregate and logit perspectives: part 1. Traﬀ. Eng. Control (1997) 75. Theodoridis, Y., Silva, J.R.O., Nascimento, M.A.: On the generation of spatiotemporal datasets. In: G¨ uting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 147–164. Springer, Heidelberg (1999). https://doi.org/10. 1007/3-540-48482-5 11 76. Treiber, M., Hennecke, A., Helbing, D.: Congested traﬃc states in empirical observations and microscopic simulations. Phys. Rev. E 62(2), 1805 (2000) 77. Tzouramanis, T., Vassilakopoulos, M., Manolopoulos, Y.: On the generation of time-evolving regional data. GeoInformatica 6(3), 207–231 (2002) 78. Uno, A., Sakaguchi, T., Tsugawa, S.: A merging control algorithm based on intervehicle communication. In: 1999 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems, Proceedings, pp. 783–787. IEEE (1999) 79. Van Arem, B., Van Driel, C.J.G., Visser, R.: The impact of cooperative adaptive cruise control on traﬃc-ﬂow characteristics. IEEE Trans. Intell. Transp. Syst. 7(4), 429–436 (2006) 80. Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T.: Scheduling autonomous vehicle platoons through an unregulated intersection. arXiv preprint arXiv:1609.04512 (2016) 81. Wakui, N., Takayama, R., Mimura, T., Kamiyama, N., Maruyama, K., Sumino, Y.: Drinking status of heavy drinkers detected by arrival time parametric imaging using sonazoid-enhanced ultrasonography: study of two cases. Case Rep. Gastroenterol. 5(1), 100–109 (2011) 82. Wang, M., Daamen, W., Hoogendoorn, S.P., van Arem, B.: Rolling horizon control framework for driver assistance systems. part ii: cooperative sensing and cooperative control. Transp. Res. Part C Emerg. Technol. 40, 290–311 (2014) 83. Zeng, Q., Wu, C., Peng, L., Li, H.: Novel vehicle crash risk detection based on vehicular sensory system. In: Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 622–626. IEEE (2015) 84. Zhao, W., Ngoduy, D., Shepherd, S., Liu, R., Papageorgiou, M.: A platoon based cooperative eco-driving model for mixed automated and human-driven vehicles at a signalised intersection. Transp. Res. Part C Emerg. Technol. 95, 802–821 (2018) 85. Zhou, Y., Ahn, S., Chitturi, M., Noyce, D.A.: Rolling horizon stochastic optimal control strategy for ACC and CACC under uncertainty. Transp. Res. Part C Emerg. Technol. 83, 61–76 (2017)

Selection of Driving Mode in Autonomous Vehicles Based on Road Profile and Vehicle Speed Mahmoud Zaki Iskandarani(B) Al-Ahliyya Amman University, Amman 19328, Jordan [email protected]

Abstract. In this work, a switching mechanism is proposed using variation in Tire Spring Length (Ls (Tire)), Suspension Spring Length (Ls (Suspension)) together with changes in Sprung Mass Acceleration (SMA), all correlated as a function of speed and road profile to enable optimum driving mode selection and setting of Suspension Stiffness to Damping Ratio (r). To achieve the stated objective of this work, MATLAB simulation is performed using three main r ratios (1, 25, 0.25) and two main road elevation profiles (1, 6). It is shown through this work that each mode of driving has unique signals resulting from Sprung Mass Acceleration (SMA), Sprung Mass Position (SMP), and variation in both Tire Spring Length and Suspension Spring Length. All these signals are used to enable optimization and automatic selection of driving mode and strategy especially in autonomous vehicles. Optimization of driving mode is critical particularly when trajectory information is exchanged between vehicles using Basic Safety Messages (BSMs) and between vehicles and Road Side Units (RSUs). Keywords: Intelligent transportation systems · Vehicle ride analysis · Vehicle dynamics · Comfort ride · Autonomous vehicles · Vehicle handling

1 Introduction Autonomous vehicles undergo intensive testing before being released and used by the people. To achieve a high level of independence from drivers and move away from Human machine Interaction (HMI), intelligent control needs to be applied with variables considered to be important to humans in general and drivers in particular. Road condition regarded as a critical factor in safe and comfortable driving, which affects decisions regarding speed, acceleration, distance from other vehicles, and maneuver and trajectories. These variables also affect other vehicles indirectly through vehicular connectivity and the type of data contained in a Basic Safety Messages (BSMs) exchanged through routing protocols with other vehicles and with Road Side Units (RSUs). Several approaches for longitudinal and lateral optimization of vehicle trajectories are researched in terms of dynamic control of vehicle path with emphasis on road curvature, road roughness, and travelling time. Predictive control models were developed for autonomous vehicles driving. Some models separated spatial path from velocity pattern. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 95–120, 2022. https://doi.org/10.1007/978-3-031-10467-1_5

96

M. Z. Iskandarani

Handling and Comfort driving of vehicles in a safe manner is critical for a successful design, and depends on parameters affecting vehicle body and suspension covering sprung and un-sprung mass of the vehicle, all tied to stiffness and shock absorption. Vehicles motion is subjected to opposition forces resulting from road profile, which affects overall vehicles body dynamics and handling. The suspension system role is to help in providing motion stability under different external forces by affecting torque and inertia in order to guarantee reliable and safe driving [1–5]. As vehicle suspension systems is designed to provide both good handling and comfort, and due to the opposite requirements for the two control strategies, an intensive work on autonomous vehicles covering their suspension control at different trajectories as a function of automation level is related to the dynamics of driving (comfort, handling) in correlation with the strategy of safety and accident prevention. Adaptive Cruise Control (ACC) used by autonomous vehicles is correlated to the longitudinal motion of the vehicle and affects suspension dynamics, which is affected by road profile, subsequently affecting driving mode selection of the vehicle [6–12]. Mobility aspects of autonomous vehicles is important as it aids drivers in reliable and safe driving without the need for the driver to manually carry out tasks that might lead to accidents under certain conditions due to error of judgment and lack of experience or lack of concentration as a result of stress. To achieve reliability, safety, comfort and good handling, an integrated comprehensive strategy that enables the optimum mode of driving correlated to road profile and speed is required [13–15]. This paper proposes an algorithm based on analysis of vehicular modes and their characteristics, which enables vehicle driving mode selection especially in autonomous vehicles as a function of road elevation profile, vehicle speed, suspension spring length, and tire spring length variations, in order to achieve either optimum handling or optimum comfort and will establish a criteria for a selection that present good comfort and good handling, as a function of vehicular dynamics.

2 Methodology Optimization through automation of stability in vehicles can be achieved through intelligent sensory based systems. Such systems provide critical correlated information received from integrated sensor systems regarding vehicle status, road conditions, and environment. The processed data enables short reaction times for each controlled component within the vehicular structure [16–19]. The integrated approach for vehicular control follows a multi-layer architecture, which comprises sensing, instruction, and application layers, controlling both longitudinal and lateral vehicle movement [20, 21]. The industrial proposed architecture, employs optimization through simulation and application of intelligent algorithms to enable smart and safe driving [22, 23], together with the new adaption of active and semi-active suspension systems that enables intelligent adaptive control of vehicular movements through advanced sensors and actuators and enables automatic control and driving mode selection of autonomous vehicles as a function of road profile, speed, acceleration, road roughness, and environmental conditions [24–27]. To achieve this research work objectives, MATLAB simulation is carried out considering:

Selection of Driving Mode in Autonomous Vehicles

97

1. Different suspension stiffness to damping ratios (r) 2. Different Road Elevations (E) 3. Different vehicles speeds (v) The objective is to enable an automatic selection algorithm to operate in selecting the optimum driving mode based on: I. II. III. IV.

Variation in the Sprung Mass Position (SMP) Variation in the Suspension Spring Length Ls (Suspension) Variation in the Tire Spring Length Ls (Tire) Variation in Sprung Mass Acceleration (SMA)

Switching mechanism per road profile is suggested based on the simulated characteristics at the considered three levels of road profile and different suspension stiffness to damping coefficients as shown in Fig. 1.

Start

Obtain Data from Sensors: Road Profile (t), Vehicle Speed (t), Suspension Spring Length (t), Tire Spring Length (t)

Apply Stiffness to Damping Ratio

Trace Sprung Mass Acceleration

Fig. 1. Driving mode switching mechanism

3 Results Tables 1, 2 and 3 show the simulated results (Sensor data) covering variation in Sprung Mass Position (SMP) as a function of three different vehicle speeds and two road profiles. Associated with Tables 1, 2 and 3 are Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19, which present changes in Sprung Mass Acceleration (SMA) as a function of vehicle speed and road profile. This data will be used in the switching algorithm in addition to the variation in Spring Length for both the vehicle suspension and the tires.

98

M. Z. Iskandarani Table 1. Change in spring mass position as a function of E, r (v = 40 km/h)

E=1

Change in Sprung Mass Position (cm)

E=6

Change in Sprung Mass Position (cm)

Road Profile Change (cm)

r=1

Road Profile Change (cm)

r=1

r = 25

r = 0.25

r = 25

r = 0.25

0.0004

0.0004

0.0004

0.0004

0.0023

0.0023

0.0023

0.0023

0.0065

0.0003

0.0001

0.0007

0.0389

0.0020

0.0008

0.0044

0.0005

0.0014

0.0002

0.0034

0.0033

0.0082

0.0010

0.0201

0.0013

0.0009

0.0003

0.0022

0.0075

0.0051

0.0018

0.0129

−0.0002

0.0012

0.0002

0.0015

−0.0014

0.0073

0.0014

0.0088

−0.0049

0.0008

0.0004

0.0006

−0.0292

0.0051

0.0026

0.0035

−0.0018

−0.0002

0.0004

−0.0025

−0.0108

−0.0012

0.0023

−0.0151

−0.0070

−0.0001

0.0004

−0.0026

−0.0419

−0.0009

0.0022

−0.0156

0.0055

−0.0014

0.0004

−0.0041

0.0332

−0.0085

0.0025

−0.0244

0.0134

0.0006

0.0004

0.0008

0.0805

0.0037

0.0024

0.0051

0.0102

0.0022

0.0004

0.0079

0.0610

0.0132

0.0024

0.0475

0.0003

0.0028

0.0006

0.0082

0.0020

0.0170

0.0039

0.0491

0.0038

0.0024

0.0008

0.0040

0.0230

0.0142

0.0046

0.0242

0.0059

0.0032

0.0010

0.0041

0.0351

0.0193

0.0058

0.0248

0.0099

0.0039

0.0011

0.0062

0.0597

0.0232

0.0068

0.0373

0.0031

0.0050

0.0015

0.0081

0.0187

0.0301

0.0090

0.0487

−0.0003

0.0044

0.0018

0.0054

−0.0018

0.0262

0.0109

0.0322

−0.0059

0.0037

0.0021

0.0013

−0.0355

0.0219

0.0125

0.0078

−0.0030

0.0022

0.0022

−0.0025

−0.0180

0.0130

0.0134

−0.0151

0.0037

0.0019

0.0025

−0.0028

0.0220

0.0113

0.0152

−0.0170

0.0065

0.0024

0.0026

0.0008

0.0388

0.0146

0.0157

0.0051

Figure 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 prescribes spring variations for both tires and suspension as a function of vehicle speed and road profile. Tables 1, 2 and 3 have an entrance key for the algorithm that reads road profile and associate with it change in sprung mass position. For E = 1, the key is 0.0004 and for E = 6, the key is 0.0023. Thus, correlation between active suspension to stiffness ratio (r), speed, and road profile in connection to position can be achieved together with correlation with Sprung Mass Acceleration related to Sprung Mass position as shown in Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19.

Selection of Driving Mode in Autonomous Vehicles

99

Table 2. Change in spring mass position as a function of E, r (v = 70 km/h) E=1

E=6

Change in Sprung Mass Position (cm)

Road Profile r = 1 Change (cm)

r = 25

r = 0.25

Change in Sprung Mass Position (cm)

Road Profile r = 1 Change (cm)

r = 25

r = 0.25

0.0004

0.0004

0.0004

0.0004

0.0023

0.0023

0.0023

0.0023

0.0024

0.0003 −0.0000

0.0010

0.0233

0.0020 −0.0003

0.0061

0.0039

0.0008

0.0001

0.0021

0.0233

0.0050

0.0009

0.0126

−0.0031

0.0006

0.0002

0.0011 −0.0187

0.0038

0.0013

0.0066

−0.0070

−0.0002

0.0001 −0.0017 −0.0419

−0.0013

0.0007 −0.0099

0.0121

−0.0004

0.0003 −0.0023

0.0723

−0.0022

0.0017 −0.0140

0.0027

0.0019

0.0001

0.0046

0.0162

0.0116

0.0007

0.0274

0.0055

0.0016

0.0005

0.0054

0.0329

0.0097

0.0030

0.0322

0.0100

0.0026

0.0006

0.0046

0.0597

0.0158

0.0034

0.0277

−0.0005

0.0037

0.0008

0.0068 −0.0032

0.0220

0.0050

0.0410

−0.0053

0.0023

0.0010

0.0027 −0.0320

0.0141

0.0062

0.0164

0.0061

0.0019

0.0013 −0.0014

0.0368

0.0112

0.0076 −0.0082

0.0069

0.0026

0.0012

0.0018

0.0411

0.0158

0.0074

0.0110

0.0096

0.0033

0.0016

0.0050

0.0578

0.0200

0.0098

0.0303

0.0020

0.0042

0.0017

0.0070

0.0117

0.0254

0.0101

0.0419

0.0009

0.0038

0.0020

0.0052

0.0056

0.0231

0.0120

0.0314

0.0025

0.0033

0.0022

0.0020

0.0149

0.0200

0.0133

0.0123

−0.0011

0.0031

0.0025

0.0007 −0.0066

0.0185

0.0152

0.0042

−0.0000

0.0022

0.0027 −0.0008 −0.0002

0.0130

0.0163 −0.0047

0.0000

0.0021

0.0029 −0.0008

0.0001

0.0124

0.0172 −0.0048

0.0026

0.0018

0.0029

0.0153

0.0105

0.0175

0.0002

0.0012

100

M. Z. Iskandarani Table 3. Change in spring mass position as a function of E, r (v = 100 km/h)

E=1

Change in Sprung Mass Position (cm)

E=6

Change in Sprung Mass Position (cm)

Road Profile Change (cm)

r=1

Road Profile Change (cm)

r=1

r = 25

r = 0.25

r = 25

r = 0.25

0.0004

0.0004

0.0004

0.0004

0.0023

0.0023

0.0023

0.0023

−0.0007

0.0005

0.0001

0.0012

−0.0041

0.0028

0.0004

0.0070

−0.0049

0.0006

0.0001

0.0016

−0.0292

0.0035

0.0006

0.0094

−0.0042

−0.0003

0.0001

−0.0011

−0.0251

−0.0017

0.0005

−0.0067

0.0102

0.0007

0.0004

0.0003

0.0611

0.0040

0.0021

0.0019

0.0069

0.0021

0.0005

0.0050

0.0412

0.0128

0.0033

0.0298

0.0031

0.0018

0.0002

0.0050

0.0188

0.0106

0.0012

0.0300

−0.0053

0.0027

0.0007

0.0044

−0.0321

0.0161

0.0039

0.0261

0.0065

0.0009

0.0006

−0.0001

0.0387

0.0056

0.0039

−0.0004

0.0083

0.0033

0.0010

0.0033

0.0499

0.0199

0.0062

0.0199

0.0002

0.0030

0.0009

0.0062

0.0012

0.0182

0.0053

0.0372

0.0023

0.0035

0.0015

0.0044

0.0139

0.0209

0.0091

0.0264

−0.0000

0.0030

0.0018

0.0018

−0.0000

0.0179

0.0111

0.0109

−0.0006

0.0021

0.0017

−0.0004

−0.0035

0.0129

0.0104

−0.0025

0.0026

0.0022

0.0020

0.0001

0.0153

0.0131

0.0119

0.0007

0.0068

0.0025

0.0021

0.0020

0.0410

0.0147

0.0124

0.0118

0.0044

0.0033

0.0020

0.0052

0.0261

0.0199

0.0120

0.0314

0.0010

0.0035

0.0025

0.0056

0.0058

0.0212

0.0148

0.0334

0.0042

0.0037

0.0030

0.0039

0.0249

0.0221

0.0180

0.0232

−0.0014

0.0036

0.0032

0.0028

−0.0085

0.0216

0.0192

0.0167

−0.0077

0.0022

0.0030

0.0002

−0.0460

0.0134

0.0179

0.0010

Selection of Driving Mode in Autonomous Vehicles

101

4 Analysis and Discussion 4.1 Sprung Mass Acceleration Tracing Figure 2, 3 and 4 shows the SMA as a function of time and vehicle speed. The plots clearly show an increase in the SMA as vehicle speed increases for a fixed suspension stiffness to time ratio (r = 1), fixed simulation duration, and road elevation profile (E = 1), while Figs. 5, 6 and 7 show a marked increase in SMA as a function of vehicle speed for (r = 1) and (E = 6), and also show a notable increase in SMA compared to the simulated values with (E = 1) for the same r, vehicle speed and simulation duration. Comparing the plots where (E = 1) and (E = 6) for the same r, speed, and simulation duration, the results for maximum available swing for SMA variation over time for E = 1 is {15, 30, 50} compared to {80, 200, 300} for E = 6. The response characteristics for E = 1 relating SMA to vehicle speed follows a power law as expressed as in Eq. (1) with Eq. (2) expressing the response at E = 6. SMA(E=1,r=1) = 0.12 ∗ v1.31

(1)

Fig. 2. Effect of vehicle speed and road profile on SMA for r = 1, v = 40 km/h, and E = 1

Fig. 3. Effect of vehicle speed and road profile on SMA for r = 1, v = 70 km/h, and E = 1

102

M. Z. Iskandarani

Fig. 4. Effect of vehicle speed and road profile on SMA for r = 1, v = 100 km/h, and E = 1

Fig. 5. Effect of vehicle speed and road profile on SMA for r = 1, v = 40 km/h, and E = 6

Fig. 6. Effect of vehicle speed and road profile on SMA for r = 1, v = 70 km/h, and E = 6

SMA(E=6,r=1) = 238 ∗ Ln(v) − 801

(2)

Selection of Driving Mode in Autonomous Vehicles

103

Fig. 7. Effect of vehicle speed and road profile on SMA for r = 1, v = 100 km/h, and E = 6

Figure 8, 9 and 10 shows the SMA as a function of time and vehicle speed. The plots clearly show an increase in the SMA as vehicle speed increases for a fixed suspension stiffness to time ratio (r = 25), fixed simulation duration, and road elevation profile (E = 1), while Figs. 11, 12 and 13 show a larger increase in SMA as a function of vehicle speed for (r = 25) and (E = 6), and also show a notable increase in SMA compared to the simulated values with (E = 1) for the same r, vehicle speed and simulation duration. However, the increase in SMA for r = 25 is much smaller compared to the case where r = 1. This is due to the fact that r = 25 represents a tuned suspension stiffness to damping ration that enables comfort, thus smoother SMA response curves and lower SMA values per speed and time duration. Comparing the plots where (E = 1) and (E = 6) for the same r, speed, and simulation duration, the results for maximum available swing for SMA variation over time for E = 1 is {1.5, 4, 5} compared to {8, 20, 30} for E = 6. This shows an increase as road elevation profile is higher but mush lower that when r = 1. The response characteristics for E = 1 relating SMA to vehicle speed follows a power law as expressed as in Eq. (3) with Eq. (4) expressing the response at E = 6. SMA(E=1,r=25) = 3.9 ∗ Ln(v) − 12.7

(3)

SMA(E=6,r=25) = 23.8 ∗ Ln(v) − 80.1

(4)

Equations (3) and (4) show a general function that can be used to describe SMA for the condition of comfort and can be generalized in Eq. (5). SMA(Ej ,r(comfort ) = φj ∗ Ln(v) − ϕj

(5)

Figure 14, 15 and 16 shows the SMA as a function of time and vehicle speed. The plots clearly show an increase in the SMA as vehicle speed increases for a fixed suspension stiffness to time ratio (r = 0.25), fixed simulation duration, and road elevation profile (E = 1), while Figs. 17, 18 and 19 show a larger increase in SMA as a function of vehicle speed for (r = 0.25) and (E = 6), and also show a notable increase in SMA compared to the simulated values with (E = 1) for the same r, vehicle speed and simulation duration. However, the increase in SMA for r = 0.25 is much higher compared to the case where r

104

M. Z. Iskandarani

Fig. 8. Effect of vehicle speed and road profile on SMA for r = 25, v = 40 km/h, and E = 1

Fig. 9. Effect of vehicle speed and road profile on SMA for r = 25, v = 70 km/h, and E = 1

Fig. 10. Effect of vehicle speed and road profile on SMA for r = 25, v = 100 km/h, and E = 1

= 1 or r = 25. This is due to the fact that r = 0.25 represents a tuned suspension stiffness to damping ration that enables handling, thus more responsive SMA curves with higher SMA values per speed and time duration. Comparing the plots where (E = 1) and (E

Selection of Driving Mode in Autonomous Vehicles

105

Fig. 11. Effect of vehicle speed and road profile on SMA for r = 25, v = 40 km/h, and E = 6

Fig. 12. Effect of vehicle speed and road profile on SMA for r = 25, v = 70 km/h, and E = 6

Fig. 13. Effect of vehicle speed and road profile on SMA for r = 25, v = 100 km/h, and E = 6

= 6) for the same r, speed, and simulation duration, the results for maximum available swing for SMA variation over time for E = 1 is {30, 60, 100} compared to {200, 400, 600} for E = 6.

106

M. Z. Iskandarani

The response characteristics for E = 1 relating SMA to vehicle speed follows a power law as expressed as in Eq. (6) with Eq. (7) expressing the response at E = 6. SMA(E=1,r=0.25) = 2.24 ∗ v1.31

(6)

SMA(E=6,r=0.25) = 2.4 ∗ v1.2

(7)

Fig. 14. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 40 km/h, and E = 1

Fig. 15. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 70 km/h, and E = 1

Equations (6) and (7) show a general function that can be used to describe SMA for the condition of comfort and can be generalized in Eq. (8). SMA(Ej ,r(handling)) = ψj ∗ vθj

(8)

From Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19, the following is observed:

Selection of Driving Mode in Autonomous Vehicles

107

Fig. 16. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 100 km/h, and E = 1

Fig. 17. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 40 km/h, and E = 6

Fig. 18. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 70 km/h, and E = 6

108

M. Z. Iskandarani

Fig. 19. Effect of vehicle speed and road profile on SMA for r = 0.25, v = 100 km/h, and E = 6

1. An increase in SMA as a function of both road profile and vehicle speed with much larger increase as a function of road profile. 2. A distinct characteristic difference between the three driving modes is evident I.

Not optimized for a specific driving mode: Suspension Stiffness to Damping Ratio (r) = 1 II. Comfort: Suspension Stiffness to Damping Ratio (r) = 25 III. Handling: Suspension Stiffness to Damping Ratio (r) = 0.25 3. Higher levels of SMA recorded, which reflects the adaption of the vehicle to handling in relation to road profile, with much lower values realized for comfort mode. 4. Both Comfort and handling optimizations for a vehicles display similar correlating functions despite road elevation profile change, while the default state at r = 1 goes through functional transformation. 4.2 Tire Spring Length Variation From Figs. 20, 21, 22, 23, 24 and 25, it is realized that the tire spring length variation is higher at higher road profile elevations (E = 6) for all r values (1, 25, 0.25), with smoothest transitions at comfort level (r = 25) and least change in the case of handling (r = 0.25). The maximum allowed tire spring length variation as a function of road elevation profile for all considered speed (40, 70, 100) Km/h and per fixed simulation duration is given as: 1. r = 1: {0.32, 0.9} and phase reversal in part of response-Not optimized. 2. r = 25: {0.4, 1.0}-Optimized for comfort. 3. r = 0.25: {0.3, 0.6}-Optimized for handling The observed low variation as road profile elevation changes from E = 1 to E = 6 for handling is expected as the SMA is higher to enable better driving of the vehicle as the road profile changes.

Selection of Driving Mode in Autonomous Vehicles

109

Fig. 20. Effect of vehicle speed and road profile on tire spring response for E = 1, and r = 1

Fig. 21. Effect of vehicle speed and road profile on tire spring response for E = 6, and r = 1

110

M. Z. Iskandarani

Fig. 22. Effect of vehicle speed and road profile on tire spring response for E = 1, and r = 25

Fig. 23. Effect of vehicle speed and road profile on tire spring response for E = 6, and r = 25

Selection of Driving Mode in Autonomous Vehicles

111

Fig. 24. Effect of vehicle speed and road profile on tire spring response for E = 1, and r = 0.25

Fig. 25. Effect of vehicle speed and road profile on tire spring response for E = 6, and r = 0.25

112

M. Z. Iskandarani

4.3 Suspension Spring Length Variation From Figs. 26, 27, 28, 29, 30 and 31, it is realized that the suspension spring length variation is higher at higher road profile elevations (E = 6) for all r values (1, 25, 0.25), with smoothest transitions at comfort level (r = 25) and least change in the case of handling (r = 0.25). This is consistent with the plots for tire spring length variation as it should be. The maximum allowed tire spring length variation as a function of road elevation profile for all considered speed (40, 70, 100) Km/h and per fixed simulation duration is given as: 1. r = 1: {0.54, 0.9}-Not optimized. 2. r = 25: {0.65, 1.4}-Optimized for comfort. 3. r = 0.25: {0.51, 0.7}-Optimized for handling The observed low variation as road profile elevation changes from E = 1 to E = 6 for handling is expected as the SMA is higher to enable better driving of the vehicle as the road profile changes. From Figs. 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31, the following is observed: 1. An increase in Spring Mass Position variation as a function of road profile and is realized with evidence in the case of Comfort (r = 25) that for higher road profiles the vehicle control should switch to handling rather than stay at comfort level, as spring position begin to attain negative values. 2. A distinct characteristic difference between the three driving modes is clear. I.

Not optimized for a specific driving mode: Suspension Stiffness to Damping Ratio (r) = 1 II. Comfort: Suspension Stiffness to Damping Ratio (r) = 25

Fig. 26. Effect of vehicle speed and road profile on suspension spring response for E = 1, and r =1

Selection of Driving Mode in Autonomous Vehicles

113

Fig. 27. Effect of vehicle speed and road profile on suspension spring response for E = 6, and r =1

Fig. 28. Effect of vehicle speed and road profile on suspension spring response for E = 1, and r = 25

114

M. Z. Iskandarani

Fig. 29. Effect of vehicle speed and road profile on suspension spring response for E = 6, and r = 25

Fig. 30. Effect of vehicle speed and road profile on suspension spring response for E = 1, and r = 0.25

Selection of Driving Mode in Autonomous Vehicles

115

Fig. 31. Effect of vehicle speed and road profile on suspension spring response for E = 6, and r = 0.25

III. Handling: Suspension Stiffness to Damping Ratio (r) = 0.25 Figures 32, 33 and 34 present statistical view of the effect of road elevation profile (E) and suspension stiffness to damping ratio (r) on the average sprung mass position (SMP). From the plots it is evident that the reference case scenario is when the vehicle suspension system is not optimized, which is a default state with SMP average variation is in the middle. The lowest average SMP variation is at comfort scenario, with handling scenario having the highest average SMP variation as expected to cope with road variations at various speeds. Also, it is evident the effect of road elevation profile on average SMP

Fig. 32. Effect of vehicle speed and road profile on sprung mass position at 40 km/h.

116

M. Z. Iskandarani

Fig. 33. Effect of vehicle speed and road profile on sprung mass position at 70 km/h.

Fig. 34. Effect of vehicle speed and road profile on sprung mass position at 100 km/h.

variation as the average variation increases significantly as a function of road elevation profile. Figures 35, 36, 37, 38 and 39 present statistical view of the effect of road elevation profile (E) on the average sprung mass position (SMP) and average sprung mass acceleration (SMA). From the plots it is evident that there is a distinct level difference between the average SMP at E = 1 compared to E = 6. Also a change in the response characterisitics and curve shape is observed, which distinguishes the simulated speeds form each other. From Figs. 38 and 39, a clear difference in the average change in SMA is presented at comfort level (r = 25) compared to both the default state (r = 1) and the handling state (r = 0.25).

Selection of Driving Mode in Autonomous Vehicles

Fig. 35. The relationship between vehicle speed and average SMP for r = 0.25

Fig. 36. The relationship between vehicle speed and average SMP for r = 1

Fig. 37. The relationship between vehicle speed and average SMP for r = 25

117

118

M. Z. Iskandarani

Fig. 38. The relationship between vehicle speed and average SMA for E = 1

Fig. 39. The relationship between vehicle speed and average SMP for E = 6

5 Conclusions This work demonstrated through simulation that different road profiles and speeds affect both amplitude and phase of signals emanating from Sprung Mass and tires. active suspension system can be interfaced to switching algorithm that enable automatic selection of driving mode to achieve optimum comfort and optimum handling and to ensure safety of vehicles, particularly autonomous vehicles, which require higher reliability testing and if operated as connected vehicles, will affect other vehicles on the road through the exchange of trajectory messages. The analysis in this work showed that comfort and handling responses are clearly separated in their characterisitics using both SMA and SMP with clear effect of road elevation profile on the response characteristics. It is also clear from the simulated work that the nature and shape for the response function is different for comfort (r = 25), handling (r = 0.25) and default non-optimized (r = 1). The work also presented characteristic values for SMA and SMP which can be used to

Selection of Driving Mode in Autonomous Vehicles

119

further enable better vehicle road situation analysis in order to optimize performance and support adaptive vehicular control for safer and more comfortable journeys. Such characterization is supported by the presentation of functions that describe the relationship between vehicle speed and SMA for three different setting (non-optimized, comfort, handling), where an evidence of optimization is presented through the functional stability in the cases of comfort and handling as the mathematical description stayed the same regardless of the road elevation profile with coefficient change while in the case of non-optimized case, a functional transformation is realized. In addition, difference in function description between the comfort, and handling cases is evident as in the case of comfort, a logarithmic law is observed, while in the case of handling, a power law in observed. This is very supportive to the two conditions in terms of the need for comfort and handling and their translation into mathematical descriptors. Acknowledgments. This MATLAB program used in this work was written by James T. Allison, Assistant Professor, University of Illinois at Urbana-Champaign.

References 1. Nemeth, B., Costa, E., Quoc, G., Tran, B.: Adaptive speed control of an autonomous vehicle with a comfort objective. Adaptive Speed Control of An Autonomous Vehicle, 2101_ADS+Comfort, no. November, pp. 0–10 (2020) 2. Basargan, H., Mihály, A., Gáspár, P., Sename, O.: Adaptive semi-active suspension and cruise control through LPV technique. Appl. Sci. 11(1), 1–16 (2021). https://doi.org/10.3390/app 11010290 3. Du, Y., Liu, C., Li, Y.: Velocity control strategies to improve automated vehicle driving comfort. IEEE Intell. Transp. Syst. Mag. 10(1), 8–18 (2018). https://doi.org/10.1109/MITS. 2017.2776148 4. Anselma, P.G.: Optimization-driven powertrain-oriented adaptive cruise control to improve energy saving and passenger comfort. Energies 14(10), 1–28 (2021). https://doi.org/10.3390/ en14102897 5. Jain, S., Saboo, S., Pruncu, C.I., Unune, D.R.: Performance investigation of integrated model of quarter car semi-active seat suspension with human model. Appl. Sci. 10(9) (2020). https:// doi.org/10.3390/app10093185 6. Syed, K., Hemanth Kumar, C.H., Praveen Sai, V., Bhanu Prasad, D., Ram Prasanna Kumar, A.: Modelling of a suspension system in a car. Int. J. Mech. Eng. Technol. 9(4), 381–388 (2018) 7. Ali Ahmed, A.: Quarter car model optimization of active suspension system using fuzzy PID and linear quadratic regulator controllers. Glob. J. Eng. Technol. Adv. 6(3), 088–097 (2021).https://doi.org/10.30574/gjeta.2021.6.3.0041 8. Niculescu, A., Sireteanu, T., Kowalski, M., Jankowski, A.: Quarter car model to evaluate behaviour under road and body excitation. J. KONES 24(1), 265–273 (2017). https://doi.org/ 10.5604/01.3001.0010.2826 9. Manolache-Rusu, I.-C., Suciu, C., Mihai, I.: Analysis of passive vs. semi-active quarter car suspension models, no. December 2020, p. 76 (2020). https://doi.org/10.1117/12.2571225 10. Uddin, N.: Optimal control design of active suspension system based on quarter car model. J. Infotel 11(2), 55 (2019). https://doi.org/10.20895/infotel.v11i2.429

120

M. Z. Iskandarani

11. Tran, G.Q.B., Pham, T.P., Sename, O., Costa, E., Gaspar, P.: Integrated comfort-adaptive cruise and semi-active suspension control for an autonomous vehicle: an LPV approach. Electron 10(7) (2021). https://doi.org/10.3390/electronics10070813 12. Li, H., Wu, C., Chu, D., Lu, L., Cheng, K.: Combined trajectory planning and tracking for autonomous vehicle considering driving styles. IEEE Access 9, 9453–9463 (2021). https:// doi.org/10.1109/ACCESS.2021.3050005 13. Basargan, H., Mihály, A., Gáspár, P., Sename, O.: Road quality information based adaptive semi-active suspension control. Period. Polytech. Transp. Eng., 1–8 (2021). https://doi.org/ 10.3311/pptr.18577 14. Mihály, A., Kisari, Á., Gáspár, P., Németh, B.: Adaptive semi-active suspension design considering cloud-based road information. IFAC-PapersOnLine 52(5), 249–254 (2019). https:// doi.org/10.1016/j.ifacol.2019.09.040 15. Rezanoori, A., Anuar Ariffin, M.K., Delgoshaei, A.: A new method to improve passenger vehicle safety using intelligent functions in active suspension system. Eng. Solid Mech. 7(4), 313–330 (2019). https://doi.org/10.5267/j.esm.2019.6.005 16. Ulbrich , S. et al.: Towards a Functional System Architecture for Automated Vehicles, pp. 1–16 (2017). http://arxiv.org/abs/1703.08557 17. Pendleton, S.D., et al.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 1–54 (2017). https://doi.org/10.3390/machines5010006 18. Lin, S.C., et al.: The architectural implications of autonomous driving: constraints and acceleration. ACM SIGPLAN Not. 53(2), 751–766 (2018). https://doi.org/10.1145/3173162.317 3191 19. Yeong, D.J., Velasco-hernandez, G., Barry, J., Walsh, J.: Sensor and sensor fusion technology in autonomous vehicles: a review. Sensors 21(6), 1–37 (2021). https://doi.org/10.3390/s21 062140 20. Yang, X., Ren, Y., Hu, L., Huang, Y., Lu, P.: Evaluating the impact of road quality in driving behavior of autonomous vehicles, vol. 1159106, p. 3 (2021). https://doi.org/10.1117/12.258 3641 21. Bruqi, M., Likaj, R., Shala, A.: Simulation of vertical quarter car model with one and two DOFs. Science Proceedings III International Science Conference “innovations,” vol. 112, pp. 110–112 (2017) 22. Lu, B., et al.: Adaptive potential field-based path planning for complex autonomous driving scenarios. IEEE Access 8, 225294–225305 (2020). https://doi.org/10.1109/ACCESS.2020. 3044909 23. Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., De Albuquerque, V.H.C.: Deep learning for safe autonomous driving: current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22(7), 4316–4336 (2021). https://doi.org/10.1109/TITS.2020.3032227 24. Jiang, J., Seaid, M., Mohamed, M.S., Li, H.: Inverse algorithm for real-time road roughness estimation for autonomous vehicles. Arch. Appl. Mech. 90(6), 1333–1348 (2020). https:// doi.org/10.1007/s00419-020-01670-x 25. Gharieb, M., Nishikawa, T.: Development of roughness prediction models for Laos national road network. CivilEng 2(1), 158–173 (2021). https://doi.org/10.3390/civileng2010009 26. Saleh Mousavi-Bafrouyi, S.M., Kashyzadeh, K.R., Khorsandijou, S.M.: Effects of road roughness, aerodynamics, and weather conditions on automotive wheel force. Int. J. Eng. Trans. B Appl. 34(2), 536–546 (2021). https://doi.org/10.5829/IJE.2021.34.02B.27 27. Liu, C., Wu, D., Li, Y., Du, Y.: Large-scale pavement roughness measurements with vehicle crowdsourced data using semi-supervised learning. Transp. Res. Part C Emerg. Technol. 125, 2–3 (2021). https://doi.org/10.1016/j.trc.2021.103048

Compass Errors in Mobile Augmented Reality Navigation Apps: Consequences and Implications David S. Bowers(B) The Open University, Milton Keynes, UK [email protected]

Abstract. Augmented reality capabilities on smartphones have led to the development of mobile Augmented Reality (mAR) navigation apps. Whilst some mAR navigation apps annotate navigational points of interest within the camera image, mAR compass apps add only a compass rose and bearing to the image, enabling the device to function as a hand bearing compass. Compass apps do not attempt registration of the camera image, so these apps allow observation of the errors affecting smartphone magnetometers. This paper presents compass deviation (error) curves and offset errors for mAR compass apps on 17 mobile devices. The magnitude of deviation error measured (typically 5–10°) and the apparent transience of compass calibration, over periods of less than an hour, cast doubt on claims that the apps are equivalent to professional navigation compasses, and suitable for navigation use. As an mAR navigation app is unlikely to achieve reliable registration in an open-field context such as yacht navigation, the deviations observed also raise serious questions about the reliability of mAR navigation apps, and identifies real safety concerns for the use of such apps. The paper concludes by speculating on interface designs for mAR navigation apps that could mitigate compass errors in mobile devices, identifying also some issues that require further investigation. Keywords: Mixed/augmented reality · Compass · Error measurement · Calibration · Navigation interface

1 Introduction The issues discussed in this paper arise from the widespread availability of augmented reality development platforms for mobile devices such as smartphones and tablets. This has enabled apps to be developed by those who may be unaware of the intrinsic limitations of general purpose devices such as smartphones. Specifically, following the success of urban mobile AR (mAR), as used in interactive games such as Pokemon Go, a range of open field mAR navigation apps is emerging. Some claim to act as a high-precision, “professional” grade, hand bearing compass [1], with some offering the ability to overlay a desired course, or to label navigational markers [2–4]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 121–142, 2022. https://doi.org/10.1007/978-3-031-10467-1_6

122

D. S. Bowers

The internal magnetometer (compass) in mobile devices is an important component of mobile augmented reality. Since the earliest days of AR, the magnetometer has rarely been used on its own to determine the orientation of a device, owing to the inevitability of significant compass errors [5]. Instead, considerable effort has been expended on registration techniques to align markers in the displayed image with their real-world counterparts [6]. Image processing techniques such as edge detection and 3D mapping are used to snap the AR universe onto the observed image. Where no map of objects exists, more advanced techniques such as Simultaneous Location and Mapping (SLAM) [7] are deployed. However, registration is not always possible. In an open outdoor setting there may be no distinguishable features nearby. The distances to landmarks may be significant, and the landscape as a whole may be observed from a distance rather than moved through, so techniques such as SLAM are not applicable. The challenge is complicated further in a marine context – for example, on a yacht or motor boat – by much of the view itself moving owing to surface waves. Even simple panorama stitching operations can be too difficult in these circumstances, let alone isolating landmarks. In such contexts, AR displays are completely reliant on the accuracy and precision of internal sensors – particularly the magnetometer. If the internal compass is being used to identify objects at a distance – either visually or by tagging them within the image – then its accuracy is critical. Each degree of error at a range of 2 km gives an offset error of about 35 m. With larger errors or distances, the offset may be comparable with the scale of the objects themselves, and possibly larger than their separation in the landscape. Mobile Augmented Reality compass apps do not usually attempt registration: they simply overlay a representation of the compass bearing on the camera view. By measuring compass errors (deviation) for augmented reality compass apps on a set of mobile devices, including tablets, iPhones and Android phones, it was possible to construct the inherent deviation curves for those devices. This gives an indication of the potential risks arising from relying on AR compasses for navigation.

2 Related Work Given the dominance of registration techniques for AR over reliance on the internal compass, it is unsurprising that there are few studies of the (in)accuracies of embedded compasses. Blum et al. [8] conducted an in-the-wild study using three devices held or carried by a subject walking in an urban environment. They deduce standard compass errors of up to 30°, but they report only limited checks on the orientation of the device during the trial. Indeed, they appear to equate heading (direction of travel) with the orientation of the device. They also appear not to have ensured calibration of the devices prior to their study. Both of these factors may explain the somewhat larger standard errors measured by Blum et al. than were found in our study. Smartphones have attracted attention recently in the geology community, where there is a need for rapid data collection in remote areas [9–11]. The analyses of the errors observed in comparison with professional magnetic compasses are rigorous, but appear to assume that the errors are random, rather than falling into any sort of pattern.

Compass Errors in Mobile Augmented Reality Navigation Apps

123

The form and causes of the deviation (error) curve for a magnetic compass on a ship has been understood since the 19th century. In [12], Doerfler characterizes the shape of a deviation curve as having. • A constant component, due to misalignment of the compass • One or more semicircular components, with a period of 360°, and • Quadrantal components, with a period of 180°. There may be multiple components of each kind, which may not be in phase, so the resulting overall deviation curve can be complex, as with some reported in this paper. However, the important point is that a deviation curve represents reproducible errors for different orientations of a magnetic sensor: the errors are not simply random. In the context of a smartphone compass, an additional, variable, component is also significant. The currents powering major components of a smartphone, such as the processor or screen, and even the current through the battery, will generate magnetic fields comparable to that of the earth. How significant these dynamic influences will be will depend on the physical layout of the device, as well as variable factors such as screen brightness, battery state and processor load.

3 Experiment Design A brief overview of this experiment and summary results were presented in [13].

Fig. 1. Schematic layout of experiment from above

The principle of swinging a steering compass is well-established for ships. The compass (ship) is sited in a known position, away from magnetic influences, and headed (pointed) towards eight or more markers in known relative positions. Steering compass bearings towards each of the markers are compared with the known bearings; the difference between measured and known bearings is the compass error, or deviation, in the direction (heading) of the known bearing.

124

D. S. Bowers

This approach was emulated on a sports field. The central point was a plastic stool, with twelve fluorescent markers positioned at the edge of the field at roughly equal angular separations, and distances of 50 to 90 m from the stool (Fig. 1). Participants were asked to remove potentially magnetic objects from their person – such as keys, coins, spectacles, etc. – and to sit on the stool. Using their device and their chosen AR compass app, they measured the bearing to each marker in turn. Each participant completed three or four circuits of the markers: the first without calibrating their device’s compass, and the subsequent two or three following calibration. Finally, following calibration, four additional apps were compared on a single device (a 2018 iPad) against the app used in the main experiment. 3.1 Analysis of Errors: Reference Bearings The reference bearings were themselves physical measurements, and therefore prone to errors, which must be quantified to validate the observed deviation curves. The reference bearings were measured using a classic “offshore instruments” magnetic hand bearing “mini-compass” with an “infinity” prism, critical damping, a reading precision of 0.5°, a precision of any single reading in use of less than 2° and a nominal accuracy of 1°. Designed specifically as a compass, its construction includes no magnetic components that might give rise to deviation. The experiment was run over four sessions. Each bearing was measured three times, and the reference bearing derived as the simple mean of the measurements, rounded to 0.5°. Differences between the measurements and the mean were typically just 0.5°. The calculated standard errors – although questionable for samples sizes of just three observations with highly quantized values – were all less than one degree. The differences between adjacent bearings were checked using a Davis Mk 15 sextant. Again, multiple measurements were made, leading to mean standard errors in the measurements of less than 6 (minute of arc). The parallax errors [14, p. 52] due to use of a sextant at short ranges were less than 5 . The combined sextant parallax error and measurement standard errors were much less than the measurement precision for the reference bearings. Hence, the sextant observations were sufficiently accurate to check the measured reference bearings. Comparison of the measured sextant angles with the measured reference bearings showed that all of the latter were accurate to within their estimated standard error, although a total of five reference bearings out of 48 (12 in each of four experiment sessions) were adjusted by 0.5° to agree better with the measured sextant angles. After this analysis, it was safe to conclude that they were accurate to half of one degree, which compares favorably with the precision (resolution) of 1° for the augmented reality compass software used in the experiment.

Compass Errors in Mobile Augmented Reality Navigation Apps

125

3.2 Analysis of Errors: Observed Bearings The reading precision for the compass apps was one degree. In most cases, a threefigure bearing was shown on the screen; there was no opportunity to interpolate between markers on a compass rose. Only one set of observations was taken for each device before calibration. No attempt was made to mitigate small measurement errors, as the uncalibrated deviation curves were simply to illustrate the impact of calibration on the accuracy of the apps. Following calibration of each device’s compass, at least two sets of observations were taken. The measurements were highly reproducible for 12 of the 17 devices, with most of the measured bearings being identical, or differing by only a single degree. Given the precision of one degree, each pair of observations was simply averaged. For the remaining 5 devices, a third set of bearings were taken. The maximum variation was 2° for any marker, which is comparable with the precision. Although the readings were highly reproducible, there were some anomalous results, due to user error (e.g., looking at the wrong object), or to dazzle from the sun; users commented on the difficulty of seeing the screen when its aspect was within about 30° of the direction of the sun. Errors of this type could not be mitigated, but they should be noted as risks when using this kind of device as a bearing compass.

4 Results The differences between the observed bearings and the reference bearings compass are the deviation of the compass in the device, as mediated by the App used. The plot of deviation against known bearings (to each marker) is the deviation curve for the device. 4.1 All Devices As expected for magnetic sensors, the majority of deviation curves have a distinctive sinusoidal character. Some appear also to have higher frequency components, implying that the devices are subject to several sources of deviation. The amplitude of the deviation curves, although varying between devices, is at least a few degrees, and is over 10° in several cases. The key values for the deviation, summarized in Table 1, are misalignment or offset error; the amplitude of the deviation curve; and the root mean square error (RMSE) after the curve has been corrected for any misalignment error. Rather than standard deviations, the maximum values indicate clearly the severity of the observed errors.

126

D. S. Bowers Table 1 Mean/maximum deviation values. All data values in degrees.

Device

Misalignment

Amplitude

RMSE

Mean

Max

Mean

Max

Mean

Max

Calibrated

0.8

2

7.0

12

4.4

8.0

Uncalibrated

1.3

4

5.7

6.8

3.8

4.4

Calibrated

2.9

5

3.8

6.0

2.2

3.9

Uncalibrated

27

96

24

118

19

99

Tablets (n = 3)

iPhones (n = 6)

Android Phones (n = 8) Calibrated

0.9

−2

8.3

14

5.5

9.3

Uncalibrated

5.8

14

29

80

20

51

4.2 Android Phones The curves for the eight Android phones are shown in Fig. 2(left), and the corresponding summary data in Table 2. Not only do the curves in Fig. 2 have a sinusoidal form, but the curves for different devices are not in phase – that is, there appears to be no common underlying deviation curve. This also confirms the lack of deviation in the original reference bearings. 4.3 IPhones The summary data for each of the six iPhones are shown in Table 3, with the corresponding deviation curves in Fig. 2 (right). Although the deviation curves are less sinusoidal than for the Android phones, their underlying form is still sinusoidal. The participant using the iPhone 8 plus (blue curve) reported the most trouble with sun dazzle, so the resulting deviation curve could be distorted.

360

360

330

330

300

300

270

270

240

240

Heading (degrees)

Heading (degrees)

Compass Errors in Mobile Augmented Reality Navigation Apps

210 180 150 120

210 180 150 120

90

90

60

60

30

30 0

0 -15

127

-5

5

15

-15

-5

5

15

Error (degrees)

Error (degrees) Samsung S4

Sony xperia Z1 c

Redmi Note 3

Redmi Note 4

iphone 8 plus

iPhone SE

Samsung note 4

Samsung s6

iphone 8

iphone 6s 2

Alcatel 3

pixel x1

iPhone 5s

iphone 8

Fig. 2. Deviation curves for Android Phones (Left) and iPhones (Right) Table 2 Summary data for Android phones Offset Samsung S4

Amplitude

RMSE

10.00

6.37

Sony Xperia Z1 c

−2

9.00

5.81

Redmi Note 3

−2

10.00

6.94

Redmi Note 4

5.50

3.91

Samsung Note 4

13.50

9.32

Samsung S6

9.50

6.42 (continued)

128

D. S. Bowers Table 2 (continued) Offset

Amplitude

RMSE

−1

4.50

2.76

4.00

2.15

Mean

−1.67

8.25

5.46

Median

−2.00

9.25

6.09

Alcatel 3 Pixel x1

Table 3 Summary data for iPhones Offset

Amp

RMSE

iPhone 8 plus

2

6.00

2.99

iPhone SE

5

5.75

3.89

iPhone 8

2

1.50

0.96

iPhone 6s

3

3.25

1.65

iPhone 5s

3

4.25

2.47

iPhone 8

1

2.00

1.40

Mean

2.67

3.79

2.23

Median

2.50

3.75

2.06

4.4 IPads The three iPads in the sample form a group that was too small to be statistically significant. Figure 3 shows their measured deviation curves. The (rather aged) iPad 2 was the only device for which the deviation curve appeared worse – with a larger amplitude – after calibration compared with before calibration. 4.5 Different Apps on the Same Device A further set of data, for different apps on a single device, is shown in Table 4 and Fig. 3 (right). The “reference” app, Compass Eye [3], was used first, then four other apps, and then the reference app again. The two (different!) deviation curves measured for the reference app are shown as thicker lines (orange and blue) in Fig. 3. The various apps have differently shaped deviation curves, with a greater amplitude than those of the reference app. Also, the reference app had a worse deviation curve when it was used a second time, an hour after its first curve was measured.

360

360

330

330

300

300

270

270

240

240

Heading (degrees)

Heading (degrees)

Compass Errors in Mobile Augmented Reality Navigation Apps

210 180 150 120

210 180 150 120

90

90

60

60

30

30 0 -5

0 -15

-5

5

15

0

5

10

15

Error (degrees)

Error (degrees) 2018 iPad

129

compass eye 1 AR compass commander compass compass dig bearing compass compass eye 2

iPad pro 18

iPad2

Fig. 3. Deviation curves for iPads (Left) and different Apps on a single 2018 iPad (Right)

Table 4 Summary data for different Apps on a 2018 iPad App

Misalignment

Compass Eye (Reference 1)

4.5

Alternatives

3.7

Compass Eye (Reference 2)

3.4

Mean

Max

Amplitude Mean

Max

1.5 5.5

6.0 3.75

RMSE Mean

Max

0.85 6.75

3.9

4.7

2.23

4.6 Uncalibrated Results Table 5 shows data for the devices prior to explicit calibration. The deviation errors were so large that the deviation curves would be difficult to interpret. As previously, the root mean square error is calculated after the observed bearings have been corrected for the linear offset. Thus, devices such as the Samsung S4, which has an amplitude for the deviation curve of nearly 80° has a RMS error of over 50°,

130

D. S. Bowers

whereas the second iPhone 8, with an offset of 96° but an amplitude of only 6.25 has an RMS error of 4.59. However, these statistical manipulations do nothing to disguise the large deviation errors in these devices prior to calibration. Table 5 Summary data for devices prior to calibration Android phones

offset

amp

RMSE

Samsung S4

−6

79.5

50.82

Sony Xperia Z1 c

14

22.75

17.67

Redmi Note 4

2

9.5

6.50

Samsung Note 4

2

13.5

8.90

Samsung S6

1

5.75

3.79

Alcatel 3

4

3

1.90

Pixel x1

−11

69

51.32

Mean

5.71

29.00

20.13

iPhone 8 plus

0

2.25

1.17

iPhone SE

7

4.25

2.93

iPhone 8

22

117.5

99.37

iPhone 6s

5

3.75

2.70

iPhone 5s

3

7

3.34

iPhone 8

96

6.25

4.59

Mean

26.60

23.50

19.02

iPhones

5 Discussion The discussion of these results falls under several headings: scale of the deviation data; differences between iPhone and Android phones; the effect of calibration; and differences between apps on the same device. 5.1 The Scale of the Deviation Curves For a “real” hand-bearing compass, as used by mariners, the expected accuracy is about 1°, with no observable deviation in the compass itself. The reading precision should be half a degree. Ignoring for the moment the misalignment errors for the devices in this experiment, “calibrated” deviation curves with an amplitude of 4 or 5° are worrying. At two kilometers, a 5° error corresponds to a displacement of more than 150 m, which could be very serious if the device is being used for navigation. This area will be explored further in the next section.

Compass Errors in Mobile Augmented Reality Navigation Apps

131

5.2 IPhone Versus Android Phones Other authors have reported differences between the magnetometers in iPhones and Android phones [9]. This difference is seen also in this experiment, and appears to be significant (two-tailed t-test: t < 0.01). However, some iPhones were worse than some Android phones. A more extensive study would be needed to determine which brand of phone might be best, and under which circumstances. But “best” might still not mean “good enough for navigation”! 5.3 Calibration Calibration can lead to dramatic improvements in the observed error curve. Uncalibrated deviation curves with an amplitude of more than 90°, or misalignment offsets of over 90°, do not inspire confidence. However, standard calibration, by moving the device smoothly in a figure of 8, appears not always to be beneficial. For five of the devices – an iPad, two iPhones and two Android phones – the deviation curve was worse after calibration than before. In calibration, the rate of change of measured aspect is compared with the rate of rotation measured by the gyroscope; but this can never detect linear offsets. So, no matter how often a device is (re-)calibrated, it may still have an undetected offset error. Furthermore, calibration seems not to persist. For the reference app on a single device, the deviation curve was noticeably different after less than one hour. So, not only is calibration crucial – if it were possible to find an unambiguous, effective, calibration mechanism – but it appears also that it needs to be repeated far more often than any navigator would be likely to contemplate. 5.4 Differences Between Apps on the Same Device The differences in deviation for the five apps tested on a single device are worrying. There would seem to be two effects: First, different apps interpret the output from the magnetometer differently. The calculations are complex – interpreting the magnetic field measurements on three axes, taking into account the aspect of the device, and also its geographical location, which determines the dip and the variation of the magnetic field. Different apps should not give different answers – the calculations are deterministic and should produce just one result for any given position and orientation of the device. Perhaps standardized algorithms should be encapsulated in the drivers for the magnetometer. The second apparent effect is the “fading” of calibration. Since the calibration is in software/data, rather than any physical adjustments to the device, it also impacts directly on the calculations. Furthermore, there is no clarity about where the calibration data are located – and, indeed, whether or not they are specific to a particular app. The variation between apps must reduce further trust in smartphone compasses. 5.5 Other Observations Several other observations were made during the conduct of these experiments that might be relevant for those developing mobile mAR apps:

132

D. S. Bowers

• For many of the smartphones, the mAR compass app consumed a lot of battery capacity (up to 30%) in an experiment lasting just 15 min; • It was very difficult to use an mAR compass within 30° of the sun; • “Pinch zoom” would have helped locate correct objects in the camera view; • Large, clear readouts are essential, colored to contrast with the image. 5.6 Implications for Reliability of mAR Compasses Overall, mAR compass apps are impressive, and their polished interfaces seems to imply accuracy. This is reinforced by the apparent reproducibility of compass bearings, at least over a few minutes. They appear to the user to be stable, easily read (except towards the sun) alternatives to traditional magnetic compasses. The apparent stability of the apps could well lull users – and developers – into a completely false sense of security. mAR compasses are prone to deviation, which can be significant, and could be dangerous, as we shall explore in the next section. Measuring the deviation errors for simple mAR compass apps has allowed us to measure the deviation for the embedded compasses, since the compass apps do not attempt to “register” the camera image against the real world. The severity of the deviation appears to be far too great to tolerate in a navigation instrument. Even for calibrated devices, the amplitude of the deviation curve is likely to be of the order of 4 to 8°, with an additional linear offset of perhaps a couple of degrees. For mAR navigation apps which rely on the internal compass, the magnitude of the observed deviation is a real problem, and could cause a serious maritime accident.

6 Compass Errors and Navigation Even with multiple satellite navigation systems, compasses are still vitally important for the navigation of small craft (yachts, motor boats and fishing vessels). Hand bearing compasses are used for determining position and locating landmarks. Mis-identifying a landmark can magnify the original compass error significantly. We conclude this discussion by storyboarding one possible approach to mitigating, in an AR navigation app, the deviation errors inherent in smartphones and tablets. 6.1 The Function of Hand Bearing Compasses The function of a hand bearing compass on a yacht or similar small craft is not the same as that of the steering compass. Rather than indicating the direction of travel of the yacht, a hand bearing compass is used to define a direction, typically to locate a landmark that will be helpful to the navigator. Landmarks may be targets towards which the yacht might be steered, or, in combination with a second landmark, can define, very precisely, an approach direction to be followed to avoid hazards. Landmarks are not always easy to identify, especially if the yacht is not already on the correct approach path; unlike large ships, the only constraint on a yacht’s course is usually that there is sufficient water to float. A yacht navigator will often be looking for

Compass Errors in Mobile Augmented Reality Navigation Apps

133

a particular landmark before they have reached the “proper” approach direction, and, most likely, to one side or the other of the direction in which the yacht is heading. Since most landmarks are marked on charts, it is simple to measure on the chart the direction from the yacht’s current position to a landmark; a satellite navigator or electronic chart plotter can provide the direction directly. It should then be a matter of looking in the required direction, as shown by a hand bearing compass, to locate and identify the landmark. So, for example, one might be looking for the lighthouse on the left of Fig. 4, which is somewhere in the coastline on the right (in the center of the red circle).

Fig. 4. Front lighthouse at St Malo (Left) and St Malo Coastline (Right)

6.2 Registration at Sea Given that an mAR compass will suffer from significant deviation, such an app could direct the navigator to the wrong building in the landscape, which could be dangerous. In urban environments, for more sophisticated mAR apps, the standard approach is to register the camera image against a model of the real world. For coastal navigation, however, registration is extremely challenging, to the point that it really is not reliable. The confounding factors are multiple. Firstly, the whole of the foreground is moving, with surface waves, causing considerable difficulty for image processing. Secondly, the coastline – which often contains all of the landmarks – is viewed from a distance, and occupies only a small vertical angle within the image. In Fig. 4, which is heavily cropped, the whole of the coast occupies less than 3° of vertical angle, or about 10% of the vertical field of view; more generally, at the distances over which navigators are seeking to locate landmarks, the coast could be even less of the image. At 5 km (~3 miles), a 100 m high coastline would subtend less than a degree of vertical angle. Furthermore, the majority of landmarks are likely to be contained within the visible coastline, rather than on it, and therefore even smaller portions of the image. Finally, everything is moving, because of the motion of the boat in the waves. The motion of a boat is asymmetric and punctuated by shocks, as the boat “slams” into waves – the kind of motion almost designed to confuse any inertial sensors that might help with image stabilization. Furthermore, the device on which the app is running will be held by the navigator, who may not be braced effectively against the boat, so the boat’s motion will be amplified by the movement of the navigator’s arms. Doubtless, there will be some circumstances in which registration will be possible. However, navigation is a safety-critical activity, so any navigation app must be robust against the (many) occasions when registration is not reliable, or is perhaps impossible.

134

D. S. Bowers

6.3 Risk of Error Magnification One key use of landmarks is to define a leading line (approach vector) to avoid hazards whilst entering a harbor, or clearing the obstacles that lie off a headland. Defining a leading line often involves identifying and aligning two separate landmarks, which may not be easy to spot, and for which a hand bearing compass is likely to be employed. The lighthouse in Fig. 8 is actually the outer (front) mark on the leading line to enter the harbor of St Malo, France. In Fig. 5, it corresponds to point B. The leading line is defined by aligning that lighthouse with a second one, further inland, at point A. The distance between A and B is 0.9 Nautical Miles (NM). The leading line defined by these two lights is actually used by vessels – both ships and yachts – approaching the port from some point D, several miles out, to point C, 3.2 NM from B.

Fig. 5. Schematic layout of leading line

Consider now the task of locating and identifying the outer mark during the approach towards point C. If the compass has a deviation error of, say, 3°, then a building that looks similar to the actual mark might be “identified” as the mark, even though it is actually at a position B’. Using the approximation. tan θ ≈ θ ≈ sin θ for θ less than about 0.1 rad (≈ 5.73°), the distance B-B is approximately 3.2 ∗

3 ≈ 0.17NM 57.3

or just over 300 m. The main impact of that offset is the use of the “wrong” front mark for the leading line. An error in identifying the front mark B due to compass error θ gives corresponding error in the leading line direction, ϕ, where ϕ ≈ θ. ϕ ≈3∗

d2 d1 3.2 0.9

≈ 10◦ The green areas in the chart of the approach to St Malo, in Fig. 6, are rather unpleasant rocks. The red blocks A, B, C and D correspond to the same letters in Fig. 5. An error

Compass Errors in Mobile Augmented Reality Navigation Apps

135

Fig. 6. Chart of approach to St. Malo (Extract from BA 2669 - The Channel Islands)

of 10° in the leading line, if pursued, would actually lead the navigator to approach the port over several of those rocks – or even over land! Even if the compass deviation error is “only” 3°, two factors exacerbate the situation. First, the mAR compasses appear to be very reproducible, so that successive readings are often the same, whereas the precision of a magnetic compass is readily improved simply by repeating the reading. Second, should the app actually attach labels to landmark (as a “point of interest”), possibly in the wrong place, the label itself would add a spurious authority to the identification of a landmark. 6.4 Augmented Reality Navigation Apps There are several mAR apps on the market, such as Spyglass [2] and NV Chart APP [15], which claim not only to be as accurate as a professional hand bearing compass, but also to be able to label landmarks. There is no documentation available that indicates whether or not any attempt is made to mitigate the deviation errors in the internal compass or the difficulty of registration at sea. Indeed, private email communication with one company suggests that they may be unaware of the scale of the possible deviation, and apparently do not appreciate the potential impact. Having demonstrated the scale of deviation errors, and indicated why registration would be challenging and potentially unreliable, we conclude this section by proposing, in a “storyboard”, a possible design for an mAR navigation app that harnesses the power of augmented reality whilst mitigating the associated risks of accuracy. Figure 7 shows the approach to St Peter Port in the Channel Islands from the North, through the “Little Russel”. The apparently innocuous stretch of water is actually littered with rocks, and there are fast-flowing tides and cross-currents.

136

D. S. Bowers

In order to maintain situational awareness, and to be able to correct for tidal drift – to avoid some of the more unpleasant obstacles – navigators need to locate and monitor three landmarks, circled in red in Fig. 7, and shown in close-up in Fig. 8.

Fig. 7. Approaching St Peter Port along the Little Russel, from the North, with Three Key Landmarks Highlighted

Fig. 8. Key landmarks for the approach to St Peter Port along the Little Russell: From Left to Right, The Brehon Tower, Roustel and Platte

Unfortunately, as is often the case in real life, two of the three landmarks are very difficult to see, and yet knowing where they are is important for safe navigation. Also, in the slightly reduced visibility captured in Fig. 7, the land is an amorphous grey, with little visible structure, and two of the landmarks merge into the background. It is a daunting first-time approach, even for an experienced yachtsman: it should be an ideal situation for an mAR app to assist the worried navigator, by highlighting the position of the three landmarks. Although, in this particular case, the presence of a prominent headland might allow registration of the image, this is an exception, resulting from the approach being along the coast of the island (Guernsey). In the general case, the approach is likely to be more perpendicular to the coast, with little scope for registration. There would still be the problems of movement of the foreground (surface waves) and of the image as a whole. It would be unhelpful – if not dangerously misleading – to have an app that indicated the positions for points of interest, but not whether or not registration has been possible.

Compass Errors in Mobile Augmented Reality Navigation Apps

137

Since registration is likely often to be infeasible, an mAR app should specifically assume that registration has not been possible. As this implies that it must rely on the device’s internal compass, the app must be explicit about mitigating deviation errors. To explore further the potential impact of deviation errors on the accuracy of markers for landmarks, it is important to appreciate the relatively small angles between the landmarks, and the even smaller “scope for error” available if a yacht strays outside the safe water indicated by those landmarks. Figure 9 is an extract from the British Admiralty Chart BA0808, East Guernsey Herm and Sark, showing the position of the yacht when the image in Fig. 7 was taken. The yacht’s position, at the top right of the image, and the three landmarks are marked by red flags. The three blue lines from the yacht’s position to the landmarks show how small the angles are between the three bearings: the angle between the Brehon Tower and Roustel is just 4°, and that between Roustel and Platte 10°. With deviation errors of the order of 5°, an mAR app could easily mis-position labels for the three landmarks so that they “identify” the wrong objects. Furthermore, the areas outside the lines from the yacht to Platte, and from the yacht to the Brehon Tower, are full of green areas, representing rocks. The scope for error is less than one degree.

Fig. 9. Extract from Chart BA 0808 Showing Little Russel approach to St Peter Port

138

D. S. Bowers

In Fig. 10, the three markers are supposed to indicate the positions of the Brehon Tower (A), Roustel (B) and Platte (C). Unfortunately, “B” is actually pointing to the Brehon Tower, while A is “identifying” a rock in amongst several others.

Fig. 10. Markers for the Three Landmarks… in the Wrong Positions

A classic approach to indicate the extent of the possible errors would be to use error bars, as suggested in Fig. 11. However, there are two problems: first, the error bars overlap, confusing the presentation, and second, the app itself will not know the deviation error, since it will depend on the orientation of the device, the calibration state, the battery level and probably several other factors beyond the app’s control.

Fig. 11. Putative error bars on the three markers

Not only do the error bars overlap, but they are also all the same length. The deviation error is dependent on the orientation of the device, not on where a marker is positioned within the image. So, the correction needed for each of the markers is the same: if one marker could be aligned correctly with its target, and the same correction applied to the other two, then all three would be correct. This could help the navigator locate the two landmarks that are difficult to see. One might connect all of the markers with a “yoke”, and provide “nudge arrows”, as suggested in Fig. 12, to allow the user to move the set of markers to the left or right. So, the three markers could be adjusted to the right, as a group, until A points, correctly, to the Brehon Tower; B would then be over Roustel, and C over Platte.

Fig. 12. Markers for three landmarks connected by yoke, with Nudge Arrows

Compass Errors in Mobile Augmented Reality Navigation Apps

139

The question then is, how could the navigator know when any of the markers were aligned correctly? To help with this, we look again at the chart extract in Fig. 9 which, rather unusually for British Admiralty charts, includes monochrome sketches of the three landmarks. In the box of six sketches at the center left of the extract, Platte is the bottom left, Roustel is top center, and the Brehon Tower is bottom right. The sketches are included on the chart because their recognition is so important for navigation – and also because they are not standard navigation marks. Providing a sketch or image of a landmark, to help the navigator recognize it, harnesses what Adorni refers to as the human ability to “perform mysteriously well on [landmark] recognition tasks” [16]. Indeed, by comparing the close-up images in Fig. 8 with the position of the markers in Fig. 10, the marker for the Brehon Tower is “obviously” in the wrong position. But, without those reference images, would a navigator unfamiliar with the area have been able to reach the same conclusion? This suggests that a useful addition to the mAR display might be inset images of the three landmarks, as in Fig. 13.

BT

R

P

Fig. 13. A possible mobile AR display including reference images, landmark markers, yoke and nudge arrows.

6.5 Questions to Address Inevitably, proposing a design such as that in Fig. 13 for an mAR app raises many questions to address before the design could be a solution to the compass deviation problem. These questions, which indicate only the initial scope of the necessary investigations, are mentioned only in general terms here. For example, the overall format of the display and its usability, non-ambiguity and robustness would need extensive user testing and evaluation. Obvious issues include, for which landmarks reference images and markers should be displayed, and whether they should be selectable or determined automatically by the app. How that selection should change as the aspect of the device is changed could also be problematic.

140

D. S. Bowers

Another issue would be that the deviation of the device’s compass will change with its aspect: so how far could the device be rotated before the manual alignment was no longer valid? For deviation curves of the form A*sin (θ + k), where A and k are constants, the maximum rate of change of the deviation error with aspect is simply A (degrees per radian) (i.e., MAX [A* cos (θ + k)]). For a deviation curve amplitude of 10°, the maximum rate of change of deviation with aspect would be 10° per radian, or less than 10° in 57. So, a change in aspect of 10° would change the deviation by less than 2°. A deviation amplitude of 10° is at the top end of those reported in this paper, but for a particular device it could be greater, particularly if it is uncalibrated. If reference (inset) images are helpful, there are questions about the appropriate direction and distance from the landmark for each reference image; under what lighting conditions they should be shown – as landmarks can appear very different in overcast weather and bright sunlight; and whether weather conditions, e.g. rain or cloud, would be important. These questions are the subject of ongoing investigation by the author. There would also be feasibility questions about compiling and deploying a corpus of reference images, impacting the viability of the overall approach. Having identified issues still to be addressed, it should be emphasized that something needs to be done immediately to avoid the risk of a navigator being misled by an app which appears very polished, and asserts with apparent authority that a landmark that is crucial for safe navigation is actually in the wrong place. Simply warning users about the risk of deviation errors is insufficient – errors of the magnitude reported here are much greater than any yacht navigator will have encountered in a steering compass.

7 Conclusion AR offers exciting opportunities to assist navigators in unfamiliar situations. However, the polished appearance of AR navigation apps, and their apparent reproducibility and precision, can generate a false, and potentially dangerous, sense of security. In situations such as coastal navigation, or other open-field applications, registration is usually infeasible. Hence, an AR system is completely reliant on sensed orientation. mAR apps suffer further from the limitation that the only possible direction sensor is the embedded compass, since, by definition, the device is not attached to the vessel, and therefore not able to link to any navionic data such as (corrected) compass orientation. But the internal compasses within mobile devices are prone to deviation errors which can be large compared with the discrimination required to place markers for navigational points of interest. Furthermore, there would seem to be issues with the calibration of compasses in mobile devices, and, in particular, the persistence of any calibration action. The deviation errors for uncalibrated devices can be enormous. Open-field navigation is safety-critical, requiring high reliability rather than occasional success. It follows that app developers must understand the limitations due to deviation, to warn prospective users, and need to find a safe design for mAR apps that are to be used for coastal navigation, in particular, and for open-field navigation in general, whenever there is any restriction on the possibility or reliability of registration. These issues do not arise with quasi-static AR systems installed in ships’ bridges or plane cockpits, where the direction of view of an AR system is related directly to the

Compass Errors in Mobile Augmented Reality Navigation Apps

141

(corrected) compass orientation of the ship or plane [17]. For such systems, compass deviation should not be a problem. Nor are we considering immersive AR environments, which would be impractical for yacht navigation, since the navigator needs to maintain peripheral vision and the ability to switch visual contexts frequently. We have outlined one possible approach to be taken in the interface for a mobile AR solution that involves the navigator – that is, the user – in correcting for deviation errors. This approach has the significant added benefit of ensuring that the user maintains full situational awareness, rather than focusing purely on a computer display. There remain several questions to explore, both in the proposed design, particularly its usability and robustness, and there may also be questions about feasibility, construction and dissemination of an appropriate image database. In the short term, this paper should serve to raise awareness of the serious risks of relying on non-registered mAR solutions for navigation. One would hope that the issues raised would be addressed in the appropriate design of future mAR navigation apps, or in technological solutions to the underlying problem of compass deviation.

References 1. Shimizu, H.: AR Compass Pro. https://appadvice.com/app/ar-compass-pro/1412221726. Accessed 11 Nov 21 2. Ahafonau, P.: Spyglass. http://happymagenta.com/spyglass/. Accessed 11 Nov 21 3. Pocket Mariner: Compass Eye. https://pocketmariner.com/mobile-apps/compass-eye/. Accessed 11 Nov 21 4. Jeon, J.-S.: A Study on implementation of the mobile application of aid to navigation using location-based augmented reality. J. Navig. Port Res. 43(5), 281–288. https://doi.org/10.5394/ KINPR.2019.43.5.281(2019) 5. Azuma, R.T.: The challenge of making augmented reality work outdoors. In: Ohta, Y., Tamura, H. (ed.) Mixed Reality: Merging Real and Virtual Worlds. Springer. Chp 21 pp. 379–390. ISBN 3-540-65623-5 (1999) 6. Krevelen, D., Poelman, R.: A survey of augmented reality technologies, applications and limitations. Int. J. Virtual Real. 9(2), 1–20 (2010) 7. Reitmayr, G. et al.: simultaneous localization and mapping for augmented reality. In: International Symposium on Ubiquitous Virtual Reality, Gwangju, pp. 5–8 (2010). https://doi.org/ 10.1109/ISUVR.2010.12 8. Blum, J.R., Greencorn, D.G., Cooperstock, J.R.: Smartphone sensor reliability for augmented reality applications. In: Mobile and Ubiquitous Systems: Computing, Networking, and Services (MobiQuitous 2012), pp. 127–138 (2013) 9. Lucie, N., Pavlis, T.L.: Assessment of the precision of smart phones and tablets for measurement of planar orientations: a case study. J. Struct. Geol. 97, 93–103, ISSN 0191-8141, https://doi.org/10.1016/j.jsg.2017.02.015 (2017) 10. Whitmeyer, S.J., Pyle, E.J., Pavlis, T.L., Swanger, W., Roberts, L.: Modern approaches to field data collection and mapping: digital methods, crowdsourcing, and the future of statistical analyses. J. Struct. Geol. (2018), ISSN: 0191-8141, https://doi.org/10.1016/j.jsg.2018.06.023 11. Katsiokalis, M., Ragia, L., Mania, K.: Outdoors mobile augmented reality for coastal erosion visualization based on geographical data. In: Cross-Reality (XR) Interaction, ACM ISS 2020 (International Workshop on XR Interaction 2020) 12. Doerfler Ron: Magnetic Deviation: Comprehension, Compensation and Computation (Part I). https://deadreckonings.com/2009/04/18/magnetic-deviation-comprehension-com pensation-and-computation-part-i/. Accessed 12 June 19 (2009)

142

D. S. Bowers

13. Bowers, D.: Augmented reality smartphone compasses: opportunity or oxymoron? In: Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and the International Symposium on Wearable Computers, 9–13 Sep, London, ACM (2019) 14. Mills, G.B.: Analysis of random errors in horizontal sextant angles, Naval Postgraduate School Monteray, CA. https://core.ac.uk/download/pdf/36712161.pdf (1980). Accessed 10 Feb 2020 15. NV Chart Group GmbH: NV Chart App. https://eu.nvcharts.com/digital-charts/nv-chartsapp/ (2021). Accessed 10 Nov 21 16. Adorni, G., Gori, M., Mordonini, M.: Just-in-time landmarks recognition. Real-Time Imaging 5, 95–107 (1999) 17. Gernez, E., Nordby, K., Eikenes, J.O., Hareide, O.S.: A review of augmented reality applications for ships bridges. Necesse 5(3), 159–186, https://hdl.handle.net/11250/2721515 (2020)

Virtual Reality as a Powerful Persuasive Technology to Change Attitude During the Spread of COVID-19 Sara Alami(B) and Mostafa Hanoune Laboratory of Information Technology and Modeling, Faculty of Sciences Ben M’sik Hassan, II, University of Morocco, Casablanca, Morocco [email protected]

Abstract. Virtual Reality is a powerful simulation tool that has the potential to play a key role in several areas. In this article, we focus on different simulation tools and on different ways of providing negative feedback for persuasive purposes through simulated experiments. The persuasive objective that we consider concerns the persuasion towards the right behavior, we are based on the emotional aspect in order to introduce a fairly high level of anxiety as negative feedback with techniques that we will address in what follows, to fight against the spread of COVID-19, in order to strengthen the safety and health of citizens. In general, this research aims to develop a VR application that aims to change the behavior of citizens and encourage persuasive researchers to explore immersive virtual environments. Keywords: Behavioral change · Persuasive systems · Virtual reality · Simulation · Mixed reality · Augmented reality

1 Introduction Simulation proves itself as a very successful tool able to change people attitude and behavior by using powerful simulation technologies like virtual reality (VR) which can immerse users in new experience making them able to touch and also live consequences of their decision and actions, simulating a virtual world in which people have to mentally transport themselves and simulate objects. In terms of research, articles that explore and explain aspect of VR technology are scarce. On the other hand, we note that persuasion technology literature focuses mainly on non-virtual reality and considers it a more effective means of persuasion. The scarcity of studies was noted by mainstream persuasion literature, for that Guadagno and Cialdini [1] encourage researchers to explore more and more immersive virtual environments.

2 Simulation Starting with B.J. Fogg who propose three important categories of simulation in persuasive technologies: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 143–158, 2022. https://doi.org/10.1007/978-3-031-10467-1_7

144

S. Alami and M. Hanoune

2.1 Simulated Cause and Effect Scenarios It simulation can immediately show the link between cause and consequence of a behavior, and that can help persons to change their attitudes or behaviors, the goal is to simulate experiences without seeming to preach for example eating hamburgers and fries every day can lead to heart disease in the future, smoking increases the risk of cardiovascular disease but the effects are not immediately visible in the real world. Simulating cause and effect could make this link clear and convincing, prompting you to change your habits. This kind of simulation is considered as a powerful persuasive tool because it makes you able to explore the relation between your behavior or attitude and the consequences of it in a very short time, and also the ability to convey the effects in a vivid and credible way. Because of those two previous points this type of simulations allows users to better understand the likely consequences of their attitudes or behaviors. 2.2 Simulated Environments This type of simulation creates a virtual environment where people are mentally transport into it. It offers a new controllable place to explore new behaviors and perspectives, and create situations where people who adopt such behavior are rewarded in order to be motivated, also it allows persons to try a specific behavior, control their self-control or when they first encounter anxiety-provoking situations and taking another person’s point of view by living it. 2.3 Simulated Objects Object simulations do the opposite of what we have discussed in the previous point: these one simulates features that accompany users in a real environment. This approach allows users to effectively touch the consequences of what they are simulating on their daily life. what make technologies based on simulating objects so powerful in convincing users is the fact that they are able to fit in their daily routine, also the low dependency on imagination or the suspension of disbelief, and the ability of demonstrate clearly the impact of certain attitudes or behaviors. There are several tools to make the simulation: Virtual Reality (VR) When we talk about virtual reality, we immediately think of VR helmets which are becoming more and more popular in a wide range of sectors such video games, immovable, industry, events…, it actually includes all technologies that immerse a user in an artificial environment, whatever the meaning involved [2]. Virtual reality introduce to its users a computer-generated environment composed by scenes and objects that look and feel so real in order make them able to try consequences of their actions in vivid way, it also give them the impression of being immersed in their environment [3].

Virtual Reality as a Powerful Persuasive Technology

145

3 History The beginning was in 1962 with Morton Heilig who invented an immersive cinema presenting five short films that allowed us to better feel the scenes of the film using the different senses that we have with the help of a fan, smells and a vibrating seat [4]. After six years, and precisely in 1968, the Ultimate Display HMD, the first VR (Fig. 1) helmet was invented and named by “sword of Damocles”, This term is used to describe a particularly painful situation, this helmet was very heavy hence the need to use a mechanical arm to hold it, and also preventing the user from moving freely.

Fig. 1. The first VR prototype.

In the previous paragraphs we have focused on the first innovative VR helmets, now moving on to another important component gloves or “datagloves” invented by Thomas Zimmerman in 1982, research engineer at the Atari Research Lab, those gloves able to retranscribe in real time the movements of the hand in the virtual universe, thanks to sensors (fiber optics), the latter will be modernized afterwards but the laboratory was closed by the Atari company of games and consoles video in 1983, and that what pushed the founder of these gloves to found his own research lab called VPL Research in order to commercialize the DataGlove in 1984. In 1985 NASA began using the principle of virtual reality in astronaut training. For this raison it developed the VIEW helmet (Virtual Interface Environment Workstation) which is compatible with the dataglove gloves, it was able to allow astronomers to experience a complete real-life immersion. 1989 Virtual reality nevertheless continues to develop in the playful video environment with the arrival of the Nintendo Power Glove. Developed in partnership with VPL Research, this glove controller is based on the principle of the DataGlove. Unfortunately for Nintendo, the project was a commercial failure due to the limited number of games available and limited use. 1995 Nintendo tried again to stand out on the market of virtual reality, this time with a helmet named the Nintendo Virtual Boy. But this one is not perfect and has many drawbacks to the gaming experience: it is based on a foot system, the games available are too poor in terms of quality and they are only in two colors, red and black. Added to that users were complaining of kinetosis, Nintendo withdrew the helmet from the market in less than a year after its release, and that makes the Nintendo Virtual Boy the least sold Nintendo system. After these commercial failures, the virtual reality project was abandoned by companies for a long period of time.

146

S. Alami and M. Hanoune

2012 for a new project was born: The Oculus Rift by the Luckey Palmer Company. Virtual reality then accelerated enormously especially in 2014 with the arrival of major companies such as Facebook with the purchase of the Oculus Rift, Sony with the announcement of the future marketing of its Playstation VR headphones on PS4. Google also presents the Google Cardboard, a simple cardboard in which to slide one’s phone to create a headset with a very small budget. Samsung continues the development of headsets for smartphones with the announcement of the Samsung gear VR,compatible obviously only with Samsung phones. In 2015, HTC in partnership with Valve presented the HTC Vive headset, a headset with two controllers, allowing the player’s hand movements to be included without being too cumbersome. By 2017, all these projects are on the market, but the HTC Vive is to date the most accomplished headset, offering one of the best immersions while imposing the least inconvenience. 2020 using augmented reality become very simple, you need just to wear simple glasses instead of using smartphones in front of your eyes (take the game “Pokémon go” as example). This year the manufacturer Nreal launch the first consumer AR glasses, it is a powerful device (with high specifications such as Snapdragon 855 or higher chips multiple cameras, a 1080p display, and offer spatial tracking functionality) compatible with most smartphones, able to be connects to them using USB type C and offer a new and better augmented reality experiences using the 5G network.

4 VR Explication In our days it is clearly seen that Virtual Reality play a key role in many sectors: • • • • • • •

Industries [5]. Retail [6, 7]. Tourism [8]. Education [9]. Healthcare [10]. Entertainment [11]. Research [12, 13].

Many reports show that sales of VR Head Mounted Displays (HMD) for the first time hit the $1 million cap [14] alongside these studies prove it hits $9.1 million since 2021, after that it was 1.5 million in 2017 [15]. The same reports also show that young people are more inclined and interested in this technology [16]. upcoming releases of standalone VR HMD (For example, Oculus GO, HTC Vive Focus) are coming in the future with new features and declining prices and that makes this technology accessible to most people [17, 18].

Virtual Reality as a Powerful Persuasive Technology

147

Virtual reality adds an additional dimension to the different sectors and creates a new interactivity through entertainment which provokes attention and a particular interest in the proposed environment [19]. According to this report (Fig. 2) we notice that in 2016 VR is more used in the sectors of sports, marketing, advertising, gaming, and televisions. Also, in education - VR uses principles of active pedagogy which transforms the passive student into an actor. That actor will feel more involved in his training program and his attention to the courses becomes stronger [20]. Whereas in 2019 VR is gone further in the health sector - when it is associated with medical techniques, we put in mind phobia therapy (fear of heights, driving anxiety, arachnophobia, fear of crowds,…).

Fig. 2. VR usage report in different areas between 2016 and 2019.

The commercial application segment dominated the global virtual reality market with a share of over 53% in 2020 as a shown in the previous figure (Fig. 3) and is anticipated to maintain its lead for the next seven years. The growing adoption of VR headsets in the commercial sector, such as retail stores, car showrooms, and real estate, is providing new growth opportunities to VR companies. The rapid penetration of smartphones has resulted in the vast application of VR technology in the commercial segment. Many companies are incorporating VR technology for introducing their new products to reach the masses. For instance, in April 2020, AUDI AG, a luxury automobile manufacturer, declared that it will unveil the Audi e-Tron Sportback using the VR event, Virtual Market 4, as a digital platform dedicated to its customers [21]. The healthcare segment is expected to witness the fastest growth rate from 2019 to 2028, which is attributed to the wide spectrum of opportunities for VR in the healthcare sector, such as in medical learning and training, medical marketing, and disease awareness. Companies, such as Ossi VR and Immersive Touch, offer VR solutions to train surgeons and medical students. There is an on-going demand for emergency training, virtual surgeries, and VR anatomy applications to help educate medical professionals with more precision. The consumer segment is estimated to have a significant growth due

148

S. Alami and M. Hanoune

Fig. 3. VR usage report in different areas from 2020.

to the increasing demand for VR technology in the gaming and entertainment industry. The need for effective enterprise training and better communication and collaboration tools across organizations is driving the growth of the enterprise segment.

5 Equipment Used in Virtual Reality • Virtual World A virtual world is a three-dimensional environment that is often, but not necessarily, realized through a medium (i.e. rendering, display, etc.) where one can interact with others and create objects as part of that interaction. In a virtual world, visual perspectives are responsive to changes in movement and interactions mimic those experienced in the real world [22]. • Immersion Virtual reality immersion is the fact of being present in a virtual world. Introducing to the user’s brain a fake sense of presence which make it believe that what he see is real, to reach the total immersive level enough senses must be activated in order to teleported the user mentally and put him in a virtual word [23]; there are two common type of immersion: mental immersion and physical immersion. • Sensory Feedback As we said before, to immerse totally in a virtual world requires using most of our senses including vision, hearing, touching and more other senses. In order to simulate properly those senses we need a sensory feedback, which can be reached through the

Virtual Reality as a Powerful Persuasive Technology

149

use of specific hardware and software that considered as the key components to build a basic virtual reality system: • • • •

Head mounted displays (HMD). Special gloves or hand accessories. Hand controls [24]. Interactivity

the interaction is an essential element for introducing a good and impressive virtual reality experiences for the user’s thing that make them interchange naturally with the virtual environment. In [25] senses of immersion will remain during the virtual experience if the environment where the user immerse is responding to his action in manner, if not the human brain will notice that so quickly and the sense of immersion is going to decrease. when we talk about virtual environment responding we must take into our consideration how the user moves around, how he changes his viewpoint and also the movement of his hands and head [26]. Emerging Technologies Virtual reality (VR), augmented reality (AR) and mixed reality (MR) are emerging technologies based on the integration of virtual digital elements into the real world: superimposed (AR), interacting with elements of this real environment (MR), or in total immersion of the user in a virtual universe cut off from the real world (VR). Virtual Reality (VR for virtual reality) encompasses all the immersive experiences available via a virtual reality headset (also called HMD for head-mounted display). The user is completely cut off from the real world, in a virtual environment visible through the virtual reality headset. Mixed Reality is a combination of real-world objects and virtual objects [27]. The term encompasses a wide range of technologies from augmented reality (AR) to augmented virtuality (AV). Augmented reality is essentially adding digital content to a real-world environment, while augmented virtuality is adding physical content to a virtual environment. The Fig. 4 resumes what we explain before:

Fig. 4. Emerging technologies.

150

S. Alami and M. Hanoune

Related Work and Motivations Baby Think It Over let’s begun with the famous Baby Think It Over project [28], the project involves a simulator in the form of a high-tech doll that looks like a human baby. The objective of this program, which is used in training for many parents at different schools, is to persuade teenagers to avoid becoming parents in teenage. The baby cries five to fifteen times at any time of the day for 2 or 3 min, and sometimes the dolls cry for more than 15 min. Whenever the baby cries, the student caring for them should pay immediate attention to the doll. He must insert a key into the baby’s back and hold it in place to stop the crying. Among the most important clauses of the program’s contract is the door of the baby wherever they go, be it to training, to the movies, to parties, and even to bed in order to touch, feel and experience the impact to have a baby, always need care, on their own life, for this the majority of them have decided to stop doing certain activities such as attending parties because they know that their baby simulator will cry and that the unexpected bothers them. Therefore 143 out of 150 (95%) teens who participated in a Baby Think It Over program, said afterwards that they are not yet ready to take responsibility which assures us that the simulated objects can be very effective tools in terms of persuasion. Fitness with Virtual Reality this is one of the projects that use mixed reality it’s called LifeFitness VR Rowing Machine [29] which started with a stationary rowing machine containing a screen that shows you, the exerciser, rowing a boat on the virtual water. When you grow faster, your boat moves faster through the water. The goal is providing more fun and motivation during training [30]. The regular practice of sport has many physical and mental benefits, for this reason we find every day a very large number of women and men of different age groups go to the gyms. But after some months of excessive training 90% end up giving up training on a regular basis. Here or virtual reality, which has experienced a remarkable evolution in recent times with the improvement of graphics, can intervene by offering the user to dive into a new environment in order to offer more pleasure and motivation in during training. Target Behavior Since the establishment of the state of health emergency to fight against the spread of the COVID-19, citizens have been asked to stay at home and only go out in case of urgent needs (doctor, shopping, pharmacy). We have found that there is a violation of quarantine by not respecting the containment and the rules established by the authorities. There is a category that does not want to submit to the orders of the authorities, they are young people and teenager this category is not able to stay at home all day, spending the whole day playing Tom and Jerry with the police. All these efforts made by the governments in order to strengthen the safety and health of their citizens. However, we find a group of citizens who are not aware of the seriousness of the situation around them that can be seen in their lack of respect for the quarantine imposed by the epidemiological situation. So the objective now is making users choose between two actions staying at home or getting out to spread more and more the virus, a VR simulation could be a very powerful tool to persuade those users and push them to stay at home, to do that we need to present

Virtual Reality as a Powerful Persuasive Technology

151

the effects of the non-compliance with preventive measures during the epidemic in a vivid and memorable way. Use Case We will create a virtual experience that allow users to realistically experience Risks of breaching an emergency during the spread of the COVID-19 virus and try for himself the effects of going out without an urgent need while not respecting the precautions and measures to protect against the virus, which we deny contact with surfaces, failure to sterilize, not wearing a protective mask, and the lack of respect for the safety distance. Moreover, to introduce aversive feedback users violate one of the previously mentioned preventive measures. In order to test that we are based on the emotional side by using techniques which provide high level anxiety to the user. This technique was called HighAnx it should introduce to the user different element that increase his anxiety noticeably. HighAnx is based at the same time on two types of feedbacks visual and audio ones that we found in first-person videogames augmented with other effect and idea that we propose in the following. The negative feedback is used when the user violates one of the preventive measures such the safety distance or going out without the protective mask for example consists of: producing a voice of human Coughing suspiciously and breath painfully, those sounds become more disturbing if the user does not withdraw from the same action; in order to simulate the phenomena of tunnel vision, which is always linked to extreme states of stress, we have used the progressive reduction of the visual field with a sequence of black flashes that has a rhythm like the heart rate and the life bar turns red with a white flash when the character is near death (Fig. 5).

Fig. 5. Black flashes synchronized with heartbeat sound.

The new idea added to these audio-visual stimuli in our technique is the exploitation of what we call biofeedback in order to trigger and induce user anxiety, which will be based on the fact that change our heartbeat from a normal to abnormal rhythm create a high level of anxiety and fear.

152

S. Alami and M. Hanoune

For example, studies conducted on the group of panic patients have shown that they considerate rate cardiac symptoms as the one of the most fear provoking and a tool to induce anxiety, and just hearing sound of abnormal heartbeat using headphones can be a fear-relevant cue for anxiety sensitive individuals [31]. In order to detect their cardiac frequency, we used a pulsioxymeter attached to the user’s earlobe this data is going to be used as following: • When users are safe situation in the virtual environment, we make them hear their heartbeat sound in the headphones while they explore the world (Fig. 6).

Fig. 6. The first scene of the virtual experience (Safe Situation).

• When users violate one of preventive measures, we digitally speed up their heartbeat sound in the headphones in order to give them the impression that their own heartbeat is becoming abnormal (Fig. 7).

Fig. 7. The scene when user violate any of the preventative measures.

Virtual Reality as a Powerful Persuasive Technology

153

• When users undo the breach, we progressively return to the replay of their actual heartbeat (Fig. 8).

Fig. 8. The scene when user undo the breach.

To implement this approach, there are ten steps in the Pantelidis methodology to determine when to use VR [32] (Fig. 9). But there is no detailed explanation on how to assess the user experience gained after using the VR app. We used a methodology with the addition of an evaluation step with the IVE questionnaire [33].

Fig. 9. VR research methodology.

Step 1 - Define the purpose and reason for using VR In general, this research aims to develop a VR application that aims to change the behavior of citizens and encourage persuasive researchers to explore immersive virtual environments. Our future work consists of the creation of a project that we carry out with a broader scope on simulation and produce an application that responds to the situation already invoked in the previous section (the fight against the spread of COVID-19).

154

S. Alami and M. Hanoune

Step 2 - Determine more specific goal The objective is based on the emotional side using techniques that provide a high level of anxiety to the user. This technique, called HighAnx, must present the user with different elements that significantly increase their anxiety. HighAnx introduces several lessons on personal prevention by combining several video game and simulation features to further increase user engagement and persuasion. The topic of COVID-19 here is just a case where linking behavior requires change and as a perspective we want to develop our system to be a versatile platform adaptable to any situation, and use the rapid and enormous development of simulation technologies as well as competition between the leaders in this field (Samsung, Meta, Sony, etc.) in order to achieve a very high level of immersion, touching all the human senses, the thing which will make our system strong enough in terms of changing the behavior of citizens within smart and sustainable cities. Step 3 - Determine the level of realism, the type of immersion and the presence Mobile VR selected as the platform to run this app, so it needs a smartphone and a VR headset. Mobile VR can be easier to organize and relatively inexpensive compared to desktop VR (HTC Vive and Oculus Rift). There is no cable plugged into Mobile VR, which can reduce people’s risk of falling when using VR APP. Step 4 - Determine the type of interaction Developing VR mobile apps requires a smartphone with an inertial tracking sensor. Such inertial tracking sensors are the accelerometer and the gyroscope [23]. Google VR and Unity were used to develop this app. Google VR developed by Google Developers who provide the virtual reality SDK for certain development environments. The Google VR has several API functions that can be used for rendering, input management, and VR controller device activation. Unity is a game engine and editor for developing multimedia and gaming applications. However, it could be used to develop VR applications. Step 5 - Design the virtual environments, the whole system and build the system on a predefined virtual reality platform. Figure 10 represents the complete tools of the system: • The communication between the user and the VR system obtained from the VR headset and the smartphone. • A heart rate monitoring system on a wearable device (bracelet) combined with a VR system. These devices act as input or output media. The continuous recurrence between the user and the virtual environment occurs until the user wants to end the received virtual experience.

Virtual Reality as a Powerful Persuasive Technology

155

Fig. 10. The complete tools of the VR system.

Environment: We start with the creation of a virtual environment close in terms of appearance and composition to those frequented by users in their daily lives, which allows to bring more immersion as well as familiarization with virtual environments, in this new environment we add real world objects such as bibs and hand sanitizer to offer an augmented virtual reality. Duration: The duration of each virtual reality experience must be short (not to exceed an experience of 10 min maximum) in order to maintain the user’s attention and not to frustrate potential users who are waiting to test the device (thus avoiding a waiting time too long at trade fairs for example). Step 6 - In this step, there are three implementation and evaluation steps. Step 6.1. Do a vital sign check on the target user before training with virtual reality. Checking of vital signs is done by measuring heart rate (HR), blood pressure (BD), and respiration rate (RR). Step 6.2. Implementation of the VR application for the target user. Step 6.3. Perform vital signs review after practice with VR and IVE questionnaire assessment. In next study, the development of virtual reality application for behavior change adapted from the methodology to determine when to use virtual reality in education and training combined with the user experience evaluation method of Immersive Virtual Environment questionnaire. The IVE questionnaire is designed to be able to measure user experience residing in an immersive virtual environment.

6 Conclusion Our research is one of many studies that focus on the persuasive effects of immersive augmented virtuality (AV) combining with ideas inspired from games to fight COVID19 spreading. Moreover, we focused on the anxiety explored topic of aversive feedback

156

S. Alami and M. Hanoune

in persuasive applications, in our experiment we are going to produce aversive feedback for simulated risk experiences that use audio-visual system that used in first-person videogames with a biofeedback technique, In a virtual world based on AV technology, the subject of the experiment faces the spread of the Corona virus by following a set of precautionary measures that must be respected in order to continue moving comfortably in the virtual world. In the event that these rules are not followed, the program will work to increase the experimenter’s anxiety and bring him/her into a psychological state in which fear prevails, in order to meet the need for the previously imposed preventive measures. Although this article has analyzed in depth many important aspects such as simulation, simulation tools that aim to change the behavior of citizens and encourage persuasive researchers to explore immersive virtual environments. Our future work consists in the creation of a project that we are conducting has a wider scope on simulation and produces an application that addresses the situation already invoked in the previous section (the fight against the spread of COVID-19), HighAnx introduces several lessons on personal prevention by combining several features of video games and simulations to further increase the engagement and persuasion of users. The theme of COVID-19 here is only one case where the behavior related to it needs to be changed.

References 1. Guadagno, R., Muscanell, N., Greenlee, L., Roberts, N.: Social influence online: the impact of social validation and likability on compliance. Psychol. Pop. Media Cult. 2, 51–60 (2013). https://doi.org/10.1037/a0030592 2. Définition: Qu’est-ce que la réalité virtuelle? Artefacto. https://www.artefacto-ar.com/realitevirtuelle/. Accessed 6 Dec 2020 3. Virtual Reality: another world within sight, Iberdrola. https://www.iberdrola.com/innovation/ virtual-reality. Accessed 6 Dec 2020 4. “Réalité virtuelle,” Wikipédia (2020). https://fr.wikipedia.org/w/index.php?title=R%C3% A9alit%C3%A9_virtuelle&oldid=176491603. Accessed 6 Dec 2020 5. Berg, L.P., Vance, J.M.: Industry use of virtual reality in product design and manufacturing: a survey. Virtual Real. 21(1), 1–17 (2016). https://doi.org/10.1007/s10055-016-0293-9 6. Bonetti, F., Warnaby, G., Quinn, L.: Augmented reality and virtual reality in physical and online retailing: a review. Synth. Res. Agenda, 119–132 (2018) 7. Van Kerrebroeck, H., Brengman, M., Willems, K.: Escaping the crowd. Comput. Hum. Behav. 77(C), 437–450 (2017). https://doi.org/10.1016/j.chb.2017.07.019 8. Griffin, T. et al.: Virtual reality and implications for destination marketing (2017). http://sch olarworks.umass.edu/ttra/2017/Academic_Papers_Oral/29/ 9. Merchant, Z., Goetz, E., Cifuentes, L., Keeney-Kennicutt, W., Davis, T.: Effectiveness of virtual reality-based instruction on students’ learning outcomes in K-12 and higher education: a meta-analysis. Comput. Educ. 70, 29–40 (2014). https://doi.org/10.1016/j.compedu.2013. 07.033 10. Freeman, D., et al.: Virtual reality in the assessment, understanding, and treatment of mental health disorders. Psychol. Med. 47, 1–8 (2017). https://doi.org/10.1017/S003329171700040X 11. Lin, J.-H.T., Wu, D.-Y., Tao, C.-C.: So scary, yet so fun: the role of self-efficacy in enjoyment of a virtual reality horror game. New Media Soc., p. 146144481774485 (2017). https://doi. org/10.1177/1461444817744850

Virtual Reality as a Powerful Persuasive Technology

157

12. Bigne, E., Llinares, C., Torrecilla Moreno, C.: Elapsed time on first buying triggers brand choices within a category: a virtual reality-based study. J. Bus. Res. 69 (2015). https://doi. org/10.1016/j.jbusres.2015.10.119 13. Meißner, M., Pfeiffer, J., Pfeiffer, T., Oppewal, H.: Combining virtual reality and mobile eye tracking to provide a naturalistic experimental environment for shopper research. J. Bus. Res. 100 (2017). https://doi.org/10.1016/j.jbusres.2017.09.028 14. Media alert: Virtual reality headset shipments top 1 million for the first time. https://www.can alys.com/newsroom/media-alert-virtual-reality-headset-shipments-top-1-million-first-time? time=1607635115. Accessed 10 Dec 2020 15. Clear Potential for Virtual Reality Headsets After a Slow Start, CCS Insight. https://www. ccsinsight.com/press/company-news/2919-clear-potential-for-virtual-reality-headsets-aftera-slow-start/. Accessed 10 Dec 2020 16. “The 2015 Virtual Reality Consumer Infographic,” Greenlight Insights (2015). https://greenl ightinsights.com/2015-vr-consumer-report-infographic/. Accessed 10 Dec 2020 17. Terdiman, D., Terdiman, D., Terdiman, D.: Why 2018 Will Be The Year Of VR 2.0. Fast Company (2018). https://www.fastcompany.com/40503648/why-2018-will-be-the-yearof-vr-2-0. Accessed 10 Dec 2020 18. Media alert: Virtual reality headset shipments top 1 million for the first time. https://www.can alys.com/newsroom/media-alert-virtual-reality-headset-shipments-top-1-million-first-time? time=1607635599. Accessed 10 Dec 2020 19. VR/AR/MR/XR investment focus worldwide 2019, Statista. https://www.statista.com/statis tics/829729/investments-focus-vr-augmented-reality-worldwide/. Accessed 8 Dec 2020 20. Paule, L., Paule, L.: Quelle place pour la réalité virtuelle dans l’éducation?” Laval Virtual (2019). https://blog.laval-virtual.com/quelle-place-pour-la-realite-virtuelle-dans-educat ion/. Accessed 8 Dec 2020 21. Virtual Reality Market Size, Share & Trends Analysis Report By Technology (Semi & Fully Immersive, Non-immersive), By Device (HMD, GTD), By Component (Hardware, Software), By Application, And Segment Forecasts, 2021–2028, Jan. 01, 2020. Virtual Reality Market Share & Trends Report, 2021–2028 (grandviewresearch.com) 22. What is a Virtual World? - Definition from Techopedia, Techopedia.com. http://www.techop edia.com/definition/25604/virtual-world. Accessed 8 Dec 2020 23. Réalité virtuelle immersive — EduTech Wiki. http://edutechwiki.unige.ch/fr/R%C3%A9a lit%C3%A9_virtuelle_immersive. Accessed 8 Dec 2020 24. Richardson, B., Symmons, M., Wuillemin, D.: The contribution of virtual reality to research on sensory feedback in remote control. Virtual Real. 9(4), 234–242 (2006). https://doi.org/ 10.1007/s10055-006-0020-z 25. Ouramdane, N., Otmane, S., Mallem, M.: Interaction 3D en Réalité Virtuelle - Etat de l’art. Rev. Sci. Technol. Inf. - Sér. TSI Tech. Sci. Inform. 28(8), 1017–1049 (2009). https://doi.org/ 10.3166/tsi.28.1017-1049 26. Lucas, “V-Cult - Blog : Introduction à la réalité virtuelle,” V-Cu (2017). https://www.v-cult. com/blog/2017/09/28/quest-ce-que-la-realite-virtuelle/. Accessed 8 Dec 2020 27. Milgram, P., Kishino, F.: A taxonomy of mixed reality visual displays. IEICE Trans Inf. Syst. E77-D(12), 1321–1329 (1994) 28. Baby Think It Over Project in middle schools. http://www.timolson.com/babythink.htm. Accessed 8 Dec 2020 29. B. L, “Sport VR – Comment la réalité virtuelle transforme le fitness,” Réalité-Virtuelle.com, Jan. 16, 2018. https://www.realite-virtuelle.com/vr-fitness-salles-de-sport-2702/. Accessed 8 Dec 2020 30. La réalité virtuelle est-elle l’avenir du fitness ? VR. https://www.redbull.com/fr-fr/realite-vir tuelle-vr-fitness. Accessed 8 Dec 2020

158

S. Alami and M. Hanoune

31. Pollock, R.A., Carter, A.S., Amir, N., Marks, L.E.: Anxiety sensitivity and auditory perception of heartbeat. Behav. Res. Ther. 44(12), 1739–1756 (2006). https://doi.org/10.1016/j.brat. 2005.12.013 32. Pantelidis, V.S.: Reasons to use virtual reality in education and training courses and à model to determine when to use virtual reality. Themes Sci. Technol. Educ. 2, 59–70 (2009) 33. Tcha-Tokey, K., Loup-Escande, E., Christmann, O., Richir, S.: Proposition and validation of a questionnaire to measure the user experience. Int. J. Virtual Real. 16, 33–48 (2016)

Predicting Traffic Indexes on Urban Roads Based on Public Transportation Vehicle Data in Experimental Environment Georgi Yosifov(B) and Milen Petrov Sofia University “St. Kliment Ohridski”, Sofia, Bulgaria {gkjosifov,milenp}@fmi.uni-sofia.bg

Abstract. Having the ability to accurately predict the traffic situation on urban roads can be useful for creating administrative traffic control systems, navigation software solutions or for general public awareness. In this paper we are describing our methodology for creating a new experimental setup and then analyzing the performance results of different neural network models for predicting Traffic Indexes calculated from public transportation positioning data as time series. Keywords: Traffic prediction · Neural network · Time series · Buses · GPS

1 Introduction The Traffic Indexes used in this paper are calculated based solely on positioning data gathered from periodic public transportation vehicles in an urban environment. In [1] a Traffic Index is defined as a discrete value from 0 to 5 which describes the traffic load for some 30-min interval for a road segment or an entire city, where 0 is considered “no traffic” and 5 stands for “very heavy traffic”. The algorithm used in this paper for calculating the indexes is an improved version of the algorithm introduced in [1]. In the new version the traffic indexes are calculated based on the distribution of the times the public transportation vehicles remain in the road segments and not only on the quartiles of the data. In this paper we will explore different ways to use the gathered indexes as time series and on their basis to predict future index values – one index in the future or many. In Sect. 2, we will describe our methodology for creating a new dataset of traffic history for both personal and public vehicles, using a mature widely used by researchers specialized software for urban mobility simulations, supported by the Eclipse Foundation - SUMO (Simulation of Urban MObility) [2], and the run of the algorithm for assigning the Traffic Indexes according to that dataset. The simulated data is chosen to be for a 1 km road stretch in Sofia, Bulgaria for a period of 365 days. Then in Sect. 3 we will compare the performances of the different models used for making the predictions and finally we will present our conclusions in Sect. 4.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 159–168, 2022. https://doi.org/10.1007/978-3-031-10467-1_8

160

G. Yosifov and M. Petrov

2 Methodology The methodology of our work consists of the following experimental setup with five main stages, visually shown on Fig. 1. The output data at each stage becomes the input of the next one. The stages are the following: • The first stage is the preparation of the experiment - generating a schedule for vehicles that will be involved in the traffic simulation. • The second stage is a traffic simulation based on the schedule generated by the first. • The third stage is the execution of the algorithm for calculating the Traffic Indexes for all the time intervals generated by the simulation. • The fourth stage is the transformation of the output of the algorithm into time series in a suitable format for consumption by the machine learning (ML) platform. • The fifth and final stage uses the data from the time series to make predictions about the state of traffic in the future.

Fig. 1. Experimental stages

In the next section we will look at each of the stages in more detail.

Predicting Traffic Indexes on Urban Roads

161

2.1 First Stage - Generating a Schedule The Simulation of Urban MObility (SUMO) software can be configured with a specific schedule which determines how many and which vehicle types to travel on predefined routes on the transport map [3]. Parameters of this schedule determine the characteristics and the flow of vehicles. These characteristics are as follows - in which second the flow starts, in which second it ends, what is the identifier of the flow, the type of vehicles, the color with which to indicate the flow (only relevant for simulation with a graphical interface), lane from which vehicles start their journey, starting speed, starting segment, end segment and number of vehicles per hour. All this allows for the precise definition of traffic and road situation. For the purpose of the experiment, the compilation of a schedule for a period of 365 days was chosen, taking into account the differences between weekends and working days.

Traffic Schedule (veh/30min) 1600 1400 1200 1000 800 600 400 200 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 Workday

Saturday

Sunday

Fig. 2. Traffic schedule

Figure 2 shows the canonical number of personal vehicles passing for each half-hour interval of the day, relative to the day of the week. Observations that guided the choice of those values on weekdays are the beginning and end of the working day, as well as the lunch breaks. For Saturday traffic in the morning until noon is heavier for people who want to take advantage of weekend travel. On Sunday evening, people return, and this creates the afternoon-evening peak. To the canonical numbers, the schedule generation program adds a randomness factor of ±10%. This is done in order to diversify the output and leads to different values for each hour, making it more unique. The program introduces another random element to better model the traffic, namely it defines the following distribution of car types to be present in the traffic. We have three main flows with different parameters that get mixed in the traffic. The parameters that are

162

G. Yosifov and M. Petrov

not defined explicitly remain the default ones defined by SUMO, namely: vehicle length −4.3 m, minimum distance from other participants −2.5 m, maximum acceleration −2.9 m/s2 , deceleration −3 m/s2 , emergency deceleration −7 m/s2 , maximum speed 180 km/h, number of seats −5, emission class - PC_G_EU4, speed deviation −0.1 [4]. • With probability 0.5 we have traffic, for which only a maximum speed of 80 km/h is defined. All other parameters are default. • With a probability of 0.4 we define cars with lower acceleration parameters (2 m/s2 ), and maximum speed of 40 km/h. They have an increased length of 5 m. • With a probability of 0.1 we add participants in traffic with length of 6 m, acceleration of 1.3 m/s2 , and a maximum speed of 30 km/h. A new software was developed that takes the number of days as input and generates a SUMO schedule XML file according to the above parameters. 2.2 Second Stage - SUMO Simulation Once we have generated a new SUMO schedule file, we can run the simulation software. We run it without a graphical interface for optimal operation and with parameters --fcdoutput and --fcd-output.geo, to specify that we want to generate detailed GPS records for all participants on the road. As a result, an XML file is produced containing the exact positions of all vehicles for each second. In our case for road section of length 1 km and duration of 365 days the size on disk of the file was 172 GB. 2.3 Third Stage - Running the Algorithm on the Generated Data The next step is to run the algorithm on the generated traffic simulation data to determine the Traffic Indexes. The algorithm operates only on the records produced by the public transportation vehicles in the simulation. The result of this stage is an XML document that contains information about the Traffic Indexes in all the road segments during all the 30-min intervals of the simulation. 2.4 Fourth Stage - Transformation of the Generated Data In order to make a dataset usable by the machine learning (ML) models, a special software tool was created that reads the output of the algorithm and transforms it into a CSV file. This software tool also allows the users to define which segments to include as columns in the CSV file and what should be the name of the file that contains the data. The columns of the CSV are “Start of interval time” and one column for each segment containing the Index values corresponding to that time. 2.5 Fifth Stage - Machine Learning Platform In our case TensorFlow was chosen as a machine learning platform [5].

Predicting Traffic Indexes on Urban Roads

163

The ML experiment consists of creating different prediction models and applying them to the dataset generated in the first four stages described above and then comparing the results. The focus is to solve two main tasks: • Prediction of the Traffic Index for the next 30 min time interval on a certain road segment. • Prediction of Traffic Indexes in several consecutive time intervals based on a set of measurements in the past for a certain road segment. For each of the two tasks different models for machine learning are trained and their results are compared.

3 Running the Prediction Models 3.1 Data Preparation Before using the models on the training data, it is important to understand its characteristics and ensure that it is in a suitable format for our purposes. For the purposes of the experiment, a specific road segment was selected to take the history of its indexes. As mentioned, the raw CSV dataset in our case has only two columns – Start of interval time and Segment name. Measuring and calculating the characteristics of the data set we find that. • The number of rows is 13,869 • The arithmetic mean of the values is 2.577547 • The standard deviation is 0.852635 The start date and time of the time intervals at this point are a sequence of characters in the following format “Year/Month/Day Hours:Minutes:Seconds”. In this format they are not convenient to use by ML models. For this purpose, we can transform them into POSIX time (number of seconds from 01.01.1970). Taking advantage of our experience and observations of the traffic situation, we know that traffic has clear daily and weekly frequency. To it we can add the 8-h periodicity of the working day. One way to convert this information into usable values is to convert the seconds into signals, by obtaining “Time of the day”, “Time of the week”, “Time of 8 h interval” columns using (1) and (2) where period is the number of seconds in day/week/8-h and t is the current interval time as POSIX seconds. 2π (1) period _sin = sin t × period 2π period _cos = cos t × (2) period As an example, Fig. 3 shows the graph of the time signals for the 8-h period.

164

G. Yosifov and M. Petrov

Fig. 3. Time signal for 8-h period

As next step we separate the dataset to training and the test data. The ratio chosen is 80 to 20. By comparing their mean absolute errors (MAEs), we can test our models for overfitting. Next, we will scale the features using data normalization. To achieve this, for each of the columns we will subtract its arithmetic mean and divide by its standard deviation. The mean and standard deviation are only derived from the training set to ensure that the models do not have any access to the test data. 3.2 Single-Step Models Single-step models provide predictions for just one Traffic Index in the future. The different single-step models that we tested are described in the list below. • Baseline - To understand and compare the results obtained from the other ML models, we will use a naïve model as our initial benchmark. The baseline we chose has the property to take the value in the last interval and return it as a forecast for the next, assuming that there will not be sharp declines and increases in the level of traffic in the two adjacent time intervals. • Dense Neural Network – a neural network with several consecutive strongly connected layers. To make a prediction, the neural network uses the 16 previous states of the index as input.

Predicting Traffic Indexes on Urban Roads

165

• Convolutional Neural Network (CNN) – Similar to the previous model, but with the introduction of one Conv1D layer [6] with 16 neurons. • Recurrent neural network (RNN) – is a neural network known to work well with time series, as it maintains an internal state from one time step to the next. We are using Long Short-Term Memory (LSTM) [7] network with one layer of 32 neurons. Results of the single-step models are shown below in Table 1 and Fig. 4.

Fig. 4. Single-step models performance

Table 1. Single-step models error Model

Mean absolute error

Baseline

0.4052

Multi step dense

0.2146

Conv

0.2013

LSTM

0.1881

3.3 Multi-step Models Unlike single-step, multi-step models can predict several steps forward in the time series. In this section of the paper, we will look at the results of different multi-step models, and finally we will compare their performances.

166

G. Yosifov and M. Petrov

• Baseline - As in the previous section, here we also need to compare the results we obtain with some baseline. The baseline here returns the last known result as subsequent predictions. It is expected that we get worse results here than the single-step model’s baseline, as the probability of traffic volume changing in period of 4 h is higher. • Linear model - a neural network that uses the last reported feature as input and predicts what the next 8 will be based on a linear projection. • Dense model - The structure of this neural network differs from the linear one in that there is a layer with 512 neurons between the input and output layers. However, this configuration is like the linear one in that, like it, it only accepts the last feature and uses it to determine the next 8 Traffic Indexes. • CNN - For our experiment, it is configured to work with the last 16 records, and again to predict the next 8. The convolutional layer of the network is composed of 256 neurons, and the activation used is ReLU. • LSTM - The network is configured to accumulate input data for a period of 48-time intervals backwards and to generate information for the next 4 h (8 Indexes). • Autoregressive RNN – in this model the result is generated in steps and each generated step is added as input for the generation of the next one [8]. Results of the multi-step models are shown in Table 2 and Fig. 5 below.

Fig. 5. Multi-step models performance

Predicting Traffic Indexes on Urban Roads

167

Table 2. Multi-step models error Model

Mean absolute error

Last

1.0802

Linear

0.6338

Dense

0.2429

Conv

0.2371

LSTM

0.2093

AR LSTM

0.1985

4 Conclusion In this paper we have shown how we can predict the Traffic Indexes for the next 30 min or the next 4 h. For this purpose, we created a system that generates SUMO traffic schedules based on user defined parameters. Then, based on a one-year schedule, we created a traffic simulation with the specialized SUMO software. We processed the output generated by the simulation and ran an algorithm for calculating Traffic Indexes for every half hour of that year in ten separate segments. By selecting one of these segments and taking the Traffic Indexes calculated sequentially, we compiled a time series dataset to use with different ML prediction models. We investigated 4 single-step and 6 multi-step models. Of the single-step models, the one with the smallest MAE was the RNN (LSTM), and of the multi-step ones, the autoregressive RNN gave best results. These results could help in the creation of urban traffic control systems, as well as to be used in routing and navigation software solutions. The benefits would be great for both administrations responsible for urban mobility or individuals and businesses. Acknowledgment. The research reported here was partially supported by “An innovative software platform for big data learning and gaming analytics for a user-centric adaptation of technology enhanced learning (APTITUDE)” - research projects on the societal challenges – 2018 by Bulgarian National Science Fund with contract №: KP-06OPR03/1 from 13.12.2018 and project FNI-SU-80-10-152/05.04.2021, FNI project of Sofia University “St. Kliment Ohridski” (Bulgaria) “Challenges of developing advanced software systems and tools for big data in cloud environment (DB2BD-4)”.

References 1. Yosifov, G., Petrov, M.: Traffic flow city index based on public transportation vehicles data. In: Proceedings of the 21st International Conference on Computer Systems and Technologies, 20, June 2020, pp. 201–207. https://doi.org/10.1145/3407982.3408007 2. Lopez, P.A., et al.: Microscopic Traffic Simulation using SUMO (2018). https://elib.dlr.de/124 092/ 3. “Simulation/Output/FCDOutput,” Eclipse SUMO. https://sumo.dlr.de/docs/Simulation/Out put/FCDOutput.html. Accessed 8 Nov 2020

168

G. Yosifov and M. Petrov

4. “Vehicle Type Parameter Defaults,” SUMO, 2021. https://sumo.dlr.de/docs/Vehicle_Type_Par ameter_Defaults.html. Accessed 8 May 2021 5. Abadi, M. et al.: TensorFlow: large-scale machine learning on Heterogeneous systems (2015). https://www.tensorflow.org/ 6. Chollet, F., et al.: “Keras.” GitHub (2015). https://github.com/fchollet/keras 7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 8. Tealab, A.: Time series forecasting using artificial neural networks methodologies: a systematic review. Future Comput. Inf. J. 3(2), 334–340 (2018). https://doi.org/10.1016/j.fcij.2018.10.003

Corona Virus and Entropy of Shannon at the Cardiac Cycle: A Mathematical Model Huber Nieto-Chaupis(B) Universidad Aut´ onoma del Per´ u, Panamericana Sur Km 16.3, Villa el Salvador Lima, Peru [email protected], [email protected]

Abstract. Along the weeks of symptoms due Corona Virus Disease 2019 (Covid-19 in short), patients are exhibiting an inverse relation between Oxygen saturation and beats per minute. Thus, the scenarios of the highest cardiac pulse might be against of an acceptable recovery of patient that in some cases would need of an extra assessment with Cardiologist in order to discard arrhythmias that would emerge due to the unknown eﬀects of Corona Virus in the cardiac muscle. Therefore the increasing of Oxygen saturation is seen as a rapid action to be taken in the shortest periods. In this paper, a theoretical proposal entirely based in the interrelation among Oxygen, cardiac pulse and weight of is presented.

Keywords: Covid-19

1

· Oxygen saturation · Cardiopathy

Introduction

As seen in global reports of public health institutions, Virus Disease 2019 [1] (or Covid-19 in short) and variants have been propagated in all countries from the beginning of 2020 producing a global pandemic at large scale with thousands of fatalities and millions of infections including all continents. Although massive plans of vaccination have been carried out in most countries, it is witnessed the apparition of new strains that to some extent might not be vulnerable to vaccines. In this manner, the prolongation of global pandemic requires the solid and robust strategies of preparedness to face new manifestations and complications of pneumonia as well as to apply the best pharmacological schemes. As seen in various countries, ﬁrst wave was characterized by the exhibition of an unambiguous peak at the histograms given by number of infections versus time. In this manner, it were implemented various programs of public care in order to minimize the fast propagation of disease as well as to reduce the number of fatalities in particular in all those patients that have been carrying a type of comorbidity, fact that put them in a place of full vulnerability against the infection by Covid-19. In a clear contrast to the AHN1 outbreak, the present pandemic yields two groups of Covid-19 patients: c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 169–178, 2022. https://doi.org/10.1007/978-3-031-10467-1_9

170

H. Nieto-Chaupis

– The so-called asymptomatic ones, or all those patients that might not be manifest not any symptom neither complication in their health. – Those that exhibit visible symptoms and by the which can carry out treatment either in home or health centers. For both groups the time of treatment might to take one or various weeks depending on the percent of compromise of lungs [2]. Once patients are exhibiting any kind of symptoms, from the clinical point of view, it is mandatory to know the following variables that would allow physicians to opt the best way to minimize the emergent pneumonia [2] and parallel consequences: – – – – – –

Percent of compromised Lungs, Corporal temperature, Lecture of SpO2 , Lecture of number of beats per minute, and Cardiac surveillance, Renal damage, Surveillance of Appetite and weight.

Because Covid-19 apparition has founded a new avenue of research and understanding of virus infection in humans, most of fatalities have been correlated by cardiac fail [3] and similar events. As it is well-known, a critic parameter to decide urgent actions is entirely related to the value of Oxygen of patient [4]. In most Covid-19 patients, one can see that the saturation of Oxygen can be down of up to 60% ± 10 [5] demonstrating the level of attack of virus in Lungs. In those cases, patients have manifested a high cardiac pulse reaching 150 bpm. Thus one can wonder if the overweight for in conjunction with co-morbidity can be a factor that can trigger irreversible consequences in patients. In addition, the presence of high concentrations of glucose in blood can also be a negative factor the makes diﬃcult to recover in a sustainable manner the functionality of lungs. Therefore, in this paper the issue of uncontrollable evolution of Cardiac pulse in Covid-19 patients is mathematically investigated. To accomplish this, the implementation of a theory that involves: (i) SpO2 , (ii) Cardiac pulse, and (iii) Weight, along the weeks of symptoms manifestation is done [6]. In this way, in second section the mathematical machinery is presented. Here are deerived all equations that need to be used along the paper. In third section the applications of proposed theory is presented. From these sections interpretation of a general equation of number of beats per minute and its alleviation by the usage of Bisoprolol to minimize the cardiac risk in conjunction with the conclusion of paper are presented.

2 2.1

The Mathematical Proposal Naive Model

In Fig. 1 the weight, beats per minute (bpm in short) and Sp O2 the oxygen saturation model is entirely inspired in an average values of adults patients with an age ranging between 40 y.o and 55 y.o. at the end of 4th week after of having

Hamiltonian Mechanics

171

Fig. 1. Plotting of a Naive model of Covid-19 complication through the normalized weight, oxygen saturation Sp O2 , and Beats per minute (bpm) of a 48.yo and 80 kg Patients. Equation 1, Eq. 2 and Eq. 3 are used.

started with symptoms of Covid-19 [7]. A naive model has been used, and it is describe by: 2 t W (t) = 0.8Exp − (1) 4 2 t OX = Sp O2 (t) = Exp − (2) 10 0.75 . (3) h = bpm(t) = 0.70 + 1 + (t − 5)2 From experience one can see that the beats per minute increases when weight and oxygen saturation decreases so that from Eq. (1), Eq. (2) and Eq. (3) one has: h=

γ , βOX + αW

(4)

with α, β and γ constants. The evolution of h from Eq. 4 can for example be illustrated with Eq. 1 as well as numerical inputs: h=

1.5 , 0.01Exp(−0.01t2 ) + 0.01Exp(−0.01t2 )

(5)

that can be seen in Fig. 2 with the arrow indicating the end of treatment as commonly is expected. However it was seen the rapid increasing of beats that to some extent it could mimic a type of arrhythmia [8]. The drawn arrow would denote the day by the which the Cardiologist starts to apply concrete methodologies to decrease the value of bpm [8]. In this way, one can see that there is an inverse relation between bpm and both weight and SpO2 . Therefore, from Eq. 4 as well as experience one can write down the mathematical

172

H. Nieto-Chaupis

Fig. 2. Plotting of curve “beats per minute” using Eq. 5 without normalization. Data comes from a 48 y.o. patient. The arrow is indicating the end of medical treatment, approximately.

relation that links in a linear manner SpO2 represented by OX , weight W , and bpm h as follows: βOX + αW =

γ h

(6)

with α, β and γ free parameters that are highly correlated among them by characterizing the status of patient under the symptoms of Covid-19. A trivial operation, leads to write OX as: γ hαW OX = 1− . (7) hβ γ Since γ denotes the factor linked to the value of beats per minute, then a simple connection of this value with the patient’s weight is seen in the second terms in brackets of Eq. 2. Thus this can be again written as: 2 3 hαW hαW hαW γ 1− + − + ... OX = hβ γ γ γ ∞ hαW γ γ 1 hαW Exp − = ≈ − , (8) hβ ! γ hβ γ

with the condition that if ≥ 2 hαW = 0 ⇐⇒ γ >> hαW, γ

(9)

fact that supports the usage of exponential function. One can see that from Eq. 6 there is a relevance of γ parameters in the sense that it becomes large than the related ones to weight and Oxygen saturation.

Hamiltonian Mechanics

2.2

173

Parameters Correlation

It is feasible to link in “adhoc” manner all free parameters (there is not any compromise among them because Eq. 4 and Eq. 6 are also deﬁne in “adhoc” way) through trigonometric relations given by: Sinθ = γ/h, Cosθ = α/β as well as Tanβ = γ/α. With this one can write down: (10) β = α2 + γ 2 , and with this Eq. 7 can be written as: OX

hαW = Exp − . γ h α2 + γ 2 γ

(11)

From Eq. 8 it is feasible to establish conditions for the ideal case by the which one can assume that OX is normalized to 1 (or 100%). Thus one arrives to: hαW γ = Exp − . (12) γ h α2 + γ 2 How the weight might be strongly dependent on the cardiac pulse of patient during the days of symptoms manifestation by Covid-19 can be known from Eq. 12 solving for W :

γ γ Log . (13) W = − hα h α2 + γ 2 From Eq. 8 one can write again as: hαW Exp − h= , OX β γ

(14)

hαW hOX β = Exp − , γ γ

(15)

γ

so that one gets below:

and from Eq. 9 and γ >> β one arrives to: hOX β hαW hOX β ≈ Exp = Exp − , γ γ γ

(16)

fact that implies that: βOX = αW ⇒ W =

βOX , α

(17)

and with the substitution of it into Eq. 14 and the approximation α≈

β , h

(18)

174

H. Nieto-Chaupis

then one arrives to:

γ βOX h= Exp − . βOX γ

(19)

In Fig. 3, the number of beats per minute as function of parameter γ and OX is plotted. One can see the entire linear relation with relation to γ. The inverse the initial assumption. For this exercise relation with OX is in according to √ β = 1, that also means that: α = γ = 2.

3

Implications

With the approximation Ox ≈ 1 as well as: W −λ = OX γ

(20)

and in conjunction to Eq. 4 one arrives to: λα γ h(γ) = Exp . 2 γ α + γ2

(21)

Under the scenario that establishes α >> γ so that from Eq. 21 one gets:

γ λα Exp h(γ) = , (22) α γ that can be perceived as a kind of entropy, so that it is possible to postulate a probability distribution function [9] that tentatively can be written as: pG (γ) = Exp

λα Exp γ

[ αγ ] ,

(23)

that returns Eq. 12 through the condition: h(γ) = LogpG (γ).

(24)

that is telling that the behavior of the number of beats per minute as function of parameter γ that measures the coupling to α and β obeying an total logarithmic relation. From Eq. 22 when γ is set constant equal to 1, then one arrives to: 1 . h(α) = Exp [λα] √ 2 α +1

(25)

Hamiltonian Mechanics

175

Fig. 3. Behavior of h from Eq. 19 as function of OX and parameter γ. The inverse dependence on OX is a pure consequence of the direct relation of heart and lungs coordination.

Concretely the case for large values of α one can rewrite Eq. 25 as: 1 h(α) = Exp [λα] , α

(26)

suggesting again to write a kind of probability distribution function such as: 1 pG (α) = Exp {(Exp [λα])}[ α ] ,

(27)

that tests the Shannon’s entropy [9] so that the dependence on α makes to the number of beats per minute to follow a entropic evolution for large value of α. It is interesting the formulation of a kind of event probability that although derived in this context of theory might have a genuine physiological support. 3.1

Derivation of Fundamental Equation

From Eq. 6 one can see that the case for α when it falls down for small values by indicating that the coupling to the weight W would be negligible in praxis. Again from Eq. 6 one might to arrive to: βOX + (αW → 0) =

γ , h

(28)

fact that is interpreted as a possible negligible dependence on the weight of patient while is under infection treatment. Thus one arrives to a fundamental equation between the OX and h, in the form as: h=

γ 1 , β OX

(29)

176

H. Nieto-Chaupis

Fig. 4. The number of beats per minute as function of time expressed in weeks. In both cases one can see a peak that would denote abnormalities in the cardiac function of patients that have ended their treatment of Covid-19.

that is in concordance to clinical data by which it was observed in adult patients that the more less is the OX or falls down then the number of beats per minute would have to increase. Equation 29 actually is entirely subject to errors of measurement. It is mathematically expressed as the incorporation of a quantity Δ: h=

1 γ , β OX + Δ

(30)

whose meaning is the distortion of “true” value of OX as consequence of an eventual disorder as dictated by entropy laws that Corona virus might initialize in all organs for example. With this one can rewrite Eq. 30 as: h=

1 γ . Δβ 1 + OΔX

(31)

If Δ has an random origin the h is also subject to stochastic contributions that might do not come directly of a kind of anomaly linked to cardiac function but also to interrelation to other organs such kidneys, lungs, etc. All this allows to make the departure of a linear to a nonlinear scenario. Thus one can write below: 2 t − t0 OX = , (32) Δ δ with the quadratic sustains the hypothesis of a nonlinear scenario. Here δ is the width of error and by the which the “true” measuring of oxygen saturation is deﬁned by: OX . (33) t = t0 + δ Δ

Hamiltonian Mechanics

With Eq. 32 the Eq. 31 can be ﬁnally expressed in the following form: 2 t − t0 γ h= Exp − . Δβ δ

4

177

(34)

Interpretation of Eq. 34

Because Eq. 32 the new deﬁnition of OX measurement including a potential realistic error then one would expect that the new variable η is directly proportional to time. Thus in Fig. 4 a Gaussian proﬁle exhibits a peak that is associated to the maximum value of number of beats per minute. The orange line is displaying a theoretical dependence of h against weeks with an error of 10.5%. Here one would expect to implement a prescription with Bisiprolol 2.5 mg. [10] as commonly used for heart disease but also as an eﬀective pharmacology to down the peaks of beats per minute. A potential case is given by the magenta line with an error of 38% showing a deep width. It is actually a risk scenario that might require the fast decision of a Cardiology especialist in the shorter times. This means a careful in the care of patient. In the present study the patient have evolved week until the 4th week of treatment however it was registered peaks of beats per minute at the end of 4th week. The recommendation of a Cardiologist and the usage of Bisoprolol have ended in the down of number of beats. 4.1

Shannon Entropy from Covid-19 Infection

From Eq. 34 one can see that in fact there is a kind of probability distribution function that generates a scenario of entropy and can be written below as: γ 2 Δβ t − t0 , (35) P(t) = Exp Exp − δ denoting the fact that the number of cardiac pulsations h in a Covid-19 would follow a entropy-like behavior dictated by the Shannon’s entropy: h(t) = LogP(t)

(36)

In this manner one can anticipate a kind of disorder that although cannot be seen as an arrhythmia, then would be a imminent consequence of the multisystem attack by Corona virus. It would produce a physiological correlation between heart and other organs that are depending strongly at the blood ﬂux, as for example brain and lungs. The fact that mathematically one ﬁnds a kind of entropy might to suggest events of chaos or disorder that Corona virus would trigger a wrong behavior in the heart. It is noteworthy to underline that these possible disorders might be sharp in diabetes patients [11,12]. As seen in Fig. 4 the central role of Bisoprolol is that of minimizing the “entropy” in the sense that the eﬀect of this pharmacological prescription is that of returning a kind of order to the cardiac beats. Clearly the pharmacology would have to be accompanied with a diet that leaves to heart to come back in its regular functionality.

178

5

H. Nieto-Chaupis

Conclusion

In this paper, a simple mathematical model inspired from real data from adults that have surpasses the symptomatology of Covid-19 has been proposed. In essence three variables have been taken into account. A fundamental equation has been derived and from it a Shannon’s entropy has been identiﬁed. The fact that entropic relations have been found might be interpreted as a chaotic response of heart to the attack by Corona virus. From the proposed mathematical model one can conclude that Corona virus might be a potential generator of scenarios producing disordered basic functionalities of heart and particularly with its relation to the lungs by the which this would be (in the deepest conditional scenario) the cause of an anomalous number of beats per minute. Clearly more clinical data is needed in order to improve the accuracy of present proposed model, as well as to extend to scenarios where diabetes patients are under the attack of Corona virus.

References 1. Wu, F., et al.: A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020) 2. Wu, G., et al.: A prediction model of outcome of SARS-CoV-2 pneumonia based on laboratory ﬁndings. Sci. Rep. 10, 1–9 (2020) 3. Tschope, C., et al.: Myocarditis and inﬂammatory cardiomyopathy: current evidence and future directions. Nat. Rev. Cardiol. 18, 169–193 (2020) 4. Levy, J., et al.: Digital oximetry biomarkers for assessing respiratory function: standards of measurement, physiological interpretation, and clinical use. npj Digit. Med. 4, 1–14 (2021) 5. Maldonado, L.L., Bertelli, A.M., Kamenetzky, L.: Molecular features similarities between SARS-CoV-2, SARS, MERS and key human genes could favour the viral infections and trigger collateral eﬀects. Sci. Rep. 11, 1–17 (2021) 6. Johansson, C., Kirsebom, F.C.M.: Neutrophils in respiratory viral infections. Mucosal Immunol. 14, 815–827 (2021) 7. Olivos, L., Lima, Per´ u: Private communication with a Covid-19 patient. Data from 1th to 30th April 2021 8. Robson, A.: Preventing cardiac damage in patients with COVID-19. Nat. Rev. Cardiol. 18, 387–387 (2021) 9. Belenchia, A., et al.: Entropy production in continuously measured Gaussian quantum systems. npj Quantum Inf. 6, 1–7 (2020) 10. Vanmolkot, F.H.M., et al.: Impact of antihypertensive treatment on quality of life: comparison between bisoprolol and bendroﬂuazide. J. Hum. Hypertension 13, 559–563 (1999) 11. Laurenzi, A., et al.: Pre-existing diabetes and COVID-associated hyperglycaemia in patients with COVID-19 pneumonia. MDPI Biol. 10, 753 (2021) 12. Yang, J., et al.: Plasma glucose levels and diabetes are independent predictors for mortality and morbidity in patients with SARS. Diabet. Med. 23, 623–628 (2006)

Eﬃciency of Local Binarization Methods in Segmentation of Selected Objects in Echocardiographic Images Joanna Sorysz1(B) 1

2

and Danuta Sorysz2

Department of Biocybernetics and Biomedical Engineering, AGH University of Science and Technology, Mickiewicza 30 Avenue, 30–059 Krak´ ow, Poland [email protected] II Department of Cardiology, University Hospital in Cracow, Jakubowskiego 2, 30-688 Krak´ ow, Poland

Abstract. Nowadays echocardiography devices have become more and more precise, therefore allowing for more detailed diagnostics. With the use of automatic or semi-automatic algorithms the needed time for diagnostics can be shortened thus it can allow for better medical treatment and results. An example of that can be found in examination of echocardiographic images of a PFO suspected patient. To accelerate that process the automatic algorithm can be used to segment the ﬁrst generation contrast agent bubbles used in condition assessment of a patient. We propose a method that consists of local thresholding methods which, unlike neural networks, does not need a big dataset of images. Obtained results are at the Dice coeﬃcient level of 0.5–0.8 depending on the image.

Keywords: Segmentation

1

· PFO · Local thresholding · Heart

Introduction

Lately, the technology development led to the creation of more precise and sensitive echocardiographic devices and probes which therefore, allowed for a better medical examination. Patent Foramen Ovale (PFO) is one of the structural abnormalities of the heart and its diagnosis depends on the quality of the image obtained in echocardiography. In general the evaluation starts with intravenous injection of a ﬁrst generation echocardiographic contrast agent during Transesophageal Echocardiography Examination (TEE). The transition of contrast from the right to the left atrium conﬁrms the diagnosis of PFO, and the assessment of the number of its bubbles is the size criterion. The amount of contrast bubbles is estimated by doctors which is time-consuming and secondly could be done automatically using the computer software faster and more accurately. There are many diﬀerent approaches to choose from to obtain the best results in medical images segmentation. The best choice depends, among others, on the type of the images and available computing power. As lately neural networks c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 179–192, 2022. https://doi.org/10.1007/978-3-031-10467-1_10

180

J. Sorysz and D. Sorysz

gain more and more popularity, most of the scientist would propose them as the solution to the presented problem and probably it is possible to train one to return the number of contrast bubbles in echocardiographic images. However, after deliberations the other method was chosen, the one based on mathematical and statistical operations on the images. The reasons behind rejecting machine learning were simple: ﬁrstly to train a neural network a big dataset is needed. Due to the type of examination and the way it is performed it is not widely used in many cases and it is hard to ﬁnd a big enough dataset. Moreover, the ﬁnal algorithm should be able to perform on all types of computers as doctors usually use many applications simultaneously, which weighs down RAM. In many cases starting one more program with a machine learning algorithm would freeze computers as usually they do not have the best computers. This article presents the results of ongoing research that ﬁrst outcomes were present in another publication. The three most promising methods were described there and after the need for more in-depth investigation was concluded, the examination for the second one took place [12]. The order of the article is as follows: ﬁrstly the methods are introduced, then the results are discussed and lastly the conclusions are presented.

2

Data

The echocardiographic data obtained were of the following types: grayscale 16bit, grayscale 8-bit, and RGB images (not important colour information such as heart rate, etc.). The sets featured ﬁve images from two examinations that depicted the heart from various planes. The left and right atrium as well as the interatrial septum were visible. In all images contrast bubbles like object were present. In this experiment the masks were revised as the decision on extending desired segmentation to all similar contrast-bubbles like objects was made. The Fig. 1 presents the original images with ground truth and ROI. The proprieties of the images and statistical information about objects in Ground Truth are presented in Table 1. Unfortunately in the examination ﬁle with data there were not any details about the size of the pixels. Table 1. Size of images and statistical information about segmented objects in ground truth Image Dimensions [pixels × pixels]

No. of Objects

Mean Area SD of of Objects AoO (AoO)

Mean of Max Feret

SD of Max Mean of Feret Min Feret

SD of Min Feret 1.923

A

1016 × 708

19

62.316

28.443

14.032

5.41

5.105

B

1016 × 708

23

35.217

26.922

7.981

3.693

4.29

1.967

C

1016 × 708

36

28.194

21.131

6.633

3.326

3.642

1.717

D

1016 × 708

11

14.364

7.953

4.190

1.514

2.649

1.216

E

800 × 600

33

28.818

17.307

7.698

3.805

3.759

1.754

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

181

Fig. 1. Original images and corresponding manual segmentation (red) and ROI (white)

182

3

J. Sorysz and D. Sorysz

Methods

All implementation was made in the ImageJ [10] and all presented methods and algorithms are the variances available in that program. 3.1

Local Thresholding

Local Threshold is a group of methods that executes thresholding operations on the neighborhood that the user speciﬁes rather than the entire image. The algorithm utilized during the transformation is also chosen by a person. These techniques are a function in ImageJ, and the algorithms described are implementations built in that platform. One of the important changes that authors of that plug-in made is that the neighborhood is not rectangular as in most original algorithms but circular, therefore the use of radius instead of a length of edges [2]. In the Table 2 the methods are presented organised by the number of the parameters. This number strictly correlates with the computation time the more there is the longer the needed time is. Table 2. Local thresholding methods used during experiment Number of parameters Methods 1

Contrast, Otsu

2

Bernsen, Mean, Median, Mid Grey

3

Niblack, Phansalkar, Sauvola

Contrast. The pixel under consideration is set as a background or foreground depending on which distance is shorter: between its value and the maximum in the neighborhood or between its value and the minimum in the neighborhood [10]. Otsu. According to its author, the Otsu thresholding approach is based on the “goodness” of the chosen threshold. Two classes are created for each neighborhood: background and object. Following that the search for the division point begins, with the criterion for the best outcomes being to minimize the variation in the classes [6].

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

183

Bernsen. The approach uses a contrast threshold supplied by the user. The threshold is determined at the local midgrey value if the local contrast (maxmin) reaches the contrast threshold (the mean of the minimum and maximum grey values in the local window). The neighborhood is regarded to have only one class and the pixel is set to the object or background in accordance with the middle-grey value if the local conﬂict is less than the contrast threshold [4]. It is an adaptive thresholding algorithm that is based on the image’s contrast. In the local form, two parameters are used: radius of neighborhood and contrast threshold [10]. Mean. This method computes the mean value for each neighborhood and then compares the value of each pixel in that area to the mean − C value, where C is a user-speciﬁed parameter, and assigns that pixel to one of two categories: object or background [6]. Median. Analogous to the Mean algorithm, however instead of the neighborhood mean, the median value of each neighborhood is used [6]. Mid Grey. Analogous to the Mean and Median algorithms, however, instead of the mean of the neighborhood, the MidGrey value, which is equal to the max+min of each area, is used [6]. 2 Niblack. Wayne Niblack’s algorithm is based on a standard deviation (SD) and neighborhood mean. Furthermore, the user can modify those values by giving two numbers: k, which is the SD scale, and C, which is the oﬀset [5]. The entire formula used to compare the value of a pixel P is as follows: P = M ean + k · SD − C

(1)

Sauvola. The Sauvola technique is a variant of the Niblack algorithm in which the number to which the pixel value P is compared is deﬁned by the equation [9]: SD )) (2) P = M ean · (1 + k · ( r−1 That method was created for thresholding mixed pages of text and images [9].

184

J. Sorysz and D. Sorysz

Phansalkar. Algorithm based on the Sauvola approach, but adapted for nuclei images where a low standard deviation does not necessitate a background. [7] The whole expression for the comparison pixel value P in the ImageJ algorithm is as follows: SD − 1)) (3) P = M ean · (1 + p · e−q·mean + k · ( r Nonetheless, the only variables that can be changed in that implementation are k and r; the others are ﬁxed: p = 2 and q = 10, as proposed by Phansalkar. It should be noticed that the r parameter does not represent the radius of the neighborhood, which is another value supplied by the user. 3.2

Statistical Dominance Algorithm

This technique initially counts the number of neighborhoods of a core pixel within a radius speciﬁed by the user. The number of pixels whose values are greater or equal to the central pixel value is the pixel’s ﬁnal value. Because the output image is the outcome of a statistical dominance operation, its values are tied to the distribution of the image’s grey-levels [8]. 3.3

Local Normalization

This algorithm uniﬁes an image’s mean and variance around a local neighborhood. In this implementation the fast recursive Gaussian ﬁlter was used as an spatial smoothing which greatly improves the eﬃciency of this technique. It is particularly useful for correcting uneven lighting or shading issues [1]. 3.4

Dice Coeﬃcient

Originally developed as a statistical tool for ecologists, this coeﬃcient is currently one of the most often used evaluation tools for medical segmentation. Other names for this coeﬃcient are Sørensen Coeﬃcient, Dice-Sørensen Coeﬃcient, and so on. This relationship deﬁnes the Dice coeﬃcient when applied to binarized data [3]: 2T P (4) DSC = 2T P + F P + F N Used version of the algorithm is calculated pixel-by-pixel.

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

3.5

185

Processing

The preparation of the images consisted of a transformation to grayscale (if needed). Then local thresholding was performed and for SDA and Local Normalisation methods global thresholding took place. The results were evaluated with Dice coeﬃcient.

4

Results

All presented images were processed in the same way, with algorithms described in Sect. 3. ImageJ in version 1.53j was used as an environment for handling processing. To evaluate the results Dice coeﬃcient was computed, and the highest scores are presented in the Table 3. The values of the Sørensen coeﬃcient for the Image A are signiﬁcantly lower then for the rest of Images. This can be prompted by the lack of desired objects in that Image - the only ones that are highlighted in mask only resemble them in the shape. The dependency of the Dice coeﬃcient on the radius of neighborhood of local algorithms is presented in Fig. 2. It is very clear that the best results are reached for the radius under 60 pixels for all of the images, and for the most of them under 40 or even 20 pixels. That conclusion can be easily explained - the segmented object’s dimensions are in the range of 3–14 pixels and therefore the size of the neighborhood must be similar to them to perform segmentation thoroughly. In Fig. 3 can be seen that the SDA algorithm gives very clear results as for the range of its parameters (threshold and radius). The maximum is in the small peak of the graphs and easy for recognition. The only improvement of the parameters may happen if the step of searching for the space parameters will be smaller then used value of 0.1, however the previous experience in this ﬁeld suggest that for this method there will not be any signiﬁcant improvement. On the other hand the Phansalkar algorithm (Fig. 4) presents a really interesting parameter space where the maximum lies in the long ridge of diﬀerent pairs of the parameters. Moreover there is a possibility that there can be better results for the values of the parameters: P1 under -1 and P2 under 0. Although there is small elevation of the plane of the graph for the P2 value near 1 which suggests that there can be another local maximum and only further examination can unveil the ﬁnal value of the maximal Dice coeﬃcient for this method.

186

J. Sorysz and D. Sorysz Table 3. The best results for each image of each algorithm

Image Method

Parameters

A

Otsu

Radius 21

0.0170

Contrast

Radius 102

0.0176

Bernsen

Radius 14 P1 97

0.0219

Mean

Radius 6 C -41

0.2010

Median

Radius 6 C -40

0.3701

Mid Grey

Radius 3 C -36

0.1036

Niblack

Radius 7, K 0.8, C -22.95

0.2762

Sauvola

Radius 14, KV -2.35 RV 0

0.3684

Phansalkar

Radius 8 P1 -0.24 P2 -0.14

0.1798

SDA

Radius 7, SDA radius 35, threshold 158

Local Normalization

Sigma 1 6, sigma 2 14, threshold 174

B

C

D

E

Dice coeﬃcient

0.3854 0.1414

Otsu

Radius 18

0.0123

Contrast

Radius 101

0.0112

Bernsen

Radius 5, P1 57

0.0080

Mean

Radius 11, P1 -69

0.5691

Median

Radius 9, P1 -62

0.7068

Mid Grey

Radius 6, P1 -65

0.2394

Niblack

Radius 10, K 0.4, C -50

0.5845

Sauvola

Radius 13, KV -2.15 RV 0

0.3933

Phansalkar

Radius 12, Kv -0.99, Rv 0.99

0.4370

SDA

Radius 10, SDA radius 64, threshold 136

0.7117

Local Normalization

Sigma 1 14, sigma 2 17, threshold 132

0.3047

Otsu

Radius 11

0.0154

Contrast

Radius 3

0.0103

Bernsen

Radius 5, P1 57

0.0080

Mean

Radius 9, P1 -62

0.4722

Median

Radius 8, P1 -58

0.6393

Mid Grey

Radius 4, P1 -47

0.2055

Niblack

Radius 11, K 0.85, C -35.95

0.5239

Sauvola

Radius 15, KV -2.45 RV 0

0.4174

Phansalkar

Radius 12, Kv, -0.75, Rv -0.88

0.4741

SDA

Radius 10, SDA radius 55, threshold 157

0.6579

Local Normalization

Sigma 1 5, sigma 2 12, threshold 106

0.2380

Otsu

Radius 21

0.0025

Contrast

Radius 60

0.0019

Bernsen

Radius 7, P1 65

0.0027

Mean

Radius 7, P1 -85

0.5705

Median

Radius 5, P1 -59

0.7701

Mid Grey

Radius 4, P1 -87

0.1340

Niblack

Radius 8, K 1.35, C -34.95

0.7041

Sauvola

Radius 14, KV -4.6 RV 0

0.5051

Phansalkar

Radius 20, Kv -1, Rv -0.17

0.4857

SDA

Radius 8, SDA radius 60, threshold 186

0.8294

Local Normalization

Sigma 1 49, sigma 2 7, threshold 131

0.5058

Otsu

Radius 12

0.0171

Contrast

Radius 11

0.0154

Bernsen

Radius 1, P1 23

0.0199

Mean

Radius 7, P1 -48

0.0863

Median

Radius 8, P1 -58

0.2490

Mid Grey

Radius 4, P1 -36

0.0422

Niblack

Radius 12, K 1.5, C -20.95

0.3196

Sauvola

Radius 23, KV -0.75 RV -8.5

0.5496

Phansalkar

Radius 15, Kv -0.66, Rv -0.01

0.5622

SDA

Radius 12, SDA radius 26, threshold 219

0.3411

Local Normalization

Sigma 1 8, sigma 2 9, threshold 124

0.3283

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

187

Fig. 2. Parameters space for the best method for images a) A, b) B, c) C, d) D and e) E

The visual comparison of the best segmentation and ground truth for each Image is presented in the Fig. 5. Although there are visible many false negatives it is important to know that mostly the objects are other than contrast bubbles in the left atrium. Most of those bubbles are segmented correctly which is a very good result of this experiment.

188

J. Sorysz and D. Sorysz

Fig. 3. Parameter space for SDA method

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

Fig. 4. Parameter space for Phanshalakr method

189

190

J. Sorysz and D. Sorysz

Fig. 5. Best segmentation for each image: a) - A, b) - B, c) - C, d) - D, e) - E. The coloring scheme is as follows: green is the background, light blue are false negative results, black are true positive results, red indicates false positive results, and white shows true negative results

5

Conclusions

The aim of this article was to select the best method for segmentation of a small, round, bright objects in echocardiographic images. The results presented in Table 3 clearly shows that neither the algorithms with one parameter nor Local normalisation returned acceptable Dice scores. In opposite to the article on a similar problem (segmentation of a small, round objects in medical imaging) by A.Piorkowski and J.Lasek, where the 3 parameters algorithms presented the best scores [11], in this experiment the main conclusion is that in echocardio-

Eﬃciency of LT and LN Methods in Echocardiographic Images Segmentation

191

graphic images one of the best results were given by Median, Mean, Phansalkar and SDA algorithms. Moreover, for the images from one examination (Image A, B, C and D) the top score was achieved by Statistical Dominance Algorithm (SDA) and only for Image E (which is from another examination) Phansalkar algorithm was better. This indicates that there is a possibility to adjust not only the parameters but also the method based on the statistical properties of the images. To prove this hypothesis it is necessary to perform an experiment with a much bigger data set of multiple groups of at least two images from diﬀerent examinations and diﬀerent echocardiographic machines. This would allow for statistical analysis of the images and to choose the best method for each one of them. Another path of experiment is to change the method of the evaluation to one that checks the number and position of segmented objects, instead calculating their size pixel-by-pixel. Furthermore, there is a great possibility that algorithms from the “cell counting” group such as ellipse ﬁtting, convolution masks, blob detection or circlet transform [13] will give good results. One of the other ideas that should improve the accuracy of the ﬁnal algorithm (prepared for the usage by echocardiographers) is the possibility for choosing ROI according to the position of the left atrium, however, it would appear only in the end product of experiments. Acknowledgment. This work was ﬁnanced by the AGH University of Science and Technology, Faculty of EAIIB, KBIB no 16.16.120.773.

References 1. Sage, D., Unser, M.: Easy Java programming for teaching image processing. In: Proceedings of the IEEE International Conference on Image Processing (ICIP 2001), Thessaloniki, Hellenic Republic (2001) 2. Landini, G.: Auto Local Threshold. https://imagej.net/Auto Local Threshold. Accessed 10 Sep 2021 3. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 5(3), 297–302 (1945) 4. Bernsen, J.: Dynamic thresholding of gray-level images. In: Proceedings Eighth International Conference on Pattern Recognition, Paris (1986) 5. Niblack, W.: An Introduction to Digital Image Processing, pp. 11–116. Prentice Hall, Englewood Cliﬀs (1986) 6. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979) 7. Phansalkar, N., More, S., Sabale, A., Joshi, M.: Adaptive local thresholding for detection of nuclei in diversity stained cytology images. In: 2011 International Conference on Communications and Signal Processing, pp. 218–220. IEEE (2011) 8. Pi´ orkowski, A.: A statistical dominance algorithm for edge detection and segmentation of medical images. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 3–14. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39796-2 1 9. Sauvola, J., Pietik¨ ainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)

192

J. Sorysz and D. Sorysz

10. Schneider, C.A., Rasband, W.S., Eliceiri, K.W.: NIH image to ImageJ: 25 years of image analysis. Nat. Methods 9(7), 671–675 (2012). https://doi.org/10.1038/ nmeth.2089 11. Pi´ orkowski, A., Lasek, J.: Evaluation of local thresholding algorithms for segmentation of white matter hyperintensities in magnetic resonance images of the brain. In: Florez, H., Pollo-Cattaneo, M.F. (eds.) ICAI 2021. CCIS, vol. 1455, pp. 331–345. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89654-6 24 12. Sorysz, J., Sorysz, D., Pi´ orkowski, A.: Segmentation of a ﬁrst generation agent bubbles in the B-mode echocardiographic images. In: Piaseczna, N., Gorczowska, M., L ach, A. (eds.) EMBS ICS 2020. AISC, vol. 1360, pp. 127–135. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-88976-0 17 13. Bala, J., Dwornik, M., Franczyk, A.: Automatic subsidence troughs detection in SAR interferograms using circlet transform. Sensors 21(5), 1706 (2021). https:// doi.org/10.3390/s21051706

Automatic Speech-Based Smoking Status Identification Zhizhong Ma1 , Satwinder Singh1 , Yuanhang Qiu1 , Feng Hou1(B) , Ruili Wang1 , Christopher Bullen2 , and Joanna Ting Wai Chu2 1 School of Mathematical and Computational Sciences, Massey University, Auckland,

New Zealand [email protected] 2 National Institute for Health Innovation, University of Auckland, Auckland, New Zealand

Abstract. Identifying the smoking status of a speaker from speech has a range of applications including smoking status validation, smoking cessation tracking, and speaker profiling. Previous research on smoking status identification mainly focuses on employing the speaker’s low-level acoustic features such as fundamental frequency (F 0 ), jitter, and shimmer. However, the use of high-level acoustic features, such as Mel Frequency Cepstral Coefficients (MFCC) and filter bank (Fbank) for smoking status identification, has rarely been explored. In this study, we utilise both high-level acoustic features (i.e., MFCC, Fbank) and low-level acoustic features (i.e., F 0 , jitter, shimmer) for smoking status identification. Furthermore, we propose a deep neural network approach for smoking status identification by employing ResNet along with these acoustic features. We also explore a data augmentation technique for smoking status identification to further improve the performance. Finally, we present a comparison of identification accuracy results for each feature settings, and obtain the best accuracy of 82.3%, a relative improvement of 12.7% and 29.8% on the initial audio classification approach and rule-based approach, respectively. Keywords: Smoking status identification · Speech processing · Acoustic features

1 Introduction Automatic smoking status identification is to identify a speaker’s smoking status by extracting and analysing the acoustic features that can be affected by cigarette smoking based on the spoken utterances. Speech signals carry a speaker’s basic information, such as age, gender, emotional status, psychological status, intoxication level, and smoking status [1]. Compared to traditional biochemical smoking status validation methods (such as biochemical testing of urine or saliva for the nicotine metabolite cotinine, or exhaled breath carbon monoxide) and on-site speech assessments operated by experts, automatic smoking status identification from speech signal is a simple, non-invasive, low-cost © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 193–203, 2022. https://doi.org/10.1007/978-3-031-10467-1_11

194

Z. Ma et al.

method that can be applied across a large population and does not require face-to-face contact. Automatic smoking status identification has a variety of applications such as smoking status validation, smoking cessation tracking, speaker profiling. Smoking cessation tracking applications are implicitly or explicitly employing smoking status information to record users’ quit smoking timelines. In speaker profiling systems, knowledge of smoking status can be utilised for the normalisation of acoustic features to increase the system performance. In general, automatic smoking status identification from the speech is essential for improving the flexibility of smoking status validation and the performance of speaker profiling systems. There is a rich literature on the effects of cigarette smoking on a smoker’s throat tissues, including their vocal cords [2–5]. Smoking can also degrade lung function by decreasing the airflow through the smoker’s vocal cords [6–10]. The signs of laryngeal irritation and disturbed phonatory physiology caused by smoking occur even in young smokers affects women’s voices more than men’s voices [4, 8, 11, 12]. Changes in the vocal tract can result in a significant variation in the speaker’s speech signals. Previous studies on smoking status identification have concluded that there is a relationship between a smoker’s speech signals and the corresponding smoking status. Researchers suggest that the primary acoustic features affected by smoking are fundamental frequency (F 0 ), jitter, and shimmer [10, 13–15]. The typical approach to identifying smoking status has focused on the low-level acoustic features (e.g., F 0 , jitter, and shimmer), such as mean, maximum, minimum, and standard deviation (SD) from the on-site speech assessment including sustained vowels, oral reading, and spontaneous speech tasks. Recent research focused on adopting high-level acoustic features such as Mel-Frequency Cepstral Coefficient (MFCC) as the input in smoking status identification models [16]. Recently, the performance of audio classification tasks such as emotion recognition [17], acoustic event detection [18] and speaker verification [19] has been improved by using a specific type of deep neural networks (DNNs) - Residual Network (ResNet). ResNet [20] was initially designed for image classification and has shown more reliable performance than shallower convolutional neural network (CNN) architectures. Inspired by Google’s recent work on audio classification [18], we adapted ResNet for smoking status identification. To the best of our knowledge, our work is the first use of ResNet to smoking status identification. Our contributions include (i) the combination of both high- and low-level acoustic features is used for automatic speech-based smoking status identification for the first time. (ii) deep learning is used for automatic speech-based smoking status identification for the first time; (iii), we develop a new smoking status identification dataset based on two existing corpora. This paper is organised hereon as follows: Sect. 2 introduces the various acoustic features for smoking status identification. Section 3 explains our proposed method. Section 4 describes the dataset we used and explains our experimental setup. Section 5 presents our experimental results. Finally, the conclusion and future directions are described in Sect. 6.

Automatic Speech-Based Smoking Status Identification

195

2 Acoustic Features for Smoking Status Identification The acoustic features are the acoustic components present in a speech that are capable of being experimentally observed, recorded, and reproduced. The following features will be used in our method, which includes both high- and low-level acoustic features. 2.1 MFCC and Fbank Mel-Frequency Cepstral Coefficient (MFCC) and filter bank (Fbank) are two standard high-level acoustic features that are widely utilised in audio classification tasks [21–23] and typically develop from a sub-band spectrum. MFCC is a method for converting the real cepstral of a windowed short-time speech signal derived from the Fast Fourier Transform (FFT) technique into parameters according to the Mel Scale [24]. It represents short-term spectral features of a speech signal [25]. Filter bank (Fbank) feature is a common alternative to MFCC [26], and has become a trend in acoustic feature learning for very deep neural networks because it contains additional information such as short-range temporal correlations [27]. 2.2 Fundamental Frequency (F0 ) The fundamental frequency (F 0 ) is an important low-level acoustic feature of speech signals. F 0 is the lowest, and typically the strongest frequency produced by the complex vocal fold vibrations measured in Hertz (Hz). Typical F 0 values captured in the speech signal were 120 Hz for men and 210 Hz for women [3]. Studies have consistently shown that lower F 0 values existed in smokers in comparison to the age and sex-matched non-smokers. In [7], F 0 was assessed through oral reading and spontaneous speech for 80 individuals, half of whom were classified as smokers. The results indicated that the average F 0 values for smokers were lower than for non-smokers. However, the differences between the F 0 values of the female smokers and non-smokers females (182.70 Hz smokers vs 186.45 Hz non-smokers) were not as significant as the male group (105.65 Hz smokers vs 115.95 Hz non-smokers), but the same trends were noted. Guimarães et al. [5] selected 32 adult subjects (20 smokers and 12 non-smokers) based on their age, gender, and smoking history. The smokers were aged between 27 and 51 years, with a mean age of 37 years. The non-smokers ranged in age from 20 to 51 years, with a mean age of 32 years. The smokers in the study were all regular smokers when they underwent the speech assessment. With the exception of one subject who had quit smoking ten years earlier, all non-smokers had a non-smoking history. The speech assessment included oral reading tasks, sustained vowels tasks and conversation tasks. The results indicate that a lower mean F 0 value for all speech assessments was found for the smoker group.

196

Z. Ma et al.

2.3 Jitter Jitter (measured in microseconds or % jitter) is a common low-level acoustic feature used in the smoking status validation. It is a measure of the cycle-to-cycle frequency variation or instability of a speech signal, which is mainly affected by the lack of control of vocal cords vibration. Many studies have shown higher jitter values in smokers than non-smokers. In [28], male non-smokers had substantially lower jitter values than male smokers who smoked a minimum of five cigarettes per day for five years or longer (0.364% smokers vs 0.283% non-smokers). Gonzalez and Carpi [4] indicated differences in jitter between male nonsmokers and male smokers who had been smoking for less than 10 years (47.67 µs non-smokers vs 62.78 µs smokers), implying that changes in jitter are also associated with long-term smoking. A more recent study [29] found that smoking women aged 18–24 years had a higher jitter value than non-smoking women. However, the jitter difference was not significant due to the smokers’ smoking history being relatively short (3.5 years on average). In [6], an increasing trend of jitter was found in female smokers compared to female non-smokers, but there were also differences between smokers who had smoked for more than 10 years and those who had smoked for less than 10 years (1.11% smoker ≥ 10 years vs 0.92% smoker < 10 years vs 0.69% non-smoker). However, the authors also observed that the women with a longer smoking habit smoked more cigarettes per day and were older than the other groups, which might explain the difference in voice perturbation. 2.4 Shimmer Another common low-level acoustic feature used in the smoking status analysis is shimmer (measured in decibels [dB] or % shimmer), a measure of amplitude instability of the sound wave. Studies have also found smokers have higher shimmer values than non-smokers [6, 10, 28]. When compared to male non-smokers, male smokers had a considerably higher shimmer (4.57% smokers vs 2.50% non-smokers) [28]. Likewise, the shimmer was substantially higher for female smokers who had smoked for more than 10 years than for either non-smokers and smokers who had smoked for less than 10 years (0.37 dB smokers ≥ 10 years vs 0.25 dB smokers < 10 years vs 0.21 dB non-smokers) [6]. Zealouk et al. [10] studied the vocal characteristics of 40 male subjects, 20 of whom were smokers with an average smoking history of 13 years. Smokers had substantially higher shimmer values than non-smokers (0.570 dB smokers vs 0.378 dB non-smokers).

3 Our Method The smoking status identification task is typically considered as an audio classification problem in the speech processing domain. Previous studies utilised either the low-level acoustic features (e.g., fundamental frequency (F 0 ), jitter, and shimmer) based on the on-site speech assessments [10, 13–15] or the high-level acoustic features such as MelFrequency Cepstral Coefficient (MFCC) with i-vector framework that was designed for

Automatic Speech-Based Smoking Status Identification

197

speaker recognition tasks [16]. However, no studies have combined both low- and highlevel acoustic features for smoking status identification. Further, the deep neural network (DNN) approach has not been used to model smoking status information from speech signals as far as we know. Our proposed method utilises both low- and high-level acoustic features to distinguish smokers from non-smokers along with deep neural network techniques. Our proposed architecture, as illustrated in Fig. 1, takes a speech recording with the sensor module, and subsequently passes it to the signal preprocessing module. Speech signal preprocessing is utilised to reduce the influence of acoustic noise on acoustic feature extraction, allowing for more accurate identification of the smoking status output. For further processing, the feature extraction module extracts the acoustic features such as MFCC, filter bank (Fbank), F 0 , jitter and shimmer from speech signal inputs. In our method, we use ResNet-18 as the deep learning network to be trained with the labeled data and obtain a classifier to determine the smoker/non-smoker label of the unlabeled input test speech data (for more details please refer to below Sect. 4.2). We choose ResNet-18 rather than other variants of ResNet (e.g., ResNet-34, ResNet-50, ResNet-101) because our dataset is a relatively small dataset that contains approximately 45 h of speech data and ResNet-18 can provide a better trade-off between layers and performance for such dataset.

Fig. 1. Our architecture of automatic smoking status identification.

198

Z. Ma et al.

4 Experiments 4.1 Datasets In the absence of large-scale, well-designed datasets specifically for smoking status identification experiments, we collected and created a new smoking status identification dataset based on two corpora, which are available at the Linguistic Data Consortium (LDC): (1) the Mixer 4 and 5 Speech Corpus [30], (2) the Mixer 6 Speech Corpus [31]. Both corpora comprise recordings made via the public telephone network and multiple microphones in office-room settings. The main difference in the setting is that most of the 616 distinct speakers in the Mixer 4 and 5 Speech Corpus have English as their native language, and the 594 distinct speakers in the Mixer 6 Speech Corpus all have English as their native language. In Mixer 4 and 5 Speech Corpus, only 89 of 616 speakers have valid smoking status labels. There are 40 female smokers, 8 female non-smokers, 37 male smokers, 4 male non-smokers. In Mixer 6 Speech, 589 of 594 speakers have valid smoking status labels. There are 48 female smokers, 252 female non-smokers, 70 male smokers, 219 male non-smokers. For balanced data training purposes, 200 speakers (50 female smokers, 50 female non-smokers, 50 male smokers, 50 male non-smokers) from both corpora jointly are selected for experiments. Most of the speakers have two to three 12 min transcripts reading audio segments; a few of them only have one 12 min transcript reading audio segment. We split the training set, validation set and test set following the 8:1:1 ratio. We choose 5 female smokers, 5 female non-smokers, 5 male smokers, 5 male non-smokers as the test set and the rest speakers as the training set. The fundamental frequency (F 0 ), jitter, and shimmer statistics for smokers and non-smokers in the training set are shown in Table 1. Table 1. Speech features statistics divided by smoking status and gender. F 0 (Hz)

Jitter (µs)

Shimmer (%)

Male Smokers

Min Max Mean SD

92.528 220.618 108.287 25.749

23.422 59.754 35.199 1.337

5.642 13.624 8.979 4.325

Male Non-smokers

Min Max Mean SD

97.183 249.035 116.592 25.153

20.278 47.925 24.734 0.942

4.971 12.480 6.491 2.903

Female Smokers

Min Max Mean SD

124.078 277.399 181.021 22.572

29.351 53.201 33.635 1.473

8.472 13.172 11.716 2.673 (continued)

Automatic Speech-Based Smoking Status Identification

199

Table 1. (continued)

Female Non-smokers

Min Max Mean SD

F 0 (Hz)

Jitter (µs)

Shimmer (%)

126.334 297.481 210.359 21.437

20.653 39.714 23.786 1.274

5.416 11.669 7.695 2.351

4.2 Implementation Details The input features are either 40-dimensional MFCC features or 40-dimensional log Mel-filterbank features with a frame-length of 40 ms with 50% overlap. We extracted the fundamental frequency (F 0 ) of each speech in the dataset using Praat [32], an opensource toolbox. Jitter and shimmer were calculated upon frames of 40 ms with a time-shift of 20 ms by using the DisVoice toolkit [33]. Before being fed into the ResNet, the input features are mean-normalised along the time-axis, and nonspeech frames (silent speech frame) are removed using an energybased voice activity detection (VAD) method. Data augmentation (i.e. SpecAugment [34]) is utilised in the training process to increase the diversity of the training set and further improve robustness. We choose SpecAugment in this study is becasue it is a novel data augmentation method that applied directly to the feature inputs of a neural network (i.e., MFCC, Fbank). Other traditional data augmentation methods that deformed the raw waveform by speeding it up or slowing it down which are not suitable for smoker status identification. In training, we use ResNet-18 with 16-32–64-128 channels for each residual block. The model with the best validation loss was selected for testing. In testing, the entire speech is evaluated at once. The models are implemented using PyTorch [35] and optimised by stochastic gradient descent (SGD) optimiser [36] with a momentum of 0.9. The mini-batch size is 64, and the weight decay parameter is 0.0001. We set the initial learning rate to 0.1 and decay it by a factor of 10 until convergence. All the models were trained for 100 epochs. 4.3 Performance Metrics In order to validate our proposed approach, we evaluate our proposed approach using two metrics, which include accuracy and F1 -score. The confusion matrix for smoking identification uses a 2 × 2 matrix where one axis of the matrix is the predicted class (smoker, non-smoker) and the actual class. Each box in the matrix shows the number of True Smokers (TS), True Non-smokers (TN), False Smokers (FS), and False Non-smokers (FN). Accuracy (ACC) is given by the following Eq. (1): ACC =

TS + TN TS + TN + FS + FN

(1)

Accuracy can be further analysed as precision (2) and recall (3). High precision indicates a low degree of false positives, while high recall indicates a high degree of

200

Z. Ma et al.

class recognition. The following equations are examples of precision and recall for the validation of smokers. Precision = Recall =

TS TS + FS

(2)

TS TS + FN

(3)

The F1 -score can be used to evaluate the accuracy of each given class label using the following equation: F1 = 2X

Precision × Recall Precision + Recall

(4)

5 Results According to the statistics of the acoustic features in Table 1, there is a considerable variation within fundamental frequency (F 0 ), jitter, and shimmer between smokers and non-smokers for both genders. The mean F 0 , jitter, and shimmer values show that the most significant difference is between smokers and non-smokers. We developed the following rule to act as our baseline, based on the difference within the mean F 0 , jitter, and shimmer between smokers and non-smokers: if the mean F 0 of the speaker is closer to the average F 0 of male smokers than to the mean F 0 of female smokers, in the meanwhile, if both means jitter and shimmer of the speaker is closer to the mean jitter and shimmer of smokers than to the non-smokers, we identify the speaker as smokers. Otherwise, we identify the speaker as non-smokers. We employed this simple classification rule to classify our test dataset, and it achieved an accuracy of 63.4%. Although this rule requires knowledge of the ground truth mean F 0 for smokers and non-smokers, it indicates that a simple rule may identify smoking status from speech signals. Table 2. Smoking status identification experiment results. Features

Accuracy

F1 -score

Rule-based

0.634

0.617

MFCC

w/o SpecAugment with SpecAugment

0.714 0.734

0.714 0.745

Fbank

w/o SpecAugment with SpecAugment

0.730 0.770

0.724 0.766

MFCC + F 0 + jitter + shimmer

w/o SpecAugment with SpecAugment

0.754 0.769

0.754 0.765

Fbank + F 0 + jitter + shimmer

w/o SpecAugment with SpecAugment

0.787 0.823

0.795 0.823

Automatic Speech-Based Smoking Status Identification

201

In the rest of Table 2, the models are trained on the ResNet-18 described in Sect. 4.2. The experimental results are presented with and without (w/o) the SpecAugment in two types of feature settings (high-level acoustic features only and a combination of both high- and low-level acoustic features) as inputs. We obtained better smoking status identification accuracy and F1 -score results by using the data augmentation technique for different acoustic feature settings. MFCC with SpecAugment achieved a relative improvement of 2.8% than without SpecAugment. Fbank with SpecAugment achieved a relative improvement of 5.5% than without SpecAugment. On the other hand, we can see that Fbank always yielded better performance than MFCC with or without other acoustic features. Without SpecAugment settings, Fbank outperformed MFCC either by using itself or jointly with acoustic features. A combination of Fbank with SpecAugment and acoustic features (i.e., F 0 , jitter, and shimmer) provides the best accuracy of 82.3%, which is a relative improvement of 12.7% and 29.8% on the initial Fbank-only approach and rule-based approach, respectively.

6 Conclusion Based on our experimental result, it is indicated that Fbank outperforms MFCC if we only utilise high-level acoustic features. We have demonstrated for the first time that the combination of both high- and low-level acoustic features along with the deep neural network technique can achieve high performance in smoking status identification. The data augmentation technique (i.e., SpecAugment) can further improve the smoking status identification accuracy. The proposed automatic smoking status identification model could be an alternative solution to obtain an accurate and objectively smoking status when the biological verification methods are not feasible. Our proposed method has outperformed the rule-based approach and obtained the best accuracy of 82.3%, which is a relative improvement of 12.7% and 29.8% on the initial high-level acoustic features only approach and rule-based approach, respectively. In future, we will build a long-term smoking status related speech recording corpus. Additional features such as age, gender, smoking history and smoking frequency will also be considered in the data collection and smoking status identification process. We will also explore the smoking status identification deep neural network model to further improve preformance.

References 1. Poorjam, A.H., Bahari, M.H., et al.: Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals. In: 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 7–12 (2014) 2. Murphy, C.H., Doyle, P.C.: The effects of cigarette smoking on voice-fundamental frequency. Otolaryngol. Neck Surg. 97(4), 376–380 (1987). https://doi.org/10.1177/019459988 709700406 3. Traunmüller, H., Eriksson, A.: The frequency range of the voice fundamental in the speech of male and female adults. Dep. Linguist. Univ. Stock. 97, 1905191–1905195 (1994)

202

Z. Ma et al.

4. Gonzalez, J., Carpi, A.: Early effects of smoking on the voice: a multidimensional study. Med. Sci. Monit. 10(12) (2004) 5. Guimarães, I., Abberton, E.: Health and voice quality in smokers: an exploratory investigation. Logop. Phoniatr. Vocology 30(3–4), 185–191 (2005). https://doi.org/10.1080/140154305002 94114 6. Vincent, I., Gilbert, H.R.: The effects of cigarette smoking on the female voice. Logop. Phoniatr. Vocology 37(1), 22–32 (2012). https://doi.org/10.3109/14015439.2011.638673 7. Horii and Sorenson: Cigarette smoking and voice fundamental frequency. J. Commun. Disord. 15, 135–144 (1982) 8. Awan, S.N., Morrow, D.L.: Videostroboscopic characteristics of young adult female smokers vs. nonsmokers. J. Voice 21(2), 211–223 (2007). https://doi.org/10.1016/j.jvoice.2005.10.009 9. Dirk, L., Braun, A.: Voice parameter changes in smokers during abstinence from cigarette smoking. In: Proceedings 17th International Congress Phonetic Sciences (ICPhS 2011), August, pp. 1–3 (2011) 10. Zealouk, O., Satori, H., Hamidi, M., Laaidi, N., Satori, K.: Vocal parameters analysis of smoker using Amazigh language. Int. J. Speech Technol. 21(1), 85–91 (2018). https://doi. org/10.1007/s10772-017-9487-0 11. Pinar, D., Cincik, H., Erkul, E., Gungor, A.: Investigating the effects of smoking on young adult male voice by using multidimensional methods. J. Voice 30(6), 721–725 (2016). https:// doi.org/10.1016/j.jvoice.2015.07.007 12. Simberg, S., Udd, H., Santtila, P.: Gender differences in the prevalence of vocal symptoms in smokers. J. Voice 29(5), 588–591 (2015) 13. Lee, L., Stemple, J.C., Geiger, D., Goldwasser, R.: Effects of environmental tobacco smoke on objective measures of voice production. Laryngoscope 109(9), 1531–1534 (1999). https:// doi.org/10.1097/00005537-199909000-00032 14. Braun, A.: The effect of cigarette smoking on vocal parameters, ESCA work. In: Automatic Speaker Recognition, Identification, Verification ASRIV 1994, pp. 161–164 (2019) 15. Ma, Z., Bullen, C., Chu, J.T.W., Wang, R., Wang, Y., Singh, S.: Towards the objective speech assessment of smoking status based on voice features: a review of the literature. J. Voice (2021) 16. Poorjam, A.H., Hesaraki, S., Safavi, S., van Hamme, H., Bahari, M.H.: Automatic smoker detection from telephone speech signals. In: International Conference on Speech and Computer, pp. 200–210 (2017) 17. Han, S., Leng, F., Jin, Z.: Speech emotion recognition with a ResNet-CNN-transformer parallel neural network. In: 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 803–807 (2021) 18. Hershey, S. et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135 (2017) 19. Liu, Y., Song, Y., McLoughlin, I., Liu, L., Dai, L.: An effective deep embedding learning method based on dense-residual networks for speaker verification. In: ICASSP 2021– 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6683–6687 (2021) 20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 21. Pu, J., Panagakis, Y., Pantic, M.: Learning separable time-frequency filterbanks for audio classification. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3000–3004 (2021) 22. Fujioka, T., Homma, T., Nagamatsu, K.: Meta-learning for speech emotion recognition considering ambiguity of emotion labels. Proc. Interspeech 2020, 2332–2336 (2020)

Automatic Speech-Based Smoking Status Identification

203

23. Tang, R., Lin, J.: Deep residual learning for small-footprint keyword spotting. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5484– 5488 (2018) 24. Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1(6), 1–4 (2013) 25. Mittal, V.K., Yegnanarayana, B.: Production features for detection of shouted speech. In: 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC), pp. 106–111 (2013) 26. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012) 27. Yoshioka, T., Ragni, A., Gales, M.J.F.: Investigation of unsupervised adaptation of DNN acoustic models with filter bank input. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6344–6348 (2014) 28. Chai, L., Sprecher, A.J., Zhang, Y., Liang, Y., Chen, H., Jiang, J.J.: Perturbation and nonlinear dynamic analysis of adult male smokers. J. Voice 25(3), 342–347 (2011). https://doi.org/10. 1016/j.jvoice.2010.01.006 29. Awan, S.N.: The effect of smoking on the dysphonia severity index in females. Folia Phoniatr. Logop. 63(2), 65–71 (2011). https://doi.org/10.1159/000316142 30. Brandschain, L., Cieri, C., Graff, D., Neely, A., Walker, K.: Speaker recognition: building the mixer 4 and 5 Corpora. In: LREC (2008) 31. Brandschain, L., Graff, D., Cieri, C., Walker, K., Caruso, C., Neely, A.: The mixer 6 corpus: resources for cross-channel and text independent speaker recognition. In: Proceedings of LREC (2010) 32. Boersma, P.: Praat, a system for doing phonetics by computer. Glot. Int. 5(9), 341–345 (2001) 33. Vásquez-Correa, J.C., Orozco-Arroyave, J.R., Bocklet, T., Nöth, E.: Towards an automatic evaluation of the dysarthria level of patients with Parkinson’s disease. J. Commun. Disord. 76, 21–36 (2018) 34. Park, D.S. et al.: Specaugment: a simple data augmentation method for automatic speech recognition. arXiv Prepr. arXiv:1904.08779 (2019) 35. Paszke, A. et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019) 36. Ruder, S.: An overview of gradient descent optimisation algorithms. arXiv Prepr. arXiv:1609. 04747 (2016)

Independent Component Analysis for Spectral Unmixing of Raman Microscopic Images of Single Human Cells M. Hamed Mozaffari1,2(B) and Li-Lin Tay1 1 Metrology Research Centre, National Research Council Canada, Ottawa, ON, Canada

{mohammadhamed.mozaffarimaaref, mhamed.mozaffarimaaref}@nrc-cnrc.gc.ca 2 Construction Research Centre, Fire Safety, National Research Council Canada, Ottawa, ON, Canada

Abstract. Application of independent component analysis (ICA) as an unmixing and image clustering technique for high spatial resolution Raman map is reported. A hyperspectral map of a fixed human cell was collected by a Raman micro spectrometer in a raster pattern on a 0.5-µm grid. Unlike previously used unsupervised machine learning techniques such as principal component analysis (PCA), ICA is based on non-Gaussianity and statistical independence of data which is the case for mixture Raman spectra. Hence, ICA is a great candidate for assembling pseudocolour maps from the spectral hypercube of Raman spectra. Our experimental results revealed that ICA is capable of reconstructing false colour maps of Raman hyperspectral data of human cells, showing the nuclear region constituents as well as subcellular organelle in the cytoplasm and distribution of mitochondria in the perinuclear region. Minimum pre-processing requirements and label-free nature of the ICA method make it a great unmixed method for extraction of endmembers in Raman hyperspectral maps of living cells. Keywords: Raman spectroscopy · Independent component analysis · Machine learning · Hyperspectral unmixing · Single human cell analysis · Blind source separation

1 Introduction Raman microspectroscopic is becoming more and more popular as a medical imaging modality to study and analyze living human tissues. The main advantage of Raman imaging over other techniques is the high achievable spatial resolution of Raman microscopy, which is on the same order as visible microscopy [1, 2]. Furthermore, compared to other micro-spectroscopic techniques, Raman imaging does not require any external molecular labels or dyes [3, 4]. However, the differences in Raman spectra from different regions of a human cell are generally quite subtle and make it difficult to only interpret the differences by visualizing Raman spectra and their peaks. Recently, fully-automatic multivariate machine learning (ML) methods [5, 6] have been successfully employed © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 204–213, 2022. https://doi.org/10.1007/978-3-031-10467-1_12

Independent Component Analysis for Spectral Unmixing of Raman Microscopic

205

for reconstructing and clustering Raman images collected from individual human cells [7]. In fact, Raman micro-spectroscopy coupled with ML techniques can be considered a non-invasive, fast, and automatic imaging technique to utilize for future applications in living cells and in vitro. Multivariate ML methods use the entire spectrum collected at each x-y plane coordinate to reconstruct a false colour image of the cell under investigation. In other words, ML methods aim to reconstruct an image from a three-dimensional tensor of Raman spectra. Therefore, although univariate methods, including visualizing specific band intensity or intensity ratios, reveal some information on the distribution of biochemical constituents, they cannot rival multivariate techniques in analyzing Raman hyperspectral data sets. Application of several unsupervised ML methods has been investigated for extraction and clustering constituents of human cells, including agglomerative hierarchical clustering analysis (HCA) [1, 4, 8], k-means clustering analysis (KCA) [9], divisive correlation cluster analysis (DCCA) [7] and fuzzy c-means analysis (FCA) [7, 10]. Similarly, hyperspectral unmixing methods, such as principal component analysis (PCA) [7, 10], vertex component analysis (VCA) [11], and N-Finder [12], have been utilized to extract significant pure Raman spectra (also known as endmembers) from hyperspectral Raman data. Previous studies revealed that conventional clustering techniques such as KCA perform image reconstruction better than dimensionality reduction and unmixing methods such as PCA or VCA [4, 7]. When performing statistical analysis, one assumption is that there is a probability distribution that probabilities of random variables can be drawn from. For many applications, specifically natural phenomena, Gaussian (also called Normal) distribution is a sensible choice. However, there are many situations where the Gaussianity of data does not hold [13]. Electroencephalogram (EEG), electrical signals from different parts of the scalp, natural images, or human speech are all examples not normally distributed. Raman spectra are another example of signals with no Gaussian distribution [14]. For this reason, methods such as PCA that assumes data as both normal and uncorrelated are weak in extracting components of Raman mixture spectra [15]. Fortunately, unlike PCA, independent component analysis (ICA) is powerful in extracting individual signals from mixtures of signals based on two assumptions for data points, mutually statistically independent and non-Gaussianity [14, 16, 17]. In this study, we have demonstrated that the ICA algorithm, as a multivariate method applied to micro-spectroscopic Raman data, can distinguish the nucleus, nucleoli, cytoplasm, and mitochondria of intact fixed cells by their biochemical compositions.

2 Independent Component Analysis Independent component analysis (ICA) is one of the most powerful analytical techniques in blind source separation (BSS) [13, 18]. A famous application of ICA is the “cocktail party problem”, where the underlying human speech signals are separated from a sample data consisting of people talking simultaneously in a room [19]. ICA method has been developed to retrieve unknown pure underlying components from a set of linearly mixed signals. The feasibility of the ICA method for the extraction of pure signals from mixtures is based on two important assumptions [13]. First, the pure components are statistically

206

M. H. Mozaffari and L.-L. Tay

fully independent rather than just not correlated in PCA. It means that the variation of one pure signal has no influence on the variations of the other pure signals. For the case of Raman spectroscopy, this assumption is fair when pure spectra have no influence on the variation of other pure spectra in a mixture spectrum [14]. The second assumption is that ICA considers the distribution of data as non-Gaussian. In general, the main idea of ICA is to perform dimensionality reduction to map data in terms of independent components (ICs). A Raman spectrometer connected to a confocal microscope with a motorized stage or a scanning mirror acquires Raman microscopic images in the form of three-dimensional tensors XNx ×Ny ×Nν where Nx × Ny denotes the number of pixels in the x-y plane and Nν is the number of Raman shifts in each spectrum. To use matrix algebra algorithms, hyperspectral tensor data can be unfolded into a matrix format XNx Ny ×Nν with Nx Ny samples and Nν Raman shifts in each spectrum. Considering that the Raman hyperspectral data consists of spectra generated from a linearly combination of pure Raman spectra and additive noise, the ICA linear model can be written as the following expression: XNx Ny ×Nν = ANx Ny ×Ic SIc×Nν + ENx Ny ×Nν

(1)

where X is the matrix of the observed Raman spectra, A is the mixing or concentration matrix, S is the matrix of pure source Raman spectra that contains ICs, and E is the error matrix. From the main assumptions of ICA, columns of A:,Ic are independent, and components in S are mutually statistically independent. On the other hand, significant source spectra have a definite structure (spectra have non-overlapping peaks [14]), and so their intensity does not have a Gaussian distribution [20]. For a noise-free ICA model, the objective is to estimate an unmixing matrix W that, when applied to X , produces the estimated matrix U . Mathematically, W should be the inverse of A, and U should be equal to S: UIc×Nν = WIc×Nx Ny XNx Ny ×Nν = WIc×Nx Ny (ANx Ny ×Ic SIc×Nν ) = SIc×Nν

(2)

Then, the mixing matrix A can then be calculated as: −1 ANx Ny ×Ic = XNx Ny ×Nν SNTν ×Ic SIc×Nν SNTν ×Ic

(3)

Like other BBS techniques, the ICA solutions may present some drawbacks and ambiguities. Different from the PCA algorithm, since it is not possible to determine the variances of the ICs (because both S and A are unknown), their order and sign cannot be determined [21]. Therefore, the significant independent component can be any of the ICs with either positive or negative sign. Moreover, the choice of ICA algorithm and the number of ICs is also another challenge. Lots of algorithms have been proposed to perform ICA calculations, such as FastICA, JADE, and MF-ICA [21]. In this work, the FastICA algorithm (explain in [16]), which approximates the differential entropy (denoted as negentropy) to estimate W , was used. The FastICA is an iterative approach in which the non-Gaussianity of data is maximized by updating the difference entropy of W to minimize mutual information.

Independent Component Analysis for Spectral Unmixing of Raman Microscopic

207

In order to determine the optimum number of ICs, we utilized two methods proposed in the literature, the sum of the squared residues (SSR) and Durbin-Watson (DW) tests [22]. To calculate SSR test values, corresponding to each maximum number of ICs (ICmax = [1, 10]), one ICA model is generated. Using the reconstructed matrix X and original observed matrix X , values of SSR are calculated by the following expression: 2 Nx Ny Nν Xij − Xˆ ij (4) SSR =

i=1

j=1

The ICA model corresponding to a minimal SSR is the model with the optimal ICmax value. Similarly, the value of DW criterion is defined as: DW =

Nν

2 i=2 (s(i) − s(i − 1)) Nν 2 i=1 (s(i))

(5)

where s(i) is equal to the value of i-th Raman shift in the estimated matrix S. The value of the DW test method tends to zero when there is no noise in the signal, and tends towards two if the signal contains only noise. Figure 1 shows the values of the SSR test on FastICA models applied to our experimental data. The elbow point in the image can be clearly seen on ICmax = 5. Results of the same testing scenario with DW criterion are shown in Fig. 2. The optimum number of ICs from the figure is around five, which has consistency with the results of the SSR test method.

Fig. 1. The sum of squared residuals between the original and the reconstructed matrix X for FastICA models with increasing number of ICs.

208

M. H. Mozaffari and L.-L. Tay

Fig. 2. The DW plot: Durbin-Watson values for the residues after subtracting signals calculated using FastICA models with increasing number of ICs.

3 Results and Discussion 3.1 Raman Data Acquisition and Pre-processing It is vital to pre-process data by “whitening” strategy before applying the ICA algorithm [16, 21]. It means that the observed data are mapped linearly to a new space so that its components are uncorrelated and their variances equal unity. In other words, the covariance matrix of mapped observed data equals the identity matrix. Due to the prewhitening of observed data, the impact of the additive noise on extraction is severely mitigated [16]. Hence, ICA is extremely robust to additive noise in the mixed Raman spectra. Our experimental results indicate that other pre-processing enhancements such as denoising with low-pass filters and baseline correction have minimum impact on the results of ICA. For this reason, pre-processing was kept to a minimum with just whitening and normalization of data. 3.2 Experimental Results To identify subcellular features of HeLa cell, such as the nucleus, nucleoli, cytoplasm, and mitochondrial regions, ICA was performed in the whole spectral region. The major component in all these three regions is protein. From the previous section, the optimum number of clusters for ICA was determined to be five. We implemented our ICA model using the publicly available SciKit-Learn Python library on a 64-bit Windows PC with 20 CPU cores, 192GB of memory, and NVidia TITAN RTX GPU. The results of ICA in the form of five ICs are displayed in Fig. 3. It is noteworthy to mention again that unlike other components analysis techniques such as PCA, the output set of ICs is not in order of the most significant components as well as ICs sign is not defined. Therefore, negative versions of ICs are also shown, which might provide useful information. The image clearly presents different regions in the cytoplasm (red in Fig. 3.c), regions that correlate with a high concentration of mitochondria (yellow in Fig. 3.h), and the nuclear regions consisting of nucleus and nucleoli, respectively (yellow in Fig. 3.g and Fig. 3.d).

Independent Component Analysis for Spectral Unmixing of Raman Microscopic

209

Fig. 3. Independent Components (ICs) Generated by ICA Technique from Raman microscopy of HeLa cell. (A-E) Five positive ICs. (F-J) Corresponding negative of the five ICs. Each IC scaled to [0 = black, 255 = yellow] for the Sase of better visualization.

To better distinguish different nuclear and cytoplasmic regions, a thresholding process was performed on ICs. Figure 4 shows the white light image of the cell, an intensity plot of the 2935 cm−1 band as a univariate method, the concatenation of ICs after thresholding as well as two ICs related with a different colour map for better illustration of cell organelle. Figure 4.c shows clearly segmented cytoplasmic regions in green (also light green and blue in +IC2 on Fig. 4.d), high concentration of mitochondria in red (also yellow, orange, and red in + IC2 on Fig. 4.d), and nuclear regions in light and dark blue (also yellow, orange, and red in + IC3 on Fig. 4.e).

Fig. 4. (A) Visual photomicrograph of a HeLa cell. (B) Raman spectral image constructed from only Raman shift 2935 cm−1 which has the sharpest Raman Peak. (C) Segmentation map of the same cell using based on results of ICA. (D and E) + IC2 and + IC3 in different color map.

210

M. H. Mozaffari and L.-L. Tay

As the ICs are estimated from Raman spectra based on wavenumbers and their intensity variations, the average cluster spectra corresponding to each distinguished region contain valuable information of the underlying biochemical differences. Several hundred spectra contribute to each mean cluster spectra, resulting in a very good signalto-noise ratio for the mean cluster spectra, allowing interpretation of these spectra and the spectral differences corresponding to different regions in the cell. Mean spectra representing four clusters are shown in Fig. 5. ICA was performed in the spectral range between 1050 cm−1 and 1800 cm−1 region, which exhibits the most predominant protein peaks. Spectral information in this range contained sufficient spectral information to give the best clustering result. Furthermore, ICA was performed on the C-H stretching region between 2800 and 3400 cm−1 . Trace 1 and 2 represent the nuclear region consisting of nucleoli and nucleus, respectively (dark and light blue in Fig. 4.c), trace 3 the perinuclear regions contain regions with a high concentration of mitochondria (red in Fig. 4.c) and trace 4 the region containing cytoplasm (green in Fig. 4. c). Main regions related to the contribution of proteins can be seen in all traces with distinct bands at 1655 cm−1 (amide I vibration) and the extended amid III region between 1250 and 1350 cm−1 (Peptide backbone and coupled C-H, N-H deformation modes). Furthermore, distinct peaks, which can be related to deformations of saturated and unsaturated fatty acid side chains, appear between 1200 and 1350 cm−1 . In Raman bands between 2850 and 2900 cm−1 , spectral differences of the C-Hstretching regions (CH3, CH2, and CH), more related to the cytoplasm and the mitochondrial regions are distinguishable. In fact, the rise of intensities between 2850 and 2935 cm−1 is given by relatively long alkane chains of lipids. The spectral difference is better distinguishable for cytoplasm (trace 4) and perinuclear regions (trace 3) compared with the nucleus (trace 1 and 2). Antisymmetric methyl and methylene deformations, peptide side chains, and phospholipids contribute to all traces by a sharp peak in region between 1425 and 1475 cm−1 . A rising shoulder at region between 1050 and 1100 cm−1 is related to symmetric stretching mode of phosphate esters, DNA, RNA, and phospholipids. Fingerprints of other membranous cell organelles in perinuclear area could be weakly seen in trace 3, including endoplasmic reticulum with ribosomes, the Golgi apparatus, mitochondria, lysosomes, intracellular vesicles, cholesterol, phospholipids, and fatty acids. Due to the significant contribution of protein bands in all Raman spectra, subtracting cytoplasm and nucleus Raman spectra might provide a better visualization of other constituents. This difference spectrum is shown in trace 0 of Fig. 5. For instance, negative differences might be associated with the contribution DNA.

Independent Component Analysis for Spectral Unmixing of Raman Microscopic

211

Fig. 5. Average spectra from ICA representing nucleoli (Trace 1), nucleus (Trace 2), perinuclear area with high concentration of mitochondrial (Trace 3), cytoplasm (Trace 4), and difference spectrum between perinuclear areas and nucleus (Trace 0).

4 Conclusion Previously, it has been demonstrated that the Raman spectrum of a biological sample might be a linear combination of several pure Raman spectra when they are mutually statistically independent [14, 20]. Also, the statistical distribution of Raman spectra is less likely to be Gaussian. These two characteristics have a large correlation with the

212

M. H. Mozaffari and L.-L. Tay

two fundamental assumptions for the independent component analysis (ICA) technique. Therefore, ICA can be a great unmixing alternative to apply to hyperspectral Raman map to extract source Raman spectra (i.e. endmembers). We have demonstrated that Raman micro-spectroscopy, coupled to a multivariate unsupervised machine learning technique such as ICA, can distinguish the nucleus, nucleolus, cytoplasm, and mitochondria of human cell by their biochemical compositions. This novel idea provides a new tool for non-invasive, in vitro studies of cell biological aspects without external labels.

References 1. Matthaus, C., Boydston-White, S., Miljkovic, M., Romeo, M., Diem, M.: Raman and infrared microspectral imaging of mitotic cells. Appl. Spectrosc. 60(1), 1–8 (2006) 2. Mozaffari, M.H., Tay, L.-L.: A review of 1D convolutional neural networks toward unknown substance identification in portable raman spectrometer. arXiv preprint arXiv:2006.10575, https://arxiv.org/abs/2006.10575v1 (2020) 3. Matthaus, C., Chernenko, T., Newmark, J.A., Warner, C.M., Diem, M.: Label-free detection of mitochondrial distribution in cells by nonresonant Raman microspectroscopy. Biophys. J. 93(2), 668–673 (2007) 4. Miljkovic, M., Chernenko, T., Romeo, M.J., Bird, B., Matthaus, C., Diem, M.: Label-free imaging of human cells: algorithms for image reconstruction of Raman hyperspectral datasets. Analyst 135(8), 2002–2013 (2010) 5. Mozaffari, M.H., Tay, L.-L.: Convolutional Neural Networks for Raman spectral analysis of chemical mixtures. In: 2021 5th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI), pp. 1–6, IEEE (2021) 6. Mozaffari, M.H., Tay, L.-L.: Anomaly detection using 1D convolutional neural networks for surface enhanced raman scattering. Presented at the SPIE Future Sensing Technologies, (2020) 7. Hedegaard, M., Matthäus, C., Hassing, S., Krafft, C., Diem, M., Popp, J.: Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theoret. Chem. Acc. 130(4–6), 1249–1260 (2011) 8. Diem, M., Romeo, M., Boydston-White, S., Miljkovic, M., Matthaus, C.: A decade of vibrational micro-spectroscopy of human cells and tissue (1994–2004). Analyst 129(10), 880–885 (2004) 9. Hedegaard, M., Krafft, C., Ditzel, H.J., Johansen, L.E., Hassing, S., Popp, J.: Discriminating isogenic cancer cells and identifying altered unsaturated fatty acid content as associated with metastasis status, using k-means clustering and partial least squares-discriminant analysis of Raman maps. Anal. Chem. 82(7), 2797–2802 (2010) 10. Krafft, C., Diderhoshan, M.A., Recknagel, P., Miljkovic, M., Bauer, M., Popp, J.: Crisp and soft multivariate methods visualize individual cell nuclei in Raman images of liver tissue sections. Vib. Spectrosc. 55(1), 90–100 (2011) 11. Nascimento, J.M.P., Dias, J.M.B.: Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43(4), 898–910 (2005) 12. Descour, M.R., Winter, M.E., Shen, S.S.: N-FINDR: an algorithm for fast autonomous spectral end-member determination in hyperspectral data. Presented at the Imaging Spectrometry V (1999) 13. Hyvärinen, A.: Survey on independent component analysis. Neural Computing Surveys, vol. 2. (1999) 14. Vrabie, V., et al.: Independent component analysis of Raman spectra: application on paraffinembedded skin biopsies. Biomed. Signal Process. Control 2(1), 40–50 (2007)

Independent Component Analysis for Spectral Unmixing of Raman Microscopic

213

15. De Lathauwer, L., De Moor, B., Vandewalle, J.: An introduction to independent component analysis. J. Chemom. 14(3), 123–149 (2000) 16. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000) 17. Wang, G., Ding, Q., Hou, Z.: Independent component analysis and its applications in signal processing for analytical chemistry. TrAC Trends Anal. Chem. 27(4), 368–376 (2008) 18. Langlois, D., Chartier, S., Gosselin, D.: An introduction to independent component analysis: InfoMax and FastICA algorithms. Tutor. Quant. Methods Psychol. 6(1), 31–38 (2010) 19. Stone, J.V.: Independent component analysis: an introduction. Trends Cogn. Sci. 6(2), 59–64 (2002) 20. Boiret, M., Rutledge, D.N., Gorretta, N., Ginot, Y.M., Roger, J.M.: Application of independent component analysis on Raman images of a pharmaceutical drug product: pure spectra determination and spatial distribution of constituents. J. Pharm. Biomed. Anal. 90, 78–84 (2014) 21. Hyvarinen, A.: Independent component analysis: recent advances. Philos. Trans. A Math. Phys. Eng. Sci. 371(1984), 20110534 (2013) 22. Jouan-Rimbaud Bouveresse, D., Moya-González, A., Ammari, F., Rutledge, D.N.: Two novel methods for the determination of the number of components in independent components analysis models. Chemometr. Intell. Lab. Syst. 112, 24–32 (2012)

A Method of Hepatocytes Segmentation in Microscopic Images of Trypan Blue Stained Cellular Suspension Kuba Chroboci´ nski1(B) , Wojciech Witarski2 , and Katarzyna Pi´ orkowska2 1

2

Department of Biocybernetics and Biomedical Engineering, AGH University of Science and Technology, A. Mickiewicza 30 Avenue, 30–059 Cracow, Poland [email protected] Department of Animal Molecular Biology, National Research Institute of Animal Production, ul. Krakowska 1, 32-083 Balice, Poland {wojciech.witarski,katarzyna.piorkowska}@iz.edu.pl

Abstract. Automatic detection and count of cells in microscopic images of cellular suspension could provide a useful tool for scientists, that have to perform this task by hand. This paper presents an algorithm that can count and outline hepatocytes using a set of ﬁlters and texture analysis. The set of ﬁlters and algorithms that were used prior to morphological segmentation were in order: Enhancing local contrast using CLAHE algorithm, Morphological gradient, Anisotropic Diﬀusion, Sobel edge detector. After this set of ﬁlters, morphological segmentation is carried out which consists of: Morphological gradient, Imposing minima, Watershed ﬂooding. Further improving and developing of the algorithm is planned. Clustered cells need to be separated. When this goal is accomplished cells can be counted and an algorithm that is capable of classiﬁcation of cells as dead or alive will be developed.

Keywords: Texture analysis Hepatocyte

1

· Segmentation · Trypan blue ·

Introduction

Trypan blue staining is one of the most common techniques to determine cell viability [14]. It is used to stain exclusively dead cells because intact cell membranes of living cells stop this chemical from going inside the cell. When properly prepared, under light microscopy dead cells should be colored blue while live ones should be intact. Under a microscope, cells can be counted, and then you can calculate the percentage of dead vs live cells. This task can be time-consuming and is susceptible to human error. The objective of the research presented in this paper is to develop a method for the segmentation of liver cells from microscopic images of suspension of trypan blue stained cells. An algorithm that is capable of segmenting cells robustly c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 214–224, 2022. https://doi.org/10.1007/978-3-031-10467-1_13

A Method of Hepatocytes Segmentation

215

and accurately can help researchers that have to manually count cells to determine their mortality rate. The algorithm presented in this paper consists of two parts: preprocessing and segmentation itself. Preprocessing part consists of a set of ﬁlters and algorithms that enhance the image and provide a good baseline for segmentation, which is carried out using Morphological Segmentation, which consists of Morphological Gradient, Imposing minima, and Watershed ﬂooding. The proposed method is based on an observation that hepatocytes exhibit visible texture, which is highlighted using Morphological Gradient. Application of this procedure to image returns image wherein areas texture is present are highlighted, while areas of even illumination are suppressed. This operation can be helpful to distinguish between liver cells and other cells like erythrocytes. 1.1

Related Works

There are developed approaches to cells segmentation that are used to count them. In [1] presented method is based on thresholding and morphological operations. Images used in the development and testing of this algorithm were microscopic pictures of human breast cancer cells, stained using trypan blue. Image transformed to HSV colorspace is subjected to contrast enhancement via brightness adjustment. Segmentation itself is carried out using Otsu thresholding. The resulting mask is subjected to morphological operations, opening, and dilation to eliminate small bright objects and connect gaps in bright objects. To resulting binary image labeling technique was applied to count the cells. Accuracy of >85% and correlation coeﬃcient of the counted dead cells compared to experts count was 0.74 and for alive cells was 0.99. Article [16] presents an approach to blood cell counting using Hugh transform. Microscopic images of blood cells are transformed into HSV colorspace and the Saturation channel is extracted. The resulting image is then subjected to a segmentation procedure based on histogram thresholding and morphological operations. Two thresholds are calculated: upper and lower. Those values are used to obtain two binary images, which then are subjected to morphological operations to ﬁll small holes. Logical operation XOR is then applied to those two images. The result of this operation is subjected to Circular Hugh transform. In [3] approach to segmentation of cells from microscopic images is presented that was developed to assist in research on photodynamic therapy. The developed method consisted of four steps. First was ﬁltration using a 3 × 3 spacial adaptive ﬁlter in order to reduce the impact of noise on segmentation. Next location of pixels with the lowest value in 5×5 neighborhood. Pixels chosen were also checked if their gradient along every direction is above a certain threshold. The third step is the ﬂooding process. Using previously established minima, the ﬂooding process is carried out using a FIFO data structure. The last step is post-processing. Regions centers are calculated and if they are not contained in the region, the region is discarded. Regions too big to be cells are also discarded. Comparison of this algorithm with two human counters showed an error of algorithm within ±7%.

216

K. Chroboci´ nski et al.

Article [10] describes the method for cell segmentation using Statistical Dominance Algorithm (SDA). This algorithm for every pixel determines if this pixel dominates over neighboring pixels. Input parameters describe how big is neighborhood using the radius given by the user. The value of H-minima, which is also given by the user can be used to establish an additional threshold for the pixel to be determined as dominant. If a pixel is labeled as dominant it is a part of the object that the user is looking for. This algorithm when applied to diﬀerent images produced good results that could be used in the segmentation of cells from diﬀerent tissues and diﬀerent staining methods. 1.2

Contents of the Paper

Firstly in Materials and methods section used data is described and presented, next used algorithm is explained. Algorithm description is divided into two parts: Preprocessing and Segmentation. Last part in Materials and methods section is regarding metric used in evaluation of tested algorithm. Next section presents results that have been archieved. Dice coeﬃcient and pictures with overlayed outlines are displayed. Discussion of results is also present in mentioned section. Last section presents Conclusions.

2 2.1

Materials and Methods Used Data

The data set used consisted of seventeen microscope images of trypan blue stained hepatocytes in water suspension and corresponding hand-made masks.

Fig. 1. Example 1

A Method of Hepatocytes Segmentation

217

Uneven lighting, debris, lines from hemocytometer, and clustered cells provide additional challenges but can be expected in images of this nature. Below are four examples: Fig. 1, 2, 3, 4 that showcase above mentioned ﬂaws.

Fig. 2. Example 2

Fig. 3. Example 3

218

K. Chroboci´ nski et al.

Fig. 4. Example 4

2.2

Proposed Algorithm

The presented algorithm was developed and tested in Fiji [13] which is a distribution of ImageJ. Algorithms included in MorphoLibJ [7] plugin for ImageJ were used. All of the images were previously normalized using the algorithm presented in [12]. Preprocessing. Preprocessing of images consisted of: 1. 2. 3. 4. 5. 6.

Contrast Limited Adaptive Histogram Equalization (CLAHE) [11]. Wavelet-FFT Stripe Filter [9]. Morphological gradient. Anisotropic Diﬀusion [15]. Grayscale Attribute Filtering. Sobel Edge Detector [5].

After those operations results of Grayscale Attribute Filtering and Edge Detection were subtracted from each other. The resulting image was ﬁltered with a median ﬁlter. CLAHE algorithm enhances image contrast using small regions instead of the entire image. Wavelet-FFT Stripe Filter is used in order to suppress long lines from the hemocytometer. The rest of the ﬁlters are used mostly to enhance edges and ﬁlter objects like erythrocytes.

A Method of Hepatocytes Segmentation

219

Segmentation. Morphological segmentation [8] consists of three steps: 1. Morphological gradient, 2. Imposing of the minima, 3. Wateshed ﬂooding. The morphological gradient is deﬁned as the diﬀerence between dilation and erosion of a given image. In an image calculated this way intensity of pixel indicates the contrast intensity of its neighborhood. In segmentation, it provides “valleys” for watersheds to ﬂood. Imposing of the minima ensures that those “valleys” are as “deep” as possible. Metric. Dice Coeﬃcient [4] is chosen as a metric for evaluation of segmentation. It is deﬁned as: Dice = – – – –

3

2T P , 2T P + F N + F P

(1)

TP - True Positive, TN - True Negative, FP - False Positive, FN - False Negative,

Results

Below showcased is a table (Table 1) with the Dice Coeﬃcient score for every image used and mean and standard deviation. Below the table, ﬁve examples (Fig. 5, 6, 7, 8, 9) show some of the shortcomings of the algorithm. The mean Dice Coeﬃcient of 0.703540 is a promising result and a standard deviation of 0.082217 shows consistency of results. Three of the images show a Dice score below the 0.6 mark. Those images were predicted to give some challenges to the algorithm. Cells on those images are clustered or damaged. Hemocytometer lines are noticed on images number 2 and 17. For image number 2 Dice score is 0.767479 and for 17 0.555211.

220

K. Chroboci´ nski et al.

Table 1. Dice coeﬃcient Id of image Dice coef. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

0.637407 0.767479 0.542838 0.739959 0.707085 0.697259 0.644568 0.599803 0.792181 0.753236 0.691903 0.701589 0.859074 0.746502 0.769790 0.752306 0.555211

Mean Std

0.703540 0.082217

Fig. 5. Result example 1

A Method of Hepatocytes Segmentation

Fig. 6. Result example 2

Fig. 7. Result example 3

221

222

K. Chroboci´ nski et al.

Fig. 8. Result example 4

Fig. 9. Result example 5

3.1

Discussion

Results obtained are promising but the algorithm struggles with certain situations. Example 1 shows that some debris or destroyed cell has been picked up and outlined. In example 2 most of the cells have been outlined, but still, small debris was picked up.

A Method of Hepatocytes Segmentation

223

The same situation is seen in example 3, but in this case, in three instances two cells are outlined as one. A similar situation regarding counting two cells as one is seen in example 4. In the case of example 4, you can notice that cells picked up by trypan blue staining that are further in the background are not outlined. That is due to a lack of visible texture. In the last example, you can notice few cells have not been picked up by the algorithm, but lines from the hemocytometer are not highlighted. Also, the poor quality of this picture can have an impact on a result. Stripe ﬁlter can be used to suppress lines from hemocytometer if they appear on picture and is not impacting the quality of segmentation in any signiﬁcant way. In the case of cells that are fused segmentation to individual cells can not be performed successfully. It was noticed that cells that were out of focus on images were not picked up by the algorithm presented in this article.

4

Conclusions

The current results of the algorithm are promising, but further research is needed to acquire more satisfactory quality of segmentation. Low standard deviation of the Dice Coeﬃcient is satisfying due to the consistency of segmentation quality in diﬀerent images. Method described in this paper can be used to assist in manual segmentation in order to acquire data for training of deep learning methods and other approaches that depend on training of the model. Use of circular transform [2] is considered to enhance discrimination of random debris. Conversion of image to Grayscale using diﬀerent conversion equations is considered to extract most useful information for this purpose [6]. Acknowledgment. This study was supported by statutory activity of National Research Institute of Animal Production no. 01-18-05-21 and The National Centre for Research and Development in Poland no. BIOSTRATEG2/297267/14/NCBR/2016.

References 1. Aung, S., Kanokwiroon, K., Phairatana, T., Chatpun, S.: Live and dead cells counting from microscopic trypan blue staining images using thresholding and morphological operation techniques. Int. J. Electr. Comput. Eng. (IJECE) 9, 2460 (2019) 2. Bala, J., Dwornik, M., Franczyk, A.: Automatic subsidence troughs detection in SAR interferograms using circlet transform. Sensors 21(5), 1706 (2021) 3. Chen, Y., Biddell, K., Sun, A., Relue, P.A., Johnson, J.D.: An automatic cell counting method for optical images. In: Proceedings of the First Joint BMES/EMBS Conference. 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society (Cat. N), vol. 2, p. 819 (1999) 4. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

224

K. Chroboci´ nski et al.

5. Kanopoulos, N., Vasanthavada, N., Baker, R.L.: Design of an image edge detection ﬁlter using the Sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988) 6. Kolodziejczyk, A., Ladniak, M., Piorkowski, A.: Constructing software for analysis of neuron, glial and endothelial cell numbers and density in histological Nisslstained rodent brain tissue. J. Med. Inform. Technol. 23, 77–86 (2014) 7. Legland, D., Arganda-Carreras, I., Andrey, P.: MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics 32(22), 3532– 3534 (2016) 8. Meyer, F., Beucher, S.: Morphological segmentation. J. Vis. Commun. Image Represent. 1(1), 21–46 (1990) 9. M¨ unch, B., Trtik, P., Marone, F., Stampanoni, M.: Stripe and ring artifact removal with combined wavelet – Fourier ﬁltering. Opt. Express 17(10), 8567–8591 (2009) 10. Nurzynska, K., Mikhalkin, A., Piorkowski, A.: CAS: cell annotation softwareresearch on neuronal tissue has never been so transparent. Neuroinformatics 15(4), 365–382 (2017) 11. Pisano, E.D., et al.: Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms. J. Digit. imaging 11(4), 193 (1998) 12. Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001) 13. Schindelin, J., et al.: Fiji: an open-source platform for biological-image analysis. Nat. Methods 9(7), 676–682 (2012) 14. Stoddart, M.J.: Cell Viability Assays: Introduction, pp. 1–6. Humana Press, Totowa (2011) 15. Tschumperle, D., Deriche, R.: Vector-valued image regularization with PDEs: a common framework for diﬀerent applications. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 506–517 (2005) 16. Venkatalakshmi, B., Thilagavathi, K.: Automatic red blood cell counting using Hough transform. In: 2013 IEEE Conference on Information Communication Technologies, pp. 267–271 (2013)

Using Technology to Create Personalised Environments for Dementia Care: Results of an Empathy Map Study Ronny Broekx1(B) , J. Artur Serrano2,4 , Ileana Ciobanu3 , Alina Iliescu3 , Andreea Marin3 , and Mihai Berteanu3 1 Innovation Department, ePoint, Hamont, Limbourg, Belgium

[email protected], [email protected]

2 Department of Neuromedicine and Movement Science, Faculty of Medicine and Health

Sciences, NTNU/Norwegian University of Science and Technology, Trondheim, Norway [email protected] 3 University of Medicine and Pharmacy Carol Davila, ELIAS University Hospital, Bucharest, Romania 4 Norwegian Centre for eHealth Research, University Hospital of North Norway, Tromsø, Norway

Abstract. Empathy maps are currently used to help developers understand their users’ needs and aims. We present the results of an empathy map-based design process, in terms of user satisfaction with user experience. The goal of the project was to develop a new technology for a reminiscence therapy-based multimodal intervention aimed to improve the reality orientation and the behaviour of older people with major neurocognitive disorders. Keywords: Neurocognitive disorder · Virtual environments · Human computer interaction · Interpersonal relationships

1 Premises While living longer is a triumph of modern society, it comes with a price. On individual basis, we face long term health conditions and disabilities [1]. While ageing, we enter a vicious circle of frailty and functional decline [2]. The world of the older individual shrinks, in accord. When physical functioning decline is accompanied also by cognitive decline, the situation becomes even worse, as the person with neurocognitive disorder disconnects slowly from the reality around. Activities of daily living and communication with peers are qualitatively and quantitatively disturbed, distress and depression deepen and behavior becomes challenging [3]. The vicious circle continues, the impairments and the disabilities progress, restraining the activity and the participation of the individual even more [4]. For the family and friends, this brings the feeling of losing the beloved one through a long process of decomposition, while the person with neurocognitive disorder suffers © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 225–237, 2022. https://doi.org/10.1007/978-3-031-10467-1_14

226

R. Broekx et al.

the progressive decline of all the cognitive domains (executive functioning, orientation, memory, language and communication). For the society, this means increased costs of care. For professional caregivers, it means burden and frustration. And the number of people with neurocognitive disorders is increasing, as our population ages [5]. There is no cure for major neurocognitive disorders, but there are different therapy approaches, including psychosocial interventions, proved to improve the quality of life of the beneficiaries. Multicomponent exercise-based interventions with sufficient intensity proved beneficial in terms of improving the global functioning of people with major neurocognitive disorders [6]. Nonpharmaceutical multimodal interventions aimed at improving the reality orientation of people with neurocognitive disorders seem promising, as they result in improving behavior, promoting everyday functioning, activity and communication, effects which can indirectly reduce the progression of symptoms [7]. Designing technology for such an approach is challenging but can be an eye-opener and a source of contentment for developers, as well as for the users. The main challenge comes from the fact the intended users are not always able to define their own needs and requirements, an indirect approach is needed, the involvement of family, friends (informal caregivers) as well as that of healthcare professionals (formal caregivers) and experts (geriatricians, clinical psychologists, rehabilitation medicine specialists) in design process being mandatory. The process of developing and validating such technologies requires the dedicated work of multidisciplinary teams including specialists from medical as well as from technological domains.

2 Objective The main objective of this paper is to present the rationale used in designing a technical solution able to support a reminiscence therapy-based multimodal intervention for people with neurocognitive disorders. User satisfaction with the experience provided through interaction with the new SENSE-GARDEN technology is also tackled in this paper.

3 Method Aggregate empathy maps take into consideration the input from the family and professional caregivers regarding the behavior of the persons with major neurocognitive disorders. The primary users taken into consideration are the participants in the study regarding the impact of 20 session program of interventions in SENSE-GARDEN on the quality of life of people with a major neurocognitive disorder, a study conducted during 2019–2020 in the 4 testing sites in SENSE-GARDEN project. The solution was validated in a pilot study [8] (clinical results are to be published elsewhere). The aggregated empathy maps were created before and after the intensive program of interventions in SENSE-GARDEN. The intervention of 40–60 min consisted in the following components: decompression (to adapt user to the space of SENSEGARDEN and to the immersive experience), Reality Wall (immersive non-interactive projection of images and movies presenting landscapes and nature, with relaxing music and soundscapes), Memory Lane (interactive experience of emotional recall based on

Using Technology to Create Personalised Environments for Dementia Care

227

one’s life story photos, with music from the historic period considered), Move to improve (exergaming and cognitive training-interactive experience in a non-concurrent manner), Life Road (space and time orientation training associated with physical activity based on procedural memory, biking while a scenery is developing in front of the user), decompression (prepare the better reality-oriented user to the activities of daily living outside the SENSE-GARDEN). The sessions were personalized in terms of triggers, intensity, duration and were conducted by professional caregivers (clinical psychologist, trained nurse or occupational therapist), the interaction during the workflow was directed to improve communication, to use the emotional memory anchors to reconnect the person with neurocognitive disorder with self, with his/her own history, with present caregivers, and to improve cognitive functioning, motivation and mood. Family members were encouraged to participate. The procedure helped improve feelings of respect and belonging between users, and helped the professional caregivers bounding and find out important details regarding the preferences and life history of the persons they take care of, with benefits for daily care. A pyramid of the hierarchy of needs in technology design and Empathy mapping were used to design the technical solution and the intervention. Empathy mapping was used also as an assessment tool to highlight the part of the impact of the intervention program based on SENSE-GARDEN, which is not quantifiable through usual assessment tools. Empathy mapping can be used to provide a comprehensive image of the functioning of persons with disabilities, with the direct involvement of these people, in the effort to empower co-creation of assistive and therapy technologies with increased usability [9]. We used the method to provide insight regarding the psycho-social impact of the intervention conducted using SENSE-GARDEN with persons with a major neurocognitive disorder, in terms of the changes noticed by the professional and family caregivers in the behavior of the primary users of SENSE-GARDEN after the program of interventions. User satisfaction with SENSE-GARDEN experience is presented in narrative manner and in quotes: Primary users–old people with neurocognitive disorders; secondary users– family and professional caregivers sharing the experience with the primary users).

4 SENSE-GARDEN Concept SENSE-GARDEN projects aimed at creating garden-like virtual spaces in which several experiences will be provided to people with major neurocognitive disorders). The goal of the intervention is to improve the quality of life of the person with neurocognitive disorder directly, by improving the emotional condition and reality orientation, and indirectly, by improving the knowledge of the caregivers regarding the real feelings and needs of the person they care for, as well as by improving the relationship and communication between the caregivers (family and professional caregivers as well) and the older person with a neurocognitive disorder, in need of care.

228

R. Broekx et al.

4.1 Design Hierarchy of Needs for SENSE-GARDEN As with Maslow’s hierarchy of needs in regards to technology design, called the Design hierarchy of needs, the basic level is represented by the functionality of the new technology. The next levels are represented by the reliability and the usability of the product (the last including proficiency). The highest level of the pyramid is represented by the empathic design (creativity) [10]. This relates to all the human–technology interfacing aspects. It seems to be essential for a design to meet the lowest need on the pyramid before progressing to meet further needs, but things are quite different, instead. All aspects must progress together and all must respond to a defined need of the future user, tailoring the product to respect specific requirements. If this hierarchy is followed and respected and tested by the empathy map, we see it reflects the real needs of the users and the people surrounding them with love and care. Using these two principles we created a SENSE-GARDEN design which is not only functional and usable for the person with dementia but also a game changer for the formal caregiver as she can do her job better, communicate better with the people in care and the SENSE-GARDEN is a helpful tool, as well, improving function and activity level of the person with neurocognitive disorder. For the informal caregiver SENSE-GARDEN is a beautiful and immersive experience that reconnects them through memories and emotions with their old age beloved. The multidisciplinary team of SENSE-GARDEN analyzed the user’s profile (in terms of sensory, cognitive and functional limitations but also remaining abilities like the procedural memory and the ability to learn) and users’ needs (belonging, motivation, emotional connection, space and time orientation, mood, executive functions, memory and attention training, self-esteem, sense of usefulness, along with balance and improved general physical condition). In terms of functionality, the new technology should provide to the possibility to enable, command and control triggers of reminiscence therapy through multisensory stimulation, adjusted in terms of information and emotional content, complexity, intensity and duration in a personalized manner, to provide the optimal triggers for emotional reminiscence. The conclusion of Cochrane systematic reviews on the subject is that reminiscence therapy can improve quality of life, cognition, communication and possibly mood in people with major neurocognitive disorders, the effect varying in accord with the manner the intervention is conducted and the environment of choice [11]. Technologies of different degrees of complexity are used to enable reminiscence therapy interventions [12] in safe and constructive ways [13]. The participation of peers and/or family in the reminiscence therapy sessions seems to be the most beneficial for all parts, so the intervention must be designed to address this situation [14]. Then the design has to be reliable, it must get you there without any flaws or stops because of technical error, when it does that, it does not add a lot of value because this is what it is meant to do. Next it should be usable so it has to be simple to operate like the home button everyone knows how to deal with, but it is not the real USP (Uniques Selling Proposition) of the smartphone, people just expect this solution as being basic, nice to have but not exceptional in their experience of this phone. Proficiency is a game-changer

Using Technology to Create Personalised Environments for Dementia Care

229

if you can do the same amount of work faster or in a more efficient way your design will be more desired as a high-level value and fulfill user needs more effectively. The final step in the design pyramid is to create beautiful products not only in appearance but also in form of interaction and function. This will make the product desirable and will create the real USP (Unique Selling Proposition) for the product. How do we translate these needs into experiences provided through technology? SENSE-GARDEN is a virtual garden so the basic bird sound that surrounds you is there from the moment you are entering the room. It also closes the silent gaps between music and film files as well when watching the memory lane photo album which gives you a kind of atmosphere that you are walking through the milestones of your life. The accompanying scent will help one also to meet this basic requirement. The SENSE-GARDEN experience has to be reliable; it may not stop suddenly and switch to a whole new with the right technology to save the cosines of the place and not break the atmosphere. Even a small hesitation or a film that will not start can disturb the experience so the used NFC technology has to be reliable and the experience flow has to be fluent and not intrusive to the person with dementia, even if the experience itself must be immersive. A sign outside of the SENSE-GARDEN indicating that a session is ongoing helps to save the cosines of the room and the overall experience. These are all very basic things which will just create an atmosphere, nothing special at all. Most media devices can provide the required parameters of images, movies (wide digital screens and projections), soundscapes, music (surround systems), to address correctly sight and hearing senses. Scent dispensers and fans may also be used, to address the sense of smell. Procedural memory can be addressed by making a repetitive activity with hands or legs [15]. The team decided to use biking (on a room pedalier), which also helps as a physical activity, which is recognized as beneficial for the physical condition as well as for the cognitive functioning of the persons with neurocognitive disorders [16]. To make better use of the enhanced awareness created by the reminiscence experience, the team designed also an app for exergaming, for balance, upper limb training, and cognitive stimulation, too, with the aim of improving the cognitive functioning of the primary user [17]. To manage all the stimuli provided to the user by creating a meaningful flow of experiences and to avoid overstimulation as well, a dedicated software was designed. The software allows the generation of individual profiles for primary users, loading in these profiles personal reminiscence triggers from one’s life history and preferences, the creation of personalized workflows and offers the possibility for the secondary user (caregiver) to control the flow of the session and to help the system prepare for future sessions, by offering feedback regarding the emotional reaction and the benefit of the trigger applied. The environment was designed in four different manners in the four test sites, with respect to minimal space requirements for physical and sensorial training and with finishing designed to maximize the experience provided by the media devices (white projection walls, adjusted room acoustics and balance between natural lighting and artificial lighting, with mood lights added, adjusted to the personal preferences and reactivity).

230

R. Broekx et al.

The system was designed initially for care centers’ use. A simplified, minimalist version was developed by the team for home use, also, to respond to the requirements of aging in place and of the social distancing requested to reduce the spreading of the new coronavirus. 4.2 SENSE-GARDEN usability–the ace up the Sleeve The higher usability of the SENSE-GARDEN is provided with the help of the NFC technology a novelty but not a real USP, it just happens to be an understandable interfacing model. You enter the place, the Sense Garden will be adjusting the color lights, the chosen scent and background sound, the media screens also will be loaded with all the chosen content gathered by the intake conversation and a dedicated life-story kind of questionnaire and make every screen ready to be triggered once you use the neutral rating/play card, once you swipe this neutral play card, the content will be flowing into the SENSE-GARDEN, a very slow fade-in of a natural environment will be filling the screen, it opens the primary user for a conversation about their past life guided by the indications the caregiver got from the questionnaire. If they want to change content having another scenery or a presentation of colorful butterflies the caregiver or family member swipes the thumbs up/down or neutral/play card to show the system how they reacted on the past content and the system will record this in the history workflow of the user and presents the next file. Once they want to change screens they just go to the other screen and present the neutral/play card again, a rating sign will appear if they liked the previous experience, once rated by the thumbs up/down or neutral card the system will present the new content. Memory Lane Example For example, the memory lane experience with the family photos. Once a photo is discussed the caregiver can rate the discussion and goes on with the next photo. The rating cards will be used as the “next button” in the experience. If they want to start to cycle they just leave the Memory Lane and swipe the neutral/play card against the new experience NFC scanner. So, this describes the flow of content and how the content will be rated by the family members or caregivers. If they want to stop the SENSE-GARDEN experience they just use the stop card which will be the fourth card to deal with. So only four cards will be used to perform simple usability of the SENSE-GARDEN experience. Thumbs up-thumbs down-neutral/play card and stop card which will stop the session and will log you out. If the system is left idle for more than 15 min the system automatically will go in a generic demo mode so also family members or visitors without being registered or having a personal key tag can experience a full SENSE-GARDEN session with all of the generic content. SENSE-GARDEN is not only an immersive experience but also a beautiful welldesigned nature environment that recalls memories and helps caregivers to unify the spirit of sense of home in the institutional care homes. The whole design of the sense space is matched by digitecture and artificial low maintenance gardening objects. The throne which is standing in the middle of the room creates respect for the person whose memories are presented in this cozy and safe place. The Sense Garden room design is four by four meters which helps to be close together and which is spacious enough to be used by wheelchairs or walking aids.

Using Technology to Create Personalised Environments for Dementia Care

231

Sense Garden brings people back to themselves and to nature and uses this basic relaxing content to show the inner soul to people who are joining them in search for themselves and creating a sense of home in institutional environments, taking out the stress and burden of such a clinical facility. The SENSE-GARDEN team took all these needs into account and translated them to an empathy map to describe the user behavior before and after the customer journey the user is getting through in SENSE-GARDEN.

5 Empathy Mapping for SENSE-GARDEN The empathy map externalizes knowledge about users in order to create a shared understanding of user needs, and aid in decision making. Traditional empathy maps are individual and are split into 4 quadrants (what the user Says, Thinks, Does, and Feels), with the user or persona in the middle. Empathy maps provide a glance into who a user is as a whole, containing direct input and observational results and are not chronological or sequential. As UX designers we need to advocate on behalf of the user, we must understand the needs and wishes of our users and prioritize them. Empathy maps are a powerful tool to accomplish this task. Empathy maps can also be prepared as an aggregate, when an empathy map is a collaborative visualization used to articulate what we know about a particular type of user. We decided to go even further and try to use empathy mapping as an assessment tool regarding the impact of a technology-based intervention program, impact not quantifiable with standard assessment tools used nor in clinical nor in social research. 5.1 Initial Empathy Map Quadrants This initial empathy map was created by the experts in rehabilitation medicine, clinical psychology and care, from clinical reasoning and their own experience in od people care. The participants enrolled in the study as primary users–older persons with a major neurocognitive disorder (Clinical dementia rating 2 and 3) presented at least in part the features described in this initial Empathy Map. The following represent the behavior of the primary users of SENSE-GARDEN before SENSE-GARDEN, in their daily routine, at home or in a care home. The Says quadrant indicates reduced capacity and reduced will of communication and sometimes the use of inappropriate verbal, paraverbal and nonverbal content of communication, along with exacerbated reaction to stimuli from the environment. The Thoughts quadrant indicates altered attention, perception, reasoning, disturbed judgement, impaired short and long-term memory, disturbed space, time and reality orientation. The Does quadrant presents disturbed behavior, from hoarding and cluttering to wandering or space gazing, with altered sleep cycle and lack of interest and initiative and capacity to focus on a certain activity, to perform it in an appropriate manner or to finalise it in time or otherwise. Sometimes repetitive mannerism covers anxiety and the impossibility to express frustration, desperation or anger. The frail physical and psychological condition of the user makes physical activity level and the performance

232

R. Broekx et al.

in the activities of daily living more and more difficult to perform. Social participation becomes a long-lost dream and the individual is isolated. The Feels quadrant is difficult to fill with direct observation results, as feelings are less and less visible, as the person disconnects from the present and from self. Low self-esteem, the feeling of uselessness, of emptiness and abandonment sometimes are expressed verbally but manifest anger, lack of compliance with care and therapy procedures are disturbing and challenging behaviours increasing the burden and the perceived lack of relevance and sense of the caregivers. 5.2 User Satisfaction with SENSE-GARDEN – the Empathy Map of the First Sessions This was the basic feedback required to improve and adapt technology and intervention in SENSE-GARDEN, for a better experience, smooth flow and improved impact. A dedicated session, carried out during a project consortium in-person meeting, collected opinions on the current version of the Sense Garden in operation and receiving users with neurocognitive disorders in the four countries of the test sites. The Empathy Map session included members of the four test sites; Norway, Portugal, Belgium, Romania, with four background areas: clinical, technological, business and managerial. The Says quadrant contains what the user says out loud in an interview or some other usability study. Ideally, it contains verbatim and direct quotes from research. After first sessions in SENSE-GARDEN, the family caregivers expressed their thoughts and feeling regarding their beloved: “It was my mother like she was five years ago” “It was a rebirth of my father” “Felt sixty years younger” “Feeling so emotional”, “We must tell everyone the great work we are doing”, “I saw how she enjoys it and that gives me a good feeling”, “I saw the sparkle back into those gazing eyes”. Another person was telling us she saw her mother like she was before this terrible decaying disease. The SENSE GARDEN is adding value to the visits in the care homes and brings the sparkle back in the eyes of the person with dementia and generates hope and understanding of how these people will be taken care of. So, SENSE-GARDEN is a kind of connection-space for the person with dementia to the family members but also the formal and informal caregivers which makes it easier to cope with disease but also the demands and wishes of the person with dementia and their loved ones: “It’s a great place to know my patient”, “I felt 60 years younger” Family Caregiver (Quotes and defining words). This is a “feel good” room (Feelings and emotions). “It is important that we create a space for the patient to feel empowered” (Thoughts and beliefs), “It is important to see the person, not the patient” (Formal caregiver quotes and defining words), “I’m feeling so relaxed and calm here!” (old person), “This space is going to help me to work better with my patient.” From the qualitative research we gathered what occupies the users’ thoughts quadrant but may not be willing to vocalize. Through the attitude and micro facial expressions of the primary and secondary users, as well as from the peace and sometimes joy in their voices, we know that the place feels safe and cosy. It is a good room. It is surprisingly a completely different experience than usual care, or even from the experience of traditional cognitive stimulation sessions. Informal caregivers are thinking this place is going to help their family members. They find a solution in the SENSE-GARDEN to enhance

Using Technology to Create Personalised Environments for Dementia Care

233

their visits with the person with dementia. The sound is very warm and pleasant. The artificial decoration is ok and feels like a garden when entering the room. When entering with a wheelchair there are still some things to improve, like the height of the memory lane table it should be accessible for a person with a wheelchair. The mood lights are always green but wouldn’t it be nice if they adapted to the shown content, like green with a forest, blue with water and orange when playing music. The background bird singing sound does not annoy me as it creates an idea of being outside. The thought of the Formal caregiver is that this space is going to help her get along with their patients. Formal caregiver thinks that she really believes that the SENSE-GARDEN room should be a very simple and clean space which can be built with digitecture and could be remembered as your special space. The Does quadrant encloses the actions the user takes. They hesitate when entering the SENSE-GARDEN as it is something they have never seen before. You see them adapting the space and looking to all the items that are displayed. They see their name on the screen and are a bit surprised and delighted at the same time that they are welcomed and feel important again, this way just by swiping their tag ID before the NFC reader. They experience the armchair and know exactly what the sitting space will be for the person with dementia and which seat they have to take in this safe cosy place. They look at the sign that this session will be starting and know that nobody will suddenly enter the space without a warning. They are pointing to pictures and encouraging the person with dementia to remember, this is wrong you may help but not interrogate the person with dementia as it is important to build self-esteem, not to break it. Participant users feel so relaxed and calm by these stimuli. They are getting close to the person with dementia as they want to enjoy this experience in close contact with him or her. They hold hands and are playful, they stroke and pat their backs just to show they like it. Some people like it so much they will sing along and some even start to dance. The SENSE-GARDEN is empowering the person with dementia. The informal caregiver believes this place is going to help their mom or dad. It’s important to know the person, not the patient says the formal caregiver. This all happened on all four sites, with all the users. The emotional state of the Feels quadrant is clearly the one that is the most proficient of the Sense Garden design. Caregivers are asking how they felt and many loved ones and family members all replying with the same sentence: “they felt very emotional about this experience and tears came into their eyes watching what still is there in the mind of the person with dementia, they want to hold that thought because it is reducing the worries they had and gives them peace of mind dealing with the person with dementia. Feelings of family connections and nostalgia. Share room feelings. We sometimes see conflicts when investigating these complex thoughts and see juxtapositions between quadrants, some inconsistencies like a positive reaction in one quadrant and a negative one in the other only helps us understand the worries or burdens of our users and we used this knowledge to improve the personalised experience offered in SENSE-GARDEN. We also have distilled and investigated the inconsistencies in the Sense Garden design and resolved the conflicts like the memory lane is now accessible also for wheelchair users. Or the sound level can be enhanced for persons with hearing aids. For the Life Road, we saw that using a fitness bike before a google street-view experience was a good thing for persons with an early and even intermediate phase of dementia but was no use for

234

R. Broekx et al.

persons with more progressive phase of dementia. We saw that virtual reality could not be understood and many streets are changed so much that they do not recognize their own street which really was alienating them from their perceptive world. So, we changed it by treadmill generic video’s so they could enjoy the experience as the video fluently is progressing and the theme is always some nature or forest walk which does not restrain their memory or cognitive function. Their initial idea was to let them enjoy cycling not to question what they saw or remembered. For the music files we don’t use moving video clips as we want to let them focus on the music piece, we do that by presenting a record nail moving on a vinyl plate or showing them a cover album which they recognize and are informed about what they hear. It also works the other way around, when we use old generic pictures, we accompanied them with a very generic music file to let them enjoy how the old days were and let them discuss and tell about their own old memories about their past lives. Sometimes we take paintings and slowly present them to the rhythm of the music so they start humming with it and really enjoy the rich colours and scenery of the paintings. The general thoughts about film clips are surprisingly not so great as we observed. It asks for reactions that mainly come from family members but it rarely creates a discussion or comments from the person with dementia. We think the sequence of movie scenery is too complex for the person with dementia and is posing too many questions at the same time so they are quiet and do not comment during this Sense Garden experience. However, showing some movie content really creates an atmosphere with the family members and this is reflecting on the person with dementia so that they do enjoy it because the other one enjoys it. The SENSE-GARDEN customer journey starts with the life-based questionnaire, aimed at gathering the appropriate triggers, and ends in the decompression room so after the experience, which can be very demanding in an emotional way for the primary user but even more emotional for the family members and loved ones, they enjoy a fresh cup of tea or coffee and finish it with a bar of favorite chocolate or fresh piece of cake. This decompression is very useful to gather extra info of the way family members deal with the condition of the primary user but also let them speak from their heart how they enjoyed the experience and let them tell how they saw the older person again as a loved person still worthy to be explored with the help of the SENSE-GARDEN technology. 5.3 The Final Empathy Map The aftermath Empathy Map of the user of SENSE-GARDEN after the intervention program shows us the same old frail primary users, but with a renewed presence. The SENSE-GARDEN is not about showing some meaningless content that will snooze users in a comfortable position. The SENSE-GARDEN is a collection of memories fine-tuned by family members with the help of caregivers who really want to trigger the emotional memory of the person with neurocognitive disorder, so the interaction is meaningful and without any pressure. It is like a Spa or wellness center for a weary and tired mind, every piece of the experience is connected with each other like the scent, soft music, perfect microclimate, sounds of water and that’s why the overall is so great. The direct observation during the SENSE-GARDEN sessions and the feedback given by the family caregivers provided researchers with important insight regarding the impact

Using Technology to Create Personalised Environments for Dementia Care

235

of the intervention program, meaning 40–60 min per day, 18–20 sessions of something different than usual care, offered in an intensive manner (almost daily consecutive) or extensive manner (2 sessions per week). Says Quadrant of the empathy map show improved communication capacity (more words, longer phrases, improved paraverbal and nonverbal content, more comfort in communicating with caregivers and family members). Thoughts Quadrant is enriched. More memories are found and recalled, interactively, the primary users involve themselves more in present activities and use strategies to cover gaps in their own cognitive functioning and even find the power to help others. Does Quadrant reveals better orientation to reality around, increased engagement in activities of daily living and improved levels of social participation. Improved capacity to focus on required tasks and even long and complex activities was observed, along with improved capacity to understand and process working tasks in other interventions and activities. Even if the user leaves the activity, he/she comes back at some point and resumes the activity to finalize it, even if later. Sometimes the users initiate activities, initiate and conduct meaningful communication and even offer help and practical advices to other persons for their own tasks. Feels Quadrant shows improved mood, improved emotional reactivity and connectivity with the family members and with the caregivers (formal and nonformal caregivers, as well). They feel again in the “hive”, along the family or the crew of the care home, they express fewer challenging behaviours and more compliance, they engage full-heartedly in family discussions and problems as well as in games. We can see that with just 40–60 min of personalized non-pharmacological multimodal intervention based on reminiscence therapy emotional triggers, if the technology used enables a smooth continuous cognitive flow, we obtain improvements in all the areas covered by the empathy map. We improve what the user feels, thinks, says and does, by enabling memory, executive functions, attention, perception, orientation and communication training and we enhance bounding with family again and with the professional caregivers. This improves users’ feelings of usefulness, their sense of coherence and their ability to engage in activities of daily living and to participate in social activities as well (for CDR2 users). Users’ behaviour related to usual care activities improves, the caregivers are less stressed and can provide better and family members are empowered to help their beloved have a better quality of life.

6 Conclusion “Emotions reconnect us” is the slogan of SENSE-GARDEN. This is so true because we did not only see this with informal caregivers, family and friends but even with professional caregivers who had to take a moment for themselves to process what happened in the SENSE-GARDEN session. This was not a sole case, it happened over and over again. So, what will be the game changer of the SENSE-GARDEN design, it will be the proficiency, the WHY of using the Sense Garden. The case is we want to change the experience of the person with dementia and give them a sense of home by presenting to them old memories which will connect them to their past but also to their family

236

R. Broekx et al.

members again, and with caregivers, in the present time, through emotions. This is possible if we use the right metaphors for each user and we provide sessions organised as continuous and empathetic workflows with welcoming and meaningful experiences. Even if the clinical impact of the intervention program is not big enough to be expressed in terms of changes of minimal clinical importance, these changes are consistent and the un-quantifiable changes we see in the daily life of our users is even more important than a set of scores. This kind of proficiency is proven by all the test persons who visited the Sense Garden in four different countries. A tool able to integrate a manyfold experience in a personalised cognitive flow and a well-designed and conducted intervention program. They were showing the same effect everywhere. Persons with dementia felt attached to this cosy and safe place, family members gave very valued testimonials like” it was a rebirth of my father” or “we saw our mother really liked it and that’s why we were so impressed by this experience” or the caregiver telling to the family members that they also knew the memory will not stick but the feeling and joy will last when leaving the SENSE-GARDEN experience. Empathy Maps can provide researchers a tool to assess the impact of a nonpharmacological multimodal intervention aiming to improve the delicate balance of the quality of life of old people with neurocognitive disorders and their relationships with caregivers. Acknowledgments. This work was performed in the frame of the EU projects SENSE-GARDEN (AAL/Call2016/054-b/2017) and SENSE-GARDEN Home (AAL-SCP-2020-7-270-SGH-1). We thank to the entire SENSE-GARDEN team and to all the participants in our testing sessions and studies.

References 1. Harper, S.: Living longer within ageing societies. J. Popul. Age. 12(2), 133–136 (2019). https://doi.org/10.1007/s12062-019-09248-4 2. Belikov, A.V.: Age-related diseases as vicious cycles. Ageing Res. Rev. 49, 11–26 (2018). doi:https://doi.org/10.1016/j.arr.2018.11.002 3. Dindelegan, C.M., Faur, D., Purza, L., Bumbu, A., Sabau, M.: Distress in neurocognitive disorders due to Alzheimer’s disease and stroke. Exp. Ther. Med. 20, 2501–2509 (2020) 4. Claudio, A., et al.: Position statement of the Brazilian society of sports medicine and Brazilian society of geriatrics and gerontology: physical activity and health in the elderly. Revista Brasileira de Medicina do Esporte 6(2), (2000) 5. Eurostat.: Ageing Europe-statistics on population developments. https://ec.europa.eu/eur ostat/statistics-explained/index.php?title=Ageing_Europe_-_statistics_on_population_dev elopments#Older_people_.E2.80.94_population_overview 6. McDermott, O., et al.: Psychosocial interventions for people with dementia: a synthesis of systematic reviews. Aging Ment. Health 23(4), 393–403 (2019) 7. Kurz, A.: Psychosoziale interventionen bei demenz [Psychosocial interventions in dementia]. Nervenarzt 84(1), 93–103; quiz 104-5 (2013) 8. Goodall, G., et al.: The use of virtual and immersive technology in creating personalized multisensory spaces for people living with dementia (SENSE-GARDEN): protocol for a multisite before-after trial. JMIR Res. Protoc. 8(9), e14096 (2019)

Using Technology to Create Personalised Environments for Dementia Care

237

9. Tochetto, J., Guimarães, C., Maranho, A.L., Tartari, A.L.: Design with me: i have special needs! the case for cerebral palsy. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Methods, Techniques, and Best Practices. UAHCI 2016. Lecture Notes in Computer Science, vol 9737, pp. 214–222. Springer, Cham (2016) 10. Mitchel, E.: Design Defined: What Does “Hierarchy of Needs” Mean To Product Designers? (2019) https://www.bresslergroup.com/blog/design-defined-hierarchy-of-needs-product-des ign-principles/ 11. Woods, B., O’Philbin, L., Farrell, E.M., Spector, A.E., Orrell, M.: Reminiscence therapy for dementia. Cochrane Database Syst. Rev. (3), CD001120 (2018) 12. Goodall, G., Taraldsen, K., Serrano, J.A.: The use of technology in creating individualized, meaningful activities for people living with dementia: a systematic review. Dementia 20(4), 1442–1469 (2021) 13. Ciobanu, I., et al.: Safety aspects in developing new technologies for reminiscence therapy: insights from the SENSE-GARDEN Project. RJGG, 8(2) (2018) 14. Macleod, F., Storey, L., Rushe, T., McLaughlin, K.: Towards an increased understanding of reminiscence therapy for people with dementia: a narrative analysis. Dementia 20(4), 1375–1407 (2021) 15. Kawai, H., et al.: Longitudinal study of procedural memory in patients with Alzheimer-type dementia. No To Shinkei 54(4), 307–311 (2002) 16. Jia, R.X., Liang, J.H., Xu, Y., Wang, Y.Q.: Effects of physical activity and exercise on the cognitive function of patients with Alzheimer disease: a meta-analysis. BMC Geriatr. 19(1), 181 (2019) 17. Woods, B., Aguirre, E., Spector, A.E., Orrell, M.: Cognitive stimulation to improve cognitive functioning in people with dementia. Cochrane Database Syst. Rev. 15(2), CD005562 (2012)

Learning Analytics for Knowledge Creation and Inventing in K-12: A Systematic Review Mikko-Ville Apiola1(B) , Soﬁa Lipponen2 , Aino Seitamaa2 , Tiina Korhonen2 , and Kai Hakkarainen2 1

Department of Computing, University of Turku, Turku, Finland [email protected] 2 Department of Educational Sciences, University of Helsinki, Helsinki, Finland {sofia.lipponen,aino.seitamaa,tiina.korhonen,kai.hakkarainen}@helsinki.fi Abstract. This paper presents our systematic review of empirical learning analytic studies carried out at K-12 education with a speciﬁc focus on pedagogically innovative (constructive) approaches on technologymediated learning, such as knowledge building, knowledge creation, and maker-centered learning and maker culture. After reading abstracts of identiﬁed 236 articles, we zoomed in on 22 articles. We identiﬁed three categories of studies: 1) articles oriented toward methodology development, 2) articles relying on digital tools (learning environments with LA functions) and 3) articles investigating the impact of LA. Keywords: Learning analytics Constructing

1

· K-12 · Inventing · Creating ·

Introduction

The purpose of the present investigation is to carry out a systematic review of research and development of learning analytics in the context of K-12 education. The study is speciﬁcally focused on analytics regarding learning processes and practices that require personal and collaborative inquiry, creation and building of knowledge, and making of artifacts. Such constructionist pedagogies [28] appear critical for educational transformations that societal changes call for. Productive participation in rapidly changing innovation-driven knowledge society requires that K-12 students start practicing personal and social creative competences, including solving of complex open-ended problems, working creatively with knowledge, designing and making artefacts, cultivating inventive capacity; entrepreneurial skills and risk-taking adaptability; and skills related to eﬀective teamwork and sharing of knowledge. Such epistemic-social skills are necessary for young citizens so that they may adapt to, thrive in and contribute to the rapid societal changes. The targeted innovative practices of learning and teaching may be fostered by engaging students in innovative practices of using various sociodigital technologies [12], i.e., the recently emerged integrated system of mobile devices, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 238–257, 2022. https://doi.org/10.1007/978-3-031-10467-1_15

Learning Analytics for Knowledge Creation and Inventing in K-12

239

social media, and the internet and especially computer-supported collaborative learning (CSCL) environments [47]. Sophisticated technology support enables teachers and researchers to implement complex personal and collaborative study projects at schools that either build on speciﬁcally designed learning environments or utilize diverse sociodigital technologies and tools. Digitalization has resulted to increased use of heterogeneous learning platforms and learning management systems and other educational technologies in schools across the recent decade [34]. From mere increased use of digital technologies at school, it does not, however, follow that practices of using technologies would have improved quality of learning or promoted progressive transformations in teaching and learning. Because sociodigital technologies are, further, rapidly transforming and remediating learning activities with and beyond schools, the ﬁeld is, moreover, turbulent in terms of potentially usable tools and technologies of learning constantly changing; this makes analytic eﬀorts especially challenging. The above considerations highlight the importance of learning analytics research that is conducted in the context of constructivist pedagogies [7] informed by cross-disciplinary learning sciences. In such modern learning scenarios, students’ cultivate their epistemic agency and design thinking by proactively engaging in their own learning processes, setting goals, choosing problems to work on, making choices and monitoring progress [7]. In this paper, we report our systematic literature review (SLR), in which we review the state-of-the art in learning analytics on innovation and knowledge creation pedagogies in the K-12 context. We review articles from the recent decade and investigate the pedagogical approaches, educational technologies, data used, analytics methods, and most important ﬁndings.

2 2.1

Background Constructivist Pedagogies

Behaviorism and constructivism are dominant theoretical conceptualizations of learning [10]. In behaviorism, learning is conceptualised as reactions to conditions in the learning environment, as opposed to taking an active role in meaning making and exploring the environment [10]. According to behaviorism, teaching should be based on observation, control, instructional cues, practice, and reinforcement [10]. Such philosophy appears to have shaped instructivist [36] approaches of externally controlling students’ activity and transmitting relevant information to them in carefully designed sequences. In constructivism, the goal of teaching is not to ensure knowledge of particular facts but rather the elaboration and interpretation of information. Teaching is seen as providing learners with means to create understanding, explore complex topics, and learn to think like experts in given domains do [10]. Further, Papert’s constructionism [27,28] emphasizes importance of engaging students in learning through designing and making tangible artifacts; eﬀorts of bringing maker culture to schools highlight educational importance of his ideas. While instructivist and constructionist pedagogies have been around for decades, and not without heated debates between

240

M.-V. Apiola et al.

proponents of each view (e.g. [13,17]), recent turbulent times appear to call for sophisticated constructivist pedagogies, fueled by the radical societal changes that such megatrends as computerization and digitalisation are bringing about. 2.2

Collaborative Learning

Sociodigital technologies provide a wide variety of tools and methods for collaborative practices of learning. Consequently, collaborative learning in general and CSCL in particular have grown into academic research tracks of their own. Collaboration in learning has attracted extensive research, discussions, and debates (e.g. [2,18]). There are a high number of complex elements in play in successful pursuit of collaborative learning, such as focusing on attaining shared objects of learning and mutually regulating the process. Further, it has been suggested that perhaps the essence of collaboration comes from “extra” activities required in collaboration, such as explanation, disagreement and mutual regulation, which triggers cognitive mechanisms such as knowledge elicitation, and internalisation, which may not be triggered to the same extent in individual learning settings [9]. The research on collaborative learning then is precisely about such mechanisms and activities [9], and may ask: “What are the mechanisms that may or may not foster or distract learning in collaborative versus individual settings?” It has been argued that in researching collaboration, it is crucial to zoom in to the interactions in order to understand collaboration [9]. In regards of quantifying and measuring collaboration patterns in face-to-face interactions, extensive research has been conducted with the use of sociometric badges, with a number of important ﬁndings about dynamics in small collaborative groups [30,31,50]. Other common approaches include social network analysis, which can be used to trace and visualize networking interaction taking place in collaborative teamwork (e.g. [21]). Research and development of social learning analytics, i.e., analytics for tracing learners’ interaction within teams, communities, and network appear, indeed, very important [45]. 2.3

Knowledge-Creating Pedagogies

Although collaborative learning is an important avenue of advancing progressive pedagogy, sociodigital technologies can, further, be employed to empower learners to create and build knowledge and artifacts [33] in accordance with sophisticated constructivist approaches and the best practices of collaborative learning. The importance of integrated education, and skills in critical thinking, communication, teamwork and computational thinking (CT) [2] are highlighted by many (e.g. [35,46]). In this regard, the most promising pedagogic approaches appear to be a) inquiry-based science education, b) knowledge building [38], c) learning through collaborative design [2] and d) maker-centered learning [28]. In nonlinear pedagogies students engage in pursuing of real-world problems, openended learning challenges, collaborative eﬀorts of seeking of solutions, improvising for overcoming obstacles encountered, and creating knowledge and artefacts across the process. Such practices correspond those of creative knowledge work

Learning Analytics for Knowledge Creation and Inventing in K-12

241

from science, engineering, and design, and, therefore, are in accordance with the next-generation practices of education. 2.4

Learning Analytics

Making sense of rich contextual data by using modern methods of data analysis has become its own academic discipline, which is commonly referred to as learning analytics or educational data mining [1,26,34,40,54]. In recent years, instruments and methods of learning analytics have become more sophisticated and the amount of published articles, literature reviews, books, and arranged events in learning analytics and educational data mining has expanded [34]. Yet, a major part of learning analytics research investigates learning in the context of higher education, and lot of research is still concentrated e.g. on predictive modeling of student performance or drop-out behaviours [22], while learning analytics in K-12 education and informal learning are signiﬁcantly less researched domains [1,26,34,40,54]. In order to improve quality of learning analytics, it appears critical to integrate the development of the ﬁeld with theoretical, methodological and empirical research on the learning sciences. Learning analytics is too often rooted in the dominant acquisition-oriented, reproductive and teacher-led practices of the formal education systems, in terms of, for instance, using drill-and-practice types of tasks for assessment of predeﬁned learning objectives of acquiring speciﬁc curricular knowledge and skills [7]. Such procedures do not, however, capture adequately learning that may be seen as a complex (messy) and systemic multilevel process that beyond individual takes place in teams, classroom, schools and networks, involve interacting epistemic (knowledge), social (peer interaction and collaboration), and material (tools; artifacts produced) and ontological (emotion, identity) factors [25].

3

Research Design

In this SLR we investigated the current state-of-the-art in learning analytics for invention and knowledge creation in the K-12 context. This SLR is a part of our larger research project, where we investigate and develop modern learning analytics methods. The research question for our SLR was set as follows: – What are the educational technologies, analytics methodologies and key ﬁndings in learning analytics research that targets pedagogies of invention and knowledge-creation in the K-12 domain? 3.1

Search Strategy

In order to build our search, we derived key terms arising from the research question, identiﬁed possible replacement terms, constructed a search string, which used common boolean operators, and selected the databases for our search. Few pilot searches were conducted, and the search string was then reﬁned based on

242

M.-V. Apiola et al.

search results. Our search string was constructed on the following basis: (1) The article is about learning analytics, (2) The pedagogy must be constructive, (3) The article is in the K-12 context, (4) The article must be published no earlier than in 2010. The following search string was amended to two major search engines: Scopus1 and Web of Science2 . ( TITLE-ABS-KEY (( ”learning analytics” OR ”educational data mining” OR ”academic analytics” )) AND TITLE-ABS-KEY (( ”knowledge creation” OR ”social networks” OR ”maker” OR ”epistemic” OR ”social learning analytics” OR ”choice-based” OR ”21st century skills” OR ”collaborative learning” OR ”inquiry learning” OR ”STEAM” OR ”project-based learning” OR ”knowledge building” OR ”making” OR creati* OR maker* OR disruptive* OR innovat* )) AND NOT TITLE-ABS-KEY ( ”higher education” ) ) AND TITLE-ABS-KEY ( ( school* ) OR ( ”secondary education” ) OR ( ”primary education” ) OR ( ”elementary education” ) OR ( secondary-age* ) OR ( primary-age* ) OR ( elementary-age* ) OR ( k-12 ) OR ( p-12 ) OR ( 7-12 ) OR ( k-6 ) OR ( p-6 ) OR ( 7-10 ) OR ( k12 ) OR ( k6 ) OR ( p12 ) OR ( p6 ) OR ( you-th ) OR ( teen* ) OR ( adolescen* ) OR ( child* ) OR ( tween ) ) AND PUBYEAR > 2009 AND ( LIMIT-TO ( DOCTYPE, ”ar” ) OR LIMIT-TO ( DOCTYPE, ”ch” ) OR LIMIT-TO (DOCTYPE, ”cp”))

The search returned some 172 articles (Web of Science), and 142 articles (Scopus). After removing duplicates, some 236 articles were left in the result set. 3.2

Study Selection for Full Read

The SLR was managed with a spreadsheet software. The details of the articles were imported from the search engines in comma-separated text-ﬁles. A column was added for each of the exclusion criteria. Two independent reviewers then read the abstracts of the 236 articles, and classiﬁed the articles in regards of the inclusion and exclusion criteria (see Table 1). Thus, for each article, each reviewer ended up in a verdict of inclusion or exclusion. Then the two independent reviewers’ classiﬁcations were combined. In regards of 182 out of 236 articles (77.12%) the two reviewers fully agreed to either include or exclude the article. For the remaining 54 articles, a discussion was arranged, where both reviewers’ choice for including or excluding was re-evaluated. In borderline cases, the articles were included for full read. Table 1. Inclusion criteria (All must be true for inclusion) Article reports research at school (K-12 level) Article reports learning analytics of innovation processes Article must be a research article

A total of 71 articles passes the initial screening and were subjected for a full read. The papers were then assessed for inclusion by having researchers read the full texts and determine, whether: 1. The pedagogy reported in the article is innovative. Our interest is in articles that deal with teaching that is based on knowledge building, innovation, maker projects, inventing, or following the scientiﬁc process. 1 2

http://www.scopus.com/. http://www.webofknowledge.com/.

Learning Analytics for Knowledge Creation and Inventing in K-12

243

2. The article is a research article, which clearly reports its research questions, data, and research methods. 3. The article is a full peer-reviewed publication, published in an academic journal, in conference proceedings or as a book chapter. The article should not be a review article, a poster, or a work-in-progress. Each article was independently read by two researchers, who made a suggestion. A number of discussion sessions were arranged in regards to those articles, where the reviewers disagreed with each other. In the discussions, reviewers justiﬁed their choices and debates were held before reaching a consensus about inclusion or exclusion. Resulting from this process, a total of 22 articles were selected for inclusion in the article.

4

Descriptives: Final Set

Our ﬁnal set included a total of 22 articles. All articles were in the K-12 domain and reported research done within a range of disciplines taught from mathematics, to liberal arts, computational thinking, languages, science, and critical reading to history. A range of educational technologies were used from educational technologies that support web-based dialogue to robotics kits for constructing robots and related block-based programming, and various simulation environments for learning science and dashboards for visualising learning progressions. The pedagogies ranged from collaborative dialogue-based teaching to knowledge building in blended environments, constructionism, and mathematically-geared explorations. An assortment of analytics approaches were taken, spanning the repertoire of qualitative, quantitative, and computational methods. Table 2 summarises the selected articles. Most of the included research articles were published in a journal (12) or a conference (8), while two were published as book chapters (2). In regards to geography, the researches were conducted in America (10), Europe (7), Asia (4), and Middle-East (1). A majority (8) of the included articles took place in the secondary school (grades 7–9, age 13–15). Four articles took place in primary school (grade 1–6) and four in high school. Another four articles had a variety of age groups ranging between 10–18 years. Finally, two articles had teachers or teacher students as participants (see Table 2). 4.1

Disciplinary Focus

Table 2 shows the overview of disciplines taught in the articles. The category of science included approaches to teach chemistry [20], scientiﬁc inquiry skills [11], water and water cycle [16], combined science and computational thinking [4], and ecosystems with scientiﬁc inquiry [32]. The category of computing included approaches to teach block-based programming [5,15], visual programming with R Mindstorms [37], programming in Scratch [6], Go-Lab based Inquiry learnLego ing [23] and Arduino-based maker-activities [8]. The category of mathematics

244

M.-V. Apiola et al. Table 2. Overview of Included Articles Edtech & pedagogy

Collab

LA Approach

LA Method

Science

[4]* [20] [11] [16] [32]

ABM (NetLogo) ABM (NetLogo) Inq-ITS & EcoLife Knowledge Forum ABM (EcoXPT, VR)

Ind Ind Ind F2F,Vrt Vrt

Learning gains & CT Inquiry-in Action Inquiry Competence Identify crisscrossing topics Track learning trajectory in VR

ENA, ML, EXP HS DM, QNT HS ML SS NLP PS MVAR SS

Grade Cl. 1 1 1 1 1

Computing

[4]* [15] [37] [6] [8] [5] [23]

— Scratch 3 (CT) R Lego Mindstorms Scratch & FUN Talkoo,Arduino Scratch Go-Lab

Ind F2F Ind F2F F2F Ind

Predict learning achievement Achievement and pathways Programming behaviour Collaboration Analysis AI-based pairing of students Visualize concept knowledge

MVAR MVAR MVAR QLT,QNT,ML ML ML

SS,HS SS,HS SS,HS PS SS SS

1 1 1 1 1,3 2

Math

[39] [53] [56] [55]

Saglet,VMT,GeoGebra Math tutor, dashboard Knowledge forum LUNA dashboard

Vrt Vrt F2F,Vrt Vrt

Analysis of teacher action Teachers pedagogical decisions Emotion analysis Teacher decisions and learning

QLT EXP QLT,QNT QLT,QNT

PS TEA PS PS,SS

3 3 1 3

Humanities

[19] [24] [14] [52] [49]

WiREAD Wiki, inquiry learning WiREAD, Dashboards VCRI inquiry env. WiREAD

Vrt Vrt Vrt Vrt Vrt

Social network metrics Collaboration analysis Visualise collaboration Analyse teacher actions Evaluation of tool, feedback

MVAR MVAR QNT QLT QNT

SS SS SS SS SS

1 2 2 3 3

Misc

[48]* CoVAA dashboards [51]* VCRI environment

Vrt Vrt

Evaluate impact of LA Simulated evaluation

QLT, QNT EXP

NA TEA

3 3

*=article in two or more categories; PS=primary school, SS=secondary school, HS= high school, TEA=teachers, NA=not accessible/F2F=face to face, Vrt=virtually, Ind=individual /QNT=quantitative, QLT=qualitative, MVAR=multivariate statistics, EXP=experimental pre/post, ML=machine learning, NLP=natural language processing, ENA=epistemic network analysis / Class 1=methodology development, 2=toolbased analytics, 3=evaluative research (see Sect. 5.2).

included approaches to teach geometry [39], fraction assignments [53], life stages of butterﬂies [56], and solving equations [55]. In regards to humanities, the articles included approaches to teach english language [19], liberal studies [24], critical reading [14,49], and history [52]. In the misc category, we included articles that spanned humanities and science [51], and social studies, geography, and science [48]. The misc category represents integrated Science, Technology, Engineering, Arts, Mathematics (STEAM) projects.

5

Results

5.1

Educational Technology and Pedagogy

Science, Modelling, Inquiry. In the category of science, our review covered a number of educational technologies, which focused mostly on following a scientiﬁc process by using various inquiry-environments or agent-based modelling. In one case, computational thinking (CT) activities were integrated into a 10-day high school biology unit [4], where students learned by using NetLogo3 3

https://ccl.northwestern.edu/netlogo/.

Learning Analytics for Knowledge Creation and Inventing in K-12

245

in agent-based modelling activities. Another NetLogo-based case was taught in chemistry [20], where the environment provide several forms of guidance, assistance, assessment, while logging students’ actions and responses in activities [20]. In agent-based modelling, the idea is to provide mathematically-geared explorations instead of learning by using equations (algorithmic approach), which the authors argue may lead to lacks in conceptual understanding [20]. Other environments to teach scientiﬁc inquiry included e.g. Inq-ITS [11], which uses microworlds, which are used to represent real-world phenomena in an environment where students can change and inspect their properties, conduct inquiries and simulations, make hypotheses, collect data, run trials, analyse data, and communicate ﬁndings. Students worked in an environment called EcoLife, which represents an aquatic ecosystem [11]. Another used educational technology was Knowledge Forum, which was used, in one case, for collaborative knowledge building on the topics “water and water cycle”, and “rocks and minerals” [16]. Pupils worked collaboratively, raising issues of interest, deepening ideas and explorations through ongoing questioning, research, face-to-face talks (KB talks), and work in KF to generate theories, questions, alternative explanations, and record new information. In KB talks, students sit in a circle, discuss their ideas, and introduce new ideas and concepts [16]. In this case, students worked and built their knowledge on “crisscrossing” topics—topics that extend their knowledge beyond curriculum guidelines. In one case, a virtual environment was used for learning scientiﬁc inquiry by making hypotheses and conducting experiments in an environment called EcoXPT [32]. Computing, Constructionism, CT and Maker. In regards to computing, we included projects where students built tangible artefacts, such as robots or other types of construction kits, and conducted related maker-activities and programming. The educational technologies range from robotics construction kits to visual programming, block-programming and scratch-based computational thinking platforms. The pedagogies in this category lean towards constructionism [29], open-ended problem solving, and small-group collaborative learning R Mindstorms, [29]. In one case, introductory robotics was taught with Lego where the tasks included e.g. using sensors to control robot vehicles [37]. In another case, students in small groups were set to build a prototype of an interactive toy using an Arduino-based kit with pluggable sensor and actuator modules, and students were also provided with craft materials, such as coloured paper, paper cups, wooden sticks, glitter, and glue [8]. In other cases, block-based programming [15] and scratch-programming activities were conducted [5,6], and in another case Go-Lab online laboratory learning environment was used as part of an inquiry-based learning scenario, in which students were provided an overview of their conceptual knowledge in the form of a concept cloud [23]. Math Environments. A variety of technologies were included in this category. In learning geometry, technologies such as SAGLET (System for Advancing Group Learning in Educational Technologies), VMT (Virtual Math Teams) and

246

M.-V. Apiola et al.

GeoGebra were used [39]. In this pedagogy, pupils sat in the classroom and were organised in small groups of 2 or 3 pupils. The groups did not sit near to each other but were in connection only virtually while working on geometric tasks in groups. The teacher observed the process of the teams virtually, and received alerts from the SAGLET-system of so-called critical moments, which required the teacher to intervene [39]. In one case, knowledge-building pedagogies were used to teach topics in mathematics with learning objectives covering exploration of scientiﬁc topics, e.g. the life stages of butterﬂies, or the mathematical topic of shapes. Students explored the deﬁnition of a shape, shape design, and knowledge sharing [56]. In one case, a math tutor and teacher dashboards were piloted, and teachers’ use was analysed [53], while another case taught solving of equations with a LUNA visualisation tool, while analysing teachers’ pedagogical decisions with dashboards [55]. Humanities: Knowledge Forum, WiREAD and Wiki. In one case [19], English language was taught with WiREAD technology to foster dialogic learning. Students engaged in web-based collaborative critical reading tasks. Pedagogical scaﬀolds in the WiREAD environment comprised of several “critical lens” and “critical talk types” for students to choose from, such as “What is the text/author telling the reader?”, “What presuppositions does the author make?” Students read texts uploaded by teachers and engaged in collaborative dialogue. WiREAD was also used in cases to teach English language [14], and critical reading [49]. In one case [24], as a part of their liberal studies assignments, pupils used Wiki-based technology in a semester-long collaborative group inquiry project. Pupils used a software called Wikiglass, which visualises usage statistics of Wiki, including students’ participation and contributions (revision counts, word counts, higher order thinking in sentences). So, for example, students could see visualisations of their own and their fellow pupils’ amount of contributions in Wiki [24]. In one case [52], history was taught in a CSCL environment, where students worked on open-ended assignments in a software called VCRI, which supports inquiry tasks and student projects with various tools such as Chat, Sources, Debate, Cowriter. The students read and discussed, put arguments in a diagram, and collaboratively wrote essays [52]. STEAM Integrated. We had three articles, which reported teaching that has aspects from two or more categories, so they represent articles of STEAMintegration. In [4] computational thinking with biology was taught with Netlogobased agent-modelling. In another case [48] CoVAA dashboards were used in a case that focused on evaluating the impact of the tool to teachers’ actions in teaching, which spanned social studies, geography, and science [48]. In one case [51], humanities and science were taught in a simulated VCRI environment. 5.2

Three Classes of Learning Analytics

A variety of learning analytics approaches were taken in the articles. Those approaches spanned qualitative, quantitative and computational methods. We

Learning Analytics for Knowledge Creation and Inventing in K-12

247

identiﬁed three categories of analytics approaches. First, we labeled a number of articles as methodology development. In these articles, researchers collected data from teaching experiments, and analysed that data with a variety of methods in order to bring insights of how to analyse data. In most of these cases, the analytics was conducted post-teaching, so the analytics had no impact into the teaching experiment. Second, several articles were tool-based, which means that some analytics methods were implemented into the learning experiment, so analysis data was fed directly back into the teaching and learning environment. Third, there were articles, which reported research that was speciﬁcally targeted for evaluating the impact of learning analytics tools in a teaching and learning situation. These three approaches are summarised in Table 3. Methodology Development (Class 1). In the context of science, a case [4] analysed development of students’ CT-STEM knowledge in Agent Based Modelling in NetLogo by identifying discourse elements in data by using human coders as well as an automatic coding scheme. This was followed by modelling and visualising discourse elements and their connections with Epistemic Network Analysis (ENA) [43]. Diﬀerences were observed between the epistemic networks of students with negative and positive learning gains [4]. NetLogo was also used in another case, where students’ actions and responses to questions were logged and analysed for detection of knowledge-in-action [20]. In the context of scientiﬁc inquiry, system log data were subjected to learning analytics [11]. By using RapidMiner 6.3 and the J48 decision tree algorithm, success in conducting controlled experiments was classiﬁed [11]. In another case [16], Knowledge Forum notes were subjected to learning analytics by investigating whether students “crisscrossed” science topics, which was seen as indication of interdisciplinary thinking, and productive knowledge work [16]. In another approach to analytics, log data of activities in exploration, collecting, analyzing, experimenting and hypothesizing were collected in a VR environment EcoXPT, and were subjected to multivariate statistical analyses, [32], which revealed that end point in the open-ended journey was not that meaningful as compared to the various possible paths of learning [32]. In other cases, multivariate analytics was applied to log data generated by block-based programming in order to identify students in need of assistance [15], analysis of log ﬁles generated by Lego Mindstorms EV3 with multivariate statistics, and e.g. indicators of “early achievement” were calculated, and several approaches in program construction activities were identiﬁed [37]. In similar lines, program state data saved by Scratch was subjected to learning analytics in one case, with results showing some diﬀering approaches in programming behaviour [6]. The case of [8] developed various advanced measures to analyse collaboration in Arduino-based maker activities, while one case [19] developed measures of collaboration in Knowledge Forum-based activities. These cases are further discussed in Sect. 5.3. In the case of [5], AI-based recommendation for pairing students was conducted and evaluated. In the case of [56], emotion analysis was conducted in the context of knowledge building, with data collected from video

248

M.-V. Apiola et al. Table 3. Three classes of approaches to learning analytics

(1) Tailored analysis of a speciﬁc case in education. There is typically no particular learning analytics tool or the learners or teachers do not use or get results from learning analytics. The research uses a mixture of research methods, which may include qualitative, quantitative and computational methods. The approaches here provide valuable new ideas and insights to be implemented into future learning analytics tools. (2) Analysis that involves an analytics tool that is implemented and is part of the learning situation. In many cases the tool is a visualisation tool, which gives learners and/or teachers visual feedback of the learning process. The papers report on how the data from the learning is processed to provide the instant feedback. (3) Evaluation of the impact of a learning analytics tool. In this case, a learning analytics tool such as visualisation tool is implemented in the learning situation, and it’s eﬀects on e.g. teachers’ pedagogical decision making or students learning progressions are evaluated.

recordings in KB circles, and textual discourses recorded via KB (notes and relationships of notes, including reading, building, referring to) [56]. Mixed-methods were used. First, data was qualitatively coded for students’ emotions and idea improvement-contributions. Then, statistical associations between emotions and idea-improvement activities were calculated, and diﬀerences were found e.g. in low and high participant groups’ emotional patterns [56]. Tool-based Analytics (Class 2). In this class, researchers used WiREAD with a dashboard implemented in the learning situation [14], and Wiki-based inquiry learning activities, where visualisations of contributions were shown to pupils [24]. Both of these cases are discussed further together with other research that deals with collaboration in Sect. 5.3. In one case, students concept knowledge was visualised as a concept cloud to support students’ epistemic-level reﬂection in an inquiry-based learning scenario [23]. In this case, the usage of real time representation of the students’ conceptual knowledge strengthened engagement and increased time spent with concept maps, which also reﬂected on quality of the concept maps [23]. Tool Evaluation (Class 3). This category included articles, which evaluated the impact of a learning analytics tool. For example, in the case of [51], an experimental design was used to study the eﬀects of a visualisation tool of collaborative discussions. The experimental group of teachers were provided with analysis of participation and discussions in simulated collaboration, while the control group did not receive such analytics. The results show diﬀerences in teachers’ actions in diagnosing pedagogical situations [51]. In another case [55], the impacts of a dashboard called LUNA to teachers’ decision making in mathematics problem solving classes was studied. In a quasi-experimental design, the impact of providing visualisations of topic mastery, types of errors in exercises, amount of

Learning Analytics for Knowledge Creation and Inventing in K-12

249

practice, and time versus progress statistics, were studied [55]. Video-recording and qualitative analysis were used to analyse the inﬂuence of this information to teachers’ lesson plans and actual lessons. Results show that knowledge from LUNA got incorporated into teachers’ lesson plans, but this was not found to reﬂect directly into learning gains [55]. In another research [52], the impact of a learning analytics tool to teachers’ pedagogical decisions was studied in a fully online setting, without actual classroom presence. The teachers used a CSCL environment, where they could observe students’ work, contributions, and intervene. Qualitative data was collected from teachers afterwards. While teachers had more information available from the LA tool, they missed visual cues and social presence. Communication was also delayed, since teachers could not respond to all groups at once, and some challenges in feedback processes were observed. While teachers found the tool useful, a high information load was considered a challenge [52]. Another case [53] studied teachers’ pedagogical decisions when receiving diﬀerent types of support from dashboards. In one case, geometry was taught [39]. Qualitative methods with video-data were used in order to understand types of teachers’ interventions as part of the students collaborative learning processes. Several types of intervention patterns for critical moments were identiﬁed: encouraging collaboration, in cases of disagreement, teacher prompted to reach an agreement, monitoring and supervising the execution of a task e.g. in cases where pupils started to work with many tasks simultaneously, the teacher prompted pupils to work on one task at a time, asking for justiﬁcation included prompt to justify their solution, and social validation was necessary to give encouraging feedback in the case of correct answer. Another type of teacher intervention was scaﬀolding argumentation. The exact numbers, percentages, and points in the learning trajectory of the teacher actions were mapped by the study [39]. The article was an interesting approach to zoom in on the interaction patterns and the best ways to support collaborative knowledge construction processes automatically and with teacher interventions. Another case [49] presents a mixed-methods approach to evaluate the perceived beneﬁts and problems associated with a learning analytics tool. Teachers’ reﬂections on a video-based analytics tool are discussed in [48], while another case [5] studied the impact of AI-driven pairing of students for collaborative work and it’s evaluation. To sum, this category included important research approaches on the evaluation of eﬀect of learning analytics tools in actual teaching and learning scenarios. The methods were mostly qualitative and quantitative, with experimental designs to compare pedagogical situations. 5.3

Learning Analytics: Approaches

Collaboration. This class included learning analytics for collaboration. One analytics approach targeted WiREAD-based critical reading activities, which were analysed for social metrics, such as the number of participants students sent comments to (out-degree centrality), how closely students were connected, the extent to which replies were matched by replies from same student (arc

250

M.-V. Apiola et al.

reciprocity) [19]. The data also included a questionnaire, and scales on reading ﬂuency and reading quality as graded by teachers [19]. Multivariate statistical analysis indicated that development of critical thinking may be associated with a conducive balance between the breadth and depth of interactions [19]. Also in the context of WiREAD dashboard visualisations were used in analytics of critical reading activities [14]. Dashboards were used to visualise students’ own learning activities: frequencies of comments and replies, types of replies (clarify, ideate, justify, challenge, validate). Mixed methods were used to assess learning ﬂuency, engagement and ability. Feedback showed that participants felt that dashboards provided clear views of one’s progress. This article contributes to evaluating the use of a LA-tool, and developing methods for analytics of collaboration [14]. In the context of liberal studies, pupils learned through Wiki-based collaborative group inquiry activities [24], while the system recorded students’ participation and contributions, such as revision counts, word counts, and number of sentences that were automatically inspected for “higher order thinking” [24]. Students saw visualisations of their fellow pupils’ contributions. Various metrics were collected from Wiki-based learning activities: number of inputted words, number of visualisation views, scores, and “unfairness” of division of activity in the group. Statistical analysis revealed some interesting ﬁndings: e.g. students who deviated from the norm in groupwork activity had more views of peers’ visualisations. Looking at peers’ views might have stimulated competition [24]. This research contributed to developing methods for collaboration analytics: e.g. the unfairness index and impact of looking at peer visualisations to learning. In the case of [39], pupils learned geometry in groups of 2–3 pupils with GeoGebra, VMT (Virtual Math Teams), and SAGLET (System for Advancing Group Learning in Educational Technologies). In this case, students did not speak but interacted virtually by justifying their ideas, argumenting, refuting claims, and reasoning in regards of tasks to solve. Teachers could observe the group activities, and received signals in regards of “critical moments”, e.g. when pupils got incorrect or correct answers, or were idle for too long. Qualitative methods with video-data were used to zoom in on the teacher’s interventions as part of the collaborative learning. Interaction patterns, such as encouraging collaboration, monitoring and supervising the execution of a task, asking for justiﬁcations, social validation, and scaﬀolding argumentation, were identiﬁed. The contribution of this article was to guide future analytics approaches in more targeted automatic detection of “critical moments”, in order to improve teacher’s pedagogical decisions. The article contributed mostly to evaluation of an LA tool, and developed ideas for analytics to guide teacher’s pedagogical decisions. In one case [8], the analytics was targeted at students’ small group maker projects in the context of Arduino-based TALKOO-kit. The analytics used (1) observation data, qualitatively analyses on collaboration competencies (establishing and maintaining shared understanding, taking appropriate actions to solve the problem, establishing and maintaining group organisation), as well as problem-solving competencies (identifying facts, representing and formulating knowledge, generating hypothesis, planning and executing, identifying knowl-

Learning Analytics for Knowledge Creation and Inventing in K-12

251

edge and skill deﬁciencies, monitoring-reﬂecting-applying). Secondly, analytics used (2) video data and automatic detection of hand and head positions and face directions to automatically detect physical engagement and degree of involvement [8]. The analytics involved a combination of human-coded and automatically observed data, and based on this, groupwork indicators were developed: group synchrony, and individual accountability. Results show diﬀerences between high and low performing groups in distribution of diﬀerent activities, and insights into development of modern indicators. Evolving Methods. Table 2 summarises the analytics approaches and methods used. Several observations can be made. First, in many cases, a combination of quantitative (QNT), qualitative (QLT) and machine learning-based (ML) approaches were employed in cases, where e.g. learning data was ﬁrst qualitatively analysed by human-coders, followed by partly automatic coding based on training data. In evaluative approaches, a combination of qualitative interviews and an experimental pre-post -design was common. Some approaches relied solely on quantitative methods of e.g. multivariate statistics to make sense of learning data. Probably the most promising or fruitful approaches used a combination of methods, such as those relying on ENA [43], which is based on a combination of qualitative and quantitative methods. Using a mixture of methods is also recommended in the approach of quantitative ethnography [42]. It seems that learning analytics, at least in the scenarios of knowledge constructing and inventing, essentially requires human analysts in the process. A crucial question then logically follows; how to involve human analysts in the process in analytics that also directly beneﬁts its target: the learners and the teachers? Can human analysers be involved in the analytics-loop, perhaps already at the time of the learning activities?

6

Discussion

In this research, we reviewed the state-of-the-art in learning analytics research that targets knowledge building and innovative pedagogies in the K-12 context. We set forth to answer the following research question: What are the educational technologies, analytics methodologies and key ﬁndings in learning analytics research that targets pedagogies of invention and knowledge-creation in the K-12 domain? Our results reveal a number of interesting ﬁndings. 6.1

Edutech for Knowledge Creation and Inventing

The reviewed papers reveal that there exists a range of advanced educational technologies that are designed to support learning, which is based on socioconstructivism, knowledge creation, inventing, and collaboration. A number of educational technologies support construction processes e.g. building concrete artefacts by using various construction kits. On the other hand, speciﬁc educational technologies are available to support scientiﬁc thinking by following the

252

M.-V. Apiola et al.

methods of science, e.g. in inquiry or agent-based modelling environments. Also, Knowledge Forum technologies were used in science, math and humanities, but especially in humanities e.g. in collaborative knowledge building activities. 6.2

Learning Analytics for Knowledge Creation and Inventing

Our results show a diverse range of analytics approaches for knowledge creation and inventing. Perhaps the major observation from this review is how challenging it is to digitally capture or model a learning situation, which is based on socio-constructivism. Learning under constructive pedagogies is such a multidimensional and complex phenomena that it is hard to quantify it without losing something essential and oversimplifying the situation. A number of very promising attempts were, however, found in our result set. We divided the approaches to three classes of methodology development, tool-based analytics, and evaluation of analytics tools. With regards to methodology development, a number of highly promising approaches were identiﬁed, which included analytics of routes taken in open-ended inquiry activities [32], detection of knowledge-in-action [20], epistemic network formation of students [4], detection of productive knowledge work [16], detecting success in building controlled science experiments [11], methods for AI-based pairing of students [5], indicators of early achievement in robotics [37], and detecting emotional patterns and idea-improvement contributions [56]. In many cases, human analysers played an essential part of the analytics process. Analytics that did not require human intervention were in many cases superﬁcial or simple. This given, a crucial question arises: how can future technologies support ﬂuent as well as in-depth human classiﬁcations of learning processes, perhaps already at the time of learning? With regards to articles that focused on using a learning analytics tool, several contributions were made. Various approaches to use dashboards and visualisations were taken [14,24], and visualisations of epistemic-level reﬂections [23]. With regards to evaluative approaches in learning analytics, approaches included how visualisations, such as amount of practice, time used, and progress, get incorporated into lesson plans and how that reﬂects on learning outcomes [55]. Observations about visualisations about e.g. a confusingly high information load were found [52], detection of various teacher acts such as encouraging collaboration, scaﬀolding argumentation, building also basis for analytics of teachers’ actions. Evaluative research is crucially important as part of a holistic design process of learning analytics. It is important that learning analytics is based on a process of user-centered design, pedagogically wise methodology development, tool creation, and evaluation. While research on analytics methodologies and evaluative research are both important, they do not suﬃce alone, which highlights the need for multidisciplinary approaches in learning analytics. With regards to analytics that targeted collaborative processes in constructive learning, a various interesting approaches were found. For example, various social metrics were used to build indicators of depth versus breadth of interactions and their relations to development of critical reading capacities [19], measures of higher order thinking [24], indicators of unfairness of group activity

Learning Analytics for Knowledge Creation and Inventing in K-12

253

[24], automatic detection of pedagogically critical moments, which need teacher intervention [39], and new indicators of group synchrony, and individual accountability [8]. To sum, the category of collaboration analytics in creative pedagogies revealed various approaches to design new metrics for analytics tools in collaborative knowledge construction. Many approaches to analyse collaboration use a mixture of qualitative, quantitative, and computational methods. Our results reveal a number of promising future methodologies, such as using epistemic networks [43] to model development of computational thinking activities and group design activities, development of modern indicators of collaborative processes such as group synchrony and individual accountability, and detection of emotions. In the future, it is essential to move beyond simplistic indicators e.g. of activity, or how many students are at diﬀerent phases in inquiry processes, and extensively reﬂect on how the analytics can support self-determination and learning skills, and also help in reforming educational practices, not only quantifying learning in the existing ones. Future learning analytics may include modern metrics such as mindset [3,41]. All in all, our results show that learning analytics for knowledge construction and invention pedagogies is an emerging area. The calls for analytics for C21 competencies, epistemic agency, and design mode thinking (e.g. [7,44]) indeed are starting to bare fruit. 6.3

Limitations

The present sample of articles may have been limited because many investigators pursuing design experiments regarding collaborative knowledge-creation did not use learning analytics as a keyword, in spite of developing methods for tracing multilevel personal, social and material learning processes. The present investigation indicated that many learning analytic studies sampled relied on mixed rather than formal methods. Although trivial learning processes, such as the proportion of correct solutions to closed tasks, may be formalized, nonlinear learning processes are contextual and domain speciﬁc and processes and mechanisms of learning that matter are identiﬁed only after considerable eﬀorts. Because the mediating sociodigital instruments are constantly evolving, the relevant learning phenomena are also emergent rather than already known, the learning phenomena poorly understood, and consequently hard to understand. Acknowledgment. This research was supported by the Growing Mind-project (http://growingmind.ﬁ), funded by the Strategic Research Council of the Academy of Finland.

References 1. Aldowah, H., Al-Samarraie, H., Fauzy, W.M.: Educational data mining and learning analytics for 21st century higher education: a review and synthesis. Telematics Inform. 37, 13–49 (2019) 2. Apiola, M., Sutinen, E.: Design science research for learning software engineering and computational thinking: four cases. Comput. Appl. Eng. Educ. 29(1), 1–19 (2020)

254

M.-V. Apiola et al.

3. Apiola, M., Sutinen, E.: Mindset and study performance: new scales and research directions. In: Koli Calling 2020: Proceedings of the 20th Koli Calling International Conference on Computing Education Research, Koli Calling ’20, New York, Association for Computing Machinery (2020) 4. Arastoopour, G., et al.: Modeling and measuring high school students’ computational thinking practices in science. J. Sci. Educ. Technol. 29(1), 137–161 (2020) 5. Berland, M., Davis, D., Smith, C.P.: Amoeba: designing for collaboration in computer science classrooms through live learning analytics. Int. J. Comput. Support. Collaborative Learn. 10(4), 425–447 (2015) 6. Brasiel, S., Close, K., Jeong, S., Lawanto, K., Janisiewicz, P., Martin, T.: Measuring computational thinking development with the FUN! tool. In: Rich, P.J., Hodges, C.B. (eds.) Emerging Research, Practice, and Policy on Computational Thinking. ECTII, pp. 327–347. Springer, Cham (2017). https://doi.org/10.1007/978-3-31952691-1 20 7. Chen, B., Zhang, J.: Analytics for knowledge creation: towards epistemic agency and design-mode thinking. J. Learn. Anal. 3(2), 139–163 (2016) 8. Cukurova, M., Luckin, R., Mavrikis, M., Mill´ an, E.: Machine and human observable ´ diﬀerences in groups’ collaborative problem-solving behaviours. In: Lavou´e, E., Drachsler, H., Verbert, K., Broisin, J., P´erez-Sanagust´ın, M. (eds.) EC-TEL 2017. LNCS, vol. 10474, pp. 17–29. Springer, Cham (2017). https://doi.org/10.1007/9783-319-66610-5 2 9. Dillenbourg, P.: What do you mean by collaborative learning? In: Dillenbourg, P. (ed.) Collaborative learning: Cognitive and Computational Approaches, pp. 1–19. Elsevier, Oxford (1999) 10. Ertmer, P.A., Newby, T.J.: Behaviorism, cognitivism, constructivism: Comparing critical features from an instructional design perspective. Perform. Improv. Q. 26(2), 43–71 (2013) 11. Gobert, J.D., Kim, Y.J., Sao Pedro, M.A., Kennedy, M., Betts, C.G.: Using educational data mining to assess students’ skills at designing and conducting experiments within a complex systems microworld. Thinking Skills Creativity 18, 81–90 (2015) 12. Hakkarainen, K., Hietaj¨ arvi, L., Alho, K., Lonka, K., Salmela-Aro, K.: Sociodigital Revolution: digital natives vs digital immigrants, vol. 22, pp. 918–923. Elsevier Scientiﬁc Publ. Co, Unknown, 2nd edn, February 2015 13. Hmelo-Silver, C.E., Duncan, R.G., Chinn, C.A.: Scaﬀolding and achievement in problem-based and inquiry learning: a response to kirschner, sweller, and. Educ. Psychol. 42(2), 99–107 (2007) 14. Jonathan, C., Tan, J. P.-L., Koh, E., Caleon, I. S., Tay, S.H.: Enhancing students’ critical reading ﬂuency, engagement and self-eﬃcacy using self-referenced learning analytics dashboard visualizations. In: 25th International Conference on Computers in Education, pp. 457–462. New Zealand, Asia-Paciﬁc Society for Computers in Education (2017) 15. Kesselbacher, M., Bollin, A.: Discriminating programming strategies in scratch: making the diﬀerence between novice and experienced programmers. In Proceedings of the 14th Workshop in Primary and Secondary Computing Education, WiPSCE 2019, New York, Association for Computing Machinery (2019) 16. Khanlari, A., Zhu, G., Scardamalia, M.: Knowledge building analytics to explore crossing disciplinary and grade-level boundaries. J. Learn. Anal. 6(3), 60–75 (2019) 17. Kirschner, P.A., Sweller, J., Clark, R.E.: Why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educ. Psychol. 41(2), 75–86 (2006)

Learning Analytics for Knowledge Creation and Inventing in K-12

255

18. Kivunja, C.: Teaching students to learn and to work well with 21st century skills: unpacking the career and life skills domain of the new learning paradigm. Int. J. High. Educ. 4(1), 1–11 (2015) 19. Koh, E., Jonathan, C., Tan, J.: Exploring conditions for enhancing critical thinking in networked learning: ﬁndings from a secondary school learning analytics environment. Educ. Sci. 9(4), 287 (2019) 20. Levy, S.T., Wilensky, U.: Mining students’ inquiry actions for understanding of complex systems. Comput. Educ. 56(3), 556–573 (2011) 21. Li, S., Hietaj¨ arvi, L., Palonen, T., Salmela-Aro, K., Hakkarainen, K.: Adolescents’ social networks: exploring diﬀerent patterns of socio-digital participation. Scandinavian J. Educ. Res. 61(3), 255–274 (2017) 22. Liz-Dom´ınguez, M., Caeiro-Rodr´ıguez, M., Llamas-Nistal, M., Mikic-Fonte, F.: Systematic literature review of predictive analysis tools in higher education. Appl. Sci. 9(24), 5569 (2019) 23. Manske, S., Hoppe, H.U.: The ”Concept Cloud”: supporting collaborative knowledge construction based on semantic extraction from learner-generated artefacts. In: 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), pp. 302–306 (2016) 24. Ng, J., Hu, X., Luo, M., Chu, S.K.W.: Relations among participation, fairness and performance in collaborative learning with wiki-based analytics. Proc. Assoc. Inf. Sci. Technol. 56(1), 463–467 (2019) 25. Packer, M.J., Goicoechea, J.: Sociocultural and constructivist theories of learning: ontology, not just epistemology. Educ. Psychol. 35(4), 227–241 (2000) 26. Papamitsiou, Z., Economides, A.A.: Learning analytics and educational data mining in practice: a systematic literature review of empirical evidence. J. Educ. Technol. Soc. 17(4), 49–64 (2014) 27. Papert, S.: Teaching children to be mathematicians versus teaching about mathematics. Int. J. Math. Educ. Sci. Technol. 3(3), 249–262 (1972) 28. Papert, S.: An exploration in the space of mathematics educations. Int. J. Comput. Math. Learn. 1(1), 95–123 (1996) 29. Papert, S., Harel, I.: Situating constructionism. In: Papert, S., Harel, I. (eds.), Constructionism, vol. 36, pp. 1–11. Ablex Publishing, Norwood (1991) 30. Parker, J.N., Cardenas, E., Dorr, A.N., Hackett, E.J.: Using sociometers to advance small group research. Sociol. Methods Res. 49(4), 1064–1102 (2018) 31. Pentland, A.: Social Physics. Penguin Press, How Good Ideas Spread - The Lessons from a new Science (2014) 32. Reilly, J.M., Dede, C.: Diﬀerences in student trajectories via ﬁltered time series analysis in an immersive virtual world. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, LAK19, pp. 130–134, New York, Association for Computing Machinery (2019) 33. Ritella, G., Hakkarainen, K.: Instrumental genesis in technology-mediated learning: From double stimulation to expansive knowledge practices. Int. J. Comput Support. Collaborative Learn. 7(2), 239–258 (2012) 34. Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. Data Min. Knowl. Dis. 10(3), e1355 (2020) 35. Root-Bernstein, R.: STEMM education should get “HACD.” Science 361(6397), 22–23 (2018) 36. Glasner, R., Baraness, A.: Chapter 1: introduction. In: Alfonso’s Rectifying the Curved. SSHMPS, pp. 1–32. Springer, Cham (2021). https://doi.org/10.1007/9783-319-77303-2 1

256

M.-V. Apiola et al.

37. Scaradozzi, D., Cesaretti, L., Screpanti, L., Mangina, E.: Identiﬁcation of the students learning process during education robotics activities. Front. Robot. AI 7, 21 (2020) 38. Scardamalia, M., Bereiter, C.: Knowledge building and knowledge creation: theory, pedagogy, and technology. In: Sawyer, K. (ed.), The Cambridge Handbook of the Learning Sciences, 2nd ed, pp. 397–417. Cambridge University Press (2014) 39. Schwarz, B.B., Prusak, N., Swidan, O., Livny, A., Gal, K., Segal, A.: Orchestrating the emergence of conceptual learning: a case study in a geometry class. Int. J. Comput. Support. Collaborative Learn. 13(2), 189–211 (2018). https://doi.org/ 10.1007/s11412-018-9276-z 40. Schwendimann, B., et al.: Perceiving learning at a glance: a systematic literature review of learning dashboard research. IEEE Trans. Learn. Technol. 10(1), 30–41 (2017) 41. Seitamaa, A.: Exploring Middle School Students’ Growth Mindsets in Relation to Educational and Sociodigital Activity. Master’s thesis, University of Helsinki, Faculty of Education (2021) 42. Shaﬀer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017) 43. Shaﬀer, D.W., Collier, W., Ruis, A.R.: A tutorial on epistemic network analysis: analyzing the structure of connections in cognitive, social, and interaction data. J. Learn. Anal. 3(3), 9–45 (2016) 44. Shum, S., Crick, R.: Learning Analytics for 21st Century Competencies. J. Learn. Anal. 3(2), 6–21 (2016) 45. Shum, S.B., Ferguson, R.: Social Learning Analytics. J. Educ. Technol. Soc. 15(3), 3–26 (2012) 46. Skorton, D., Bear, A.: The Integration of the Humanities and Arts with Sciences, Engineering, and Medicine in Higher Education: Branches from the Same Tree. The National Academies Press, Washington (2018) 47. Stahl, G., Hakkarainen, K.: Theories of CSCL. In: Cress, U., Ros´e, C., Wise, A.F., Oshima, J. (eds.) International Handbook of Computer-Supported Collaborative Learning. CCLS, vol. 19, pp. 23–43. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-65291-3 2 48. Tan, J., Koh, E., Ariﬃn, N., Teo, E., Tay, S., Singh, S.: Analytics environment (CoVAA) intervention: user experiences and reﬂections of teacher-practitioners. In: 26th International Conference on Computers in Education. Asia-Paciﬁc Society for Computers in Education (2018) 49. Tan, J.P.-L., Yang, S., Koh, E., Jonathan, C.: Fostering 21st century literacies through a collaborative critical reading and learning analytics environment: Userperceived beneﬁts and problematics. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, LAK 2016, pp. 430–434, New York, Association for Computing Machinery (2016) 50. Tripathi, P., Burleson, W.: Predicting creativity in the wild: experience sample and sociometric modeling of teams. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, CSCW 2012, pp. 1203–1212, New York, Association for Computing Machinery (2012) 51. van Leeuwen, A., Janssen, J., Erkens, G., Brekelmans, M.: Supporting teachers in guiding collaborating students: eﬀects of learning analytics in cscl. Comput. Educ. 79, 28–39 (2014) 52. van Leeuwen, A., Janssen, J., Erkens, G., Brekelmans, M.: Teacher regulation of multiple computer-supported collaborating groups. Comput. Hum. Behav. 52, 233–242 (2015)

Learning Analytics for Knowledge Creation and Inventing in K-12

257

53. van Leeuwen, A., Rummel, N., van Gog, T.: What information should CSCL teacher dashboards provide to help teachers interpret CSCL situations? Int. J. Comput. Support. Collaborative Learn. 14(3), 261–289 (2019) 54. Viberg, O., Hatakka, M., B¨ alter, O., Mavroudi, A.: The current landscape of learning analytics in higher education. Comput. Hum. Behav. 89, 98–110 (2018) 55. Xhakaj, F., Aleven, V., McLaren, B.M.: Eﬀects of a teacher dashboard for an intelligent tutoring system on teacher knowledge, lesson planning, lessons and student ´ Drachsler, H., Verbert, K., Broisin, J., P´erez-Sanagust´ın, learning. In: Lavou´e, E., M. (eds.) EC-TEL 2017. LNCS, vol. 10474, pp. 315–329. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66610-5 23 56. Zhu, G., Xing, W., Costa, S., Scardamalia, M., Pei, B.: Exploring emotional and cognitive dynamics of knowledge building in grades 1 and 2. User Model. UserAdap. Inter. 29(4), 789–820 (2019)

A Conceptual Framework for the Development of Argumentation Skills Using CSCL in a Graduate Students’ Research Course R. van der Merwe(B) , J. van Biljon, and C. Pilkington University of South Africa, Pretoria, South Africa {vdmerwer,vbiljja,pilkicl}@unisa.ac.za

Abstract. Developing and presenting a well-formulated research argument is core to the learning journey of a graduate student. In open distance e-learning, computer-supported learning is instrumental in providing a platform for graduate students to develop their argumentation skills. However, there is little guidance on the elements required in using computer supportive collaborative learning (CSCL) to augment argumentation skills development (ASD). This paper reports on elements identiﬁed in literature that should be present in a framework using CSCL to augment ASD. The thematically analysed data gathered during the focus group sessions were used to conﬁrm the structure of the argumentation skills development framework (ASDF), and conﬁrmed that there is a need for a framework to provide guidance in using CSCL to augment ASD. The contribution includes the conceptual ASDF using CSCL, comprising seven elements, that provides a strategy of scaﬀolded learning for implementation in a graduate course to augment ASD. Keywords: Learning · Collaboration · Computer support · Argumentation skills · ODeL · CSCL · Scaﬀolded learning · Argumentation skills development

1

Introduction

Argumentation skill is seen as a derivative that develops along the academic route and involves the understanding, managing and formulation of arguments [1] and is of interest to education as it contributes to the individual in “transforming, clarifying, changing ideas, personal growth and identifying of information” [2, p. 50]. The inclusion of the theoretical concepts of argumentation in a graduate course, along with the skills of writing academically, is not new and positive results has been reported in that the “students were able ... to produce academic texts argumentatively more sophisticated” [3, p. 139]. We refer in this study to graduate students as students that have completed their undergraduate qualiﬁcation and are now enrolled for an honours’ qualiﬁcation. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 258–278, 2022. https://doi.org/10.1007/978-3-031-10467-1_16

Conceptual Framework for Argumentation Skills Development using CSCL

259

In open distance e-learning (ODeL) it often takes considerable time for a graduate student to develop argumentation skills and demonstrate it successfully in research outputs [3] as the student is often isolated from both peers and supervisors [4]. The use of technologies available in CSCL platforms are imperative in education [5] and allow students in ODeL to not only join in online discussions, but also to augment their argumentation skills. One of the perceived advantages of using CSCL in graduate studies, is the “ability to overcome obstacles of distance and time” [6, p. 272]. However, the availability of technology and applicable platforms are not suﬃcient conditions to ensure that it will be utilised by graduate students to critically engage on the available collaboration platforms in academic argumentation and consequently develop their argumentation skills [7,8]. In a study by Van Biljon et al. [9], it was noted that graduate students, even in a cohort supervision environment with guidance from supervisors, are reluctant to use the available collaboration platforms to critically engage in argumentation with their peers. The University of South Africa (UNISA) an ODeL institution, [10], is progressively using CSCL to provide various solutions and platforms for collaboration. An example of using CSCL, that is grounded in the Grasp of Evidence (GoE) framework, is the platform presented by Mochizuki et al. [11]. The GoE framework posits ﬁve dimensions of evidence evaluation, i.e. evidence analysis, evidence evaluation, evidence interpretation, evidence integration and evidence credibility. The platform, presented by Mochizuki et al. [11], allows users to collaboratively share and read multiple documents, synthesize the contents and resolve disagreements, using the scaﬀolded environment provided in the CSCL. Though various research exists in the multidisciplinary ﬁeld of using CSCL [12], the elements required for a conceptual CSCL framework that will augment argumentation skills in ODeL environment, could not be found. Furthermore, the researchers could not ﬁnd evidence-based guidance on the elements required in a framework, purposefully designed for the augmenting of argumentation skills using CSCL, that can be implemented in a graduate course. This was also identiﬁed as a need by J¨ arvel¨ a and Ros´e [13, p. 146] that more empirical research is required on the “design of the technological settings for collaboration and how people learn in the context of collaborative activity”. It is against this background, and with a realisation of the complexity of learning interactions in CSCL between graduate students and supervisor, as e-moderator, that the research question was formulated as: What are the key elements required in a CSCL conceptual framework that could contribute to the development of argumentation skills in a graduate course? In response to the research question, the researchers developed and presented an evidence based conceptual argumentation skills development framework (ASDF) to experts in focus groups consisting of supervisors with experience in postgraduate supervision and ODeL courseware developers. Evaluation by students and the institution fall outside the scope of this study, as we believe it is important to develop a mature and robust platform before involving the students in future research.

260

R. van der Merwe et al.

The remainder of this paper is structured as follows: In Sect. 2 the theoretical framework that underpins the development of the ASDF, based on the concept of community of practice as presented by Wenger [14] is discussed. The proposed ASDF is presented and described in Sect. 3. In Sect. 4 a scaﬀolded learning approach is proposed for the ASDF and Toulmin’s argumentation model [2,15] is used to augment argumentation skills development. Toulmin’s model or method has been used in various studies to augment the development of argumentation skills of students [2,3,16] and is discussed in more detail in Sect. 5. The method of selecting the participants, the qualitative thematic analysis process followed in transcribing the data is explained in Sect. 6. The revised ASDF, based on the ﬁndings, is presented in Sect. 7, and the paper concludes with the conclusions, limitations and future studies recommendations. The rationale of this study then was to develop an ASDF that can be followed when implementing an argumentation model in a graduate course using CSCL. At the practical level, the research contributes to the body of knowledge by providing a framework that provides a philosophy and strategy of scaﬀolded procedures and techniques to implemented in a course using CSCL that augment the argumentation skills development of the graduate student. At a theoretical level, the research contributes to the body of knowledge pertaining to scaﬀolded approaches that can be applied in graduate courses towards the development of argumentation skills.

2

Theoretical Framework

The theoretical framework that underpins the development of the ASDF is the community of practice concept by Wenger [14,17]. For a community of practice to exist, the three elements that comprised the theory, ‘the domain’, ‘the community’ and ‘the practice’, need to be develop in parallel to cultivate such a community [17]. The domain element points to a community of practice that is characterised by the participation and commitment of the members towards a collaborative goal. The participants are identiﬁed by contributing to the collaborative goal through meaning and identity [14]. We refer to meaning as the way the participants will share their experience of life and the world and how it has brought about change, and identity refers to the way the participants will share how learning changed them in the context of the community. The second element, the community, refers to the engagement among the participants, through which information and knowledge is shared and relationships are built in order to learn from one another [17]. The practice, the third element, refers to the sharing of resources. The participants build libraries of resources and ﬁnd ways in which to address problems that may occur periodically [17]. In the evaluation of the ASDF, the community of practice among supervisors and course developers is signiﬁcant, as it allows amongst others an increased sense of community, the sharing of years of experience, construction of knowledge and experience and critical thinking [18].

Conceptual Framework for Argumentation Skills Development using CSCL

3

261

Proposed ASDF

Universities are adopting learning management systems (LMS) that provide collaboration platforms, using CSCL [19], that allows scaﬀolding learning and environments that can foster higher order thinking and critical thinking skills [20]. From a pedagogical perspective, the pedagogical approach and course requirements should drive the initiative in the development of the ASDF, and not the technology [21]. Furthermore, the ODeL technology infrastructure should provide the environment that is not only user-friendly, customisable, student centred but also provide the required privacy and anonymity [20]. Within the ODeL technology infrastructure, the aﬀordances of collaborative tasks, ways to communicate using communication technologies, sharing of resources are of importance [22]. The learning approach followed should allow for productive processes, following strategies that allow scaﬀolded collaborative learning processes [22–24].

Fig. 1. Conceptual argumentation skills developments framework (ASDF)

A conceptual ASDF, seen in Fig. 1, was presented to the focus groups. The conceptual ASDF comprises seven elements, that include the elements of course requirements, the pedagogical approaches, infrastructural requirements and ODeL technology infrastructure as identiﬁed in literature as well as the elements of human capacity from the perspective of the student as a researcher, the output as a well-structured research problem and the evaluation of the approach. These elements will be explored in the next paragraphs in context of a speciﬁc honours research course.

262

R. van der Merwe et al.

3.1

Course Requirements

In this study, one of the honours research courses (HRCOS82), oﬀered at UNISA, is chosen. HRCOS82 serves as a fundamental building block in equipping students with the knowledge and competencies to conduct research in the computing ﬁeld, as well as giving the students the opportunity to conduct a small research project under the supervision of a lecturing team in Computing. Students enrolled for HRCOS82 choose between a selection of research projects, a project based on their area of study, which we refer to as HRCOS82 P19 in study. Embedded in the course outcomes are the South African Qualiﬁcations Authority1 (SAQA) critical course ﬁeld outcomes (CCFO). The CCFOs are of importance as they identify key terminology that is required when building an argument and include terminology such as identifying, working, organising, collecting, communicating, use of technology, demonstrating and contributing. 3.2

Pedagogical Approaches for ASD

Collaborative learning is seen as a pedagogy that can be adopted in most learning environments, including CSCL in ODeL [25]. Furthermore, scaﬀolded learning activities in collaborative learning can be used to enhance argumentation skills development among students [16,26]. The course developer should take cognisance of the technology available in the ODeL environment [27] that can be used to provide a scaﬀolded learning journey to assist in the development of argumentation skills. 3.3

Human Capacity: The Student as a Researcher

The student in HRCOS82 P19, contributes by applying and using their competencies and contributions towards the collaborative goal. 3.4

Infrastructural Requirements

The infrastructural requirements include the resources that are required to implement the CSCL in an ODeL environment. These resources include the external resources, institutional resources and supervision resources and can be accessed and used by the community. The external resources include the adoption of cloud computing services that include open education recourses (OER), MOOCS and open data resources, as well as the use of popular multimedia platforms for communication and collaboration [28]. The inclusion of external resources is often left to the lecturer or supervisor [29]. From the student side, access to these external resources is dependent on accessibility, availability, and in some instances are device dependent. From the institution side, a need for 1

https://www.saqa.org.za/.

Conceptual Framework for Argumentation Skills Development using CSCL

263

policies that will govern privacy, security and ethics together with cost and scalability are important factors that should be considered [30,31]. Institutional resources include access to resources that the university provides to students as part of their enrolment and include the university’s online library, reference management software, statistical analysis software, webinars, academic integrity and similarity tools, to name a few. As these resources are part of the institution, the governance thereof is the responsibility of the institution. The supervisor, is appointed by the department within the university and the course requirements determine the qualiﬁcation and capacity of the supervisor. Through institutional university resources, training in supervision and capacity development programmes are provided. 3.5

ODeL Technology Infrastructure

At UNISA, ODeL is delivered through an online LMS. The LMS provides the technology infrastructure [20] for CSCL resources and includes, among others, the structure for the learning path, e-tivities, assessment and learning approach. CSCL aﬀordances [22] and should include the establishing of a joint task, space for online communication and sharing of resources, online interface for engaging in productive processes, and online technology tools for co-construction towards solving a shared problem. In the development of a course using CSCL, the course developers and e-moderator should keep it mind that, although students have access to technology through the internet, the students “lack the necessary skills and competence to engage fully and eﬃciently in online learning” [32, p. 18]. 3.6

Output

The course requirements deﬁne the outcomes for HRCOS82, which in this instance is “...mastering scientiﬁc writing, literature references and can complete an acceptable written research report”. In this study, following the scaffolded learning journey approach within CSCL and applying the argumentation model of Toulmin [15], the output will be “the presentation of a well formulated argument”. The students will submit their ﬁnal report for assessment, which is externally examined by a panel of examiners. For future studies, the method to evaluate argumentation skills from argumentation records [33] can be considered. 3.7

Evaluation of the Approach

The evaluation of the approach following in this study includes learning analytics, gathering of data through questionnaires and expert focus groups. To monitor the students’ progress, learning analytics and data will be gathered over the learning journey regarding the elements of the community of practice: ‘the domain’, ‘the community’ and ‘the practice’ [17]. The evaluation of an implemented ASDF, through learning analytics and questionnaires among students, does not fall within the scope of this study and is considered for further research. The qualitative thematic analysis process followed in the evaluation of the proposed ASDF with experts in focus groups is discussed in Sect. 6.

264

4

R. van der Merwe et al.

The Scaﬀolded Learning Journey

Scaﬀolded learning refers to the use of a variety of activities in a learning journey that will assist the students in progressing towards a stronger understanding and ultimately to independence in the learning process [26,34]. In CSCL, a scaﬀolded learning journey, as presented by Salmon et al. [24] is made up of activities (e-tivities) that promote “active and interactive online learning” and include sharing of resources, online discussions relating to the research, collaborating in the CSCL environment through writing messages, attending webinars and presenting research. The student starts with little or low level of competence in argumentation skills and progresses to a place where a well-formulated argument can be presented. The participants, e-moderator and other students as peers, provide support and transfer of information in a scaﬀolded manner as the level of challenge and the level of competence grows [34]. Refer to Fig. 2 for a presentation of this scaﬀolded learning journey.

Fig. 2. ASD through a scaﬀolded learning journey

The level of competence of the student is mapped on the horizontal axis and represents the learning journey of the individual, and the vertical axis represents the increase in the level of competencies as the student progresses. The e-moderator, as the supervisor, facilitates the learning journey by establishing the group, introducing the knowledge domain and the learning approach as well as inducting the students into the ASD learning environment [23,24]. In the scaﬀolded learning journey, the role of the e-moderator changes as the student

Conceptual Framework for Argumentation Skills Development using CSCL

265

progresses in the learning journey. Initially starting as an instructor, the supervisor provides the required training and instruction in using Toulmin’s model by identifying the various elements of claim, grounds, and so forth. As the student progresses in the learning journey, the role of the instructor gradually changes to that of a facilitator (dotted line 1) by allowing the students to build their competencies in developing argumentation skills from a low level of competence to a place where the student can create and present a well-formulated argument. Each stage requires the student to master argumentation skills in the scaﬀolded learning journey. The scaﬀolded levels of skills are presented in the categories of the revised version of Bloom’s taxonomy [35], and include competencies from remembering and understanding, to applying and analysis, and ﬁnally to the categories of evaluating, creating and implementing. In this scaﬀolded learning journey, the students (as peers) are part of discussion groups and have the opportunity not only to present their arguments, but also give and receive critique. The peers, travelling on the same learning journey as the individual student, collaborate in the space provided in the LMS. This is done through sharing, presenting, evaluating, critiquing, and applying the terminology of Toulmin’s model (presented in the dotted line labelled 2). Using the technology available in CSCL, allows students to collaborate at their own convenience, however, the e-moderator should monitor the collaboration as responses to discussions may appear in a disjunctive way, making engagement in in-depth discussions diﬃcult [36]. This is of importance, as the storyboard that will be designed for the implementation of the ASDF in a research course should provide guidelines on the e-tivities and the commitments from the students to ensure that argumentation skills development is reached. Refer to Table 1 for an example of a storyboard that represents the CSCL aﬀordances, the needs that should be addressed and design strategies with examples of e-tivities that can be used.

5

Toulmin’s Argumentation Model

Toulmin’s argumentation model was chosen as the argumentation model to follow in this study. The model is a style of argumentation that breaks the argument into six components, namely claim, grounds, warrant, qualiﬁer, rebuttal and backing, as seen in Fig. 3. Within this argumentation model, every argument has three fundamental parts which are the claim, the grounds and the warrant. The claim is the main argument and represents the assertion that the author would like to convince or prove to the audience. The grounds of an argument are the evidence and facts that support the claim. The warrant, which is often not stated explicitly, but should be part of the argument, are the assumptions that link the grounds to the claim. The backing, qualiﬁer and rebuttal are not always present in an argument but are often required to assist the author to add nuance to the argument. The backing refers to any additional support of the warrant. The qualiﬁer limits the study to a speciﬁc content, time or making the reader aware that the claim may

266

R. van der Merwe et al.

Table 1. Storyboard: CSLC Aﬀordances Map to the Needs, Design Strategies and E-Tivities. CSCL aﬀordances [22]

Needs addressed

Design strategies

E-tivities

1. Establishing a joint task

Joint project is presented to the group. Instructions on how to use internal and external resources.

Students are presented with a task that is outside their area of conﬁdence.

Using LMS collaboration spaces. Toulmin explained to in webinar. Searching and downloading of articles. Sharing work in collaboration space. Assessing in the online space.

2. Communication

Group communicate using the LMS.

Using the Using chats, communication webinars, threaded platform and discussions available applications available in in LMS. Presenting LMS. Timeous work to the group. feedback.

3. Sharing resources

Group shares resources (internal and external).

Sharing of relevant links, channels and resources.

Identifying and utilizing data repositories, websites, referencing tools and software. Accessing and using online library.

4. Engaging in productive processes

Scaﬀolded learning journey, taking into account prerequisites, focus on the development of argumentation skills.

Tasks are structured and students have to perform speciﬁc tasks in the group. Timeous feedback.

Continuing peer assessment by applying argumentation tools.

5. Engaging in co-construction

Co-construction by providing input and feedback. Presentation of work.

Keeping the shared goals and problems in context. Using elements of Toulmin to critique. Timeous feedback.

Presenting research in a webinar. Peer’s critique by applying argumentation elements.

6. Monitoring and regulation

Evaluation of approach.

Self-evaluation, group evaluation. Group evaluation. Data analytics.

Self-evaluation by individual student. Learning analytics by the e-moderator. Evaluating the approach within the group.

7. Finding and building groups and communities

Space provided in the LMS for students to join in communities with similar interests. Create awareness of external resources.

Through a scaﬀolded learning path, the student identify relevant communities and use applicable resources.

Identifying relevant communities that have similar interests.

not be true in all circumstances. Finally, the rebuttal, which is either implied or stated explicitly, acknowledges other views of similar studies. Table 2 presents a practical example illustrating the diﬀerent elements in a Toulmin argument.

Conceptual Framework for Argumentation Skills Development using CSCL

267

Table 2. Example of identifying elements of toulmin’s argumentation model as part of annotation of literature Elements of toulmin’s model Claim

Graduate students have a problem with argumentation in research.

Grounds

Own experience. Other supervisors. Literature.

Warrant(s)

Assuming that graduate students will need to use argumentation skills to present their argument in the ﬁnal report.

Backing(s)

Based on last three years of research projects. Literature identiﬁed it as problem area.

Rebuttal(s)

Alternative research on addressing argumentation skills development. English literacy contributing to poor academic argumentation. E-skills are not what it should be. Students level of the course content not suﬃcient.

Qualiﬁer/Modality

ODeL. Graduate research. Computing

Fig. 3. Toulmin’s model of argumentation

6

The Focus Groups

Ethical clearance was received and by means of purposive sampling and snowball sampling, the researchers contacted 20 potential participants. Ten of the 20 participants agreed to participate in a focus group and nominated 15 more experts to contact, of which 10 accepted. In total, 19 expert university researchers that have experience in postgraduate supervision and one ODeL curriculum designer formed part of the focus groups. These supervisors are from universities in South Africa and responsible for postgraduate supervision in diﬀerent subject disciplines. Although the experts varied in their years of postgraduate supervision, the participants all had experience in either ODeL, distance education or blended learning. Furthermore, due to the COVID-19 pandemic, more traditional residential universities in South Africa relied on e-learning environments to engage with their graduate students and could relate to the online learning environment

268

R. van der Merwe et al.

as presented in this study. Nine focus group sessions were held via MS Teams. The number of participants varied between one and three experts in a focus group. The following question guided the discussions in the focus groups: What are the key elements required in a CSCL conceptual framework that could contribute to the development of argumentation skills in a graduate research course? A summary of the research study and copies of the screens presented during the focus groups were distributed in advance to the participants. Each focus group lasted an hour. During the ﬁrst 20 min, the purpose of the focus group was explained and the ASDF presented. During the remainder of the session, the participants engaged in discussions and completed an online questionnaire. In Sect. 6.1 the ﬁndings are discussed in terms of themes that emerged from the discussions and in Sect. 6.2 the online questionnaires completed by the participants are discussed. 6.1

Focus Group Discussion Findings

The themes that emerged from the thematic analysis process were identiﬁed and labelled as ASDF, argumentation model, infrastructural requirements, collaboration and human capacity. The themes are discussed in the paragraphs that follow. The responses quoted from the participants are indicated in square brackets and refer to the speciﬁc focus group, for example, FG2 and the timestamp as recorded in the transcript. The ASDF: The presentation of the ASDF was well-received and included comments such as [FG2 [00:39:31]] “... this is really very comprehensive. There’s a lot of detail, but the framework is simple enough ” and [FG2 [00:45:18]] “[the ASDF] is linked to diﬀerent theoretical frameworks that are already existing on models that support [the ASDF] concepts” and [FG1 [00:03:58]] “...timewise in addressing the need for argumentation as this is a general concern, not only for studies but also when one needs to publish ”. Concerns expressed included comments such as “...the person that will implement it will have to understand the environment” and “...buy-in is required as the framework may be diﬃcult to implement ”.

Argumentation Model: As to the theme of the use of an argumentation model that can be used to augment argumentation skills, in this instance Toulmin, the participants in the focus groups agreed that “Toulmin is an acceptable model” and [FG 3 [00:48:05]] “... it empowers them [the students] to make the diﬀerence between criticizing an argument and criticizing the person [other students]” but warned that [FG3 [00:50:39]] “...having taught Toulmin’s to [postgraduate] students at previous university, it’s hard. It’s a very hard way of reasoning”. Although the presentation during the focus groups focused on the lack of argumentation skills and the implementation of the ASDF in a research course, it quickly became apparent from the participants that additional factors should be taken into consideration, such as language skills. As mentioned by the participants [FG 1 [00:04:37]] “students need this for studies, ..., they are ultimately going

Conceptual Framework for Argumentation Skills Development using CSCL

269

to publish.... And if you can’t argue, you can’t publish. So it’s a problem ... made me wondered as to how much of the problem for some students is that they are so much battling understanding English and reading in English and writing in English that they’re ... never actually even get to the argumentation skills that they don’t have the basic language skills.” This was conﬁrmed by [FG 7: [00:38:59]] “...the thing is people are not used to argumentation. I mean, they’re not critical even though they went through three years of an undergraduate degree”.

Infrastructural Requirements: From the discussions and the themes that emerged from the thematic analysis, it was clear that the initial presentation of the infrastructural requirements to include the external resources, institutional resources and supervision resources were problematic (see Fig. 1). In the revised version of the ASDF, the supervision resources were removed and grouped with the human capacity element (representing then both the supervisor and the student) This will be discussed in Sect. 4. Participants further suggested that the students should receive life skills on each of the levels in the scaﬀolded learning: [FG 5 [00:37:02]] “... there is also skills and knowledge attached to each one of those steps, which is admin life skills”. This was further emphasised in the comment of [FG 5 [00:45:52]] “So many of these students don’t want to present. Not because they don’t think their research is good, they just don’t have the skills to present. And if you don’t ﬁgure that out, they cannot present the research”. Although not many of the participants commented on access to the external and internal resources, there were general comments on the “extended registration periods [due to the COVID-19 pandemic], students are not on the same space [some students enrolled much earlier than others]”. Suggestions to counteract this included: “dividing the students into smaller groups as they register to counter the [current] problem”.

Collaboration: As to the theme of collaboration, it was observed that students can be categorised into three distinct groups, namely (1) those that do not want to work in groups, (2) the competitive student that will work in a group to gain information, but not willing to share and (3) the student that uses the group to share and collaborate to grow and contribute. Another participant contributed to the three distinct groups of students and added that students should be trained on how to [FG 4 [00:31:41]] “peer-review and contribute to the rest of the group” and “not enough is done in the development of the problem statement ... speciﬁcally when thinking of advancing to a Master’s”. Another viewpoint that the participants had

in the theme of collaboration was the discussion on sociotechnical perspectives and social and cultural factors that will come into the interactions and inﬂuence the behaviour of the students in the group, among each other and with the supervisor. [FG 3 [01:03:38]] “...it would be interesting to see in the ﬁrst place, what collaborations are coming, is it only between the peers and the lecturer? Are those the only parties involved? What is the nature of those interactions?”. The researchers took note of this and will explore the factors of social and cultural interactions in future research. Further comments and discussions related to

270

R. van der Merwe et al.

constructive learning and comments made on “Will the learning be structured and facilitated? How to keep the students active in the learning process during the year as students are often eager to start but then wander oﬀ” as summarised by one of the participants as [FG 3 [01:03:09]] “[the researchers should] consider very carefully, the way you craft the interactions [in the collaborative space]”. As students are from diﬀerent groups of academic environments, they must be taught how to formulate questions and post questions in such a way that all can understand them. This was conﬁrmed by [FG3 [00:47:23]] “...in the ODeL environment ... students don’t know each other and, it, this focusing on a speciﬁc tool helps them to understand that they need to, to engage with a person’s argument and then kind of applying that tool to [ask] ... where’s your backing?” As one of the participants had already implemented

group work among postgraduate students, the comments on the administration part should be taken note of, speciﬁcally in terms of allowing the students to start the group and thus reducing administration on the side of the e-moderator [FG3 [00:58:58]] “And then we got the students to contribute to it, ... [this] was simply like one big chat, what made it diﬀerent was it wasn’t supervisor initiated the students actually did”.

Human Capacity- The Supervisor as an E-Moderator and the Student as a Researcher: The human capacity theme includes both the student as a researcher and the supervisor as an e-moderator. This is diﬀerent from the original presentation in Fig. 1, where the supervisor was part of the infrastructural resources. From the discussions, it was clear that the ASDF does not take into consideration the capacity of the supervisor. Comments included [FG 1 [00:11:45]] “Diﬀerent supervisors, diﬀerent staﬀ members have diﬀerent levels of skills and have diﬀerent ways of doing things”. Furthermore, the varying capacity of the supervisor to act as an e-moderator may mean training is required: [FG 6 [[00:43:10]] “...there must be training for a module leader or a research person [because] we were never trained in any of this”. Adding to the human capacity theme, comments relating to the uniqueness of individual students are of importance and more speciﬁc training relating to argumentation skills should be given in the learning path. For example, [PG 5 [00:41:00]] “...but you start with an easier one. Generic. So you give them that and they work through the process ... and then you do it on a diﬀerent example and they have to do it then you can see if they understand it or not”. Of concern to one of the participants is the attrition rate of students in ODeL [FG3 [00:52:54]] “... will [the course] be in some way structured ...[and]... facilitated ... because we started oﬀ with the number of them excited, energized, and then by the end of the year, they were very few in the discussion groups that we, that we had with them”.

General Feedback and Critical Success Factors: The critical success factors that should be taken into consideration in implementing the framework were highlighted by a participant [FG 9 [00:48:42]] “From a supervisor perspective, but also from a student [side] ...... [there are] ... some critical success factors ... to make this framework work. So I’m wondering if some of these critical success factors for a

Conceptual Framework for Argumentation Skills Development using CSCL

271

supervisor could be something that the supervisor would need to be trained in this framework”. The participant also commented on the implementation of the framework

in a large group and that critical success factors should include the size of the group and the capacity of the supervisor [PG 9 [00:49:16]] “Extremely large group of students, will this model still be practical and will the outcome still be successful? ... If you have ﬁve [students], then it’s easy. If you’re one supervisor and you have 20 or 30 students, then it might not be as feasible anymore. So ... I’m not sure if it’s a critical success factor or a dependability. In that view also, ... is the supervisor’s capacity”.

Furthermore, after the themes were identiﬁed the code of “critical success factors” that emerged are identiﬁed as collaboration, human capacity and infrastructural requirements. These critical success factors support the list of ﬁve factors of the institutional management factors, learning environment factors, instructional design factors, support factors and course evaluation factors [39]. Though most of the participants indicated that the focus group discussions were well-organised and presented, there were comments relating to the feedback required on the ASDF that are [FG 9 [00:37:34]] “theoretical” at this stage as the ASDF is not yet implemented and tested”. The researchers take cognisance of this and the implementation and testing of the ASDF is considered for future research. 6.2

Online Questionnaire Findings

In addition to the discussions in the focus group, the participants were asked to complete an online anonymous questionnaire, which also served as their consent to partake in the study. In the questionnaire, seven characteristics presented in the ASDF relating to simplicity, comprehensiveness, generality, exactness and clarity [37], usefulness [25] and feasibility [38] were used to measure the extent to which the proposed ASDF contributed to the CSCL in providing an environment that will augment the development of argumentation skills in graduate research. The questionnaire consisted of seven questions based on a ﬁve-point Likert scale. Following each of the seven questions, a space was provided in which the participants could respond in their own words. A ﬁnal space was provided where participants could list any additional suggestions. An example of the online questionnaire can be found at https://forms.oﬃce.com/r/t5tmRYKWKj. Of the question relating to simplicity, 31.1% of the participants indicated they agreed and 43.8% indicated that they strongly agreed that the quality of the proposed conceptual framework is uncomplicated in form and design and comprehends the essence of the modelled concepts and included comments such as “It is suﬃciently simple enough with 7 stages - with some broken down into sub-tasks. The components and how they lead to other components is intuiti (sic)” and “I found it well explained”. However, there was a comment that indicated that it was “... not completely clear what the central focus is - should the contents of the conceptual framework itself be evaluated or is it about the act [should be evaluated]”. The last

comment was made by a participant that was unsure whether the ASDF was already implemented or should the ASDF be evaluated from principles. This was addressed in follow-up focus groups, ensuring that the focus should be on the

272

R. van der Merwe et al.

evaluation of the ASDF as a guideline that can be used in the implementation of a graduate course. On the question relating to comprehensiveness, 31.1% agreed and 62.5% strongly agreed that the proposed ASDF includes and addresses most of the requirements in CSCL that can be used to enhance argumentation skills in graduate research and included comments such as “...the framework is (very) comprehensive, but it may need to accommodate social and cultural diﬀerences and aﬀordances, on the part of both lecturers/supervisors and students”. Comments on human

capacity critical success factors from the supervision point of view included governance from the university on supervisory capacity and diﬀerent supervisory styles. Comments on group size included “The smaller the group size the easier the interaction and assessment and feedback is” and various comments referred to the problem of English ﬁrst language and other language barriers as it could impact on the successful outcome of argumentation skills. There was also mention to alignment with existing frameworks and guidelines for graduateness. On the question relating to generality, 56.3% strongly agreed and 37.5% agreed that the proposed ASDF could be implemented in similar scenarios in CSCL environments that could augment argumentation skills for graduate students in research. In the comments section, the participants in general commented that it could be implemented in most graduate and postgraduate courses and mentioned that “... the discussion groups are a great idea. I advocate certain discussions that have minimal facilitator-intervention”. Of the question relating to exactness, 43.8% strongly agreed and 37.5% agreed that the proposed ASDF is as far as possible accurate and addresses the perceived requirements for a CSCL environment for the augmenting of argumentation skills in graduate research. The accurateness of the framework, in terms of the success rate of the student’s ﬁnal outcomes, falls outside the scope of this study. This is further emphasised in the comment “The framework does appear to be rigorous in addressing the requirements of CSCL and argumentation at a graduate level. But this will only be clear when it is implemented and evaluated!”

Although 50% strongly agreed and 37.5% agreed to the question relating to clarity, the comments from the participants were more diverse. Comments included that although the ﬂow is evident and correct, it was not clear as to what the purpose of the course represented in the ASDF was, as reﬂected by one of the participants as “Thought the subject matter was argumentation; did not gather that it was topic of own choice in which they APPLIED argumentation”. This comment was addressed in the follow-up focus groups and is discussed in detail in the section on pedagogical approaches for ASD. Of the question concerning usefulness, 68.8% of the participants strongly agreed and 25% agreed that the proposed conceptual framework is applicable in providing an environment that will augment the development of argumentation skills for graduate research.

Conceptual Framework for Argumentation Skills Development using CSCL

273

Of the question about feasibility, 62,5% of the participants indicated that they strongly agreed and 18,8% agreed that the proposed conceptual framework is feasible in providing a CSCL environment that will augment the development of argumentation skills for graduate research. The comments included the complexity as “The model may be too complex to comprehend in one go” and human capacity critical success factors that may impact the implementation of the ASDF. In the additional comments and feedback section, the participants agreed that the ASDF is well-designed and will be of use and “...that it will enhance the student argumentation”. From the comments, it was also noted that the ‘measuring’ of the eﬃciency of the framework will be diﬃcult. The participants recommended that the process be recorded “from beginning to end in an LMS or tool such as WA [sicWhatsApp] the qualitative data will be automatically recorded and can be used to show how the arguing skills of students improved - whether they are top students or those who struggle. The idea is to improve this skill as I understand it”. Other feedback

included a broader approach to argumentation skills development, to include the hermeneutical circle works and benchmarking the ASDF against the ACM and AIS Computing/IS curricula. Valuable links to academic articles and books were shared.

7

Revised ASDF

From the thematic analysis, the researchers identiﬁed that the human capacity code should encompass the student as the researcher, and the lecturer as the e-moderator. Refer to Fig. 4 where the supervision resources as an element is removed from the infrastructural requirements element and presented as a separate node. The human capacity element then consists of the student as researcher and the e-moderator. The key elements of the revised ASDF include then the course requirements that determine the requirements of the human capacity (consisting of both the student and the e-moderator), infrastructural requirements and the pedagogical approaches used in ODeL. The course requirements, pedagogical approaches, human capacity and infrastructural requirements are applied in the ODeL technology infrastructure. Evaluation of the approach is through learning analytics and evaluation. As for the development of argumentation skills, the scaﬀolded learning approach within the CSCL environment is provided by the LMS. The assessment of the output - in this study, the presentation of a well formulated argument - is conducted through the technology provided by the LMS.

274

R. van der Merwe et al.

Fig. 4. Revised argumentation skills developments framework (ASDF)

8

Conclusion

The developing and presenting of a well formulated research argument is core in the learning journey of a graduate student. The use of CSCL in ODeL plays an important role in providing a platform for graduate students to engage in academic discourse that will support the development of their argumentation skills. It was highlighted in literature that there is a need for a framework using CSCL that will contribute to the development of argumentation skills in graduate studies. From the online discussions, it was clear that the ASDF does not suﬃciently focus on the human capacity of both the student as a researcher and the emoderator. In the revised ASDF, (Fig. 4), this was addressed by removing the e-moderator (as supervisor) from the infrastructural requirements to its own space. The ﬁndings are conﬁrming that there is a need for a framework that can be implemented in a graduate course that will augment the development of argumentation skills. Furthermore, collaboration among students is of importance to foster their sense of working together to reach a higher goal, in this instance, the development of a well-formulated argument. The participants in the focus groups provided valuable insights into the ASDF. Furthermore, the themes that emerged from the discussions suggest that

Conceptual Framework for Argumentation Skills Development using CSCL

275

the key elements are required in a CSCL conceptual framework, and the conceptual framework can be used as a guideline when developing a research course with argumentation skills development embedded. From the feedback relating to the element of the human capacity, with the student as a researcher and the e-moderator, the researchers in this study realised that more research should be done to measure the social, cognitive and teaching presence of the learning experience. The theme relating to the use of the argumentation model, with speciﬁc reference to Toulmin, was widely discussed. Although there were suggestions for other models, the participants all agreed that Toulmin is a good and wellresearched model to implement. As to the theme relating to collaboration, the participants agreed that the scaﬀolded pathway and collaboration are to the advantage of the students’ research development. The mapping of Bloom’s taxonomy and the SAQA CCFOs in the learning path was commended, although there were participants that mentioned that some students may have to go back a step or two before advancing to the next level. The researchers acknowledge that there are some limitations to this study in that the ASDF is developed for incorporation into graduate courses in ODeL. Furthermore, the study included a relatively small number of participants in the various focus groups. To complete this paper, the researchers identiﬁed topics for further research, that include the research into the element of human capacity with speciﬁc focus on the critical success factors that may inﬂuence the success of the ASDF. Measuring of the educational experience from the students’ perspective in terms of social, cognitive and teaching presence has been identiﬁed as an area of further studies as well as research into determining whether the arguments presented by the students that were part of this graduate course improved their ﬁnal project and ﬁnal results. The learning analytics concerning the experience of the elements of the community of practice, namely practice, domain and community from the student’s perspective, falls outside the scope of this study and is considered for future research. Reﬂecting on the use of MS Teams as a platform for conducting focus group sessions, the researchers propose a need to identify the strengths and weaknesses of using virtual platforms for a comparative research study. Acknowledgement. This paper is based on the research supported by the South African Research Chairs Initiative of the Department of Science and Technology and National Research Foundation of South Africa (Grant No. 119606).

References 1. Rapanta, C., Walton, D.: The use of argument maps as an assessment tool in higher education. Int. J. Educ. Res. 79, 211–221 (2016). https://doi.org/10.1016/ j.ijer.2016.03.002

276

R. van der Merwe et al.

2. Andrews, R.: Argumentation in Higher Education. Improving Practice Through Theory and Research. Argumentation in Higher Education. 1st edn. Routledge, Milton Park (2009). https://doi.org/10.4324/9780203872710 3. Rapanta, C., Macagno, F.: Evaluation and promotion of argumentative reasoning among university students: the case of academic writing. Revista Lus´ ofona de Educa¸ca ˜o. 45, 125–142 (2019). https://doi.org/10.24140/issn.1645-7250.rle45.09 4. Manyike, T.V.: Postgraduate Supervision at an Open Distance e-Learning Institution in South Africa. S. Afr. J. Educ. 37(2), 1–11 (2017). https://doi.org/10. 15700/saje.v37n2a1354 5. Laeeq, K., Memon, Z.A.: Strengthening virtual learning environments by incorporating modern technologies. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) CompCom 2019. AISC, vol. 998, pp. 994–1008. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-22868-2 67 6. Pollard, R., Kumar, S.: Mentoring graduate students online: strategies and challenges. Int. Rev. Res. Open Distance Learn. 22(2), 267–284 (2021). https://doi. org/10.19173/irrodl.v22i2.5093 7. Vasquez-Colina, M.D., Maslin-Ostrowski, P., Baba, S.: Tapping into graduate students’ collaborative technology experience in a research methods class: insights on teaching research methods in a Malaysian and American setting. Int. J. Teach. Learn. High. Educ. 29(2), 281–292 (2017). https://eric.ed.gov/?id=EJ1146141 8. Fatimah, F., Rajiani, S.I., Abbas, E.W.: Cultural and individual characteristics in adopting computer-supported collaborative learning during Covid-19 outbreak: willingness or obligatory to accept technology? Manage. Sci. Lett. 11, 373–378 (2021). https://doi.org/10.5267/j.msl.2020.9.032 9. van Biljon, J., Pilkington, C., van der Merwe, R.: Cohort supervision: towards a sustainable model for distance learning. In: Tait, B., Kroeze, J., Gruner, S. (eds.) SACLA 2019. CCIS, vol. 1136, pp. 147–162. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-35629-3 10 10. Letseka, M.: Stimulating ODL research at UNISA: exploring the role and potential impact of the UNESCO chair. Open Learn. 36(2), 133–148 (2021). https://doi.org/ 10.1080/02680513.2020.1724780 11. Mochizuki, T., Chinn, C. A., Zimmerman, R.M.: Grasping evidence with EDDiE : a CSCL tool to support collaborative reasoning about disagreements in multiple documents. In: Proceedings of the 14th International Conference on ComputerSupported Collaborative Learning-CSCL 2021, pp. 271–272 (2021). https:// repository.isls.org//handle/1/7339 12. Hmelo-Silver, C.E., Jeong, H.: Beneﬁts and challenges of interdisciplinarity in CSCL research: a view from the literature. Front. Psychol. 11, 1–11 (2021). https:// doi.org/10.3389/fpsyg.2020.579986 13. J¨ arvel¨ a, S., Ros´e, C.P.: Advocating for group interaction in the age of COVID-19. Int. J. Comput. Support. Collaborative Learn. 15(2), 143–147 (2020). https://doi. org/10.1007/s11412-020-09324-4 14. Wenger, E.: Communities of Practice: Learning, Meaning, and Identity. In: Brown, J.S., et al. (ed.) Cambridge University Press, Cambridge (1999) 15. Toulmin, S.E.: The Uses of Argument. Cambridge University Press, Cambridge (2003) 16. Tsai, P.S., Tsai, C.C.: College students’ skills of online argumentation: the role of scaﬀolding and their conceptions. Internet High. Educ. 21, 1–8 (2014). https:// doi.org/10.1016/j.iheduc.2013.10.005

Conceptual Framework for Argumentation Skills Development using CSCL

277

17. Wenger-Trayner, E., Wenger-Trayner, B.: Communities of practice: a brief introduction, in STEP Leadership Workshop, University of Oregon, National Science Foundation (US) (2011). http://hdl.handle.net/1794/11736 18. Andrew, M., Arnold, J.: Collaboration, community, identity: engaged e-learning and e-teaching in an online writing course. In: The Australasian Society for Computers in Learning in Tertiary Education, pp. 106–117 (2011) 19. Maor, D., Currie, J.K.: The use of technology in postgraduate supervision pedagogy in two Australian universities. Int. J. Educ. Technol. High. Educ. 14(1), 1–15 (2017). https://doi.org/10.1186/s41239-017-0046-1 20. Zanjani, N., Edwards, S.L., Nykvist, S., Geva, S.: The important elements of LMS design that aﬀect user engagement with e-learning tools within LMSs in the higher education sector. Australasian J. Educ. Technol. 33(1), 19–31 (2017). https://doi. org/10.14742/ajet.2938 21. Blenker, P., Dreisler, P., Faergemann, H.M., Kjeldsen, J.: A framework for developing entrepreneurship education in a university context. Int. J. Entrepreneurship Educ. Univ. Context 5(1), 45–63 22. Jeong, H., Hmelo-Silver, C.E.: Seven aﬀordances of computer-supported collaborative learning: how to support collaborative learning? How can technologies help?. Educ. Psychol. 51(2), 247–265 (2016). https://doi.org/10.1080/00461520. 2016.1158654 23. Salmon, G.: E-tivities: The Key to Active Online Learning. On the Horizon, 2nd edn. Kogan, London (2003) https://doi.org/10.4324/9780203074640 24. Salmon, G.: e-Moderating: The Key to Teaching and Learning Online, 3rd edn. Taylor & Francis Group Routledge Falmer, London and New York (2011) 25. Li, K.M.: Learning styles and perceptions of student teachers of computersupported collaborative learning strategy using Wikis. Australas. J. Educ. Technol. 31(1), 32–50 (2015). https://doi.org/10.14742/ajet.521 26. Oh, E.G., Kim, H.S.: Understanding cognitive engagement in online discussion: use of a scaﬀolded, audio-based argumentation activity. Int. Rev. Res. Open Distance Learn. 17(5), 28–48 (2016). https://doi.org/10.19173/irrodl.v17i5.2456 27. Ali, W.: Online and remote learning in higher education institutes: a necessity in light of COVID-19 pandemic. High. Educ. Stud. 10(3), 16–25 (2020). https://doi. org/10.5539/hes.v10n3p16 28. Manca, S.: Snapping, pinning, liking or texting: investigating social media in higher education beyond Facebook. Internet High. Educ. 44, 1–13 (2020). https://doi.org/ 10.1016/j.iheduc.2019.100707 29. Jung, I., Lee, J.: A cross-cultural approach to the adoption of open educational resources in higher education. Br. J. Educ. Technol. 51(1), 263–280 (2020). https:// doi.org/10.1111/bjet.12820 30. Van der Merwe, R., van Biljon, J.: Trends, drivers and barriers inﬂuencing cloud computing services for mobile interactions in teaching and learning. In: Proceedings of 2nd Conference on Information Communications Technology and Society (ICTAS), pp. 57–62 (2018). https://uir.unisa.ac.za/handle/10500/23686 31. Eneje, S.: Real-world applications of mobile learning tools in engineering: prospects, hindrances and accessibility in conjunction with scholastic views. In: 2020 IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1–8 (2020). https://doi.org/10.1109/CCECE47787.2020.9255769 32. Kotze, D.A.: Theoretical framework for open distance learning: a South African case study. Independent J. Teach. Learn. 16(1), 10–23 (2021)

278

R. van der Merwe et al.

33. Hirata, H., Okada, S., Nitta, K.: Analysis of argumentation skills for argumentation training support. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) Intelligent Computing. Comp 2019. Advances in Intelligent Systems and Computing, vol. 997 (2019). https://doi.org/10.1007/978-3-030-22871-2 23 34. Lee, C.I., Yang, Y.F., Mai, S.Y.: The impact of a scaﬀolded assessment intervention on students’ academic achievement in web-based peer assessment activities. Int. J. Distance Educ. Technol. 14(4), 41–54 (2016). https://doi.org/10.4018/IJDET. 2016100104 35. Anderson, L.W., Krathwohl, D.R.: A taxonomy for Learning Teaching and Assessing - A Revision of Bloom‘s Taxonomy of Educational Objectives. Longman, 1st edn. Pearson, Londono (2001) 36. Altebarmakian, M., Alterman, R.: Cohesion in online environments. Int. J. Comput. Support. Collaborative Learn. 14(4), 443–465 (2019). https://doi.org/10. 1007/s11412-019-09309-y 37. Oliver, M.S.: Information Technology Research: A Practical Guide for Computer Science and Informatics, 3rd edn. Van Schaik Publishers, Pretoria, South Africa (2013) 38. Jung, I., Sasaki, T., Latchem, C.: A framework for assessing ﬁtness for purpose in open educational resources. Int. J. Educ. Technol. High. Educ. 13(1), 1–11 (2016). https://doi.org/10.1186/s41239-016-0002-5 39. Cheawjindakarn, B., Suwannatthachote, P., Theeraroungchaisri, A.: Critical success factors for online distance learning in higher education: a review of the literature. Creative Educ. 03(08), 61–66 (2012). https://doi.org/10.4236/ce.2012.38b014

Single Access for the Use of Information Technology in University Education During the SARS-CoV-2 Pandemic José L. Cendejas Valdez1(B) , Heberto Ferreira Medina2,3 , María E. Benítez Ramírez1 , Gustavo A. Vanegas Contreras1 , Miguel A. Acuña López1 , and Jesús L. Soto Sumuano4 1 Departamento de TI, Universidad Tecnológica de Morelia,

Cuerpo academico TRATEC - PRODEP, 58200 Morelia, Michoacán, Mexico [email protected] 2 Instituto de Investigaciones en Ecosistemas y Sustentabilidad-UNAM, 58190 Morelia, Michoacán, Mexico 3 Departamento de Sistemas y Computación, Tecnológico Nacional de México, 58117 Morelia, Michoacán, Mexico 4 Departamento de TI, CUCEA - Universidad de Guadalajara, Cuerpo académico de TI - PRODEP, 45130 Zapopan, Jalisco, Mexico

Abstract. The proper use of information technology (IT) in Universities during the Covid-19 pandemic has been of vital importance, so it is necessary to generate a methodological-technological proposal that serves as a guide to meet the needs of educational programs and users (students, teachers, and administrative staff) who work from home. The present research presents an exploratory, descriptive, correlational, and cross-sectional study composed of (1) review of the literature, (2) application of a survey, (3) study of information reliability (Cronbach’s alpha), (4) study of correlations (Pearson’s bivariate) and (5) proposed solution. Generating an analysis of different Universities in Mexico and identifying how they are attending the educational processes and the technological tools they use. Based on the above, a methodological-technological proposal is presented based on resilience. It provides access to information and digitization of the educational processes of any university, which helps users adapt to the new hybrid environment achieving an acceptable level of digital maturity at all levels. In addition, it provides a set of technological options that the user can implement and select the most appropriate based on the needs of their academic processes, integrating academics, students, administrators, and those responsible for the computer centers through a single digital platform and with a single username and password. Keywords: IT · Universities · Digital maturity · Resilience

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 279–293, 2022. https://doi.org/10.1007/978-3-031-10467-1_17

280

J. L. C. Valdez et al.

1 Introduction The outbreak of Coronavirus (2019-nCoV) known worldwide as Covid-19 has spread to practically the entire world. Through the report generated by the Municipal Health Commission of Wuhan, China on December 31, 2019, and on January 26, 2020, the World Health Organization (WHO) announced a high-risk epidemic in China and worldwide. The world has reacted to COVID-19 quickly and responsibly; Government social distancing measures have helped slow the spread of the virus. However, this came to modify the lifestyle of all people, mainly in educational institutions of all levels, especially in universities and research centers commonly known as Higher Education Institutions (HEI). Technology gives strength to each process of organizations that are on the path of digital transformation and that can adapt to the new conditions; Likewise, those that are slower to adopt digitization and automation in their processes will have to work in an accelerated manner to stay present during the duration of the pandemic. IT has become a necessary tool, which is part of daily life and are tools that have transformed society to such a degree that it is difficult to live without them. Therefore, at this time, access to IT must be efficient, so that it satisfies the need for the automation of user information and above all that it allows attending to technological aspects, intending to improve their administrative processes and the most important thing is to provide quality services that contribute to the fulfillment of the teaching-learning process of the Universities. IT is a crucial part of the teaching-learning process of universities, so it is essential to know the tools used to meet and achieve the objectives set, in addition to determining which are the most efficient for students, teachers, staff administrative and computer center managers to help them trigger their creativity and skills to the fullest. For these reasons, this study investigates which are the main IT that is being used to attend and comply with the teaching-learning process and the administrative processes. For this reason, a methodological-technological proposal was developed where the levels and technologies that must be implemented or improved in a gradual way are determined to serve the users of the universities and in this way accompany the users in the fulfillment of their daily activities and thus obtaining a level of digital maturity. The use of IT in Higher Education during the Covid-19 pandemic has made it possible to generate a methodological-technological proposal that serves Higher Education Institutions (HEIs) to be more efficient to meet educational programs and meet the needs of students, teachers, and administrators at a distance. Therefore, the presented methodological-technological proposal will give access to information and the digitization of processes in Universities, helping to progressively improve, adapting to the environment, and generating an acceptable level of digital maturity of all users (teachers, students, and administrators) of the HEIs. This document contains four main parts which are: (I) introduction, (II) literature review, in which a theoretical investigation of the main issues that contextualize IT during this pandemic and how it helps the University processes; (III) methodology, which allowed the compilation of the information that supports the proposal, the correlations and the grouping of the data. (IV) Results, where the methodological proposal and the level of digital maturity are shown through the development of an application.

Single Access for the Use of Information Technology in University

281

2 Literature Review 2.1 Information Technology Context IT has become the engine and catalyst for organizations in the world, so workers must have an academic profile that is supported through IT. According to the AMITI [1], that presented the report on the current state of IT in Mexico, and the best jobs by income in Mexico related to the area of technologies, which were: (1) database administrator (DBA), (2) Distribution Services Manager, (3) Solution Architect, (4) Systems Engineer (DevOps), (5) Software Engineer, (6) Solution Developer, (7) Network Engineer, (8) Developer, (9) Java Developer and (10) support engineer. According to [1], the following data primarily stand out: • There are 75 million Internet users (mu), 83 mu of cell phones, 51 mu of computers, 69 mu of smartphones, 16 mu of homes with Internet, 17 mu with pay TV, 31 mu of digital signal at home. • There is a clear disadvantage in the use of IT in the rural population; only 19% have a computer, 41% were Internet users, 19% have Internet access at home, 57% use a cell phone, and 38% use smartphones. • The 10 states with the greatest advance in these indicators are; (1) Mexico City, (2) Baja California, (3) Sonora, (4) State of Mexico, (5) Nuevo León, (6) Chihuahua, (7) Quintana Roo, (8) Baja California Sur, (9) Querétaro and 10) Aguascalientes. According to [2] report, entrepreneurship in IT-oriented companies in Mexico focuses on: • Innovation. This factor has little research and development (R&D), it is recognized that in Mexico 30% comes from the private sector, while in other countries it is reported up to 80%. The lack of funding mechanisms for research, patents, and research centers is an example. • Environment to do business. In 2016, Mexico ranked 65th in the Doing Business ranking (best place to do IT business), by 2020 it ranked 60th. • Support infrastructure. The incubator project and the Prosoft program in 2006 were a great advance, however by 2020 these projects were closed, and the incubators supporting the sector have practically closed. There is no government support. • Human capital. In 2016 it was recognized that 56% of IT companies had difficulty finding trained personnel. Although progress in the areas of training has been significant, by 2020 45% of recruitment companies recognize that there is still a lack of personnel trained in IT. • Financing. In 2016, 53% of companies recognized facilities to access financing, for the period from 2015 to 2020, [3] recognized a growth of 13.7% in financing to companies in the sector. • Demand profile. In this context, in Mexico, IT companies are doing well since the demand for IT services grew by 50% until 2020. Another important growth is estimated with the T-MEC.

282

J. L. C. Valdez et al.

• Integration with the IT Industry. Although there was the impulse of IT clusters and financing through PROSOFT, the integration with the industry was partially successful, by 2020 many companies mentioned difficulties to integrate or use the IT services since they are outdated. • Entrepreneurial culture. Mexico presents a lack in this area since it is recognized that 58% of the businesses that start were IT entrepreneurs, compared to Brazil, which was 80%. • Government Strategy. Until 2019, 73% recognized that there were government programs to encourage the sector, however, the lack of support and government monitoring has been a barrier. Therefore, it is necessary to train university workers and students to encourage the use of IT to achieve an optimal degree of digital maturity. 2.2 Digital Maturity Today digital maturity is a vital element for any organization and it is required to have virtual technologies and platforms that help achieve the processes and objectives set. This requires not only technology but also trained IT human resources. This will allow the development of activities that are carried out daily through technology. The author in [4], describes digital maturity as the set of practices for the development of organizational strategies using digital assets that allow the creation of added value that differentiates the company from others. In the study carried out by the consulting firm Capgemini and by the Massachusetts Institute of Technology-MIT [5], it is mentioned that companies that carry out a digital transformation in their organizations are particularly benefited in financial performance. According to [6], they show that it is relevant to generate a digital transformation by organizations and it is considered that this process is slow but with constant progress. Considering that digital maturity is not related to the investment in technology that the company requires, but rather to the management capacity of the company towards business intelligence (BI). The latter concept being known as a combination of processes, policies, culture, and technologies to gather, manipulate, store and analyze the data collected by internal and external sources, to communicate information and generate knowledge, supporting decision-making. An organization that seeks digitization goes through different levels to achieve a degree of digital maturity. The difference between each of the levels is not linear, but exponential since a company that has a digital approach is more efficient than a traditional company.

Single Access for the Use of Information Technology in University

283

For this reason, native startups with a digital focus are capable of competing, although they have fewer resources than those of traditional companies that are well-positioned in the market. According to what is proposed by [7], it mentions that there are four levels of maturity, which are described in Table 1. Table 1. Levels of digital maturity in the organization. Level of maturity Description Beginner

Beginning companies in the search for digitization have developed their digital channels, have not exploited the potential of the web, and do not offer all their products and services through it. They also do not usually have mobile applications and, if they do, they are hardly useful to users in terms of the services they provide. Digital initiatives are beginning to be considered due to market reaction and competition, but business intelligence is still light-years away. Management is still reluctant to change and is characterized by its immaturity in digital culture and skepticism about the value that this process brings to them. These companies, regardless of the sector, are running a significant risk by not addressing digitization

Intermediate

Companies at this level are beginning to listen to customer needs, but are still far from having a customer-centric approach. Its digital channels are half developed and the experience is not uniform in all of them, with the mobile channel being the great victim. However, although they are taking steps towards being obsessed with the customer, they are still far from being able to personalize their business and integrate data-driven intelligence. These companies have already identified the need to transform and digital culture is often present in small groups or departments. However, they need a strategic plan (not project by project) that allows them to achieve objectives and extend that digital culture to the entire organization

Advanced

Companies with an advanced level of digitization go through many transformative initiatives, allowing them to build a digital culture and organization. All digital channels are fully developed. Companies at this level offer their products and services from all channels without problems. The omnichannel experience is complete Besides, the user experience is completed with some customization and they have an advanced analysis of the data, which allows them an additional segmentation to the socio-demographic segmentation. To achieve the highest level, they have to put the customer at the center of their strategy, unite technology and business with this goal and exploit the use of their data on a massive scale (continued)

284

J. L. C. Valdez et al. Table 1. (continued)

Level of maturity Description Expert

These companies have mastered the digital transformation, not only in the current state, but they are prepared to follow their customers and meet their needs. For these companies, their customers are special and therefore their business has a customer-centric and Omnichannel approach from design. And not only that, they can anticipate the needs of customers to exceed their expectations. There is no need for synchronization or integration between channels, they share data in a centralized, unified, and customer-focused point. Besides, they use real-time intelligence for all company operations. They have an empirically adopted strategic plan with digital initiatives that are monitored based on Key Performance Indicator (KPIs) to measure both the results and the value generated. Technology and business areas go digital and work together in an integrated organization. They are responsible for changing the rules of the game, their organization and culture is digital and agile

2.3 Digital Maturity The appearance and growing importance of knowledge as a factor of production during this period of pandemic make the use and development of technologies, methodologies, innovation, and strategies for its measurement, creation, and dissemination a priority for organizations and Universities. However, [8], consider that their development has made knowledge an indispensable element for economic and social development. Using knowledge management, the concept of business intelligence (Business Intelligence, business intelligence, or business intelligence) arises. According to [9], this is the name given to the set of strategies, actions, and tools focused on the administration and creation of knowledge through the analysis of existing data in an organization or company. The author in [10], mentions that in knowledge management these strategies allow following a set of actions for the intelligent company, and that gives an advantage over its competitors, mainly because the value added to the services or products that are a consequence of these actions are efficient in their production and operation that can hardly be replicated by those that do not have these defined processes or strategies. 2.4 Digital Disruption and Digital Platforms for Education Currently, HEIs are immersed in the so-called digital era or society, which has caused a greater dependence on technology, IT innovates on a day-to-day basis, technological developments in different areas such as artificial intelligence, virtual and augmented reality, internet of things, smart devices and big data; They are examples of innovations generated by technological and scientific advancement, this is classified as digital disruption. According to [11], they mention that innovation is disruptive when a product or service is born and it becomes a leader replacing a previous technology.

Single Access for the Use of Information Technology in University

285

Today, so many disruptive innovations occurred in such a short time as now in the digital society and during this pandemic. Digital disruption is a change that breaks with the model and what was previously a leader leaves its place to these new proposals, eventually changing certain ways of life and professional development, [12]. An educational digital platform can be defined as a wide range of disruptive computer applications installed on a server whose function is to facilitate the creation, administration, management, and distribution of courses through the internet for teachers, [13, 14]. Table 2 shows a descriptive summary of the e-learning platforms most used by educational institutions during the current pandemic. Table 2. Descriptive summary of disruptive platforms, during the SARS-CoV-2 Pandemic. Name

General description

Type of license

Moodle

This Learning platform is designed to provide educators, administrators, and students with a single, robust, and secure integrated system for creating personalized learning environments [15]

Open source

Edmodo

Communication and collaboration platform with the capabilities of a Learning Management System (LMS). It can be used anywhere learning occurs, whether in person, online, or a combination of both [16]

Free platform

Blackboard

Virtual learning environment and a LMS - Saas (Software as a Service) learning management system developed by Blackboard Inc [17]

Chamilo

Open source virtual campus that is distributed under the GNU/GPLv3 license, and that any person, institution or company can freely use for the delivery of training actions over the internet [18]

Open source

Evol Campus

LMS platform that simplifies online training, hosted in the cloud. Available anytime, anywhere. One of its main characteristics is that it is scalable

LMS - Saas (Software as a Service)

Canvas

Learning Management System (LMS). Open source In terms of adaptation, Canvas is open LMS - Saas (Software as a Service) source software that allows the platform to be customized according to the specific needs of the institution or users. It is usually hired under the SaaS model (Software as a Service)

286

J. L. C. Valdez et al.

The data previously shown does not have an order of priority, since the choice of one or another e-learning platform will come from the specific and concrete needs of the institution in its organizational, pedagogical, technological, and economic dimensions, described by [19]. The purpose of this study is to evaluate the technological level of Mexican HEIs and the behavior of university users in the use of the technology implemented during the pandemic. The research presents a methodological-technological proposal based on the following steps: (i) survey, (ii) data analysis, (iii) results, and (iv) discussion and conclusions.

3 Methodology Once we know the problem related to the Universities and the main IT that they are using to meet the needs of their professors, students, and administrators. The scope of the research is a study of the type: (1) exploratory, (2) transectional, (3) descriptive and (4) correlational. The steps to follow in this research are shown in Fig. 1.

Fig. 1. Methodological model with the stages of the investigation.

To determine the population, a geographical search was made and the higher-level schools of the state of Michoacán in Mexico were selected, which are found in the National Information System of Higher Schools of the Ministry of Public Education, [20]. Select 30 HEIs that have bachelor’s, master and doctoral programs. To obtain the sample size, the finite population method was applied, which is shown in formula 1. n=

z2 × p × q × N e2 (N − 1) + z 2 × p × q

(1)

Single Access for the Use of Information Technology in University

287

where: p = % of times a phenomenon is assumed to occur in the population. q = is the non-occurrence of the phenomenon (1–p). e = is the maximum error allowed for the sample. N = population size. z = % of desired reliability for the sample. The sample was determined with a confidence level of 95% and a margin of error of 5%, resulting in 23 universities surveyed. The substitution data is shown in Table 3. Table 3. Data substitution. Reliability level 95%

p = 50%

e = 5%

Z = 1.96

q = 50%

N = 30

n = 28

The survey developed was divided into four sections: (1) respondent data, (2) services and technological tools, (3) digital maturity and (4) user satisfaction. The survey consisted of a total of twenty-eight items using the Likert scale. According to [21], this type of scale “represents a valuable alternative for data collection in quantitative research that seeks to obtain information on the predispositions, attitudes, evaluations, and opinions that a population has on a particular issue”. The survey items were answered mostly by the heads of the administration department, academics, and heads of the IT departments of the selected Universities. The reliability study was carried out through Cronbach’s alpha, obtaining reliability of 0.816. Subsequently, the correlation study was carried out where the highest correlation was 0.770 between the P20 items that refer to “Digital maturity is an opportunity that attracts new students to Universities, either through improved classrooms or experiences of online learning” and Q21, which mentions whether “It considers that digitally mature organizations are competitive”, it is clearly evident that there is a relationship between these two variables, where the main issue is the level of digital maturity of the Universities. This will allow attracting students who are interested in the use of technology and at the same time be a competitive Institution for being at the forefront in the use of new platforms, as shown in Table 4. Table 4. Results of the correlation study P8

P9

P10

P15

P19

P20

P22

P2

−.090

.297

.662

.180

.443

.117

−.188

P7

.024

.696

.420

.154

.060

.192

.238

P20

.375

.079

.194

.664

.171

1.000

.536

P21

.295

.140

.194

.664

.067

.770

.302 (continued)

288

J. L. C. Valdez et al. Table 4. (continued) P8

P9

P10

P15

P19

P20

P22

P22

.251

.119

.043

.267

−.005

.536

1.000

P23

.155

.125

.292

.632

−.309

.563

.426

P24

−.367

.227

.428

.090

−.153

−.010

−.329

4 Results The methodological-technological proposal of digital maturity includes the services and technologies that educational institutions have used to solve the management and control needs of remote school activities during the pandemic. The analysis of data on the digital services that were required to continue with the academic activities in the face of the health emergency helped to provide feedback on the current state of digital maturity in which the Universities are and serve as a basis for a better proposal that seeks to bring institutions and users to a level of expert digital maturity. The services required to guarantee the quality of distance education are diverse and one of the main drawbacks is the lack of integration of all of them through a single tool and a single account for access, currently, each service requires a different application, which generates in students and teachers the need to control multiple users and information from different sources at all times. The proposal contains three levels which are shown in Fig. 2 and are described below: (a) Level 1.- Hyper-convergence. At this level, the digital services needed by the universities are integrated, through the combination of computing resources such as physical and virtualized server equipment, computer networks, use of security policies, and storage services. (b) Level 2.- Application. It is proposed to carry out the administration of the different platforms, both educational, administrative, and technical support processes. (c) Level 3.- Digital services for the user. This level will allow interaction with students, teachers, area managers, and users in general who have a direct and indirect relationship with the University. In Fig. 3 and Fig. 4 show the process to be followed through the interfaces of a mobile app that integrates the different digital services used in the distance education service.

Single Access for the Use of Information Technology in University

289

Fig. 2. Methodological-technological proposal of single access to reach the digital maturity of the users and the organization.

Fig. 3. Educational service of the hyperconvergence system.

290

J. L. C. Valdez et al.

Fig. 4. Notifications through the hyperconvergence system.

The software that accompanies the digital maturity proposal aims to collect information from the digital management and control services of (1) academic activities: (2) administrative processes and school control, (3) learning platform, using an application of comprehensive hyper-converged communication, which guarantees quick, simple and efficient access to academic monitoring information, as shown in Fig. 5.

Single Access for the Use of Information Technology in University

291

Fig. 5. Report card through the hyperconvergence system.

5 Conclusion Thanks to the analysis of the current challenges faced by HEIs in the contingency of COVID 19, we know the tools that each institution uses to face the need to provide distance education but the importance of HEIs and how they direct their efforts to achieve digital maturity both in their technological infrastructure and users.

292

J. L. C. Valdez et al.

The use of digital assets allows HEIs to offer better services and have a competitive advantage over institutions that do not adapt to the digital needs they face today. To guarantee the path of HEIs to digital maturity, a proposal was created, it takes as a basis the technologies currently adopted by the institutions to solve their control needs and adding a new system that implements the concept of “hyper-convergence”, which allows gathering in a single system the information contained in the different applications, favoring access to information simply and efficiently in a single tool. The digital maturity proposal offers a guide for HEIs so they can have access to virtual technologies and platforms that solve their process management needs, in addition to moving towards digital maturity and offering quality services in a market that every day requires a greater technological adaptation. The digital maturity proposal is a simple way to integrate and use different technological tools to adapt them to the process of HEIs. Acknowledgments. Thanks for the support of the Technological University of Morelia (UTM), the Institute for Research in Ecosystems and Sustainability (IIES) – UNAM and the Guadalajara University (UdeG); for their help in the application of the survey and the application of the supporting software. As well as for the publication of this paper.

References 1. AMITI Homepage: Informe sobre el estado de las TIC’s en México. https://amiti.org.mx/wpcontent/uploads/2011/10/Informe-de-actividades-AMITI-2020.pdf Accessed 22 Nov 2020 2. IMCO-MICROSOFT: Informe sobre emprendedores en México. https://imco.org.mx/wpcont ent/uploads/2014/05/20140507_Los_Emprendedores_de_TIC_en_Mexico.pdf Acceced 14 Sep 2020 3. CANIETI: Informe de financiamiento para empresas de TIC’s.http://www.canieti.org/Com unicacion/noticias/canietinforma.aspx Accessed 19 Sep 2020 4. Sandberg, J.: Digital capability: investigating coevolution of IT and business strategies. Ph.D. Dissertation, Umeå University (2014). http://urn.kb.se/resolve?urn=urn:nbn:se:umu: diva-88722, ORCID iD: 0000–0003–0602–5404 5. MIT: Lifting the lid on corporate innovation in the digital age. (2020) https://www.teksys tems.com/en/insights/state-ofdigitaltransformation2021/?ecid=ps_tek_p_cligen_xx_sttfdg tltrns_google_xx_xx_20210305_ad2a2f81&vendor_id=4100&gclid=CjwKCAiA1aiMBhA UEiwACw25MaDDRRGgTTYuk6EUx7zGXKGPHxaKdpZOlJ3nLSbcIvQ27mDpGU_ FrRoCaPkQAvD_BwE Accessed 19 Oct 2020 6. Fitzgerald, M., Kruschwitz, N., Bonnet, D., Welch, M.: Embracing digital technology. MIT Sloan Manag. Rev. (2013) 7. Paradigmadigital: Estrategia Digital y Transformación Digital; Los 4 niveles de madurez de la transformación digital, ¿en cuál está tu compañía? https://www.paradigmadigital.com/tec hbiz/los-4-niveles-madurez-la-transformacion-digital-esta-compania/ Accessed 17 Sep 2020 8. Michelo, J., Medellin, E., Hidalgo, A.Y., J.assó, J.: Conocimiento e innovación. In: Retos de la Gestión Empresarial, pp. 25–55. UAM-UNAM-Plaza y Valdés, México (2008) 9. Ahumada-Tello, E., Zárate Cornejo, R.E., Plascencia López, I., y Perusquia-Velasco, J.M.: Modelo de competitividad basado en el conocimiento: el caso de las pymes del sector de tecnologías de información en Baja California. Revista International Administración & Finanzas 5(4), 13–27 (2012)

Single Access for the Use of Information Technology in University

293

10. Larson, B.: Delivering Business Intelligence. McGraw Hill, New York (2009) 11. Bower, J.L., y Christensen, C.M.: Disruptive technologies: catching the wave. Harvard Bus. Rev. 73(1), 43–53 (1995) 12. Schwab, K.L.: cuarta revolución industrial. Penguin Random House, Barcelona (2016) 13. Sánchez Rodríguez, J.: Plataformas de enseñanza virtual para entornos educativos Pixel-Bit. Revista de Medios y Educación, núm. 34, enero, 2009, pp. 217–233 Universidad de Sevilla Sevilla, España (2009) 14. Garcia Aretio, L.: Necesidad de una educación digital en un mundo digital. RIED. Revista Iberoamericana de Educación a Distancia 22(2), 09–22 (2019) https://doi.org/10.5944/ried. 22.2.23911 15. Moodle.org.: Acerca de Moodle. https://docs.moodle.org/all/es/Acerca_de_Moodle Accessed 17 Nov 2020 16. New.edmodo.com.: Learn More. Obtenido del sitio web oficial de Edmodo. https://go.edm odo.com Accessed 17 Nov 2019 17. Blackboard: Acerca de nosotros. Obtenido del sitio web oficial de Blackboard.com. https:// www.blackboard.com/es-lac/try-blackboard Accessed 11 Aug 2019 18. Chamilo: Chamilo LMS y la asociación. Obtenido del sitio web oficial de la plataforma educativa Chamilo. https://chamilo.org/es/chamilo/ Accessed 22 Sep 2020 19. Salinas Ibáñez, J., de Benito Crosetti, B., Pérez García y A., Gisbert Cervera, M.: Blended Learning, más allá de la clase presencial, RIED. Revista Iberoamericana de Educación a Distancia, 195–213 (2018) 20. SEP-Secretaría de Educación Pública: Sistema Nacional de Información de Escuelas. Recuperado el 17 de Octubre de 2017, de Sistema Nacional de Información Estadística educative. (2019). https://siged.sep.gob.mx/SIGED/escuelas.html Accessed 17 Dec 2019 21. Fabila Echauri, A.M., Minami, H., Izquierdo Sandoval, M.J.: La escala de Likert en la evaluación docente: acercamiento a sus características y principios metodológicos. Perspectivas docentes 50: TEXTOS Y CONTEXTOS, 31–40 (2013)

Camcorder as an e-Learning Tool Isaeva Oksana(B) , Boronenko Yuri, and Boronenko Marina Yugra State University, 16 Chekhov street, 628012 Khanty-Mansiysk, Russia [email protected]

Abstract. With distance learning, the issue of organizing and conducting laboratory work in physics becomes relevant. It is clear that it is impossible to completely translate experimental science into a remote format, but thanks to modern video cameras, this is partially possible. Our goal: to develop a method for analyzing video files for remote laboratory work in physics without using programming skills. As an example, laboratory work from the section “Mechanics” and “Thermal radiation” are given. An algorithm for obtaining experimental data, an algorithm for spatial calibration in the ImageJ program is described. The results of the correlation analysis of student feedback on the proposed methodology are presented. It is shown that 25% of students are ready to overcome difficulties arising in the process of mastering the methodology in order to obtain additional competencies (p = 0.66). The majority of students (75%) consider tasks to be extremely difficult and they cannot get the desired skills and abilities. The above facts indicate that the proposed methodology is intended for students with high motivation to learn. They also testify in favor of students’ independent choice of methods, techniques and tools used in experiments. Keywords: Physics · Laboratory work · Video camera · ImageJ

1 Introduction An important feature of physics, both a science and an academic subject, is its experimental nature. Therefore, experiment should be at the forefront of teaching physics [1]. The most difficult thing in distance learning is to organize high-quality laboratory work. It is necessary that the work be, firstly, actively performed by students, and secondly, teach them the most important elements of experimental work: the main methods of conducting an experiment and processing its results [2]. In addition, the acquired knowledge and skills should work for the future. By teaching students to use the technical means (TS) that artificial intelligence will use, we automatically give young professionals a chance to survive in a rapidly changing external environment. Thus, in the context of distance learning, the issue of improving the organization and conduct of a laboratory workshop in physics is very relevant. It is obvious that one of the integral vehicles of artificial intelligence will be a video camera. Using a video camera to record experiments, students gain an expanded set of competencies that allow them to find new solutions to existing problems in the field of professional activity. An additional advantage of laboratory work is the lack of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 294–304, 2022. https://doi.org/10.1007/978-3-031-10467-1_18

Camcorder as an e-Learning Tool

295

possibility of cheating, because everyone has their own, always a new video file. This type of laboratory is intended for students who want to connect their lives with research, the development of new methods-a high-level specialist [3].

2 Overview Basically, a laboratory practice in physics with a distance learning method is built on the basis of the use of virtual laboratory works [2–5]. Yu.A. Portnov, I.R. Malshakova [6] considers the methods of conducting laboratory work in physics in the format of remote work. The authors step by step consider the difficulties in organizing a remote laboratory workshop and suggest ways to solve them. The article presents a model of the organization of the measuring part of the laboratory work in physics “Determination of the coefficient of viscosity of a liquid by the Stokes method”. However, in this work, students do not take part in the filming itself, the material has been prepared in advance by the teacher. There is also the possibility of using digital tools to conduct research on natural science subjects [7]. For example, V.N.Budilov, P.V. Dremanovich [8] using a video camera to measure the frequency of mechanical vibrations. However, the authors use a webcam, which does not allow for high-definition video. I. Merkulov, AI Molovtseva [9] considers the use of high-speed survey to improve the accuracy of physical measurements. However, not all educational institutions can afford to use such a camera. Filippova I.Ya. [10] describe video analysis as a modern tool for a physics teacher and a sequence of actions in video analysis. Video analysis assumes the presence of video material. Video fragments can be filmed using a web camera, a consumer video camera, and a digital camera. However, these tools will not always produce high-definition video. Korneev VS, Raikhert VA [11] consider the process of introducing a complex into a laboratory workshop on physics, which uses processing of a digital video camera signal using the Matlab software package using the example of the laboratory work “Uncertainty ratio for photons”. To do this work, you need to learn how to work in the Matlab program, which is not easy enough. Korneev VS, Raikhert VA [12] consider examples of laboratory work on wave optics, in which computer processing of images of interference patterns (Newton’s rings) obtained by a digital video camera is performed using the MS Office Excel program. Elliott K. H., Mayhew C. A. [13] combined a commercial CCD camera with a computer video digitizer to provide an inexpensive linear 2D optical detector that can be used to collect spectral and spatial information. The equipment is capable of storing images in pixels with eight-bit precision. Software has been developed to process and display the data. The system was used in the physics laboratory of the first year for quantitative measurements of the position and intensity of Fraunhofer diffraction patterns and registration of optical radiation spectra. Bonato J. et al. [14] propose the use of smartphone-based slow motion video analysis methods as a tool to investigate the physical concepts governing the propagation of mechanical waves.

296

I. Oksana et al.

As a result, students who do not have programming skills face an additional problem: how to extract the necessary data from a video file? In our work, we set ourselves the goal: to develop a method for analyzing video files for remote laboratory work in physics without using programming skills.

3 Methodology for Conducting Remote Laboratory Work A feature of the proposed method of conducting remote laboratory tests is that students independently plan an experiment after setting a goal for them. Using a video camera, you can study the laws of kinematics, dynamics, photometry, etc. 3.1 Application of a Video Camera for Laboratory Work in the “Mechanics” Section Let’s consider the process using the example of the laboratory work “Study of variable motion” (Fig. 1). Before filming, students need to think over the conditions for conducting the experiment: choose a body that will move, ensure a change in speed during movement, etc. In addition, you need to correctly position the video camera: set it motionless, the lens must be parallel to the plane of video filming, the distance from the objects being filmed to the camera big enough.

Fig. 1. a) Filming the collision of two bodies; b) Variable motion of bodies.

It is difficult to correctly perform video filming if the student does not understand what exactly is being investigated and what patterns are being checked. In order not to redo the work, students have to carefully read the theory before filming. In general, the algorithms in all laboratory work include the execution of work on the receipt and analysis of video files, which refers to video analytics. This is the first type of problem. The second type of problems relates directly to physics. After the video file has been shot, it is necessary to extract the coordinates of the moving body from it. You can use the

Camcorder as an e-Learning Tool

297

free software ImageJ to analyze video files. This program allows you to process video files in avi format (mjpeg codec). Since smartphones save videos in a different format, you need to convert the captured video file first. Then load into the ImageJ program. Spatial calibration is carried out by taking a snapshot of the ruler and entering the resulting scale into the ImageJ program. The spatial calibration algorithm is as follows. 1. We get a snapshot of the ruler carried out on the same video camera and with the same lens that is used in the experiment. The distance the ruler is from the lens should be equal to the distance the object was from the lens. The ruler divisions should have clear outlines (as well as the details of the surface of the investigated object in the experiment). After fulfilling the above requirements, we register a frame with a ruler image. 2. Load the ruler image into the ImageJ program. If we know the linear dimensions of the bodies with which we conduct experiments, then we will use them to introduce the scale. In this case, a ruler is not needed. Students often use video files in which they themselves are participating (brisk running). Then their growth is used as a scale segment. 3. Using the line tool, select a segment of known length. Open the dialog box in Analyze/Set Scale, set the known distance, specifying the units. You can “remember” any selected objects in ROI Manager (Fig. 2) by selecting the “Add” command located on the manager toolbar.

Fig. 2. a, b) Entering each selected segment in the ROI Manager, which can be found in the Drop-Down Menu.

After entering the scale, on the images you can specify the coordinates corresponding to the position of the desired object in the frame. 4. Next, mark the points on the frames, the coordinates of which are tracked. You can also perform automatic selection of the center of mass of a moving object (Fig. 3). Measurements are made by selecting the “Measure” command in the ROI Manager.

298

I. Oksana et al.

Fig. 3. Automatic selection of the center of mass of a moving object.

Each captured video frame corresponds to a certain moment in time. In order to find the time corresponding to the moment of measurement, you need to divide the frame number by the number of frames per second (fps) with which the video is filmed. Graphs are built in any available program, including Excel (Fig. 4).

Fig. 4. Dependence of coordinates on time.

Camcorder as an e-Learning Tool

299

Next comes the selection of the best model (Table 1). Table 1. Selection of the best model Description Model1

Model2

Equation

y = A + B*x + C*xˆ2

y = a−b*cˆx

Function

Parabola

Asymptotic1

Number of points

34

34

Number of parameters

3

3

Model1

Model2

Preferred model Model name AIC

Model1

Akaike’s information criterion test (AIC) RSS

N

Params

AIC

Akaike Weight

Model1

2,74852E−4

34

3

− 389,29237

0,99674

Model2

3,84841E−4

34

3

− 377,8481

0,00326

Model1 has lower AIC value and so is more likely to be correct. This model is 305.556 times more likely to be correct Fit parameters (Model1) Value

Standard error

t-Value

Prob > |t|

Dependency

A

−0,07849

0,00486

−16,15019

1,20841E−16

0,98896

B

0,00404

1,81287E−4

22,28566

1,21564E−20

0,99784

C

−2,48815E−5

1,55342E−6

−16,01728

1,52027E−16

0,99422

Fit converged. Chi-Sqr tolerance value of 1E−9 was reached Fit parameters (Model2) Value

Standard error

t-Value

Prob > |t|

Dependency

A

0,09574

0,00274

34,87694

1,93096E−26

0,95154

B

0,24383

0,01831

13,3194

2,28266E−14

0,96208

C

0,96153

0,00318

302,63512

2,21886E−55

0,9867

Fit converged. Chi-Sqr tolerance value of 1E−9 was reached Fit statistics (Model1)

(Model1)

Number of points

34

34

Degrees of freedom

31

31

Reduced Chi-Sqr

8,8662E−6

1,24142E−5

Residual sum of squares

2,74852E−4

3,84841E−4 (continued)

300

I. Oksana et al. Table 1. (continued)

R-Square (COD)

0,98628

0,98079

Adj. R-Square

0,9854

0,97955

Fit status

Succeeded (100)

Succeeded (100)

Fit converged. Chi-Sqr tolerance value of 1E−9 was reached

Thus, the students conclude that the experimental data are better described by a quadratic function: X (t) = −0, 08 + 0, 004t − 2, 49t 2

(1)

Then the instantaneous speed: V (t) = 0, 004 − 2, 49t

(2)

From the graph of the path (coordinates), according to the tangent of the slope of the graph with the time axis, it is possible to determine the magnitude and sign of the projection of the speed; by the orientation of the branches of the parabola (up or down), you can determine the sign of the projection of the acceleration (plus or minus, respectively). If the movement is not uniformly variable, then all sections on the curve that can be highlighted on the graphs are subject to analysis. 3.2 Application of a Video Camera for Laboratory Work in the “Thermal Radiation” Section Recently, high-speed video cameras have become a fundamental part of experimental physics as a device that makes it possible to “slow down” time. With the help of optoelectronic complexes based on such video cameras, fast processes are successfully investigated by measuring thermophysical parameters. When using video cameras as high-speed brightness micropyrometers, narrow-band light filters are usually used. But narrow-band filters are too expensive to be used in routine laboratory work. Modern color photo matrices are already equipped with appropriate light filters. Therefore, the idea of using such color video cameras for pyrometry without the use of external light filters has been hovering for a long time. However, not all camcorders support RAW readout. The purpose of one of the remote work is to test the possibility of using inexpensive color video cameras for pyrometry without the use of external additional light filters. Let’s consider measurements with a color video camera Coocoo S70, based on the OmniVision4689 CMOS sensor. The OV4689 sensor is a CMOS matrix of 2720 columns and 1584 lines, of which 2720×1536 are active pixels (pixel size: 2µm × 2 µm). The rest of the pixels are used for black level calibration and interpolation. Photodiodes are known to react to light levels. Therefore, color rendition is ensured by placing a color filter (red, green or blue) above the photosensitive element. The filters are staggered

Camcorder as an e-Learning Tool

301

according to the Bayer pattern. In fact, the light filters installed on the photodiodes perform the functions of external light filters. Therefore, the raw data (the same frames) must contain information about images presented in three spectral ranges (R, G, B). Both the exposure time and the gain can be manually set, allowing the camcorder to be calibrated. The lens of the camcorder is not removable. Although the sensor supports RAW data transmission, the camcorder is a budget option and raw data cannot be read. But to measure the spectral characteristics of individual color channels of color matrix photodetectors, it is necessary to have images in the “raw” format of raw data. The ImageJ program has the ability to present the resulting files in RGB format. In the RGB color model, the spectral function is the sum of the sensitivity curves for three colors-Red, Green, Blue-red, yellow, green. In this sum, non-negative weights take into account the contribution of each of the colors. As a result of the conversion, instead of one original image, we get three - R, G, B images, which, with a certain degree of accuracy, can be interpreted as data from the R, G, B channels separately (Fig. 5).

Fig. 5. Spectra recorded by a video camera a) Without external light filters b) External green light filter; c) External Red Filter.

RGB images are converted to grayscale using the formula gray = (red + green + blue)/3 or gray = 0.299 × red + 0.587 × green + 0.114 × blue if RGB weighted conversions is checked in the menu. The resulting spectra when converting a color image into separate RGB color channels are automatically converted to gray scale (Fig. 6) taking into account the weight coefficients. The area of the spectral sensitivity curve of the video camera is proportional to the signal recorded by the video camera in the corresponding spectral range. Then the spectral ratio will be the ratio of the area under the curve related to the corresponding filter of the channel R, G, B. This spectral ratio corresponds to a certain “black body” temperature set when calibrating the video camera against a reference temperature lamp. In laboratory work, you need to compare the ratios obtained in the case of external filters and without them. Research should be carried out on a video camera that allows you to save video files in RAW format, so that it is possible to compare pyrometric measurements after introducing a correction factor.

302

I. Oksana et al.

Fig. 6. Spectra obtained by converting a color image into separate RGB color channels.

4 Discussion To evaluate the method of obtaining experimental data using a video camera, students conducted an anonymous survey using a Google form. 2 groups of students took part in the survey, in which part of the laboratory in mechanics was carried out using a video camera. In total, 53 people took part in the survey: 15 girls (28,3%) and 38 boys (71,7%). Of these, under the age of 18 - 3 people (5,7%), 18 years old - 33 people (62,3%), 19 years old - 8 people (15,1%), 20 years old - 4 people (7,5%). Multiple choice of answers to questions was allowed. The survey showed that considering each answer with respect to 100%, it is difficult to plan an experiment and correctly make a video (28,3%); difficult to calibrate (26,4%) and process the resulting files (56,6%); it is difficult to analyze experimental data from the point of view of physics and to visualize them (47,2%). Only 5.7% of students answered that any stage of work was easy for them. Based on the results of the survey, students can be divided into two groups: those who further want to work with a video camera (41,5%), and those who do not want to (47,2%), the rest do not use a video camera. The difficulty of performing laboratory tests using a video camera is both medium (11,3%) and easy (7,6%). The task was considered extremely difficult by 22.6%, very difficult (24,5%), difficult (34%). Among those who do not use a video camera in experiments, 100% believe that the level of difficulty of execution is from 1–3 on a five-point scale. Among those who found it easy to complete the assignments, there were no those who did not want to continue working according to this method. However, among the number of students who found it difficult (4 and 5 on a five-point scale of difficulty) to do their work, 25% are ready to continue working with a video camera.

Camcorder as an e-Learning Tool

303

To conduct a correlation analysis of the survey results (Fig. 7), binary answers (“yes” or “no”) were replaced by +1 and −1, respectively. In the answers assessing the complexity of any actions (planning and setting up an experiment, understanding the problem, analyzing the data), no substitution was performed on a five-point scale from 0 to 5 (where 0 is very easy, 5 is very difficult).

Fig. 7. Correlation matrix of answers to the questionnaire questions.

Correlation analysis of the answers to the questionnaire showed that students who are accustomed to working according to a template experience difficulty both in planning and conducting an experiment and in analyzing experimental data (p = 0,71).

5 Main Conclusions Herein is described the technique of remote laboratory work using a video camera on the example of laboratory work in the sections “Mechanics” and “Thermal radiation”. The advantage of the developed methodology is that there is no need for programming skills. However, the use of a video camera in a laboratory practice, using the ImageJ program for analyzing video files, allows, in addition to physical laws, to master the basic principles of video analytics. Students have to think through the experimental conditions in detail, which allows them to deepen their understanding of physical laws.

304

I. Oksana et al.

An anonymous survey was conducted using a Google form for students to evaluate the method of obtaining experimental data using a video camera: • The number of students who learn new methods relatively easily does not exceed 7%. • Correlation analysis showed that the desire to use a video camera in experiments is associated with the number of competencies (p = 0.66) obtained in the learning process (25% are ready to overcome difficulties). If the tasks are extremely difficult for students, then they believe that they cannot get the desired skills and abilities (75%). The above facts testify in favor of the need for students to freely choose the methods, techniques and tools used in experiments.

References 1. Skvortsov, A.I., Fishman, A.I.: Video camera and computer. New approaches in organizing a laboratory physics workshop. KIO 1(5), 16–20 (2005) 2. Pribylov, N.N., Pribylova, E.I., Pritsepova, S.A.: Laboratory workshop on physics for distance learning. Phys. Educ. Univ. 9(2), 108–112 (2003) 3. Arkhipova, E.A., Polyanichko, K.S.: Trends in the development of video analytics in the world. Int. J. Human. Nat. Sci. 7(2), 69–72 (2019) 4. Drachev, K.A., Gubin, S.V.: Virtual laboratory work in physics for distance learning students. Sci. Herit. 44–1(44), 9–12 (2020) 5. Veselova, S.V., Stein, B.M.: Distance learning: laboratory practice in physics. At home and in nature. MNKO 1(62), 188–190 (2017) 6. Portnov, Y., Malshakova, I.L.: Organization of laboratory works in the conditions of distance learning. Probl. Modern Educ. 3, 218–226 (2021) 7. Hannanov, N.K.: Digital tools for research work on natural science subjects. OTO 1, 468–486 (2012) 8. Budilov, V.N., Dremanovich, P.V.: On the possibilities of measuring the frequency of mechanical vibrations using a video camera. Sci. Techn. Bull. Volga Reg. 5, 114–117 (2012) 9. Merkulov, O.I., Molovtseva, A.I.: Using video technologies to improve the accuracy of physical measurements. In: Materials of the All-Russian conference of students, graduate students and young scientists dedicated to the Year of Russian cinema, pp. 32–35 (2017) 10. Filippova, I.Y.: Video analysis as a modern tool for a physics teacher. Comparison of software products on the Russian market. Educ. Technol. Soc. 15(1), 487–504 (2012) 11. Korneev, V.S., Raikhert, V.A.: Computerization of laboratory works in physics on the example of laboratory works “Uncertainty ratio for photons.” Actual Probl. Educ. 1, 200–204 (2018) 12. Korneev, V.S., Raikhert, V.A.: Digital technologies of optical images processing in the laboratory practice in physics. Actual Probl. Educ. 1, 185–190 (2020) 13. Elliott, K.H., Mayhew, C.A.: The use of commercial CCD cameras as linear detectors in the physics undergraduate teaching laboratory. Eur. J. Phys. 1(2) (1998) 14. Bonato, J., et al.: Using high speed smartphone cameras and video analysis techniques to teach mechanical wave physics. Phys. Educ. 52(4), 045017 (2017)

Smartphone in Detecting Developmental Disability in Infancy: A Theoretical Approach to Shared Intentionality for Assessment Tool of Cognitive Decline and e-Learning Igor Val Danilov(B) Academic Center for Coherent Intelligence, Riga, Latvia [email protected]

Abstract. The paper aims to create a concept design of further research for developing the digital tool to improve e-learning method. Specifically, one of its applications is a computerized assessment quiz for detecting developmental disabilities in infants online at the earliest stage. Notably, this smartphone app can detect a developmental delay at such an age when only proven behavioral markers are challenging since behavioral markers are based to a large extent on verbal communication. The theoretical approach to the problem discusses interaction modality at the onset of cognition from perspectives of communication theory, embodied dynamicism, and genetics. Newborns and even already fetuses demonstrate different social achievements, respectively–early imitation, other-race effect, facial attractiveness, distinguishing mother and stranger, other-species effect, preference of faces, recognition of other newborns crying, as well as twin fetuses comovements, recognition of voices, emotion expression–which are not consistent with their cognitive and communicative abilities. The article discusses the Primary Data Entry (PDE) problem: what interaction modality provides acquiring initial social data at the beginning of life. How does shared intentionality emerge in organisms? This question can guide understanding interaction modality at the onset of cognition that is critical for many rapidly evolving knowledge domains, such as assessment tool of a developmental delay and advanced e-learning. This article is one of the first attempts to build a framework for a theoretical model of shared intentionality by discussing theoretical approaches and empirical data of intelligence and cognitive processes in interpersonal interaction. Keywords: Shared Intentionality · Developmental Disability · e-Learning

1 Introduction Parent-reported data from 88,530 children aged 3–17 years show that about 17% of children were diagnosed with developmental disabilities during a study period of 2009– 2017, 1.74% children–with Autism Spectrum Disorders (ASD) [1]. Efficient markers for detecting developmental disabilities at the earliest stage are crucial in diagnosing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 305–315, 2022. https://doi.org/10.1007/978-3-031-10467-1_19

306

I. Val Danilov

and treating cognitive decline in children [1–4]. The only proven therapy for core symptoms of ASD is behavioral therapy [3]. The earliest assessment of developmental delay provides early intervention in ASD; the earlier an intervention provides the better longterm outcomes for the child [1–3, 5]. According to recent research [6], computer testing can measure the interaction ability in infants before the age when the typical developmental trajectory predetermines verbal communication. In that case, a deviated outcome in the computer quiz means an alert. The current study attempts to accentuate a crucial role of implicit social interaction modality at the onset of cognition and why an assessment of shared intentionality can detect cognitive development in children. The following sub-Sections show the problem of cognition at the onset and why this problem supposes implicit social interaction modality–shared intentionality. Section 2 discusses this issue from different theoretical perspectives. Next, section Discussion attempts to answer: Why does the disclosed lack of shared intentionality reveal the cognitive delay in infants. Finally, Section Future Work proposes research questions for investigating the App’s features for detecting cognitive development in children. 1.1 Shared Intentionality in Cognition Recent findings extend knowledge about human cognition and interpersonal interaction by showing an increase in both neuronal activities [7] and acquisition of new knowledge [8–12] in subjects in the absence of sensory cues between them in all these experiments. According to Danilov and Mihailova [10], interpersonal dynamics of individuals in psychophysiological coherence promote coordinated neuronal processes in motherinfant dyads. A supranormal environment pushes the inherited mechanism of infants’ social entrainment to the mother’s rhythm. Shared intentionality appears if an ongoing sequence of emotion-motion experiences also accompanies these dynamics: from emotional arousal with imitation to emotional contagion with interactional synchrony. The uplift of interpersonal dynamics promotes this first protoconversation in organisms with communication disabilities. A recent case study [12] tested shared intentionality in dyads with young children of the standard developmental trajectory (18 months, 28 months, 31 months of age), comparing their results with the child with cognitive delay (aged 33 months). The experimental design checked their ability to interact with mothers without sensory cues. Children were asked to create the bond between sounds of spoken numbers (in unfamiliar language for children while familiar for their mothers) and the appropriate set of items [12]. Their mothers simultaneously solved the same problems without communicating with the children. The child with developmental delay showed three times fewer outcome in these tasks. In education, experimental data shows that group collaboration in problem-solving can significantly increase memorization by 28%, the effect of which increases to 30% in a month [13]. The Problem-Based Learning (PBL) method requires including the actual problem-solving activity in the learning process [14–16]. This makes students overcoming staged difficulties through their own insight. The teacher needs to create quite complex problems to ‘switch on’ the phenomenon of insight in students. It should not be just a set of tasks as a formal pretext to invite students into an entertaining lesson [16]. The crucial point of the PBL is a discovery or insight, which is still an

Smartphone in Detecting Developmental Disability in Infancy

307

inexplicable, unpredictable and uncontrollable process [17]. Sudden and unexpected insights when solving problems in the learning process also can arise due to social interaction [8]. In problem-solving, shared intentionality contributes to this subjectively distinct perception of insights in the group [8]. Two case studies with a 28-month-old child with no developmental disabilities and a 33-month-old child with epilepsy and cognitive delay show toddlers’ success in numerosity during a short e-learning course [8]. The subjects have become “cardinal principle” knowers at an unexpectedly young age when other children usually do not comprehend the meaning of numerosity [8]. This data is consistent with the statement that an increase of the implicit memory on the issue improves students’ performance, filling their solution with confidence [18, 19]. 1.2 Primary Data Entry Problem for Understanding Shared Intentionality The onset of cognition is a crucial point in understanding social interaction modalities and the development of communication. A human neonate is born not ready for surviving in society. This organism meets with a new experience; she needs to acquire social knowledge to survive. Understanding social reality is essential for individuals to perform immediate reactions and strategic planning in environments with many rapidly changing elements; two arguments taken together challenge our knowledge about the onset of cognition. First, acquisition of knowledge already must imply initial data because new knowledge can be assimilated based on the discovery of new key relationships between cause and effect within the previous knowledge and/or opening links between elements of initial knowledge and new information domains [9]. Second, the appearance of communication requires a shared understanding of the signal’s meaning within a particular context among a community of users [20]. While communication appears with abstract symbols, it is widely argued that a newborn does not maintain abstract thought until the age of about 12 months. These arguments mean that at the initial stages of development, newborns are not able on their own to acquire knowledge and even communicate with caregivers to acquire first social phenomena. However, a growing body of experimental data show that newborns and even already fetuses demonstrate different social achievements, respectively: early imitation, otherrace effect, facial attractiveness, distinguishing mother and stranger, other-species effect, preference of faces, recognition of other newborns crying [16], as well as twin fetuses co-movements, recognition of voices, emotion expression [21], which are not consistent with their cognitive and communicative abilities. This dichotomy poses the Primary Data Entry (PDE) problem–what interaction modality provides acquiring initial social data. Understanding this issue provides the development of a digital tool for detecting ASD by measuring infants’ interaction ability before the age when the typical developmental trajectory predetermines verbal communication. Knowledge about social interaction modalities at the onset of cognition can develop learning methods, in specific it contributes to assessment tools of a developmental delay in children and e-learning curriculum.

308

I. Val Danilov

2 Approaches to the PDE Problem It seems uncontroversial to say that learning in organisms begins from a conditioned reflex. This quality develops the organism’s behavior, which is no longer limited by fixed reflexes controlled by inherited neural pathways. However, in a multi-stimuli environment, the stimulus-consequence pair is unpredictable due to the many irrelevant (unrelated) stimuli claiming to be associated with the embodied dynamic information randomly. The bond of stimulus-consequence pair of a social phenomenon in the sensorimotor network requires categorizing reality by the nervous system before applying the embodied network to a specific stimulus. All known sensory modalities simultaneously process a huge amount of neutral (irrelevant) stimuli. It is well known that reality is full of visual signals in the range of 400–790 THz that vision sensing may detect. Many sounds from the range of 20 to 20,000 Hz may simultaneously stimulate hearing sense. Thermal stimuli possess social meanings that the human body frequently receives through thermal sensing. Many irrelevant stimuli pressure the human body, exciting tactile sensing. Proprioception and olfaction signals also enrich the chaos of irrelevant stimuli. Each instant, this cohort of many different irrelevant (neutral) stimuli simultaneously affects the nervous system through receptors. Meanwhile, a neutral stimulus (auditory, visual, etc.) by itself does not make sense; it is not a relevant stimulus yet. The mind is that “computer” that fills irrelevant stimuli with meanings, composing stimuli in a message. Therefore, it is a vicious circle–the pure mind already needs meaning in order to acquire even the first meaning. This section shows why this is a difficult problem for the pure reason– discriminating meanings and even irrelevant (unrelated and neutral) stimuli at a stage of development when there is no categorization yet. 2.1 Communication Theory About the PDE Problem According to psychology, stimuli may impact social behavior, changing mental state, only if they are sensory cues (relevant stimuli with a specific reaction) that transmit information. According to Shannon [22], information is the resolution of uncertainty, intending information as a statistical process. Only a probability of both encoding a message to symbols by the sender (the source of information) and decoding it by receiver makes information transmitted. No probability of symbols means no transmission of information. ‘The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point [22].’ The information transmits symbols, which consist of bits (binary digits) based on the binary logarithm: x = log2 N

(1)

This equitation means the value of the message which bears N number of bits. According to Shannon [22]: A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such devices can store N bits, since the total number of possible states is 2N and: log2 2N = N

(2)

Smartphone in Detecting Developmental Disability in Infancy

309

The information transmission is estimated by entropy quantity (information probability in the sense of statistics). Two equally likely outcomes (a binary input of one bit) provide lower entropy (less information) than specifying the outcome with more number of outcomes (more bits in the message). The actual message is one from a set of possible messages [22]. It acts on the receptors along with the noise. The noise includes all irrelevant (neutral) stimuli from all sensory modalities that may simultaneously with the actual message impact the pure nervous system. It is a sum of all different stimuli impact. There are two statistical processes at work: the source (which produces a message) and the noise [22]. Therefore, all of the entropies of all sources should be measured on a per- second basis: H (am) = H (s) + (H(n1) + H (n2) + ... + H (ny))

(3)

The entropy of the actual message H(am) is a sum of the source entropy H(s) calculated together with the entropies of other “y” number of irrelevant (neutral) stimuli. In the binary source (all messages are stimulus-consequence pair), the source entropy H(s) has a small value. Binary stimuli yield probability p = 1/2, and such a source has an entropy of one bit. In contrast, the noise yields entropy equal the sum of all irrelevant (unrelated) stimuli from all sensory modalities–Vision, Audition, Tactitian, Gustation, Olfaction, and Thermoception Σ (H(n1) + H(n2) +… + H(n,y)). Noise consists of “y” number of binary stimuli with probability p = 1/2, and such a sum has an entropy of “y” bit. As is discussed above, the nervous system receives many different irrelevant stimuli simultaneously. The noise value may tend towards infinity because the number “y” of irrelevant stimuli is extensive. The cross-interference of all irrelevant stimuli entropy should also be taken into account. The equitation shows that the noise significantly exceeds the source value. If the transmission is noisy, it is impossible to reconstruct the original message [22] for the pure nervous system without intentionality. From the communication theory perspectives, at the beginning of life, none of the known sensory modalities can provide the transfer of information through sensory cues in a multi-stimuli environment. Since all stimuli possess equal meaning (value). The pure nervous system requires instruction for the selection of stimuli (intentionality) to begin cognition. This instruction cannot come through sensory stimuli. 2.2 The PDE Problem in Cognitive Science Three main approaches within cognitive science tend to understand cognition: cognitivism, connectionism, and embodied dynamicism [23]. Many theories co-exist in various hybrid forms within this framework. The more crucial of them is the embodied dynamic system [23], the theory of innate intersubjectivity and innate foundations of neonatal imitation [24], the theory of natural pedagogy [25], and the theory of sensitivities and expectations [26]. While all these theories are well elaborated, they leave a gap in knowledge about the onset of cognition. Embodied cognitivists–closest to the goal–proposed an autonomous system without inputs and outputs. That is, the mind is an autonomous system by its self-organizing and self-controlling dynamics. It determines the cognitive domain in which it operates [23, 27, 28]. Many embodied dynamic system theories follow Vygotsky’s ideas of arising

310

I. Val Danilov

cognition through social contexts [29, 30]. Descriptions of embodied cognition might be organized around a larger number of narrower themes [31, 32]; however, efforts to broaden the themes, thereby reducing their number, risks generalizing the description of embodied cognition to the extent that its purported novelty is jeopardized [32]. Anyway, context dependency is one of their principles. According to Juarrero, intentions are the phenomenological counterpart of neurological self-organization that arises and develops due to contextual constraints [29]. This embodied cognitivists’ approach is grounded on the dynamical hypothesis [33]. However, their interpretation of a dynamic system is not accurate: 1. According to this approach, embodied features of cognition deeply dependent upon characteristics of the physical body. If the agent’s beyond-the-brain body plays a significant causal role, then the primary data makes sense. 2. In mathematics, a dynamic systems model is a set of evolution equations. It means that entering primary data is required. The dynamic system may not begin its life cycle without introducing initial conditions corresponding to specific situation inputs and parameters. 3. The dynamical system hypothesis [33] has not claimed the lack of initial conditions. Dynamicists track primary data less than dynamic changes inside. However, it does not mean that primary data do not exist and do not necessary. Even the concept of dynamically embodied information does not solve the problem. According to embodied cognition approach, symbols encode the local topological properties of neuronal maps [23], a dynamic action pattern. The sensorimotor network yields pairing of the binary cue stimulus with the particular symbol saved in the structures and processes that embody meaning. It means intentionality should be already in place at birth because the organism should distinguish input stimuli for pairing the binary cue stimulus with the particular symbol. In short, we are discussing the ability of the pure reason to recognize social cues. To begin cognizing, a fetus (or infant) must already categorize the infinite and continuous context (monolithic and whole social reality) before the act of intentionality. How does she do this? Tomasello [34] tries to solve this problem by introducing the newborns’ basic motive force of shared intentionality, appealing to protoconversation based on emotion coordination. However, the mechanism of such emotion coordination is not clear because it is grounded on understanding emotion states, which is impossible for the fetus (infant) at this developmental stage [10, 35]. Expressing emotions and understanding other people’s expression of emotions also includes knowledge of social meanings [21, 35, 36]. Even the well-elaborated theories of emotional contagion (such as the Perception-action model of empathy [37], Active intermodal matching [38], Associative sequence learning [39], Neurocognitive model of emotional contagion [40]; Learned matching [41], however, do not define introducing initial social knowledge through emotions for beginning a categorization of social reality. Thus, they also leave a gap in knowledge regarding the PDE problem.

Smartphone in Detecting Developmental Disability in Infancy

311

2.3 Genetics About the PDE Problem It is unlikely that the genetic mechanism can determine social behavior [41] which is a mediator between a particular individual’s mental state and a set of meanings of a concrete social reality. That is, genes shape brain development from the outset, influencing the development of a particular composition of psychological traits [42]. All psychological traits show genetic influence; however, no trait is entirely heritable [42]. Psychological traits influence social behavior, but they do not define it uniquely, sharing their competence with the impact of environment [16, 42] and free will. This argument calls into question even the simple genes–traits–behavior association. This association does not directly provoke a cause-and-effect relationship between genes and behavior [16, 21]. Moreover, it seems that the nature of social behavior is more complicated than just a stimulus-response scheme. The variability of the meanings of social events exceeds the possibility of a limited number of behavior patterns to represent them unambiguously. The expression of social behavior is even more complex than the simple association of genes-traits-behavior. It connects phenomena from personal reality with an appearance from social reality: phenomenon from personal reality–current psychophysiological state–psychological traits– current social reality–behavior [16, 21, 42, 43]. Genes cannot direct an individual how to apply an understanding of the situation and the individual’s intention to a specific social event: a. Empirical data shows that no trait is 100% heritable; heritability is caused by many genes of small effect [42]. b. There is a lack of empirical evidence and even an idea on a genetic mechanism [16]. c. There is the conflict of the hypothesis about an inherited social knowledge with generally accepted neo-Darwinian evolutionary principles [16]. The genetic code is no more than a rule of causal specificity based on the fact that cells use nucleic acids as templates for the primary structure of proteins; yet it is unacceptable to say that DNA contains the information for phenotypic design [23]. Genetic knowledge cannot advocate an assumption about any inherited a priori social knowledge.

3 Discussion These different approaches show that at the first step of cognition, organisms must learn to emphasize some stimulus against the background of many others. The PDE problem shows the paradoxical ability of organisms to select only one environmental stimulus from many others instantly. Experimental data show that mother-child dyads succeed in this trial without sensory cues between them. What do we know about shared intentionality–modality of interaction in the absence of sensory cues? Whether or not shared intentionality bypasses sensing; from perspectives of communication theory, embodied dynamicism, and genetics, that is the way it is. Indeed, even in reflexes, intentionality should be already in place for appearing conditioned reflexes. In order to create them, the pure nervous system must already know which sensory stimulus to choose (from many different sensory irrelevant (unrelated) stimuli

312

I. Val Danilov

stimulated receptors simultaneously) to connect it with the corresponding neural network associated with the meaning of this sensory stimulus. It is not so easy as it may seem. While unconditioned reflexes provide a definite connection of stimuli with the corresponding neural networks, conditioned reflexes require learning to associate a previously unrelated neutral stimulus with a different stimulus that excites some reaction. That is, conditioned reflexes require categorization of reality prior to this act. So it is a vicious circle again. The above arguments from Sect. 2 show that the blank mind cannot possess social knowledge, and information cannot overcome the noise threshold by sensing at the beginning of life. That is, the pure nervous system cannot manage a perception only with the help of the prompts from perception itself. Pure reason cannot possess intentionality without an outside prompt. Intentionality can be neither innate nor self-created by the organism itself. It means that shared intentionality does bypass sensing–it is the learning to choose the appropriate sensory stimulus. Therefore, shared intentionality can be only in a case of interaction beyond receptors. Empirical evidence shows that emotion contagion bypasses sensory modalities [44]. Findings mentioned in Section I extend knowledge about human cognition and interpersonal interaction by showing an increase in neuronal activities [7] and acquisition of new knowledge [8–12] in subjects without sensory cues between them. A recent case study [12] tested shared intentionality in dyads with young children of the standard developmental trajectory (18 months, 28 months, 31 months of age), comparing their results with the child with cognitive delay (aged 33 months). The child with developmental delay showed three times fewer outcome in these tasks. The association of these empirical data with theoretical reflections on the issue yields the twofold inference. First, shared intentionality complements a set of interaction modalities. Moreover, it is a crucial social interaction modality at the onset of cognition, contributing to efficient learning in infancy. Second, the lack of shared intentionality in infancy can lead to cognitive delay. Considering the above data about the cognitive particularities of children with ASD, the disclosed lack of shared intentionality can also reveal this delay. The above experimental data and theoretical arguments constitute a framework of research on computerized measuring infants’ interaction ability before the age when the typical developmental trajectory predetermines verbal communication.

4 Future Work Understanding interaction modality at the onset of cognition is critical for many rapidly evolving knowledge domains, such as advanced e-learning. In specific, the research outcome can contribute to developing a computerized testing method of infants’ communicative disability to detect developmental disability before the age when the typical developmental trajectory predetermines verbal communication. In addition, it means that a smartphone app (the most useful device in children) would be able to detect developmental disabilities in infants at such an age when only proven behavioral markers are challenging since behavioral markers are based to a large extent on verbal communication. Further research should verify optimal conditions of this assessment procedure (e.g., independent of cultural differences of users). It should find (i) the minimum necessary

Smartphone in Detecting Developmental Disability in Infancy

313

set of tasks; (ii) conditions of shared intentionality stimulation that are necessary and sufficient for any types of relationships in dyads. For example, similar sensory cues can possess different meanings in different cultures; family habits and values differ from country to country. In addition, research should optimize the assessment procedure to avoid an excessive load on participants. This last task requires a delicate tuning since the App should stimulate the sequence of emotion-motion experiences in dyads: from emotional arousal with imitation to emotional contagion with interactional synchrony. The current article presents an entirely different approach to the problem based on assessing shared intentionality in dyads, measuring coherent intelligence outcomes during testing. Since using non-perceptual interaction, this method can complement iPhone assessing algorithms, which Apple would likely create soon for assessing sensory cues of child behavior. It also promotes earlier assessment of even non-verbal children, i.e., a computerized assessment of a cognitive delay in children before the normal trajectory of cognitive development predicts communication. The method can underlie the e-learning intervention curriculum to correct cognitive development trajectory.

References 1. Zablotsky, B., et al.: Prevalence and trends of developmental disabilities among children in the United States: 2009–2017. Pediatrics 144(4), e20190811 (2019). https://doi.org/10.1542/ peds.2019-0811 2. Pierce, K., et al.: Evaluation of the diagnostic stability of the early autism spectrum disorder phenotype in the general population starting at 12 months. AMA Pediatr. 173(6), 578–587 (2019). https://doi.org/10.1001/jamapediatrics.2019.0624 3. McCarty, P., Frye, R.E.: Early detection and diagnosis of autism spectrum disorder: why is it so difficult? Early diagnosis of autism. Semin. Pediatr. Neurol. 35, 100831 (2020). https:// doi.org/10.1016/j.spen.2020.100831 (Published by Elsevier Inc.) 4. Rojas-Torres, L.P., Alonso-Esteban, Y., Alcantud-Marín, F.: Early intervention with parents of children with autism spectrum disorders: a review of programs. Children 7, 294 (2020). https://doi.org/10.3390/children7120294 5. Hyman, S.L., Levy, S.E., Myers, S.M., et al.: Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics 145, e20193447 (2020) 6. Val Danilov, I., Mihailova, S.: A New Perspective on Assessing Cognition in Children through Estimating Shared Intentionality. J. Intell. 10, 21 (2022). https://doi.org/10.3390/jintelligenc e10020021 7. Painter, D.R., Kim, J.J., Renton, A.I., et al.: Joint control of visually guided actions involves concordant increases in behavioural and neural coupling. Commun Biol 4, 816 (2021). https:// doi.org/10.1038/s42003-021-02319-3 8. Val Danilov, I., Mihailova, S., Reznikoff, I.: Shared intentionality in advanced problem based learning: deep levels of thinking in coherent intelligence. springer nature-book series: transactions on computational science & computational intelligence. In: The 17th Int’l Conf on Frontiers in Education: Computer Science and Computer Engineering, Along with CSCE congress (2021) 9. Val Danilov, I., Mihailova, S., Perepjolkina, V.: Unconscious social interaction: coherent intelligence in learning. In: Proceedings of the 12th annual conference ICERI Seville, Spain, pp. 2217–2222. (2019) https://doi.org/10.21125/iceri.2019.0606

314

I. Val Danilov

10. Val Danilov, I., Mihailova, S.: Emotions in e-Learning: the review promotes advanced curriculum by studying social interaction. In: The proceedings of the 6th International Conference on Lifelong Education and Leadership for ALL, Sakarya University. (2020) https://faf348ef-5904-4b29-9cf9-98b675786628.filesusr.com/ugd/d546b1_738 4fbcb4f9a4c6981f56c0d6431fff2.pdf Accessed on July 2021 11. Val Danilov, I., Mihailova, S.: Intentionality vs Chaos: brain connectivity through emotions and cooperation levels beyond sensory modalities. In: The Thirteenth International Conference on Advanced Cognitive Technologies, and Applications, COGNITIVE 2021, 18–22 April, in Porto, Portugal (2021) 12. Val Danilov, I., Mihailova, S., Reznikoff, I.: Frontiers in cognition for education: coherent intelligence in e-Learning for beginners aged 1 to 3 years. Springer nature–book series: transactions on computational science & computational intelligence. In: The 20th Int’l Conf on e-Learning, e-Business, Enterprise Information Systems, and e-Government, along with CSCE congress (2021) 13. van Blankenstein, F.M., Dolmans, D.H.J.M., van der Vleuten, C.P.M., Schmidt, H.G.: Which cognitive process support learning during small-group discussion? The role of providing explanations and listening to others. Instr. Sci. 39(2), 189–204 (2011) 14. Masek, A.B., Yamin, S.: The effect of problem based learning on critical thinking ability: a theoretical and empirical review. Int. Rev. Soc. Sci. Human. 2(1), 215–221 (2011) 15. Schmidt, H.G., Rotgans, J.I., Yew, E.H.J.: Cognitive constructivist foundations of problembased learning. In: The Wiley Handbook of Problem-Based Learning. Miley Blackwell, p. 257 (2019) 16. Val Danilov, I.: Social interaction in knowledge acquisition: advanced curriculum. critical review of studies relevant to social behavior of infants. In: IARIA, Conference proceedings. 2020. ISBN: 978–1–61208–099–4. The article was re-published in the Journal of Higher Education Theory and Practice. 20; 12. (2020) https://articlegateway.com/index.php/JHETP/ article/view/3779/3596 https://doi.org/10.33423/jhetp.v20i12 17. Boud, D., Feletti, G.I.: The Challenge of Problem-based Learning, pp. 40–41. London: Kogan (1997) 18. Graf, P., Mandler, G.: Activation makes words more accessible, but not necessarily more retrievable. J. Verbal Learn. Verbal Behav. 23, 553–568 (1984). https://doi.org/10.1016/s00225371(84)90346-3 19. Hasher, L., Goldstein, D., Toppino, T.: Frequency and the conference of referential validity. J. Verbal Learn. Verbal Behav. 16, 107–112 (1977). https://doi.org/10.1016/s0022-5371(77)800 12-1 20. Wittgenstein, L.: Philosophical Investigations. Prentice Hall, Hoboken (1973) 21. Val Danilov, I.: Ontogenesis of social interaction: review of studies relevant to the fetal social behavior. J. Med. Clin. Res. Rev. 4(2), 1–7 (2020) 22. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Techn. J. 27, 379–423 & 623–656 (1948) 23. Thompson, E.: Mind in Life: biology, phenomenology, and the sciences of mind. In: The Belknap Press of Harvard University Press, 1st paperback edition. Harvard University Press, Cambridge, Massachusetts London, England (2010) 24. Delafield-Butt, J.T., Trevarthen, C.: Theories of the development of human communication. (2012) https://strathprints.strath.ac.uk/39831/1/Delafield_Butt_Trevarthen_2012_Theo ries_of_the_development_of_human_communication_final_edit_060312.pdf Accessed 30 October 2021 25. Csibra, G., Gergely, G.: Natural pedagogy. Trends Cogn. Sci. 13, 148–153 (2009)

Smartphone in Detecting Developmental Disability in Infancy

315

26. Waxman, S.R., Leddon, E.M.: Early word-learning and conceptual development. In: The Wiley-Blackwell Handbook of Childhood Cognitive Development. (2010) https://www. academia.edu/12821552/Early_Word-Learning_and_Conceptual_Development Accessed 30 October 2021 27. Varela. F.J.: Principles of Biological Autonomy (1979). ISBN-10:0135009502, ISBN13:978–0135009505 28. Varela, F.J., Bourgine, P.: Towards a practice of autonomous systems. In: Varela, F.J., Bourgine, P. (eds) Towards a Practice of Autonomous Systems. The first European conference on Artificial Life, pp. xi–xviii. MIT Press, Cambridge (1992) 29. Juarrero, A.: The self-organization of intentional action. Rev. Int. Philos. 228, 189–204 (2004). https://doi.org/10.3917/rip.228.0189 30. Kull, K., Deacon, T., Emmeche, C., Hoffmeyer, J., Stjernfelt, F.: Theses on biosemiotics: prolegomena to a theoretical biology. Biol. Theory 4(2), 167–173 (2009) 31. Wilson, M.: Six views of embodied cognition. Psychon. Bull. Rev. 9(4), 625–636 (2002). https://doi.org/10.3758/BF03196322 32. Shapiro, L., Spaulding, S.: Embodied cognition. In: Edward, N.Z. (ed.) The Stanford Encyclopedia of Philosophy (Winter 2021 Edition). (2021) https://plato.stanford.edu/archives/win 2021/entries/embodied-cognition 33. van Gelder, T.: The dynamical hypothesis in cognitive science. Behav. Brain Sci. 21(5), 615–628 (1998). https://doi.org/10.1017/s0140525x98001733 34. Tomasello, M.: Becoming human: a theory of ontogeny. Belknap Press of Harvard University Press, Cambridge (2019). https://doi.org/10.4159/9780674988651 35. Val Danilov, I., Mihailova, S.: New findings in education: primary data entry in shaping intentionality and cognition. In: the Special Track: Primary Data Entry, Cognitive 2021, IARIA XPS Press (2021) 36. Val Danilov, I., Mihailova, S.: Knowledge sharing in social interaction: towards the problem of primary data entry. In: 11th Eurasian Conference on Language & Social Sciences which is held in Gjakova University, Kosovo. p. 226. (2021) http://eclss.org/publicationsfordoi/abs t11act8boo8k2021a.pdf Accessed 30 July 2021 37. Preston, S.D., de Waal, F.B.M.: Empathy: its ultimate and proximate bases. Behav. Brain Sci. 25, 1–20 (2002) 38. Meltzoff, A.N., Moore, M.K.: Explaining facial imitation: a theoretical model. Early Dev. Parent. 6, 179–192 (1997) 39. Ray, E., Heyes, C.: Imitation in infancy: the wealth of the stimulus. Dev. Sci. 14(1), 92–105 (2011). https://doi.org/10.1111/j.1467-7687.2010.00961.x 40. Prochazkova, E., Kret, M.E.: Connecting minds and sharing emotions through mimicry: a neurocognitive model of emotional contagion. Neurosci. Biobehav. Rev. 80, 99–114 (2017). https://doi.org/10.1016/j.neubiorev.2017.05.013 41. Heyes, C.: Empathy is not in our genes. Neurosci. Biobehav. Rev. 95, 499–507 (2018). https:// doi.org/10.1016/j.neubiorev.2018.11.001 42. Plomin, R., DeFries, J.C., Knopik, V.S., Neiderhiser, J.M.: Top 10 replicated findings from behavioral genetics. Perspect. Psychol. Sci. 11(1), 3–23 (2016). https://doi.org/10.1177/174 5691615617439 43. Val Danilov, I.: Emotions in learning towards coherent intelligence: the review of studies on social behavior in infants with visual impairment. J. Med. Clin. Res. Rev. 4(4), 1–6 (2020) 44. Tamietto, M., Castelli, L., Vighetti, S., et al. Unseen facial and bodily expressions trigger fast emotional reactions. PNAS 106(42), 17661–17666, (2009) . www.pnas.org. https://doi.org/ 10.1073/pnas.0908994106

Centralized Data Driven Decision Making System for Bangladeshi University Admission Fatema Tuj Johora, Aurpa Anindita, Noushin Islam, Mahmudul Islam(B) , and Mahady Hasan Department of Computer Science and Engineering, Independent University, Bangladesh (IUB), Dhaka, Bangladesh {2022932,2022937,2022931,mahmud,mahady}@iub.edu.bd

Abstract. The advancement of any modern nation depends largely on the standard of higher education. The fourth industrial revolution is all about the growth of information and frontier technologies. Consequently, it is impossible to maintain the standard of higher education without the help of emerging technologies. The higher education admission process has access to enormous heterogeneous data. This data can be used as an essential factor in making the appropriate decisions for educational institutions. With this point of view, we are proposing a Data-Driven Decision-Making System that will analyze available data from the previously proposed system Centralized Admission Hub (CAH) and will generate consummate reports based on that data. Decision-makers from universities can take strategic decisions about the standards of admission tests, seat distribution, and admission criteria based on analysis like department-wise student interest, overall interest in being admitted into certain universities, etc. Students can make decisions about choosing a university or major based on information like the acceptance rate of the university, student-to-faculty member ratio, etc. The university’s managing authorities can take the required decisions as well, based on relevant reports generated by this system. In this research work, we discussed different aspects of the proposed system, explored possible features, and aligned those to solve higher education admission-related problems. Keywords: Data analysis · Data driven solution · Higher education · Higher education admission system · Centralized admission system

1 Introduction Nowadays, on a global scale, the higher education admission sector is facing increasing pressure to change its admission-related operations as the admission data is increasing every year. Each year, a lot of students apply to different universities and medical colleges for higher education in Bangladesh. So, universities are currently considering available data as a fundamental element to settle on appropriate decisions and back up their strategies [2]. They are additionally trying to find actionable insights to meet economic, social, and cultural agendas [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 316–330, 2022. https://doi.org/10.1007/978-3-031-10467-1_20

Centralized Data Driven Decision Making System

317

As a significant factor of the nation, higher education requires being organized and transparent. The traditional admission system is exorbitantly expensive in terms of time, energy, physical toil, and the huge workforce involved. Students need to apply to multiple universities and travel physically to attend the exams. This can cause them stress as well as financial and time constraints [3–5]. Our peers, Pratik et al. [6] previously proposed a Centralized Admission Hub (CAH) that would be organized centrally for both students and universities to address this issue. Furthermore, governments and education-related organizations can access any admission-related data readily available on the CAH system. Their proposed solution inspired us to see the huge opportunity of incorporating data-driven solutions into the CAH system. Currently, the admission process is decentralized in Bangladesh, and universities process and store the admission data individually which makes data analysis difficult and time consuming. In our system, all the data will be stored centrally so that data analysis and processing will be easier. In this paper, we propose a Data-Driven Decision-Making System to improve the higher education admission process. Our system will collect data from multiple public and private universities, students’ applications, and other sources. Then the system will analyze that data and generate various types of reports to help the decision-makers make an appropriate decision; students will be able to decide while choosing a department or major, universities will be able to decide on a strategy, and governments and their organizations can access data and make decisions if required as well.

2 Literature Review Every year, approximately 1.37 million students in Bangladesh [10] pass higher secondary exams and then appear in graduate admission tests, creating a massive database of admission aspirants. If we are able to handle a huge amount of data properly, it can solve a lot of problems. It can handle large amounts of data easily and readily [17, 19]. Data-driven solutions can handle enormous data sets that are huge in size and type, along with intricate structures that are difficult to store, and they scrutinize and visualize for further processing or consequences [12] Basically, the concept of data-driven solutions is a procedure of research, including massive data, to disclose concealed patterns and undisclosed correlations, which is known as big data analytics [15]. Nowadays, micro-level learning platforms, as well as traditional learning institutes, are booming to provide for students anywhere and anytime. Bersin et al. [25]. The next step of this kind of self-learning is smart learning; tailored for users, intelligent in nature, and also machine-driven. So big data and AI are eminent in education as well. Margo et al. [27] suggested an educational ecosystem that consists of multiple types of users who have an extensive range of perspectives depending on their purpose, concept, and mission. Daniel et al. [26] suggested guidelines for Big Data in universities and they developed frameworks for data accumulation, gathering, and a data set for users of the software. The users are learners, teachers, strategy makers, and researchers. They drew up a theoretical framework to recognize big data analytics in higher education. Overall education-related data, including admission data, is already a large amount of data that can be utilized to get valuable analysis for educational institutions [23].

318

F. T. Johora et al.

Gonzalez-Martinez et al. [28] analyzed cloud computing implications in the education sector. It is a novel tool applied in education. IBM suggested a model for big data with five categories [29], which are depicted as: “volume, velocity, variety, veracity, and value.” Data collected during admission, especially in a centralized system like ours, is a big data pool, and every year it is becoming larger and larger in ‘Volume’. As different data types are included in admission-related data, it is also included in the ‘variety’ of big data. The velocity characteristics can be shown as well, as the data has also increased over the years. Veracity is there, as data is coming with proof and being institutionalized. To improve operational efficacy, or “value”, educational intelligence [24] may help support managerial processes, because increased decision-making is possible in Big Data. Also open to applying to big data in education, data mining [30] for education is applied to foresee the performance of students [21]. In recent years, large organizations have rapidly shifted to technologies, implementing and adopting big data, and they are monetizing and profiting from different big data analytic applications [9], which can stipulate insights while decision-making [16, 20]. Like large corporations, e-governance can also benefited [8] examples. Swapnil et al. [8] proposed a Big Data Analytics Framework for e-governance that would be able to do comprehensive mining and analyze event data from Enterprise Service Ecosystems. Viloria et al. [11] proposed a “student pattern recognition tool” that can facilitate the teaching-learning process with the help of Knowledge Data Discovery (Big Data). The organized methodology is in four phases: identifying patterns, analyzing the teaching-learning process, third is “Knowledge Data Discovery”, and finally developing, implementing, and validating the software. Mavroeidakos et al. [13] proposed a system for providing quality healthcare services and managing emergency medical needs via a medical data centralized management (CM) system. The proposed “auto-calling computing environment” will be centralized and will ease errands, such as the accumulation of data from diverse sources, the performance of analytic functions, and can boost the intensives’ perception. Implementing bid data technology seems like a lucrative solution, but it needs intensive care to be a successful application. It involves analytical ability and structured procedures to unravel big data into business worth [7, 14]. Overall, organizations can increase their probable value by using big data analytics in the form of subsequent techniques: “descriptive, predictive, and prescriptive analytics” [18]. Moreover, as the number of data sources increases rapidly (every year in our case) and involves sophisticated data analysis as well, so does the amount of data-related storage required to facilitate data analysts’ adapting and yielding data swiftly. To ensure this, an agile database is needed that can acclimate both its physical and logical contents in synchronization with the fast evolution of newly growing data [31].

3 Proposed Solution We propose to develop a Data-Driven Decision-Making System to improve the higher education admission process in Bangladesh. This system will be based on centralized admission, in which all public and private universities will be assigned, and students

Centralized Data Driven Decision Making System

319

can apply for the admission test through the system. The system will be able to collect admission-related data from applications, admission results, universities, the education ministry of Bangladesh, and other government organizations. The system will organize the data into types like structured and unstructured data. Then the system will process and validate that data to maintain data transparency. Data analysis will be done by the system to generate various types of strategic reports. A rich picture of our proposed Data-Driven Decision-Making System is presented in Fig. 1.

Fig. 1. Rich picture of proposed system

3.1 Research Method Used and Challenges Faced Generally research method is a science and technique of analyzing how research needs to be conducted. Fundamentally, the procedures by which researchers describe, explain and predict several phenomena are known as research methodology. We used several research techniques and methods to make a proposal for the unique solution. We did our background study and found no such system elsewhere. But there was several system that incorporated data analytics to different sector. So, we decided to incorporate data analytics in higher study. To do so, the challenge incorporates to have a data feed from universities to analyze and make report out of those data. First of all, we used Observation to understand the problem and then we used business process re-engineering or BPR to plan for the proposed solution. Our data sources are proposed to be coming from a centralized university repository. We proposed this centralized system that is yet to be analyzed and implemented by

320

F. T. Johora et al.

government. Our approach for the data driven system proposal is thus rudimentary and an outline to make the actual system in future. This theoretical proposal and possible procedure and idea will give an overview when future experimental studies will be done to implement the system. We have collected analog data and took statistical approaches and manual mathematics to show the power of such data driven system. The first challenge was to have a data source that is reliable and that can reasonably confirms our idea that data driven solution will be better for nation. Second, analyzing data accurately and showing relations between different university data and their implication for decision makers. As currently there is no centralized education system implemented in Bangladesh, so our proposed system cannot show any real experimental data. So we used our manually collected data from several universities and shown mathematical representation manually. By doing so, the power of such automated centralized data driven system could be visualized. Thus both of our challenges were solved. Mainly we have shown that both the manual data and manually done calculation and graphs from merely two universities can represent valuable reports and can compare between universities. So if our system gets implemented, that will have more data from more universities. Furthermore, the report will be automated and data driven as proposed. So, then similar report shown in this paper will be made by our system automatically without human interaction. This reports could be immensely beneficial for the whole nation while decide to enter in university. Moreover, the decision makers and authorities can also access the prompt response from this automated reports and decide accordingly. More on this is elaborated in Sect. 4. 3.2 Structure and Source of Data Primarily, from Independent University, Bangladesh (IUB)’s admission office, we collected their admission data from 2015 up to the year 2021. They collected all this data from students who applied to this university for several years. These data were categorized and saved for future use, and all graphs and analyses were created by hand with human interaction with the data. Because of the collaboration between IUB and North South University (NSU), Bangladesh, we also had a chance to collect NSU’s admission data from their admission office as well. So, primarily we are using structured data collected from universities. We are proposing that the system can work with both structured and unstructured data when required. If the CAH system gets acceptance, we will be able to find data centrally and store it on our centralized database, and that data will be accessible to data analysts for further processing. Then the data sources would include application forms filled up with students while applying for admission tests, universities that will be updating their admitted student database, the education ministry and university grant commission regarding results and graduation data, the results of the admission test, etc. 3.3 Data Pre-processing As our data would be gathered from several structured and unstructured data sources, there could be different types of data structures and the data could be unorganized.

Centralized Data Driven Decision Making System

321

We plan to preprocess the data to make them comparable. Data analysts have to work with the admissions offices of different universities and government organizations while doing so. They need to represent data in the same manner and with the same type of identification mark and units. Moreover, there could be data redundancy, inaccuracy, and misrepresentation. So, along with data analysts, a team is needed to work on validating final data. 3.4 Analyzing Gathered Data Our proposed solution will do further data analysis of the gathered and categorized data to show the report that is needed from different stakeholders. At this stage, the data would be classified and categorized. The analysis would be conducted over a vast period of time. From our proposed Data-Driven Decision Making System, all kinds of reports could be available. From that report, students can decide their major or the university where they want to pursue their higher education. On the other hand, universities can decide on their department expansion. Several other decisions can be taken from the report generated by the system as well.

4 Result and Discussion While several possibilities are there, we are including the following decision-making perspective in our system: Every year, students apply to different universities of their choice. They also consider the popularity of the subject, the acceptance rate, the cost associated with it, the market trend, etc. In our centralized database, we currently have access to several of these data sets, and some we will be including in the future as well. Below, we’re showing some data analysis. After analyzing, the system will display the final report, which will assist students in making a decision. With time and more data gathered in the system, it could work more consummately and precisely to pinpoint the decision and generate the report for respective stakeholders. The more time and data passes, the more robust the system gets. 4.1 Insight on Decision Makers Students. Every year, students apply to different universities of their choice. They also consider the popularity of the subject, the acceptance rate, the cost associated with it, the market trend, etc. In our centralized database, we currently have access to several of these data sets, and some we will be including in the future as well. Below, we’re showing some data analysis. After analyzing, the system will display the final report, which will assist students in making a decision. Universities. Universities can analyze various school or department-related data, majorrelated information, overall trend, number of applicants, and number of students to assess their situation and make decisions about budget allocation, laboratory and library needs, including seats in residential halls, expanding any department seats, appointing teachers, and so on.

322

F. T. Johora et al.

Government Access and Decision on R&D. Governments can have access to data from all universities and, thus, transparency in admissions, examination results, research done by different universities, and the overall situation can be easily understood. By learning about different majors and departments throughout the country, the government can sponsor or subsidize any particular department according to national needs. University Grant Commission. The University Grant Commission of Bangladesh (UGC) can publish a consummate and acceptable ranking from the data analysis of our system. Furthermore, they can manage and instruct universities based on the transparent data and generated reports available in this system. Every year, universities face several challenges, accomplishing several goals and milestones, doing research, and so on. Observing all this data, UGC can prepare a ranking and manage other issues that can help the government of Bangladesh, organizations, and prospective students have a clear picture of any university at a glance. Above, we discussed the scope and possibilities. Examples and results regarding these are included in Sect. 4.2. 4.2 Detailed Results In this section, we showed some examples of graphs that could be generated in our proposed Data-Driven Decision-Making System and discussed the decisions that could be made from those graphs. From Fig. 2, the first graph represents the trend of students’ interests and the percentage change of interest for a university. From the second graph, we can understand the acceptance trend of a university. By understanding how interest is changing, universities can take several decisions about resource allocation, room allocation, etc. for admission tests, recruitment of teachers and other staff, etc. With the acceptance graphs of different universities at hand, students can make their admission-related decisions. For example, from the acceptance graph shown in Fig. 2, we can see that from 2015 to 2019, the university accepts 40–52% of applicants to be admitted. Hence, conceivably, we can say that this university is not for all those who applied, and thus students may include some other universities in their list for their plan B. Even though 2020 and 2021 show a bit higher acceptance rates, this rise could be a result of a worldwide pandemic. The graphs in Fig. 3 represent the school-wise student distribution of NSU from 2015 to 2021. The bar chart shows the distribution among different schools in a summarized way. From the line graph, distribution trends of different schools are being observed. By analyzing the trends, universities can take the required decisions about resource allocation, room allocation, etc. for classes, course offerings, seat allocation for offered courses, recruitment of teachers and other staff, etc. Figure 4 represents students’ interest trends in different schools within the IUB. A decreasing trend for SBE (School of Business and Entrepreneurship) and an increasing trend for SETS school (School of Engineering, Technology, and Sciences) and other schools are being observed in IUB. As this is happening over a longer period, we can say that there may be an overall decline in business-related studies in Bangladesh. We will get more precise information

Centralized Data Driven Decision Making System

Fig. 2. Student’s interest and acceptance data.

323

324

F. T. Johora et al.

Fig. 3. School wise student distribution from 2015–2021 of a university.

Centralized Data Driven Decision Making System

325

Fig. 4. School wise students’ interest for a university from 2014–2021.

about this, when we will have other universities’ data regarding this. Poor quality and outdated curriculum, declining quality of infrastructure, lack of accountability, etc. can be the reasons for the decline in business-related studies. Whereas the number of engineering and technology-related students is increasing. The main reason for this could be the increasing job sector’s demand for CSE and IT graduates. In 2019, a study was conducted by the Asian Development Bank (ADB) on the graduates of Computer Science and Engineering (CSE) and Institutes of Information Technology (IIT) from nine universities in Bangladesh. The study found that the demand for CSE and IT graduates from the IT/ITES industry are strong and they have a higher job placement rate (77.1%) than any other graduates in Bangladesh’s job market [22]. Improvements in the business studies curriculum and a proper job market for business graduates can be able to overcome this decline. In Fig. 5, the first graph depicts a decreasing trend of all the majors in the business department of IUB. As there is an overall decline in business-related studies in Bangladesh, the number of students in business-related majors may also decrease. Universities and government ministries can take appropriate steps by finding the root causes of this decline. The second graph shows the major-wise student distribution of NSU that implicates the change in demand for different business-related majors. From these kinds of graphs, decisions about resource allocation, course offerings, seat allocation, management and recruitment of teachers and staff, etc., can be made. Figure 6 represents business majors’ related data from NSU and IUB. If we compare the economics majors of both universities, then we can see that even though only 5% of the business school of NSU is in economics, the total number of students there is higher than at IUB. This data can give a much more precise and useful insight when we apply it to all the same kinds of majors throughout the country. These kinds of comparisons among

326

F. T. Johora et al.

Fig. 5. Major wise student interest of a university for business department from 2014–2021

Centralized Data Driven Decision Making System

327

Fig. 6. Major wise student distribution for business department in 2021 of NSU and IUB

328

F. T. Johora et al.

different universities’ majors will help them grow. Universities can take decisions about resource allocation, seat allocation, course offerings, and other management decisions via analyzing these kinds of charts. These charts also show which subject students prefer the most for pursuing higher education. From this, universities will be able to understand which subjects they need to focus more on to make more students interested. The graphs we discussed above were manually generated based on the data of Independent University Bangladesh and North South University. But the proposed DataDriven Decision-Making System will be able to generate similar graphs automatically, which will make decision-making easier for the decision-makers.

5 Future Work and Conclusion This study is all about improving higher education admissions-related decision-making to support the managerial processes. By incorporating data analytics into the previously proposed Centralized Admission Hub (CAH), we mainly tried to improve higher education admissions-related decision making to make admission processes more effective and to provide better services to students as well as higher educational institutions. Our current research is based on manually collected data from only two universities. For the time being, our proposed system uses a descriptive type of data analytics that only shows the happenings over time through several graphs. We produced graphs for students, university grant commission, government and university authorities. These graphs are manually produced and shown how that can be helpful while decision making for different stakeholders involved. By analyzing graphs about overall admission trends, acceptance ratios, and school and major-wise students’ interests, policy makers can take appropriate decisions easily by comparing the situations between universities (Sect. 4.2). With our proposed system, finding any problem will be easier. If the CAH system gets final approval, then we will have access to centrally stored data that is more organized and accurate. This data is from all of the higher education institutes in Bangladesh. This huge, properly categorized database will be incorporated with our system to observe the overall trends of different educational institutions and compare similar departments at other universities across the country. Higher education aspirants countrywide can also reap benefits from this decisionmaking software. They can learn about acceptance rates and other information about different universities and can compare between different departments, recent trends, and popularity, etc. This system can also be used internally by universities. The possibilities of different data analyses can give the government, universities, and related organizations an idea about overall trends and decision-making. With the consummate application of Machine Learning, Artificial Intelligence, and proper data mining, the system will be able to predict possible occurrences in the immediate future and plan the required actions.

References 1. Daniel, B.: Big data in higher education: The big picture. In: Kei Daniel, Ben (ed.) Big data and learning analytics in higher education, pp. 19–28. Springer, Cham (2017). https://doi.org/ 10.1007/978-3-319-06520-5_3

Centralized Data Driven Decision Making System

329

2. Riahi, Y., Riahi, S.: Big data and big data analytics: concepts, types and technologies. Int. J. Res. Eng. 5(9), 524–528 (2018). https://doi.org/10.21276/ijre.2018.5.9.5 3. Mahundu, F.G.: E-governance: a sociological case study of the central admission system in Tanzania. Electron. J. Inf. Syst. Dev. Countries 76(1), 1–11 (2016). https://doi.org/10.1002/j. 1681-4835.2016.tb00557.x 4. Machado, C., Szerman, C.: The effects of a centralized college admission mechanism on migration and college enrollment: Evidence from Brazil. In SBE Meetings (2015) 5. Muralidhara, B.L.: Centralized admission: a novel student-centric E-governance process. Int. J. Comput. Appl. 66(23), 41–46 (2013) 6. Saha, P., Swarnaker, C., Bidushi, F.F., Islam, N., Hasan, M.: Centralized admission process: an E-governance approach for improving the higher education admission system of Bangladesh. In: Poonia, R.C., Singh, V., Singh, D., Diván, M.J., Khan, M.S. (eds.) Proceedings of Third International Conference on Sustainable Computing. AISC, vol. 1404, pp. 203–211. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-4538-9_21 7. Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V.: Big data analytics: a survey. J. Big Data 2(1), 1–32 (2015). https://doi.org/10.1186/s40537-015-0030-3 8. Janowski, T., Baguma, R., De, R., Nielsen, M.: ICEGOV2017 special collection: Egovernment innovations in India. In: ICEGOV2017 Special Collection: E-Government Innovations in India. ACM Press, NY (2017) 9. Chatfield, A.T., Shlemoon, V.N., Redublado, W., Darbyshire, G.: Creating value through virtual teams: a current literature review. Australas. J. Inf. Syst. 18(3), (2014). https://doi.org/ 10.3127/ajis.v18i3.1104 10. HSC Results Without Exams: The Pros and Cons. The Daily Star. https://www.thedailystar. net/opinion/blowin-the-wind/news/hsc-results-without-exams-the-pros-and-cons-1975281. Accessed 18 Jan 2022 11. Viloria, A., Lis-Gutiérrez, J.P., Gaitán-Angulo, M., Godoy, A.R.M., Moreno, G.C., Kamatkar, S.J.: Methodology for the design of a student pattern recognition tool to facilitate the teachinglearning process through knowledge data discovery (big data). In: International Conference on Data Mining and Big Data, pp. 670–679. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-93803-5_63 12. Schmidt, R., Möhring, M., Maier, S., Pietsch, J., Härting, R.-C.: Big Data as Strategic Enabler - Insights from Central European Enterprises. In: Abramowicz, Witold, Kokkinaki, Angelika (eds.) BIS 2014. LNBIP, vol. 176, pp. 50–60. Springer, Cham (2014). https://doi.org/10.1007/ 978-3-319-06695-0_5 13. Mavroeidakos, T., Tsolis, N., Vergados, D.D.: Centralized management of medical big data in intensive care unit: a security analysis. In: 2016 3rd Smart Cloud Networks & Systems (SCNS), pp. 1–5. IEEE (2016) 14. He, X.J.: Business intelligence and big data analytics: an overview. Commun. IIMA 14(3), 1 (2014) 15. Palem, G.: Formulating an executive strategy for big data analytics. Technol. Innov. Manag. Rev. 4(3), 25–34 (2014). https://doi.org/10.22215/timreview/773 16. Kowalczyk, M., Buxmann, P.: Big data and information processing in organizational decision processes. Bus. Inf. Sys. Eng. 6(5), 267–278 (2014). https://doi.org/10.1007/s12599-0140341-5 17. Hemapriya, M., Meikandaan, T.P., Bist, B.: Repair of damaged reinforced concrete beam by externally bonded with CFRP sheets. Int. J. Pure Appl. Math. 116(13), 473–479 (2017) 18. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014). https://doi.org/10.1109/ACCESS.2014.233 2453 19. Kubick, W.R.: Big data, information and meaning. Appl. Clin. Trials 21(2), 26 (2012)

330

F. T. Johora et al.

20. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 1165–1188 (2012). https://doi.org/10.2307/41703503 21. Asif, R., Merceron, A., Ali, S.A., Haider, N.G.: Analyzing undergraduate students’ performance using educational data mining. Comput. Educ. 113, 177–194 (2017). https://doi.org/ 10.1016/j.compedu.2017.05.007 22. CSE, IT Graduates Getting More Jobs. The Daily Star. https://www.thedailystar.net/frontp age/news/cse-it-graduates-better-fit-jobs-1820080. Accessed 18 Jan 2022 23. Daniel, B.: Big Data and analytics in higher education: opportunities and challenges. Br. J. Edu. Technol. 46(5), 904–920 (2015). https://doi.org/10.1111/bjet.12230 24. Khan, S., Shakil, K.A., Alam, M.: Educational intelligence: applying cloud-based big data analytics to the Indian education sector. In: 2016 2nd international conference on contemporary computing and informatics (IC3I), pp. 29–34. IEEE (2016) 25. The Disruption of Digital Learning: Ten Things We Have Learned. JOSH BERSIN. https:// joshbersin.com/2017/03/the-disruption-of-digital-learning-ten-things-we-have-learned/. Accessed 18 Jan 2022 26. Daniel, B.K., Butson, R.: Technology Enhanced Analytics (TEA) in Higher Education. International Association for the Development of the Information Society (2013) 27. Hanna, M.: Data Mining in the E-Learning Domain. Campus-Wide Information Systems (2004). https://doi.org/10.1108/10650740410512301 28. González-Martínez, J.A., Bote-Lorenzo, M.L., Gómez-Sánchez, E., Cano-Parra, R.: Cloud computing and education: a state-of-the-art survey. Comput. Educ. 80, 132–151 (2015). https://doi.org/10.1016/j.compedu.2014.08.017 29. Assunção, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A., Buyya, R.: Big data computing and clouds: trends and future directions. J. Parallel Distrib. Comput. 79, 3–15 (2015) 30. Liñán, L.C., Pérez, Á.A.J.: Educational data mining and learning analytics: differences, similarities, and time evolution. Int. J. Educ. Technol. High. Educ. 12(3), 98–112 (2015) 31. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Cidr, vol. 11, no. 2011, pp. 261–272 (2011)

Battle Card, Card Game for Teaching the History of the Incas Through Intelligent Techniques Javier Apaza Humpire, Maria Guerra Vidal, Miguel Tupayachi Moina, Milagros Vega Colque, and José Sulla-Torres(B) Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru {japazah,mguerrav,mtupayachim,mvegaco,jsulla}@unsa.edu.pe

Abstract. History plays a vital role in knowing their past and thus improving and not making the same mistakes in the future. However, history teaching has some shortcomings, such as memorizing history without interpretation, which is boring, uncomfortable, and non-interactive when students learn from books and articles. This work aims to cover this gap with the development of Battle Card, a card game based on the Inca empire and implemented with artificial intelligence algorithms, such as the seek and flee and arrive movement algorithms, the algorithm DepthFirst Search, and a state machine. The Scrum framework was applied to manage the project and maintain constant communication with the development team. In addition, Phaser.io was used as a framework for game development on desktop and mobile platforms. Tests were carried out, and the results were finally evaluated through game quality metrics. In conclusion, a fun app about the history of the Incas was obtained with an acceptable level of playability that educators and interested parties can use. Keywords: Card game · Teaching · Incas · Artificial intelligence · History

1 Introduction History is a fundamental social science that allows us to know our past and improve in the future with the experiences or lessons learned from our ancestors and ourselves [1]. In addition, history allows us to know the place where people live and promote patriotism. Likewise, history education is important, but many times in schools, history is taught so that students have to memorize out of obligation and not because they want to learn. The education of history has a significant role in our society to develop and create a feeling of patriotism, even more so on the school stage. Children and adolescents find it boring to learn from books, stories, articles. For them, an interactive and effective way of learning is through games with themes related to what they do in class, such as, for example, the history of the Incas, their Quechua language [2], and the conquest of the Spanish, among other topics. In the long term, it is difficult to remember the names of all the Incas unless it is from a game. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 331–346, 2022. https://doi.org/10.1007/978-3-031-10467-1_21

332

J. A. Humpire et al.

To solve the problem of the lack of patriotism and the lack of didactic and interactive teaching in the historical context, it has been necessary to apply the development of a card game. History is a fundamental social science that allows us to know our past and improve in the future with the experiences or lessons learned from our ancestors and ourselves. In addition, history allows us to know the place where people live. For these reasons, history education has a significant role in developing our society and creating a feeling of patriotism, even more so in the school setting. However, many times in schools, a traditional model of memorization and repetition is applied. The teacher carries out the lessons, and the students carry out the same activities that are boring for them. In addition, these activities are often carried out ‘under pressure’ to obtain a grade, rather than being activities of exploration, analysis, reflection, and interpretation [3, 4]. For children and adolescents, an interactive and effective way of learning is through games with themes related to what they do in class, such as, for example, the history of the Incas, their Quechua language, and the conquest of the Spanish, among other topics. In the long term, it is difficult to remember the names of all the Incas unless it is from a game. To solve the problem of the lack of didactic and interactive teaching in the historical context and to promote patriotism, it has been necessary to apply the development of a card game. The first step in teaching history is to contextualize students about the period’s most significant events and characters. The main objective of this work is to develop a card game [5], “Battle Card,” where each card represents the Incas and their detailed information. Having thus, the following specific objectives are (1) Create a game that is capable of dynamically teaching the history of the Empire of the Incas and (2) Identify the influence of a video game as a pedagogical tool in student learning. Also, this game implements artificial intelligence (A.I.) algorithms so that the game gives the impression that it is being played against a natural person. Still, in reality, it is being played with the same game, such as movement and pathfinding algorithms such as Depth. First Search. Likewise, a state machine has been implemented to control the games, cards, and levels’ behavior and animation. For the development of Battle Cards, the Scrum framework was applied for the reason that this project is a short-term project and requires a lot of communication from the team members. In addition, the Trello tool was used to manage the tasks of the different development sub-teams, for example, the design team that is in charge of carrying out the storytelling and the design of the cards, on the other hand, there is the development team which is in order of programming the motion and artificial intelligence algorithms, unit tests, performance, acceptance, and launch of the video game. The present work is structured as follows: in Section II, the main definitions for the related work are detailed; in Section III, Materials and Methods were made; in Section IV, the results are presented; finally, the conclusions and future works are presented.

Battle Card, Card Game for Teaching the History

333

2 Related Work In card games, each player has a deck of cards, where each card symbolizes a character or being that has abilities. The game’s objective is to win battles to level up and obtain new cards. Each battle or battle grants specific experience that allows increasing the level; in addition, new cards and more knowledge can be accepted in the case of winning. According to [6], the game is a fundamental means for the structuring of language and thought; it acts systematically on the psychosomatic balance; enables learning of vital significance; reduces the feeling of gravity in the face of mistakes and failures; invites participation by the player; develops creativity, intellectual competence, emotional strength, and personal stability. In short, it can be affirmed that playing constitutes a fundamental strategy to stimulate the integral development of people in general. Currently, video games are one of the most used tools in entertainment. Still, not all games are designed to entertain, despite needing specific skills such as adaptability or communication. However, video games focused on education provide the desire to learn, as in [7], where an attractive turn-based card game has been created that aims to incorporate knowledge of mathematics and the history of mythology indirectly. The authors thought that the best way to make a game is based on the currently successful mobile/tablet games. The game is aimed at primary and secondary school students. The proposed educational content is about some concepts in mathematics, such as adding fractions and adding percentages. The game’s central theme will focus on mythological history to reveal critical myths. This allows the students to practice mathematics and learn about mythology without feeling that they are doing it while having fun in the game. The implementation technology used is the Unity game engine with the C# programming language to guarantee the deployment of the game on mobiles and tablets. Regarding the results obtained, it is mentioned that it is necessary to make a new face-to-face evaluation to observe the students directly and to be able to verify the effectiveness of the game. According to the user test, the main objective of the video game is to entertain and analyze the gameplay; the authors have achieved the expected result. Although there are card games related to education as well as the use of engines such as Unity, it must be taken into account that there are games implemented in artificial intelligence, which are widely used when there is only one player against the machine, as in [8], it is a turn-based card game using heuristics implemented in C# language. This is a strategy game where the player and the computer fight in a duel to reduce the opponent’s life to 0. Also, the computer is controlled via a GameManager script, which will allow consulting the turn, such as cards that will be played, so to facilitate the best cards, two heuristics have been used, one to maximize the allied side, where the state of the game is valued as attack and health, while the other heuristic is to differentiate between sides and search the maximum advantage for part of the computer. In addition, the Monte Carlo search algorithm was also used, which allows using the best move from among different options. The paper by Garcia-Sanchez et al. [9] proposes the implementation of an automatic player or autonomous agent capable of playing Hearthstone based on search-based artificial intelligence techniques that, given a game state, are capable of establishing subsequent conditions and taking the best possible action. This work also uses the Monte Carlo and genetic algorithms to computationally adjust good results, improving the agent to

334

J. A. Humpire et al.

choose the move that leads to a better state without considering states beyond the next. Another work related to this Hearthstone game is [10], where the algorithm’s performance is analyzed through parameters such as total wins, time, tree status, heuristics, and the number of iterations. According to Herreira [11], in his work, he developed a Trading Card Game (TCG) with artificial intelligence; it was created with the Unity game engine; this software has several execution modes: player against player, player vs. A.I., and A.I. vs. A.I. An expert system was developed that emulates the decision-making ability of a human. The Rete algorithm for efficient pattern recognition was applied to implement a rule production system. Good results were obtained in effectiveness and speed. In the same way, [12] he developed the Galactic Arena game that contains an artificial intelligence system capable of playing with any combination of cards. The discrete backpack algorithm was used. This algorithm looks for the best solution among a finite set of possible solutions to a given problem. On the other hand, artificial intelligence algorithms for finding a solution are important in games. According to [13], in his work, he made a puzzle called Babylon Tower, a three-dimensional puzzle with six discs; each disc consists of six columns of small balls along the side in six different colors. It is mentioned in the article that two methods can be used to find a solution to the Babylon Tower game, namely disk by disk and column by column. The method proposed in this research is the in-depth search column by column. Also, in the paper by Koracharkornradt [14], he made a block programming game called Tuk Tuk for children of kindergarten, primary and secondary levels. In this game, students can create computer programs in a block-based language to control a car to complete a specific task, earn money, reach the next level, and unlock new coding blocks. Concepts of Depth First Search (DFS) and Breadth-First Search (BFS) have been applied in the game so that children can develop their computational thinking.

3 Materials and Methods The materials used were case tools that have helped manage the project, mainly we used Trello, and the chosen framework was Scrum, likewise, Artificial Intelligence Algorithms. 3.1 Trello Trello was used to support the Scrum process [15]. This application bases its methodology on Kanban’s Japanese work system, which incorporates boards and cards in a physical workspace to coordinate different activities [16]. Trello helps organize and improve the dynamics of virtual teams since the main attributes it has are its ease of use and versatility. Each requirement that is defined as a list card must have a name of the requirement, in addition to a label that characterizes that item and can be identified more quickly; the team members who will develop said task must also be attached, is to keep order, priority, and weight of the task.

Battle Card, Card Game for Teaching the History

335

After performing the tasks, the task is changed to the list; this is to have a new status of the task and see the progress in real-time of all the statuses of the requirements; when the task has been completed, proceed to change to the lists “Done,” which means done or finished. 3.2 Artificial Intelligence Algorithms Artificial intelligence algorithms will be applied to the proposed game, which will help us solve specific game problems such as the behavior of the cards, that is, player vs. artificial intelligence (A.I). The algorithms to be used are (a) movement algorithms: “Seek and Flee,” “Arrive,” and (b) search algorithms such as “Depth First Search.” Seek and Flee algorithm. Seek allows you to match a start position to an end position with a specific acceleration. Flee is the opposite of the Seek technique, moving away as fast as possible from the endpoint. The advantage of these algorithms is their complexity, which is O(1). Arrive algorithm. This algorithm allows an initial position to coincide with a final position, optimally calculating the time so that its arrival is as fast as possible. The advantage of this algorithm is its complexity, which is O(1). Depth-first search algorithm. This search algorithm allows us to search for an element within a tree or graph. The algorithm starts searching from the root node and branches down only one path until reaching the depths; after that, it returns and continues with the following path until obtaining a search result. The advantage of DFS is that it requires less memory since only the nodes on the chosen path are stored. 3.3 Scrum Scrum is a project management framework where people can tackle complex processes and finish on time. In addition, it establishes clear objectives and deadlines, the Scrum methodology is time-based, and each Sprint has a clearly defined objective and duration. The individual effort of each team member is visible; during the Daily Scrum and Sprint Review meetings, the Product Owner and Scrum Master can track the contribution of each team member during the Sprint. Scrum improves communication; all team members participate in Scrum meetings and are motivated to express their opinion and contribute to all decisions. The team can easily communicate and remove obstacles that get in the way. 3.4 Sprints What was done in the sprints is detailed, based on the planning of the project and resolution of the problem. Sprint 1. The user stories help obtain the product owner’s requirements; they help organize conversationally what the interested party requires; each user story must contain an identifier, the respective name, the priority, the individual description, and the points of acceptance. In our project, we were able to obtain 13 user stories, an example of user stories can be seen in Table 1.

336

J. A. Humpire et al. Table 1. User history

Id

HU-001

Name

Start

User

Player

Priority

High

Risk

Medium

Difficulty

5

Description

As a player, I want to enter the new game room to allow me to start playing against the machine

Acceptance

Redirect to the game room when the player presses the start button All associated multimedia elements must be loaded

In the project, the programming in javascript language was developed, and together with the support of the phaser.io framework, avatars were used to represent the opponents (see Fig. 1), assets such as cards or the inverse letter of which you get new cards in the game (see Fig. 2).

Fig. 1. Game main menu.

Artificial Intelligence Techniques. In the project, the game was developed based on artificial intelligence techniques. The first algorithm we produce is ‘seek’ and ‘flee,’ which are in charge of approaching or moving away from the objective. In the following Fig. 3, you can see how the aim is the board, and the card moves to that position [17].

Battle Card, Card Game for Teaching the History

337

Fig. 2. The initial game of the game.

Fig. 3. Seek move from card to board.

The second algorithm that we develop is the ‘arrive,’ which is in charge of approaching the target. Still, based on a radius, which, if it reaches that radius, increases the speed, in Fig. 4, you can see how the target is the enemy avatar. The card’s effect is attacking the avatar; however, when its impact detects proximity to the target, it increases its speed [17].

338

J. A. Humpire et al.

Fig. 4. Arrive Movement of the Effect of the Card to the Enemy.

Sprint 2. A redesign was carried out in the cards. What was chosen was to create three cards; the first group is the water type, the second fire, and the last ice; each type has specific characteristics with advantages, unlike the others. The initial design had three powers, and it was decided to add one more power at the top; the cards will have their type represented by an icon in the upper right (see Fig. 5).

a) Water-type card

b) Fire-type card

c) Ice-type card

Fig. 5. Type cards.

The Battle Rule Between Cards. The game rules were added, where it is described what each card can do and what actions are allowed in the game, some of the rules that may exist when attacking between a card in the deck against an enemy card are shown in Fig. 6.

Battle Card, Card Game for Teaching the History

339

Fig. 6. Rule of attack of a water card with a fire.

Depth First Search Algorithm. The third algorithm that was used was the Depth-first search; this one is in charge of looking for all the possible routes of each node until its maximum Depth [18]; in the game, it helps to make the best decision looking for the best possible way. These decisions are performed by the machine each shift. To know which is the best route, the algorithm evaluates each one in the rules given by the game; once it has all the results of each branch, it proceeds to assess which is the best option; once this result is obtained, it returns to the primary function the best route found, in Fig. 7, you can see how the algorithm distributes all the cards within a tree. After applying the Depth-first search algorithm, the best route that the algorithm took after analyzing all the results was colored. These decisions are performed by the machine each shift.

Fig. 7. Execution of the depth first search algorithm.

Sprint 3. A state machine to control the behavior and animations of the video game has been implemented as a decision-making algorithm. In Fig. 8, the states and their transitions are shown according to the player’s different actions. Statuses have also been implemented for the game and the levels.

340

J. A. Humpire et al.

Fig. 8. The state machine for the cards.

The implementation of three levels was carried out, which correspond to the natural regions of the coast, mountains, and jungle belonging to the Tahuantinsuyo, the empire of the Incas. In each level, the difficulty increases because the available cards decrease, making it difficult for the player to restrict the possibility of drawing another card from the deck to save themselves from an enemy attack. For the first level (coast), the number of available cards is 6, for the second level (mountain), the number of available cards is 5, and for the third level (jungle), the number of available cards is 3. To pass a level, it is necessary to beat all the enemy cards on the board. The three levels are shown in Figs. 9, 10, and 11.

Fig. 9. Videogame level 1 (coast).

Battle Card, Card Game for Teaching the History

341

Fig. 10. Videogame level 2 (mountain).

Fig. 11. Videogame level 3 (jungle).

The video game instructions were made in the main menu, where the controls are indicated, the game rules for each type of card (water, ice, and fire), the battle rules, and finally, the instructions on how to pass the level (see Fig. 12).

342

J. A. Humpire et al.

Fig. 12. Videogame instructions.

4 Results For this work, three tests were carried out: • Unit tests, • Test of performance, • Acceptance Tests. 4.1 Unit Tests The tests were developed using the Jasmine tool, a JavaScript-based library for unit testing. It consists of several parts: The Corridor, Test framework, and Plugins. All unit tests were successful, the methods used within the game were correct. In Fig. 13, the results of the unit tests are shown.

Fig. 13. Unit test results.

4.2 Test of Performance The Performance Tests carried out were aimed at: Testing the performance of the BattleCards game and Identifying the delay time in the execution of the game. The tests were developed using the Puppeteer tool, a Node.js library. It provides a high-level API

Battle Card, Card Game for Teaching the History

343

that allows automating actions on Google browsers: Chrome and its open-source version Chromium. The results obtained for the game’s loading time are 5,386 s. In Fig. 14, the results of the Test of performance are shown.

Fig. 14. Test of performance results.

4.3 Acceptance Tests The acceptance test is the last test action before deploying the software. The objective of the acceptance test is to check if the game is ready and can be used by users to perform the functions and tasks for which it was designed. For this Test, the nine user stories were carried out. To verify the initial results, it has been based on a series of evaluation metrics to measure the quality of the game [19], test all the functions and mechanics of the game, and thus control the quality. For this, a table of the metrics to be used has been made and is detailed in Table 2. According to what was established, multiple interviews were conducted with schoolchildren to evaluate the video game, and pre-test and post-test surveys were also conducted for the game’s usability. According to what was established, multiple interviews were conducted with schoolchildren between the ages of 10 and 12 at the primary level to evaluate the video game; these surveys were of satisfaction, pre-test and post-test; the latter to assess the game’s usability. The analysis of the satisfaction survey carried out on schoolchildren had the following results: 80% found it attractive, 60% clearly understood the dynamics of the game; In addition, they were asked about the features to improve in the game, the main features being graphic design and visual effects. Finally, 100% of schoolchildren commented that they would play it again, thus concluding that the proposed educational game is exciting.

344

J. A. Humpire et al. Table 2. Result of gameplay metrics Nombre de la métrica

Effectiveness Target effectiveness

Efficiency

Propósito

What percentage of goals and X = 1−16.66% challenges have been X = 83.33% successfully achieved?

Goal completeness

What percentage of goals and X = 5/6 challenges have been X = 0.83 completed?

Finish time

How long does it take for the X = 7 (min.) player to achieve a goal?

Efficiency relative to the user How efficient is an expert level player versus a new player? Flexibility

Results

Accessibility

X = 6/10 X = 0.6

What percentage of goals are X = 0/6 achieved using different X=0 forms of interaction other than those used by default?

Finally, an average of all the factors described in the metrics were obtained, obtaining an initial result of 71.8% of effectiveness, of efficiency 60%, in flexibility metrics, there are no goals with different interactions, in metrics of 100% safety, 90% fault tolerance and finally 80% satisfaction.

5 Conclusions A Battle Card game was developed based on the Inca empire and implemented with artificial intelligence algorithms, such as the seek and flee and arrive movement algorithms, the Depth First Search algorithm, and a state machine. The Scrum framework was applied to manage the project and maintain constant communication with the development team. In addition, Phaser.io was used as a framework for game development on desktop and mobile platforms. Tests were carried out, and the results were evaluated through game quality metrics, obtaining a result of 71.8% of effectiveness, of efficiency 60%, in flexibility metrics, there are no goals with different interactions, in metrics of 100% safety, 90% fault tolerance and finally 80% satisfaction. The results showed that educators and specialists could use the video game “Battle Cards” as a pedagogical tool in student learning. As future work, it is suggested to apply other artificial intelligence algorithms that optimize search and selection operations. During the project’s development, different points to consider as future work have been identified: The design of the cards could be improved in appearance and adapted to any device since it could be developed for mobile devices for both Android and iOS. The structure of the instructions like those of the board will have future versions depending on the difficulty levels that can be added to the game.

Battle Card, Card Game for Teaching the History

345

A virtual assistant could be added to help in the detailed explanation of the game, so users can clearly understand how the game works. In new versions, more rivals could be added to the game. Develop a new functionality to play online with other opponents who live in Peru or other countries in the world. Add the multiplayer option, that is, two players against the machine.

References 1. Rahimi, F.B., Kim, B., Levy, R.M., Boyd, J.E.: A game design plot: exploring the educational potential of history-based video games. IEEE Trans. Games 12(3), 312–322 (2020). https:// doi.org/10.1109/TG.2019.2954880 2. Zapata-Paulini, J.E., Soto-Cordova, M.M., Lapa-Asto, U.: A mobile application with augmented reality for the learning of the Quechua language in pre-school children. In: 2019 IEEE 39th Central America and Panama Convention, CONCAPAN 2019. IEEE (2019). https://doi. org/10.1109/CONCAPANXXXIX47272.2019.8976924 3. Gallego, F.A., Näslund-Hadley, E., Alfonso, M.: Changing pedagogy to improve skills in preschools: experimental evidence from Peru. World Bank Econ. Rev. 35, 261–286 (2021). https://doi.org/10.1093/WBER/LHZ022 4. Samuelsson, J.: History as performance: pupil perspectives on history in the age of ‘pressure to perform.’ Education 3–13(47), 333–347 (2019). https://doi.org/10.1080/03004279.2018. 1446996 5. Phillip, N.A., Permana, S.D.H., Cendana, M.: Modification of game agent using genetic algorithm in card battle game. In: IOP Conference Series: Materials Science and Engineering, vol. 1098 (2021). https://doi.org/10.1088/1757-899x/1098/6/062011 6. Caserman, P., et al.: Quality criteria for serious games: serious part, game part, and balance. JMIR Serious Games 8(3), e19037 (2020). https://doi.org/10.2196/19037 7. Williams Monardez, F.J.: Clash of Myths: diseño de un juego de cartas (2018) 8. Niklaus, J., Alberti, M., Pondenkandath, V., Ingold, R., Liwicki, M.: Survey of artificial intelligence for card games and its application to the Swiss game Jass. In: Proceedings 6th Swiss Conference on Data Science, SDS 2019 (2019). https://doi.org/10.1109/SDS.2019. 00-12 9. García-Sánchez, P., Tonda, A., Fernández-Leiva, A.J., Cotta, C.: Optimizing hearthstone agents using an evolutionary algorithm. Knowl.-Based Syst. 188, 105032 (2020). https://doi. org/10.1016/j.knosys.2019.105032 10. Marcos Alvés, Á.F. de., et al.: Análisis de la técnicaMonte Carlo Tree Search’aplicada como Inteligencia Artifcial en el videojuegoHearthstone: Heroes of Warcraft’ (2020) 11. Herrerías Santos, J.M., María, J.: Diseño y desarrollo de un de juego de estrategia de cartas coleccionables (TCG) con inteligencia artificial (2018) 12. Fin De Grado, T., Gimeno, A.B.: Galactic Arena, videojuego de cartas coleccionables de un solo jugador desarrollado en Unity 3D (2017) 13. Rahmat, R.F., Harry, Syahputra, M.F., Sitompul, O.S., Nababan, E.B.: The depth-first search column by column approach on the game of Babylon Tower. In: Proceedings of the 2nd International Conference on Informatics and Computing, ICIC 2017 (2018). https://doi.org/ 10.1109/IAC.2017.8280613 14. Koracharkornradt, C.: Tuk Tuk: a block-based programming game. In: IDC 2017 - Proceedings of the 2017 ACM Conference on Interaction Design and Children (2017). https://doi. org/10.1145/3078072.3091990

346

J. A. Humpire et al.

15. Parsons, D., Thorn, R., Inkila, M., MacCallum, K.: Using Trello to support agile and lean learning with Scrum and Kanban in teacher professional development. In: Proceedings of 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering, TALE 2018. IEEE (2019). https://doi.org/10.1109/TALE.2018.8615399 16. Pisoni, G., Hoogeboom, M.: Investigating effective dynamics of virtual student teams through analysis of Trello boards. In: Proceedings of ICETA 2019 - 17th IEEE International Conference on Emerging eLearning Technologies and Applications. IEEE (2019). https://doi.org/ 10.1109/ICETA48886.2019.9039972 17. Lucas, S.M.: Computational intelligence and AI in games: a new IEEE transactions. IEEE Trans. Comput. Intell. AI Games 1(1), 1–3 (2009). https://doi.org/10.1109/TCIAIG.2009.202 1433 18. Sunil, B., Naveen Kumar, M.R., Gowrishankar, B.N., Prema, N.S.: A Comparative Study on Various Search Techniques for Gaming Applications. In: Ranganathan, G., Chen, J., Rocha, Á. (eds.) Inventive Communication and Computational Technologies. LNNS, vol. 89, pp. 1201– 1205. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0146-3_116 19. Amacifuén Cerna, L.B.: Métricas pedagógicas y tecnológicas para medir la calidad de los juegos on-line. Univ, San Ignacio Loyola (2017)

Barriers for Lecturers to Use Open Educational Resources and Participate in Inter-university Teaching Exchange Networks Paul Greiff(B) , Carla Reinken, and Uwe Hoppe Osnabrück University, Osnabrück, Germany [email protected]

Abstract. Concepts of Inter-University Teaching Exchange Networks (IUTENs), i.e. the exchange of courses between different universities as well as the import of Open Educational Resources (OERs) offer the potential for innovative teachinglearning scenarios in higher education. Depending on the organization, discipline, and objectives, this opens up a variety of possibilities for shaping teaching. However, this change places new challenges on lecturers and involves uncertainties. For this reason, it is necessary to investigate barriers to these new concepts, as they could lead to a lack of acceptance and thus a low intention to participate. We examine barriers by conducting interviews with 19 university lecturers and analyzing them using qualitative content analysis. The survey aims to identify and categorize barriers related to the use of OERs as well as for participation in IUTENs from the perspective of teachers. Key findings show that the main barriers to using OERs are no perceived added value and issues related to quality. In terms of IUTENs, it appears that adherence to traditional teaching formats is the main barrier to participation in exchange networks. The categories can provide important indications of areas that need special attention in order to promote lecturers’ willingness to participate in IUTENs and using OERs. Keywords: Barriers · Open Educational Resources · Teaching exchange networks · Lecturers

1 Introduction The digitalization of analog processes has advanced in recent decades to become a ubiquitous component of social and work life. The way people communicate, exchange information, develop and understand disciplinary knowledge has changed due to the development and availability of digital technologies [1]. The degree of digitalization is also steadily advancing in the field of university teaching. It includes socio-technical, socio-economic, and didactic changes [2]. In higher education institutions (HEIs) the digital transformation (DT) is characterized by a high degree of dynamism, accelerated digitalization processes, and changes in the transfer of knowledge and expertise. Further characteristics are changed teaching and learning scenarios, role and requirement profiles, and legal frameworks [3]. Many technical solutions are available which support © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 347–358, 2022. https://doi.org/10.1007/978-3-031-10467-1_22

348

P. Greiff et al.

the digitalization in HEIs, and those technologies are proven to exert a decisive impact on teaching types [4]. Digitalization in teaching opens up new possibilities and many advocates emphasize the potential of low-threshold use of digital resources to share knowledge actively and, above all, cooperatively [5]. In this context, cross-university teaching networks can make teaching materials available to many students and lecturers from different institutions. Moreover, student heterogeneity has increased as the Bologna Process promotes a wide range of study programs for students regardless of their social and educational background in European HEIs [6, 7]. As heterogeneity of students increases, so does the individuality of their study needs and goals [8]. The growing number of students and their higher expectations, as well as the increased demand for internationalization at HEIs, is contrasted with a relatively constant number of teachers. In order to adequately meet the individual goals, expectations and needs of students, IUTENs can serve as a solution approach. Participation in IUTENs, which means the exchange of courses between universities, brings many advantages [9]. The number of courses that can be chosen increases, synergies can be created through the use of distributed courses and the knowledge behind them, and a division of labor among lecturers is possible. Furthermore, the global development of the OER movement in terms of learning content and educational resources plays a key role in the domain of education [10]. OERs have gained increased attention in recent years due to their potential to transcend educational, demographic, economic, and geographic boundaries and promote lifelong as well as personalized learning [11]. The term OER was first introduced at a conference hosted by UNESCO in 2000 and was promoted in the context of providing free access to educational resources on a global scale. The most common definition of OER is, “digitized materials offered freely and openly for educators, students and self-learners to use and reuse for teaching, learning and research” [12]. OERs have emerged as a concept with great potential to support educational change because their essential advantage is that they can be shared very easily worldwide via the Internet [13]. Important to note is that there is only one key differentiator between an OER and any other educational resource: its license. Thus, an OER is simply an educational resource that includes a license that facilitates reuse and possibly adaptation without the need to first obtain permission from the copyright holder [13]. In summary, OERs can range from individual images to entire courses, with licensing determined by the author and requiring no additional agreement with others. IUTENs, on the other hand, are exclusively whole courses and a form of institutionalized collaboration. IUTENs cooperate not only in the exchange of courses but also in exams, evaluations and so on. These new teaching formats bring changes but also pose new challenges for lecturers [14]. In this context it is important to investigate possible barriers, because based on this, potential solutions can be developed and obstacles can be overcome. This can facilitate the implementation of OERs and IUTENs and strengthen their acceptance by the stakeholders. Therefore, the aim of this research is to survey lecturers of different

Barriers for Lecturers to Use Open Educational Resources

349

universities about barriers that are likely to have an influence on the successful introduction and sustainable use of OERs and IUTENs. For the described problem, the following research question is formulated. What potential barriers do lecturers see by using OERs and participating in IUTENs? In order to be able to answer this research question, semi-structured interviews were conducted with lecturers from different universities. For this purpose, 19 qualitative interviews were carried out to collect potential barriers with regard to the adoption and use of OERs and IUTENs. The willingness to participate in an IUTEN, as well as the use of OERs, is currently still very restrained. OERs, for example, are very high on the agenda of social and inclusion policies and are supported by many stakeholders in education. However, OERs have still not reached a critical threshold among lectures for use in higher education [15]. The perspective of the lecturers could shed light on the reasons for these problems. The barriers identified can provide educational institutions with information on which framework conditions need to be optimized or created. Specific requirements can be derived that should be taken into account when implementing IUTENs and OERs.

2 Open Educational Resources and Teaching Exchange Networks Digitalization in higher education sets a trend of digital support in students and teachers daily working routine and thus have a decisive impact on the type of teaching [16]. OERs and IUTENs open up new ways of presenting and accessing teaching materials, designing courses, or structuring and processing studies. Such new teaching formats and innovations in teaching and learning require new teaching-learning competencies from lectures and confront them with new challenges [14]. In this process, lecturers are expected to engage in something that triggers uncertainty and which they are not obliged to do. They are challenged to change their teaching habits (towards a more moderative and supportive teaching style) and to try new forms of assessment. For technical support, it may be necessary to work in cooperation with external departments and prepare courses together [17]. At the same time, their teaching becomes more transparent, as both OERs and exchange networks can reach a larger student body than the traditional lecture in the auditorium or classroom [17]. For this reason, there is a need to examine barriers, because they could lead to a lack of acceptance and therefore to a low intention to use. Barriers in digital higher education teaching are defined as factors “[that] decrease the desire to teach online by discouraging, constraining, providing sources of dissatisfaction, and reducing the reward/effort ratio” [4]. New learning formats in higher education require a certain level of user acceptance in order to be able to sustainably survive in the education sector and above all to guarantee long-term added value for students and teachers [18]. Any barrier that can be identified and thus avoided preventively before the introduction of innovation contributes to achieving this goal. Regarding to that, to accomplish the maximum benefit of novelty in university teaching, overcoming barriers is necessary. To meet the current developments in teaching at universities, it is becoming increasingly important to create IUTENs and make teaching materials available for all students and teachers from different universities and even across different countries [11]. Multimedia information and communication systems (such as Learning Management

350

P. Greiff et al.

Systems) open up new options for designing cooperative teaching and learning environments for university education. There is no generally valid definition for university networks within higher education teaching and research and, moreover, this hardly has been researched beyond case studies so far [19]. In the area of businesses social and economic exchanges are referred to as exchange rings [20]. In this context, the term “barter” was characterized to refer to exchange and was defined as a moneyless market exchange [21]. Following this principle, IUTENs can also be considered as barter. IUTENs are not a new topic per se. Already in the 1990s and the early 2000s, there were projects in Germany such as VIROR, ULI, or RION, which were often funded by the state to create such teaching associations. The response was in many cases very positive, but after the funding period, many projects did not continue for various reasons. In order to continue supporting students in their individual needs and goals with the resources provided, additional teaching formats and teaching materials are offered. Participation in inter-university teaching exchange networks i.e. exchange of courses on key topics at different colleges and universities can have a variety of effects: increasing the number of courses, synergies by leveraging distributed course offerings and the teachers’ knowledge, and also a division of labor among the participating lecturers [10]. This happens in a bottom-up process, which requires commitment and motivation from lecturers, in the sense of an exchange culture of “give and take”. The orientation towards OERs such as single or complete lectures, seminars, videos, podcasts, etc. offers the possibility to create own materials and make them available licensed on a publicly accessible platform. Course materials can be individually tailored by lecturers and recommended to students. Linked to the topic of OERs, questions must be asked whether and how new forms of teaching and learning can be designed, how studies with compulsory attendance can be supplemented by other forms of teaching, how competencies can be bundled, profiles expanded, or main focuses strengthened through inter-university cooperation. While OERs serve to enrich individual courses, and can be used thematically, IUTENs are permanently organized and can establish themselves as a teaching format due to their long-term nature. In this context, Covid-19 is proving to be a driving force. Due to the ongoing pandemic situation, which almost excludes face-to-face courses with students, lecturers are challenged to operate with the prescribed conditions. That means continuing to offer students “online” courses in a way in which studying continues to be satisfying and engaging.

3 Method For this study, we interviewed 19 lecturers about their using behavior of OERs and their participation in IUTENs. The focus of the survey was the production and export or import of OER as well as the willingness to participate in IUTENs or the experienced benefits of the latter. This survey has a strongly explorative character, as the goal is to generate many diverse types of perspectives in the form of freely formulated answers. This method was chosen because it is fundamental in an exploratory approach to ask the opinions and expectations of the participants as freely and as unbiased as possible. The interviews were conducted between December and February 2019/2020. Of the participants, seven

Barriers for Lecturers to Use Open Educational Resources

351

were female and twelve were male. Among the disciplines represented are, for example, biology, law, business informatics, human sciences, and mathematics. All interviews were conducted in German, as this is the native language of the interviewees. This ensured that all questions could be easily understood and answered. The interviews lasted an average of 45 min. The interview guideline starts with questions about the profession of the interviewee and the format of her/his own teaching. This is followed by questions about incentives and barriers for the use of external materials (OERs) and IUTENs. Since incentives have already been considered and published in a previous research, this paper will focus exclusively on barriers. An overview of the questions relevant to this paper can be found in Table 1. Table 1. Excerpt of the interview questions. Initial question

Additional follow-up question

In which department and subject area do you work? What form of courses do you hold? (lecture, seminar, webinar, etc.) How many students do you teach on average in a semester? Are you familiar with the term OER? Do you already use OERs?

If no: Explanation of open educational resources If yes: How? Do you also share your own content? Where do you get your content? Where do you publish your content? (in closed university systems e.g. Stud.IP with Login or open channels e.g. Youtube) For what reason? If no: How open are you to the idea of using and sharing other materials (OERs) and why?

Where do you see possible barriers for the use of OERs? Do you know of any inter-university teaching If yes: exchange networks? Are you a participant/initiator in an exchange network? What general conditions do you experience? (advantages/ disadvantages) If no: A brief explanation of exchange networks is given and an example is mentioned What hinders you to participate in an exchange network? Where do you see potential barriers to participate in an inter-university teaching exchange network?

352

P. Greiff et al.

Respondents were asked whether they were aware of OERs and IUTENs or whether they use OERs or participate in an IUTEN. Subsequently, the main question was about barriers that hinder the use of OERs or to participate in an exchange network. All interviews were recorded, transcribed, and analyzed inductively according to Mayring’s qualitative content analysis [22]. The process of analysis was divided into four phases. During transcription, a rough word count and classification were created to identify statements about OERs and IUTENs. First, the responses were classified according to the questions and then paraphrased (phase 1) [22]. This is followed by initial abstraction through generalization to allow gradual reduction and subsumption of subcategories at a scientific level [22]. Paraphrases were generalized to core sentences at an appropriate level of abstraction (phase 2). Subcategories were inductively formed from the given statements in order to be able to include as much content as possible in the analysis. In the third phase, the first reduction was made by shortening semantically identical core sentences and those that were perceived as not contributing significantly to the content. Finally, as the second reduction, the core sentences with similar or identical sentences were grouped together and thus classified into categories (phase 4). In doing so, care was taken to ensure that the categories are presented as precisely, disjunctively, and comprehensively as possible [23]. In Table 2, these four phases are outlined by a concrete example taken from the survey. Table 2. Example of the four phases procedure P1: Paraphrase

P2: Core sentence

P3: First reduction

P4: Category (2nd reduction)

We usually have certain standards at the locations with regard to the scope of the course in credit points. And that doesn’t necessarily harmonize

Standardization of credit points is not consistent at all universities

No standardized allocation of credit points

Unclear credit recognition

4 Results 4.1 Findings The focus was deliberately placed on barriers for lecturers in order to be able to derive solutions that can increase the use of OERs as well as participation in IUTENs. Results and findings of our qualitative analysis of the interviews are presented in Table 3. The table contains the main categories that were derived from the set of questions as well as the categories that emerged from the core sentences after the first and second reduction. In addition, a short explanation is given for each category and the corresponding number of respondents who named the respective category. Categories that were only mentioned once are not covered in Table 3.

Barriers for Lecturers to Use Open Educational Resources

353

Table 3. Categories of barriers Main category (question)

Categories (after 1st and 2nd reduction)

Explanation

Number of mentions (once per interview)

OER barriers

No added value

The added value of OERs not evident by lecturers

7

Quality issues

Doubts about quality, quality assurance, and the various quality expectations of OERs

6

Lack of knowledge

Lack of knowledge 4 about conditions, availability, and general information about OERs

Fear of being overwhelmed

Fear of being overwhelmed by an intransparent search structure and potentially by an overwhelming offer of OERs

4

Adherence to traditional teaching formats

Lecturers do not want to give up the existing pool of tasks and traditional teaching

3

High effort

Fear of high effort due to the use and creation of OERs

3

Fear of criticism

Fear of criticism of own OERs by colleagues and peers

2

Adherence to traditional teaching formats

Existing teaching formats (face-to-face teaching) are preferred and a change to this is met with resistance

8

Unclear credit recognition

Unresolved credit recognition (credits, teaching load)

3

IUTEN barriers

(continued)

354

P. Greiff et al. Table 3. (continued)

Main category (question)

Categories (after 1st and 2nd reduction)

Explanation

Number of mentions (once per interview)

Imbalance between production and use

Lecturers fear an imbalance between production and use (free-rider behavior)

3

Individuality

Lecturers have specific requirements for their own lectures and want to bring in their own interests

2

Discipline specificity

Own lectures too specific for exchange transfer

2

Quality issues

Own quality standards too high, unequal quality expectations

2

As can be seen in Table 3, the category no added value is the most frequently mentioned barrier category with seven mentions. The interviewed lecturers, therefore, find that no added value is to be expected from the use and creation/sharing of OERs, or at least they do not currently see it. In second place is the category quality issues, with six mentions. Here, lecturers stated that they had concerns about the quality of OERs. In addition, they often stated that they were uncertain about compliance with quality standards and pointed out that each lecturer had a different understanding of quality. With four mentions, the category lack of knowledge follows in third place. The respondents stated that they simply had no knowledge about OERs and their availability and creation. Also in third place with four mentions is the category fear of being overwhelmed. This category includes statements that indicate that the lecturers are afraid that both the search mechanism for OERs and an excessive number of OERs found would be a barrier to the use of OERs. Fourth place, with three mentions each, is shared by two barrier categories which are adherence to traditional teaching formats and high effort. The former refers to the unwillingness to abandon traditional formats and existing task pools. High effort, as the name suggests, refers to the fear of high and disproportionate effort. The last category, which was formed from more than one mention, is fear of criticism with two mentions. Here, the lecturers surveyed stated that they would sometimes expect harsh criticism of their OERs from colleagues and regard this as a barrier. Among the barriers to participation in IUTENs, the category adherence to traditional teaching formats ranked first with a total of eight mentions. Here, similar to the corresponding category for OER barriers, the lecturers surveyed indicated that they wanted to stick to traditional teaching formats. The importance of face-to-face teaching in particular was emphasized here. The two second places are the barrier categories unclear

Barriers for Lecturers to Use Open Educational Resources

355

credit recognition and imbalance between production and use, which with only 3 counts each already drop significantly from the first place. Unclear credit recognition is meant that lecturers consider it as a barrier to participate in an IUTEN as long as it is not clearly regulated how credit points are granted to students and how they are recognized with regard to their own teaching loads. In the case of imbalance between production and use, the respondents expressed the fear that some colleagues make more use of an IUTEN (use external courses) than provide some themselves (produce). The third place is shared by three categories of barriers with two mentions each. First of all, there is individuality. Here, the lecturers stated that they had specific requirements for their teaching and wanted to bring in their own interests. In discipline specificity, lectures saw another barrier in the fact that their own courses were too specific for exchange transfer. The last category is quality issues. Here, the lecturers stated that their own quality standards are too high and that quality expectations differ greatly between exchange lecturers. 4.2 Discussion The findings already presented are discussed below, focusing on the three barriers with the most mentions. Here, the procedure is analogous to the findings, starting with the barriers to the use of OERs and then the most important barriers in relation to the IUTENs. We briefly explain the barrier categories we consider most noteworthy, with a mentioned number of at least three, and provide possible guidance for resolving these barriers. The first rank within the OER barriers here is assigned to the category no added value with seven mentions. As already explained above, this category means that lecturers do not recognize the added value of OER usage. The obvious solution here would be to inform lecturers about the added value of OERs in relation to themselves, the students, and also the institution concerned. For example, a targeted information campaign with information events, information flyers, and accompanying marketing would provide a solution here. With six mentions, the quality issues category takes second place within the OER barriers. Here, doubts about the quality and the non-uniformity of the quality of OER from the lecturers’ point of view are in focus. A possible solution would be to introduce an elaborate quality assurance instrument embedded in a quality management model to support the introduction of OERs at an institution. But also the information of the lecturers about such quality assurance tools represent a crucial step to the dismantling of this barrier. In third place, with 4 mentions each, are lack of knowledge and fear of being overwhelmed. In the case of the former, as already described above, an information campaign would help to break down the existing barrier. A leaflet with the most important information about OERs is also conceivable here, for example. The barrier fear of being overwhelmed could be minimized, for example, by a specific development of media competence or also by a workshop on targeted OER search. In the area of IUTEN barriers, the top barrier with eight mentions was adherence to traditional teaching formats. In this barrier, lecturers prefer traditional teaching and for this reason, do not see the need to participate in inter-university exchange networks. Breaking down this barrier does not seem to be quite as evident here as in the previous

356

P. Greiff et al.

areas. Also, more research is needed on the exact reasons for this barrier in order to be able to intervene specifically. Nevertheless, systematic education of the lecturers about advantages such as a reduction of workload or an expansion of the curriculum of an institution seems to be the first logical step to counteract this barrier. With three mentions each, the barriers unclear credit recognition and imbalance between production and use are in second place. For both barriers, it can be stated that clear institutional policy decisions are necessary and that there must also be specific, binding agreements between the universities involved. Only when the credit points are clearly defined and a distribution key for a general fair distribution of production and use of teaching modules is available, these barriers can be reduced.

5 Conclusion In this paper, we have collected the perceived barriers of lecturers to the use of OERs and the participation in IUTENs by means of qualitative interviews and subsequently categorized them. To answer the research question: What potential barriers do lecturers see by using OERs and participating in IUTENs? we reduced the mentioned barriers to core sentences and categorized them afterwards. These categories provide an initial insight and also give indications of areas where improvements need to be made or where both structural and personal support is needed. On the one hand, barrier research in the field of OERs and IUTENs can benefit from this, as well as universities and intuitions that plan to use OERs in the future or participate in exchange networks. Figure 1 shows an overview of the surveyed barriers.

OER Barriers 3

2

IUTEN Barriers

7

2

3

2 8

2 4

6

3

4 No added value Quality issues Lack of knowledge Fear of being overwhelmed Adherence to tradi onal teaching formats High eﬀort Fear of cri cism

3

Adherence to tradi onal teaching formats Unclear credit recogni on Imbalance between produc on and use Individuality Discipline specificity Quality issues

Fig. 1. Overview of the results

A total of seven different categories were formed for OER barriers and a total of six for IUTEN barriers. The most important three categories for the use of OERs from the perspective of lecturers are, no added value (7), quality issues (6) and with four mentions

Barriers for Lecturers to Use Open Educational Resources

357

each on the third place lack of knowledge and fear of being overwhelmed. Among the categories for participation in IUTENs, adherence to traditional teaching formats is by far the most important barrier with eight mentions. This is followed by unclear credit recognition and imbalance between production and use with three mentions each. Certain overlaps are noticeable in this regard, as the categories of quality problems and adherence to traditional teaching formats are found in both the OER barriers and the IUTEN barriers. However, the ranking of these barriers to the use of OERs is reversed in relation to participation in IUTENs. Thus, for example the barrier quality issues are considered more important for OERs than for IUTENs. As a first step, the survey aims to map possible barriers to the use of OERs and participation in IUTENs from the lecturers’ perspective. Recommendations for action can only be derived to a limited extent, as the focus here is initially on identifying the barriers. It should be clearly emphasized that the results only give a small insight from the perspective of the lecturers’. Further research, for example in the quantitative field, and also the involvement of further stakeholders is necessary to reveal even more barriers in this context and then to be able to counteract them decisively in the next step. In addition, more detailed insights into the causes of the barriers would be needed in order to be able to develop targeted solutions. This approach would reduce the scope for interpretation. Our study has attempted to identify and categorize the first barriers of what we consider to be the most important stakeholder group so that the expansion of OERs and IUTENs at universities can be approached in an accessible manner. We hope to have made a contribution to the rapid implementation of these concepts.

References 1. Ihme, J.M., Senkbeil, M.: Warum können Jugendliche ihre eigenen computerbezogenen Kompetenzen nicht realistisch einschätzen? Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 49(1), 24–37 (2017). https://doi.org/10.1026/0049-8637/a000164 2. Schallmo, D.R.A., Williams, C.A.: Digital Transformation Now! Guiding the Successful Digitalization of your Business Model. Springer, Cham (2018). https://doi.org/10.1007/9783-319-72844-5_3 3. Bond, M., Marín, V.I., Dolch, C., Badenlier, S., Zawacki-Richter, O.: Digital transformation in German higher education: student and teacher perceptions and usage of digital media. Int. J. Educ. Technol. High. Educ. 48(15), 1–20 (2018). https://doi.org/10.1186/s41239-0180130-1 4. Bennett, S., Agostinho, S., Lockyer, L.: Technology tools to support learning design: implications derived from an investigation of university teachers’ design practices. Comput. Educ. 81, 211–220 (2015). https://doi.org/10.1016/j.compedu.2014.10.016 5. Bergamin, P., Filk, C.: Open Educational Resources (OER) – ein didaktischer Kulturwechsel? Bergamin, Per (Hrsg.): Offene Bildungsinhalte (OER): Teilen von Wissen oder Gratisbildungskultur? 25–38 (2019) 6. Brändle, T.: 10 Jahre Bologna-Prozess - Chancen, Herausforderungen und Problematiken. Springer, Wiesbaden (2010). https://doi.org/10.1007/978-3-531-92203-4 7. OECD. Number of Students (Indicator) (2018). https://doi.org/10.1787/efa0dd43-en 8. Wong, B.T.M., Li, K.C.: Using open educational resources for teaching in higher education: a review of case studies. In: Proceedings of the International Symposium on Educational Technology (2019)

358

P. Greiff et al.

9. Hoppe, U., Packmohr, S.: Barter als Geschäftsmodell für den interuniversitären Tausch von Lehrangeboten. In: Breitner, M.H., Bruns, B., Lehner, F. (eds.) Neue Trends im E-Learning, pp. 226–244. PhysicaVerlag, Springer, Heidelberg (2007) 10. Hagenhoff, S., Schumann, M.: Einsatz multimedialer internetbasierter Informations- und Kommunikationssysteme in der universitären Ausbildung am Beispiel der virtuellen Lernwelt WINFO-Line. In: Beiersdörfer, K., Engels, G., Schäfer, W. (eds.) Informatik’99 – Informatik überwindet Grenzen, pp. 31–39. Springer, Berlin (1999) 11. Seufert, S., Guggemos, J., Moser, L.: Digitale transformation in higher education: towards open ecosystems. Zeitschrift für Hochschulentwicklung 14(2), 85–107 (2019) 12. Yuan, L., MacNeill, S., Kraan, W.: Open educational resources – opportunities and challenges for higher education. Educ. Cybern. Rep. 1, 1–23 (2008) 13. OECD: Giving Knowledge for Free. The Emergence of Open Educational Resources, pp. 30– 31. Centre for Educational Research and Innovation (2007) 14. Michelsen, G., Rieckmann, M.: Kompetenzorientiertes Lehren und Lernen an Hochschulen – Veränderte Anforderungen und Bedingungen für Lehrende und Studierende. In: Campus Transformation, pp. 45–65. Education, Qualification & Digitalization (2014) 15. Ehlers, U.-D.: Extending the territory: from open educational resources to open educational practices. J. Open Flex. Distance Learn. 15(2), 1–10 (2011) 16. Butcher, N., Kanwar, A., Uvali´c-Trumbi´c, S.: A Basic Guide to Open Educational Resources (OER). Commonwealth of Learning & UNESCO, Vancouver, Paris (2015) 17. Euler, D, Seufert, S.: Change Management in der Hochschullehre: Die nachhaltige Implementierung von e-Learning-Innovationen. ZFHD (2005) 18. Hiltz, S.R., Eunhee, K., Shea, P.: Faculty Motivators and demotivators for teaching online: results of focus group interviews at one university. In: 2007 40th Annual Hawaii International Conference on System Sciences, HICSS 2007, pp. 1–3. IEEE, Waikoloa, USA (2007) 19. Mukerjee, S.: Agility: a crucial capability for universities in times of disruptive change and innovation. Aust. Univ. Rev. 56(5), 56–60 (2014) 20. Bosse, E., Würmseer, G.: Hochschulverbünde: Ein aktueller Überblick zu Rahmenbedingungen, Organisation, Herausforderungen und Erfolgsfaktoren lehrbezogener Zusammenarbeit. HIS-HE: Medium, Hannover (2020) 21. Hubert, E.-M.: Tauschringe und Marktwirtschaft. Eine ökonomische Analyse lokaler Komplementärökonomien. Duncker & Humblot, Berlin (2004) 22. Mayring, P.: Qualitative Content Analysis: Theoretical Foundation, Basic Procedures and Software Solution. Social Science Open Access Repository (SSOAR), Klagenfurt (2014) 23. Raithel, J., Friedrichs, J., Klöckner, J.: Quantitative Forschung: Ein Praxiskurs, 2, durchges VS Verlag für Sozialwissenschaften, Wiesbaden, Wiesbaden (2012)

Lessons Learnt During the COVID-19 Pandemic: Preparing Tertiary Education in Japan for 4IR Adam L. Miller(B) Aichi Shukutoku University, Hoshigaoka Campus, Nagoya, Japan [email protected]

Abstract. This paper aims to explore the ways in which tertiary education was disrupted in Japan due to the COVID-19 pandemic, and what new skills were acquired by teachers and students during this process. The paper will go on to explore in detail how these skills may well prepare both stakeholders for the uncertain future that 4IR (Fourth Industrial Revolution) may have in store. The value of this study lies in it attempting to give an insight into how this global crisis has the potential to leave not only despair in its wake, but also the opportunity to improve learning environments for all; it will specifically look at the new teaching methods that were hurriedly put in place, and how familiarizing oneself with these new technologies and tactics could be of great benefit in the 4IR future that lays ahead. To get a rounded understanding of this field of study, key texts written before, during and in reflection of the pandemic will be explored. These texts are largely focused on either 4IR, teaching approaches used during the pandemic, or a combination of these two fields of study. Paired with this, a small scale questionnaire was distributed amongst 8 HE (Higher Education) teachers based in Japan; the questionnaire collected both quantitative data for statistical analysis, and qualitative data for thematic analysis. The aim of this study is to explore the thoughts and opinions of Japanese HE teachers and existing literature, in the hope of discovering successful teaching methods that could be of benefit to both teachers and students, as the influence of 4IR continues to increase. Furthermore, this study will also look at any apprehension or fear associated with this influx of modern technology, and how these fears may be alleviated or avoided altogether. Keywords: Fourth Industrial Revolution · Higher Education Japan · Tertiary education

1 Literature Review Although all of the literature that will be referred to throughout this study is connected through the theme of improving teaching practices, they can be largely be divided into three broad sections; texts written before the COVID-19 pandemic, texts written during the pandemic, and texts that focus on the impact the pandemic had and how that may be used to map out future plans. Each of these categories will be supported by one or two key texts, and a selection of supporting studies. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 359–378, 2022. https://doi.org/10.1007/978-3-031-10467-1_23

360

A. L. Miller

1.1 Pre-pandemic Texts In regards to the texts written before the pandemic, Teaching in the Fourth Industrial Revolution: Standing at the Precipice written by Armand Doucet, Jelmer Evers, Elisa Guerra, Dr. Nadia Lopez, Michael Soskil and Koen Timmers, offers an interesting insight into how 4IR is predicted to alter teaching practices in the future [1]. The book consists of 8 chapters, which range in focus from the ethical responsibilities of future curriculums, and how technology has the potential to bridge equity gaps in education. Another text that offers a detailed explanation of 4IR and its potential is Klaus Schwab’s 2017 text, The Fourth Industrial Revolution [2]. Throughout the book, Schwab attempts to note the influence 4IR has had on the world, and how that influence is almost certain to increase. He also attempts to highlight the responsibility that comes with such a dramatic societal shift: “Shaping the fourth industrial revolution to ensure that it is empowering and humancentered, rather than divisive and dehumanizing, is not a task for any single stakeholder or sector or for any one region, industry or culture. The fundamental and global nature of this revolution means it will affect and be influenced by all countries, economies, sectors and people” [2]. Here Schwab is clearly stating that 4IR is a global phenomenon, the responsibility of which does not rest on the shoulders of any one sector, culture or nation. Schwab calls for a global effort to confront and prepare for this change, a task that is huge in scale, but one which could have significant benefits to countless people, but only if we unite to not only acknowledge this change is imminent, but that we all have a direct influence on the direction it will take. Klaus continues: “It is, therefore, critical that we invest attention and energy in multistakeholder cooperation across academic, social, political, national and industry boundaries. These interactions and collaborations are needed to create positive, common and hope-filled narratives” [2]. The text by Doucet et al. takes the subject of 4IR as seriously as Klaus, and it is as cautiously optimistic about its potential benefits if it is handled correctly, while also being fully aware of the dangers if 4IR is not handled with due consideration. However, the book by Doucet et al. looks at the specific implications to the world of education, and how teaching approaches, curriculum design, and desired course outcomes will be influenced and altered by 4IR. Although all of the chapters from the book offer interesting perspectives on this industrial revolution, “Evolution of Technology in the Classroom” by Koen Timmers will be drawn on most heavily during this study. “The Fourth Industrial Revolution will only exacerbate technology’s influence [in classrooms]. Students need to know how to use technology to not only learn, but to also use technology to apply their knowledge to real situations. Teachers need to find effective ways to incorporate it in ways that boost understanding and provide avenues for making learning relevant” [3]. Timmers’ chapter in particular is relevant to this study as it looks not only at how technology can be used in the classroom, but the responsibility teachers have to ensure their students are familiar with technology that may become ubiquitous in the employment market they find themselves in after graduation.

Lessons Learnt During the COVID-19 Pandemic

361

1.2 Texts Written During and in Response to the Pandemic The second category of texts that have contributed to this study are research papers that were written and published during the pandemic. The papers cited in this study are focused on the changes that were brought about, the complications that arose and how technology has stepped in to either replace or enhance traditional approaches to tertiary education. “Online teaching-learning in higher education during lockdown period of COVID-19 pandemic” [4] published in 2020 looks at how university level education in India adapted to the pandemic; much like this study, the primary research has both quantitative and qualitative elements, however the study conducted by Mishra et al. is far larger in scope, and also includes data collected from students, an aspect that will not be replicated in this study. The paper titled “Education Innovations through Mobile Learning Technologies for the Industry 4.0 Readiness of Tertiary Students in Malaysia” [5] also looks at the impact COVID-19 had on education, but is more concerned with how technology was used in an attempt to bridge the gap between the teaching methods before and during the pandemic. Both of the above papers are closely connected to this study, which also looks at how teachers adapted to the state of emergency that was announced in Japan, regarding the flexibility and adaptability that was necessary in both the technology used and the teaching approach that needed to be taken. Other papers written during this time were also helpful in constructing a rounded understanding of this field of study, such as “A COVID-19 Re-envisioned Teaching Practicum Curriculum” [6], looks at how curriculums were hurriedly adjusted to ensure education could continue. The wellspring of papers concerned with this topic were oftentimes very specific, such as “Curriculum Development Strategy Management for Student Mental Health in COVID-19 Pandemic” [7] which is concerned with Islamic boarding schools in Indonesia. While only tangentially connected to the topic of this paper, it did offer some interesting insights, such as areas of teaching that were lackluster, and how these problems were noted due to the upheaval caused by the pandemic and the following observations. Another text that looked at the impact the pandemic had on the world of education, and how teachers may be able to weather the difficult seas brought about by COVID-19 was the OECD’s A framework to guide an education response to the COVID-19 Pandemic of 2020 [8] which may not have been concerned solely on tertiary education, but attempts to map out strategies to help make these curriculum changes as smooth as possible for all involved stakeholders. Looking at a study with such a large, international scope will help support any claims made in this smaller study. COVID-19: The Great Reset [9] looks at how the pandemic has brought to the fore some of the major discrepancies and imbalances in modern society. It also argues that the global community now has an opportunity to either right the wrongs that have been highlighted by the pandemic, or we can attempt to return to a poor imitation of the life we lived before COVID-19. While some sections of the book focus on education (and will therefore be useful to this study), it is by and large an overview of the impact made by the pandemic. While all of the sectors mentioned in the book will not be studied in this paper, the book is fruitful in that it puts forward the idea that the pandemic, while terrible, offers us all a chance for reflection and improvement moving forward.

362

A. L. Miller

While less formal in its approach than Schwab and Malleret’s text, Teaching in the Post-COVID Classroom [10] offers a firsthand account of how a teacher adapted their teaching style, and is hopeful that these changes can be applied to future lessons, even when face-to-face classes once again return to being the norm. Furthermore, Stevens speaks of the benefits of being forced to scramble up a very steep learning curve, as the skills she acquired during this tense period, can enhance and improve her future lessons. While all of the above texts vary in regards to their scope, methodology, purpose and connection to this study, they are all concerned, in one way or another, with how modern technology has the potential to create a better learning environment for both teachers and students, a point that will be explored in much more detail in the following sections of this paper.

2 Introduction The COVID-19 pandemic had an incalculable impact on the global society, and we now find ourselves at a crossroads in regards to how we move on from such a devastating disaster; we have the opportunity to either return to a life that is possibly similar to the one familiar to us all from before the pandemic, or reflect on the shortcomings that the pandemic highlighted and attempt to build a more supportive society for us all [9]. This paper attempts to map out the ways in which curriculums were suddenly and drastically altered in Japanese HEIs, how this change was brought about, and the skills both teachers and students were forced to acquire. It will then go on to explore the hypothesis that many of the lessons learnt during this traumatic time could be of benefit to the world of academia; while the devastation wrought by the pandemic should not be understated, there is a potential to take this opportunity to prepare all educational stakeholders for the looming and unavoidable impact of 4IR. While 4IR will touch on almost everyone’s lives, this study will focus primarily on university teachers and students in Japan. When the effects of COVID-19 became apparent in Japan, many prefectures around the country announced a state of emergency, which limited travel, encouraged working from home, and also meant that many universities decided to teach online lessons and limited on-campus activity as much as possible [11]. This meant that many teachers were forced to navigate a new approach to teaching, mastering online learning platforms, becoming more comfortable with online classes, and altering materials and activities so they were more conducive with a digital learning environment. These changes were undertaken by educators worldwide, and while the learning curve may have been extremely steep, there is potential to continue to reap the benefits from these difficult lessons many were forced to learn, even after their use is no longer mandatory. “Many of the new skills I learned will undoubtedly be applicable and relevant to in-class instruction […] I’m thankful that I was pushed a little out of my comfort zone to learn how to best leverage educational technology” [10]. While Stevens is based in North America and not involved with tertiary education, her sentiments on being forced out of her comfort zone in order to learn new skills is perhaps a familiar feeling with teachers worldwide (no matter the level they teach). Furthermore, the optimism with which she states her gratefulness for learning these new

Lessons Learnt During the COVID-19 Pandemic

363

skills and their potential benefit in the future is a theme that will be studied in further depth throughout this study. While we may have the perception that these technologies have the opportunity to embrace the idea of a global community, allowing people to more easily access education worldwide, and not be limited by their geographic location, that ideal is not always realized: “Liberalization, Privatization and Globalization of education have been deteriorated remarkably due to limited mobility and limitedly confined exchange programmes of academic activities among the countries during the COVID-19 lockdown” [4]. Here, Mishra et al. explore the idea that by limiting travel, the pandemic has closed off many opportunities for people to interact directly with the international community, which obviously has a large impact on exchange or study abroad programmes. While the pandemic has had countless detrimental impacts on the world of education, including the limitation of international exchanges, it could also be argued that this same technology has the potential to actually increase international discourse, as people may no longer depend on international travel to attend international events, as they can be accessed from anywhere by moving the events online (or at the very least, allowing for online participants to join the events). Furthermore, the digital technology that allows for online (or hybrid) conferences to take place also presents a wide range of unexpected but hugely useful benefits, such as content being more easily and quickly digitized, so that it can be stored, shared and viewed at a later date much more freely [12]. Similarly, as we shall explore later, unexpected benefits were discovered in Japanese HEIs after the shift to online teaching was made. In order to track these changes that were made, and the potential benefits that could be secured to help make the transition into a world evermore influenced by 4IR, this paper takes a dual approach in its collating of data and information. Firstly, key texts that are focused on 4IR, COVID-19’s impact on society (most notably for this study, its impact on education) or a combination of both topics, will be cited. Added to this, a questionnaire was also distributed amongst HE teachers based in Japan, which had both qualitative and quantitative elements. By compounding the primary research with existing literature in this field, this paper hopes to carve a route through the difficult times many educators and students have faced in recent years, and point to how these obstacles could well have better prepared us for the encroaching impact of 4IR.

3 Methodology As previously discussed, this study is dependent on a wide range of existing literature, that helped give a solid understanding of contemporary teaching environments, and how they were (and still are) affected by both 4IR and the COVID-19 pandemic. Primary research was also used to help connect this literature with the current situation in EFL/ESL classrooms in Japanese universities. For this study, the data was collected via a small-scale questionnaire, that was completed by 8 teachers working within the Japanese HE industry. Although the scale of the study was rather small, the questionnaire did include several open questions that also allowed for the collection of qualitative data. The questionnaire was conducted via

364

A. L. Miller

Google Forms, and the figures that follow were generated by Google Forms, based on the results from the questionnaires. Throughout this study, the anonymity of the participants will be maintained, so that their names and the details of the HEI with which they are affiliated will not be mentioned. Although this anonymity was not requested by the participants, it was believed to allow them the freedom to share their true thoughts and opinions without the fear of repercussions. However, a minimal amount of demographic information was collected, including the age and gender of those who completed the questionnaire, as well as the type of institute at which they worked, and the position they held there. This had the potential to unearth some trends in the decisions made and opinions held by the teachers, without causing any danger of their anonymity being compromised. The questionnaire itself was made up of 15 questions, 12 of which were closed, multiple choice questions. This allowed for the collection of numerical data that could be analyzed and organized with relative ease. The remaining 3 questions were open questions, and allowed for the participants to give more detailed and well-pondered answers. As previously mentioned, the sample size for this study may have been rather small, but each of the participants offered in-depth answers, which provided a rich source of qualitative information. A selection of these answers will be cited throughout this study, and the questionnaire responses can be found, in full, in Appendix 1. Of the multiple-choice questions that were asked, some were used to identify the “norm” in regards to teaching practices before the COVID-19 pandemic, as the disruption caused by the pandemic is the most domineering variable in relation to this study. The remaining questions concerned themselves with topics such as how teaching practices changed, the length of time the teachers had to adjust, and what their overall opinion was on how their institute managed this change.

4 Results and Findings 4.1 Demographics of Interviewees

Fig. 1. Chart showing the age ranges of those interviewed.

Lessons Learnt During the COVID-19 Pandemic

365

Fig. 2. Chart showing the genders of those interviewed.

As Figs. 1 and 2 clearly show, the participants ranged in age, but the most common demographics were 30–39 and 40–49. While there was some variety in the age ranges, unfortunately only 12.5% of the participants (or just 1 of the 8 interviewed) identified as female. This study may have been more rounded if this figure was more evenly distributed, and therefore any variables regarding gender will not be explored in this paper, as deciphering trends would be difficult and more than likely inaccurate.

Fig. 3. Chart showing the types of HEIs to which the teachers are affiliated.

Figure 3 shows that ¼ of the interviewees work at a public university, and another interviewee works at public, private and vocational institutes. The remaining 62.5% of the participants are primarily affiliated with a private university. This may seem like an unbalanced spread of answers, but it is not too disproportionate from the types of universities in Aichi Prefecture (where the participants are based). With 47 listed universities in Aichi [13], 7 are public universities; meaning around 15% of universities in Aichi are public institutions.

366

A. L. Miller

Fig. 4. Chart showing the positions held by the teachers interviewed.

Figure 4 indicates that the majority of those interviewed work in full-time positions, but 3 of the 8 participants are working part-time, so it was very helpful to get feedback from both types of educators. No managerial or clerical staff responded to the questionnaire, but as this study is concerned with teaching methods and how teachers were forced to adapt, the omission of these two types of workers should not be too detrimental to this study. 4.2 Teaching Methods of Interviewees All of the participants stated that teaching face-to-face was the standard way to conduct lessons before the pandemic. It can then be inferred, for the purposes of this study, that in class lessons can be seen as the pre-pandemic norm.

Fig. 5. Chart showing the teaching method(s) following the “State of Emergency” announced in Japan following the COVID-19 pandemic.

Lessons Learnt During the COVID-19 Pandemic

367

Figure 5 states that half the participants taught asynchronous lessons (hereon referred to as on-demand) and the other half taught synchronous lessons (hereon referred to as real-time). This is beneficial to the study, as it not only allows for opinions on both of the main styles of online teaching, but the 50/50 split allows for these responses to be balanced and not heavily weighted towards one particular style of teaching. 4.3 Preparations Made for Curriculum Change

Fig. 6. Chart showing the notice given to teachers before the curriculum change occurred.

Figure 6 clearly illustrates that teachers were given time to adjust to these changes, with 50% stating they had more than 2 weeks’ notice. The shortest time given to a teacher to prepare was 4–6 days. This study will go on to look at if the length of time given for

Fig. 7. Chart showing the involvement the teachers had in the decision-making process.

368

A. L. Miller

preparation made an impression on the teachers in regards to their overall thoughts of the new teaching style. In Fig. 7 1 is equal to not having any participation in the decision-making process, and 5 meant they were heavily involved. It seems only 1 of the 8 participants played an active role in the decision making process, and 62.5% of the participants had no influence on the decision making process at all. This is another factor that will be examined, specifically if it affected the teachers’ pinion about the new curriculum.

Fig. 8. Chart showing the teachers’ thoughts on the new curriculum.

Once again, the sliding scale was used in Fig. 8 to show how happy the teachers were with the new curriculum, 1 indicates they were very unhappy with it and 5 represents them being extremely happy with it. Although, by and large, the teachers were not involved with the decision-making progress, they were all either indifferent or happy with the final decision made. None of the teachers responded by saying they were actively unhappy with the final decision. 4.4 Qualitative Data from Open Questions For the first two open questions, all 8 of the participants responded. For the last question 6 of the 8 participants wrote a response. These responses varied greatly in length, but gave a very clear insight into the opinions held by HE teachers in Japan. Although the responses in full are available in the appendix, only select quotes will be used in the main body of this paper. 4.4.1 If Any, Please Explain the Potential Benefits of Online Teaching Looking at the responses from the questionnaire was very insightful, in that it gave some very clear indications into how these changes are thought of by the teachers. Although this paper will go on to examine these quotes in greater detail, it could be generally argued

Lessons Learnt During the COVID-19 Pandemic

369

that the responses showed that the teachers are largely in favour of embracing these new technology-enhanced approaches; however, there are certain points of apprehension, including the lack of human interaction, a potential drop in student motivation, teachers being exposed to overwork and even the fear of this technology leading to the overall detriment of quality in education. These questions were designed to encompass both real-time and on-demand online lessons. Many of the responses were positive and looked at how this technology has the potential to make the work-load for teachers a little lighter, especially in regards to organization, record keeping, and grading. In the interests of transparency, the quotes are left unedited, and any spelling or grammatical errors will not be corrected. Any edits will be clearly indicated: “It probably helps in terms of the overall organisation of the course. It’s easier to keep records of attendance and grades - you can set up activities that are automatically graded which make things much easier for the teacher.” These same benefits could be extended to the students as well as the teachers, and one respondent even claimed that the flexibility offered by this technology could enhance the motivational levels of the students: “I think it’s also a lot easier to get students to do classwork/projects outside of the normal class time as well, especially with on-demand lessons. Students can work on their assignments at any time and so they seem a bit more motivated to actually put time in.” This response specifically points to the benefits of on-demand classes. Later in this report, when we explore the potential drawbacks of these approaches, concerns about student motivation will be raised, as well as how increased accessibility of materials may infringe on a teacher’s personal time. However, the above quote shows that the flexibility and accessibility offered by on-demand classes can have a very positive impact on a student’s learning progress. Other teachers also explained that the learning curve for acquiring these skills was steep, but these new approaches have made lasting impressions on them, which may well enhance their teaching in the future. “Although, for me at least, it was a huge step out of my comfort zone and I went through periods of stress and anxiety, I picked up some useful new skills (e.g. I learned how to turn a PowerPoint into a video).” Here the teacher has familiarized themselves with a new approach and technology that has the potential to be a tool they draw upon in the future. The idea that the pandemic forced teachers to create new activities, or navigate new avenues of teaching, is possibly the best evidence for the pandemic inadvertently preparing teachers (and potentially students) for 4IR. “With the pandemic, the “digital transformation” that so many analysts have been referring to for years, without being exactly sure what it meant, has found its catalyst. One major effect of confinement will be the expansion and progression of the digital world in a decisive and often permanent manner.” [9]. Teachers’ growth was not only represented by their growing comfort with modern technology, it was also evident in how they have permanently altered their teaching

370

A. L. Miller

approach. In one quote, a teacher described how this technology has increased accessibility for students, and may have avoided the problem of students falling behind merely because they have missed class. “If a student or teacher needs to miss a class, having a recorded video of the necessary tools and content that can be shared makes a lot of sense now. I would have never thought of that as a solution before, and I hope that schools will be on board for this kind of makeup lesson approach for both students and teachers to use when necessary.” This appears to be a clear example of the pandemic highlighting a problem that many may not have been aware of i.e. that absenteeism is oftentimes unavoidable but still very detrimental to the learning process. Not only did the pandemic make this problem evident, it also offered up a solution to it that could continue to be useful even after all teaching restrictions are lifted and the mandatory dependency on technology has passed. By becoming familiar with streaming live lessons for real-time classes, or creating multimedia materials for on-demand classes, teachers now have multiple ways they can support students who may not be able to physically come to the classroom. Live Streams (which could also be recorded and watched later) and multimedia materials both allow for easily accessible distance learning, improving flexibility without compromising on the quality of the education, indeed, studies have shown that VBL classes may actually be of assistance to many EFL or ESL learners: “Learning videos are considered capable of helping both educators and students because they can be listened to repeatedly and contain audio and visual content so that it is expected to be able to help the learning process from home be as concrete at school.” [14]. VBL (video based learning) therefore has an advantage over traditional classroom based lessons, in that students have the opportunity to repeatedly watch the lesson, or particular sections they may find interesting or challenging. This again adds to the flexibility offered within the learning environment, giving students more autonomy in their learning progress. Another aspect of flexibility that was offered through the use of this technology was the way in which teachers became easier to contact, and questions could be asked of them outside of the time allotted by class schedules. As we will soon see, this aspect of the new teaching style was for some teachers a drawback, as it saw their workload increase, and their private time was encroached upon by students’ questions. While that may well have been a problem for the teachers, it had the potential to be very beneficial to the students, as they were no longer limited to the school timetable to access or complete their work, or contact their teachers for assistance. “I am far more available as an online teacher to answer questions, and it ends up being a far more efficient and quick way of communicating with large groups of students at once (using Moodle.)” This tension between what the teachers view as a problem and what the students see as an advantage is an indication that the conversation around these new technologies and approaches, and how they can best improve learning environments for all stakeholders involved, is one that needs much consideration. In the following section we will look at some of the worries the teachers have, which could be used to navigate potential problem areas, and may well help stop these potential problems from forming.

Lessons Learnt During the COVID-19 Pandemic

371

4.4.2 If Any, Please Explain the Potential Drawbacks of Online Teaching In direct contrast to the above comment of these digital lessons increasing teachers’ accessibility, students having the means to contact teachers at any time of day, could lead to teachers feeling hounded by their students. “Additional workload, higher potential for cheating, workday never ends as students contact 24-7.” Not only does this comment refute the idea that more students having more accessibility to teachers is actually a positive outcome of these new approaches, it also alludes to the fact that as this technology allows for distance learning and true mobility of lessons, teachers are no longer “off the clock” once the school day finishes. Students being able to “contact 24–7”, means that a teacher’s work-life-balance may be knocked off kilter, and what is seen as a merit from a student’s standpoint, may not align with the opinions of their teachers. With lessons becoming more fluid due to increased flexibility, the collateral damage of this progression is that boundaries are no longer limited by time and space; a student is not limited to class or office hours, and direct messaging on learning platforms allows them to contact teachers at any time. Striking a balance between offering students a better learning environment, without encroaching on a teacher’s privacy or free time, is an important topic that could benefit from further study. The increased workload was not the only concern raised by the teachers, indeed it was not even the most common. The concern that was explored by most of teachers was in regards to the lack of human interaction, and how that can be detrimental to the learning environment, as well as the mental health of both the teachers and the students. “There is a lack of real connection between the teacher and students on a personal level, and I’m assuming between the students themselves.” The observation that online learning does not have the same level of human interaction is not unique to this study, a wide range of research has been done on this subject and there are suggestions that this social isolation could be detrimental to the students, not only from an academic standpoint, but also in regards to their mental health. In a large scale study that spoke to 200 students, collected data seemed to echo the concerns of the teachers interviewed here. “Some commented that the lack of face-to-face interaction with her classmates had a detrimental effect on her learning […] and socialization skills […], while others reported that restrictions in mobility limited their learning experience […] The above findings suggest the pandemic had additive adverse effects on students’ online learning experience” [15]. While this study was based in the Philippines and may not be wholly reflective of the views of Japanese students, it was founded on a large bank of data, with 200 participants, all of which were tertiary students. Although the concerns of the teachers involved in this study seem well founded, without raw data drawn specifically for this study, it is still conjecture. That being said, it is more than likely that these worries are a cause for concern, as other teachers commented in a similar way. “I find that human connections are harder to build and maintain. It takes a lot of the fun and spontaneity out of teaching.”

372

A. L. Miller

Here, the teacher points to a difficulty that has arisen due to this new teaching style, namely it seemingly needs to be more organised to fit the digital learning environment. Other teachers also spoke of barriers created by this technology, namely that students can turn on their cameras and show their faces, but oftentimes they choose not to. It is “Difficult for the teacher to build up rapport and get to know the students. In the case of synchronous teaching, dealing with students’ reluctance to turn their cameras on. In cases where students don’t turn their cameras on, doubts about whether they are engaged with the class or just having the class in the background.” While interaction via a screen is sub-optimal, it is still favourable to students turning off their cameras, as the teacher will have more difficulty assessing body language or facial cues. Furthermore, it can add to confusion as it can be difficult to “identify who is speaking or even present” [16] during a class. But even when the technology is implemented well and used to its fullest by all participants, there is the unavoidable fact that most of these lessons will be conducted whilst both teachers and students are sat, staring at some sort of screen (be it their computer, tablet or smartphone), which can obviously cause a wide range of detrimental effects, as one teacher noted. “Increased sedentary time spent sitting at a desk in front of a computer led to clear physical and mental health issues.” The mental health of students was a major concern for many of the participants, and is a topic that has featured in many academic papers, from very specific demographics, such as Islamic boarding schools in Indonesia [7] to international guidelines [8]. However, in Japan, the idea of being part of a community is central to a person’s identity, and the social aspect of a class being taken away may well have had a lasting effect on the students. “Many Asian cultures have a belief in the interdependence of self with others. A major life task of the Asian cultures involves forming and maintaining a social relationship which the self sees as its meaningful part. For Japanese people with the view of self as interdependent, interpersonal relationships have a specific significance.” [17]. While it is now possible to explore the methods that were adopted and how they were immediately received, for many HEIs in Japan, they are still in the midst of the pandemic in some form or another, if not actively, then under fear that a new strain may cause imminent disruption [18]. To see the lasting effects the pandemic and these altered curriculums had on both teachers and students will be of great benefit to any who are required to adapt teaching approaches due to devastating global events in the future; unfortunately, for now the only available information is from fresh reactions, but continued study in this field could prove to be greatly beneficial. 4.4.3 Are There Any Other Thoughts/Opinions You Have Regarding the Continued Use of Online Learning Platforms? 6 of the 8 respondents answered this question, 4 of those responses can be categorized as largely positive towards these teaching techniques, one was wary but largely in praise of the potential benefits, and one saw these new approaches as having the potential to diminish the quality of education.

Lessons Learnt During the COVID-19 Pandemic

373

Beginning with the positive responses, one teacher in particular was very optimistic in regards to how these approaches could improve tertiary education. “I anticipate (and hope for) some kind of hybrid to come out of all this; something I think will benefit both teachers and students. Conferences are now far more accessible to those who previously couldn’t participate due to budget constraints or family duties. It’s win-win.” This respondent does not see these technologies as replacing or displacing current teaching practices; instead they will be augmented with them. As previously mentioned, it could be argued this technology has made international communication and collaboration easier, and as this respondent points out, it also increases accessibility to those that were previously limited by funding constraints. Other teachers were just as welcoming of this technology and can see themselves to continue using it. “I will continue to use the online learning platforms even once we are face to face again. It makes planning lessons and keeping track of multiple classes and projects much easier. It is also easier to give students feedback and edit their work.” This response suggests that these tools can help with the administration side of teaching, improving organisation and communication. Again, this highlights the potential to not only improve the learning environment for the students, but the working environment for the teachers. Other responses, while still positive, were a little more wary in their praise. “It can be an extremely useful ingredient to have in the educational cookbook. However, it should be used sparingly, and carefully.” So far, all of the respondents have agreed that while this technology may not have the capacity to replace the benefits of face-to-face classes, they have the potential to improve them. However, the idea of using these approaches “sparingly” and “carefully” suggests the respondent has some underlying apprehension with this technology, a fear that was stated very clearly by one of the teachers who stated. “I am concerned about the industrialisation of education […] in the same way that mass production of goods in the last century drove modern capitalism; I’m worried that new-liberal education will be automated, commodified and outsourced.” As has been seen in studies by Schwab and Doucet et al., 4IR must be approached with caution and respect. While these new technologies and approaches do have the potential to improve learning environments, they should perhaps not be relied upon too heavily. The increasing power of AI and automation have already displaced countless jobs internationally, and the above response does raise concerns of teaching facing a similar future; although the automation of teaching is far from assured. “Almost half of all jobs are in danger of disappearing due to automation, but teaching is among the professions that is least threatened. The amount of creativity and social intelligence required to teach well is simply too “human” to be done by a machine.” [19]. While it may be too optimistic to think that teaching will never be replaced by AI, Soskil argues that the responsibility of a teacher is so vast and unpredictable that it is extremely difficult (although not impossible) to replicate. Soskil continues.

374

A. L. Miller

“The most important things teachers do cannot be quantified or digitized easily. Teachers inspire their students to be intrinsically motivated learners, to overcome obstacles in their lives, and to dream big. […] Teachers recognize the struggles youngsters have outside school and help them develop the capacity to rise above adversity.” [19]. Although Soskil may not be speaking of tertiary education students, his point regarding helping students with their struggles outside of school could be applied to the responsibility of HE teachers to prepare their students for life after their graduation. In the coming years, preparing students for a “quantified and digitized “future is a vital lesson they will need to learn, as although teaching may be safe from automation for now, many other professions will not be. Giving students a similarly important and difficult to replicate skill set as those (Soskil claims) teachers have, will help secure their future prosperity. Although Soskil puts forward a convincing argument as to why teachers are less likely to find their jobs being replaced by digital counterparts, there are possibly 2 main flaws in relying on this opinion alone. Firstly, Soskil wrote this in 2018, before the pandemic had started to alter the world in an irreversible way. Secondly, it only alleviates the minds of teachers, and students may not feel so reassured (unless they plan to become teachers themselves). Even before COVID-19, automation, AI and other technologies were threatening vast swathes of job markets, and that threat has only increased in recent years. “In the pre-pandemic era, new artificial intelligence (AI)-based technologies were being gradually introduced to automate some of the tasks performed by human employees. The COVID-19 crisis, and its accompanying measures of social distancing, has suddenly accelerated this process of innovation and technological change […] These innovations provoked by necessity (i.e. sanitary measures) will soon result in hundreds of thousands, and potentially millions, of job losses” [9]. If HE level teachers have a responsibility to prepare their students for life after graduation, it is essential that students equip themselves with the skills to become valuable assets for professional environments. This could include being comfortable with this technology, and having the skills to use it to their advantage, or being creative and flexible enough to be able to offer a skillset that cannot be easily replicated and replaced by automation or AI. While the quote by Schwab and Malleret and the cautious response from the teacher are both correct in their concerns, just because large portions of an employment market may be replaced, that does not necessarily mean there will be less opportunity to find work. As Schwab and Malleret go on to say, there is an opportunity for “a global explosion of hundreds of thousands of new micro industries that will hopefully employ hundreds of millions of people” [2]. Of course, the future cannot be accurately predicted, but nor can it be ignored. These technologies will have an influence on our society, but how we navigate that change is dependent on how prepared we are for it. Looking at the results of the questionnaire represented by Fig. 9, it is fair to say that the teachers were in general pleased with the new curriculum that was introduced following the state of emergency. One quarter of the respondents said they were extremely happy with the new teaching approach, 3/8 said they were happy with it and the remaining 3/8 seemed neither happy nor unhappy with the response.

Lessons Learnt During the COVID-19 Pandemic

375

Teachers' Feedback 6 4 2 0 Teacher 1 Teacher 2 Teacher 3 Teacher 4 Teacher 5 Teacher 6 Teacher 7 Teacher 8 How happy were you with the final decision regarding the teaching approach during the State of Emergency How involved were you in the decision making process regarding the teaching approach during the "State of Emergency" How much no ce were you given before this change took place?

Fig. 9. Chart showing the potential influences on teachers’ overall opinion of the new curriculum.

Looking at Fig. 9, it shows that there was not much of a correlation between the amount of involvement the teachers had with the decision-making process and their overall thoughts on the finalised decision, as even those that had no involvement in that process still thought positively of the new curriculum. However, the two respondents that did have more involvement with the decision-making process did both respond they were happy with the finished curriculum; but as their response was no stronger than other respondents (two teachers who had no involvement said they were extremely happy with the curriculum), it should be safe to say that involvement in the decision-making process was not a strong bias impression on the teachers’ responses. What is a little clearer is that the more notice the teachers were given, the more favourable their thoughts were on the new curriculum. The three teachers who responded they were neither happy nor unhappy with the finished curriculum all had one week’s notice or less. Conversely, all of the teachers who stated they had two weeks’ notice or more, stated they were either happy or extremely happy with the finished curriculum. From this, it could be argued that of the two factors, teachers’ involvement in the decision-making process does not have a strong impression on their thoughts of the curriculum when compared to the amount of time they are given to adjust to it. In the best of times, curriculum change is a complicated and lengthy process. “Achieving change in a university is difficult, owing to organisational complexity, strongly held and diverse values and the power of vested interests.” [20]. One factor that could have helped make this transition a little less complex was that it needed to be made very quickly in response to an unprecedented and tragic event; this could have meant that teachers’ “diverse values” and “vested interests” were all focused on a singular goal, i.e. making the curriculum as good as it could be. Although the scale of this study was small, it was clear from the responses that all of the teachers, even

376

A. L. Miller

those who were not overly thrilled by the new curriculum, learnt new skills, approaches and solutions to problems they may not have been aware of before the pandemic. While there are many studies looking into the growing influence of 4IR before the pandemic, it is not beyond reason to argue that the crisis had a profound effect on education, and allowed for stakeholders to not only learn new skills, but to test them in a working environment. Would these technologies and approaches have still become ubiquitous had there not been a COVID-19 outbreak? As that is unknowable, it could be seen as reductive to ponder that question, but it could be strongly argued that online learning would not be as rounded and effective as it is today, had it not been for the pandemic forcing both educators and students to familiarise themselves with the digital landscape. “Developing an online teaching practicum experience was not on the radar of most faculties of education before March 2020, as the school-based component has always been viewed as a necessary and untouchable aspect.“ [6]. Educators and HEIs worldwide now have the opportunity to embrace the changes they were forced to make, and create a new curriculum that takes full advantage of modern technology and teaching approaches.

5 Conclusion Although the scope of this study was limited to only 8 teachers, the findings that the skills learnt and technology used during the pandemic could be of benefit in the future, is supported by much larger studies, which collected similar results, with the OECD finding the following: “A significant percentage of the respondents of the survey see that unexpected positive educational results of the changes caused by the crisis include the introduction of technologies and other innovative solutions and an increase in the autonomy of students to manage their own learning.“ [8]. If further data could also be collected from students as well as teachers within Japanese HEIs, further study could be done to see how much impact these new approaches and technology have had on tertiary education. As 4IR will impact all stakeholders involved in this section of society, there is obviously great benefit to having responses from demographics other than teachers. However, this small study has shown the complexities brought about by the sudden shift brought about by the COVID-19 crisis, and how teachers are apprehensive of these technologies for a number of reasons, including them blurring the lines of teachers’ private and professional lives, and having an adverse effect on the motivation and mental well-being of their students. These aspects of digital learning environments (or classrooms enhanced with modern technology) are worth serious consideration, and resolutions must be explored to make sure all stakeholders are comfortable embracing these teaching approaches going forward. Awareness of these technologies is vital to the future success of tertiary education in Japan, as much of the supporting literature featured in this study report that the progress of 4IR is all but inevitable and it is not a matter of if classrooms will change, but when. With that in mind, it is reassuring to see that by and large, the participants in this study

Lessons Learnt During the COVID-19 Pandemic

377

were in favour of continuing to use much of what they have learnt during the pandemic, be that creating hybrid lessons, using the learning platforms to assist with lesson planning and organization, or using this technology to tackle problems such as absenteeism. While the list of concerns are very serious, they are arguably outweighed by the potential benefits that could be wrought if these methods of teaching are either adopted going forward, or at least influential on how future lessons are planned and implemented. While 4IR may well be unavoidable, with enough due consideration and preparation, it could well be indispensable.

Appendix 1: Raw Data Collected From Questionnaire The raw data can be downloaded and observed by following the link below: http://www. sb-publishing.com/wp-content/uploads/2021/12/HPEF7063-RAW-DATA.pdf.

References 1. Doucet, A., Evers, J., Guerra, E., Lopez, N., Soskil, M., Timmers, K.: Teaching in the fourth industrial revolution: standing on the precipice. In: Schwab, K. (ed.) 2017 Routledge Taylor & Francis Group, UK. The Fourth Industrial Revolution, p. 9. Penguin Books Limited, UK (2018) 2. Nankervis, A., Connell, J., Montague, A., Burgess, J. (eds.): The fourth industrial revolution. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-1614-3 3. Timmers, K.: Evolution of technology in the classroom. In: Doucet, A., Evers, J., Guerra, E., Lopez, N., Soskil, M., Timmers, K. (eds.) Teaching in the Fourth Industrial Revolution: Standing on the Precipice, p. 106. Routledge Taylor & Francis Group, UK (2018) 4. Mishra, L., Gupta, T., Shree, A.: Online teaching-learning in higher education during lockdown period of COVID-19 pandemic. Int. J. Educ. Res. Open 1(2020), 100012 (2020) 5. Karim, R.A., Adnan, A., Salim, M.S.A.M., Kamarudin, S., Zaidi, A.: Education innovations through mobile learning technologies for the industry 4.0 readiness of tertiary students in Malaysia. In: IOP Conference Series Materials Science and Engineering, vol. 917 (2020). https://doi.org/10.1088/1757-899X/917/1/012022 6. Nel, C., Botha, C., Marais, E.: A COVID-19 re-envisioned teaching practicum curriculum. Res. Soc. Sci. Technol. 6(2), 249–266 (2021). https://doi.org/10.46303/ressat.2021.29 7. Ritonga, M., Lahmi, A., Ayu, S., Firadaus, Asmaret, D., Afdhal, S.: Curriculum development strategy management for student mental health in COVID-19 pandemic. Int. J. Pharm. Res. 12 (2020). https://doi.org/10.31838/ijpr/2020.SP2.562 8. Reimers, F.M., Schleicher, A.: A framework to guide an education response to the COVID19 pandemic of 2020. Organisation for Economic Co-operation and Development (OECD) (2020). https://doi.org/10.1787/6ae21003-en 9. Schwab, K., Malleret, T.: COVID-19: The Great Reset. Forum Publishing (2020) 10. Stevens, G.: Teaching in the Post-COVID Classroom: Mindsets and Strategies to Cultivate Connection, Manage Behavior and Reduce Overwhelm in Classroom, Distance and Blended Learning, p. 60–61. Red Lotus Books, Mountain House, CA (2020) 11. O’Donoghue, J.J.: In era of COVID-19, a shift to digital forms of teaching in Japan: teachers are having to re-imagine their roles entirely amid school closures. The Japan Times (2020). https://www.japantimes.co.jp/news/2020/04/21/national/traditional-to-digitalteaching-coronavirus/

378

A. L. Miller

12. Moss, V.A., Adcock, M., Hotan, A.W., et al.: Forging a path to a better normal for conferences and collaboration. Nat. Astron. 5, 213–216 (2021). https://doi.org/10.1038/s41550-021-013 25-z 13. UniRank: Top Universities in Aichi: 2021 Aichi University Ranking (2021). https://www. 4icu.org/jp/aichi/ 14. Suryandari, S., Singgih, S.: Video-based learning for “learning from home” solution in pandemic. In: Journal of Physics: Conference Series, vol. 1760, p. 3. National Seminar of Physics Education (2021) 15. Barrot, J.S., Llenares, I.I., del Rosario, L.S.: Students’ online learning challenges during the pandemic and how they cope with them: the case of the Philippines. Educ. Inf. Technol. 26(6), 7321–7338 (2021). https://doi.org/10.1007/s10639-021-10589-x 16. Castelli, F., Sarvary, M.: Why students do not turn on their video cameras during online classes and an equitable and inclusive plan to encourage them to do so. Ecol. Evol. (2021). https:// doi.org/10.1002/ece3.7123.,p.6 17. Katsunori, S.: Interpersonal relationships and mental health among Japanese college students. In: Landow, M.V. (ed.) College Students: Mental Health and Coping Strategies. Nova Science Publishers, Inc., USA (2006) 18. Kyodo, J.: Japan raises omicron variant alert to highest level. The Japan Times (2021) 19. Soskil, M.: Education in a time of unprecedented change. In: Doucet, A., Evers, J., Guerra, E., Lopez, N., Soskil, M., Timmers, K. (eds.) Teaching in the Fourth Industrial Revolution: Standing on the Precipice, p. 22–23. Routledge Taylor & Francis Group, UK (2018) 20. Kandiko, C.B., Blackmore, P.: Towards more successful curriculum change. In: Kandiko, C.B., Blackmore, P. (eds.) Strategic Curriculum Change: Global Trends in University, p. 206. Routledge Taylor & Francis Group, UK (2012)

Ambient Intelligence in Learning Management System (LMS) Ilan Daniels Rahimi(B) Ono Academic College, Kiryat Ono, Israel [email protected]

Abstract. The common use of Learning Management Systems in academic institutions requires solutions to make these systems accessible and to consider the differences between students. Ambient Intelligence is an optional tool for this purpose. Because Ambient intelligence-based systems allow personal access for the end-user; it is, therefore, an option for Learning Management Systems use. A Learning Management System provides students with information content and educational resources based on the web but in the same format for all users. Ambient Intelligence in fact makes it possible to give a specific answer to each student according to his character and personal characteristics and thus maximize his use of these systems for more efficient use. An advanced and innovative learning system will integrate all the learning processes relevant to the particular learner according to his abilities - his weaknesses and strengths. In this paper, we offer a structure for the use of Ambient Intelligence for academic Learning Management System. On the one hand - by defining the habits and needs of the student and on the other hand by mapping the skill that the systems need for relevant processing. This will create a customized interface to the specific needs of the student. Keywords: Learning Management System · Ambient Intelligence · Micro-segmentation

1 Introduction Adopting Learning Management Systems (LMSs) for e-learning practices for teaching is the most widely used technology in universities and colleges worldwide. These systems provide opportunities for knowledge sharing, building a community of learners, and supporting higher-order learning through conversation and collaboration [1]. This technology makes it possible to improve the learning process through proper planning, implementation, and evaluation in educational institutions [2]. It helps facilitate e-learning because it provides educational material with no time or space limit. In addition, it creates a direct network connection between students and their teachers that enables the sharing of resources and the availability of information related to courses [3]. LMS use will continue to increase dramatically, and so will its importance in both the corporate and academic world [4].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 379–387, 2022. https://doi.org/10.1007/978-3-031-10467-1_24

380

I. D. Rahimi

The use of online courses as part of the academic curriculum existed before the COVID-19 pandemic [5–7, 9]. The global epidemic causes the transition to online learning on a larger scale. This transition is a challenge alongside opportunities. Opportunities such as change conceptions and include new ways of learning and teaching processes in higher education [9, 10]. Academic institutions can take advantage of the crisis and the necessity of online learning to learn innovative pedagogical approaches and use online learning optimally [11]. Today, learning management systems for academic learning (LMSs) are uniform and accessible to all students in the same format, e.g., Moodle. It serves the gifted student and the student with ADHD in the same way. But what if each student had a customized LMS interface? One that also learns the user and adapts to his strengths and weaknesses? The contribution of this paper is to offer the possibility of adapting learning systems to the diversity of the students according to their individual skills and abilities. We provide an initial theoretical framework of integrating Ambient Intelligence (Ami) into learning systems that can help in this regard. Ambient Intelligence (Amin) refers to interfaces that identify human presence and preferences adjust for intelligent environments. These interfaces need to be sensitive, adaptive, autonomous, and customized to suit immediate human needs and requirements. Typical uses of AmI include smart homes, self-driving autonomous vehicles, healthcare systems, intelligent roads, industry sector, smart facility management, education sector, emergency services, and more [12]. The Ambient Intelligence paradigm is based on extensive and ubiquitous computing, profile practices, contextual awareness, and human-centric computer interaction. The main elements derived from the AmI environment: Embeddedness: Computers are not usually stand-alone devices, but many systems are organic and have built-in intelligence and computing capabilities. Current Internet of Things (IoT) development leads to embedded computing because it includes many types of applications with the ability to transform. Awareness of the Contexts: This element is the ability to gather information about the environment at any time and respond accordingly. Data collection includes sensors, an efficient computing software, and automated analysis data performed as emotion analysis. Invisible Computer: This paradigm uses powerful computers collectively in the background, and end-users communicate with small appliances. People interact with embedded systems in an indistinguishable way - the designs are actually “invisible”. Machine Learning: This element allows devices to learn from environment experience automatically, extract knowledge from updated data and create learning processes and capabilities [13, 14].

Ambient Intelligence in Learning Management System (LMS)

381

Segmentation Using AmI allows segmentation. The term “segmentation” is a concept from the field of marketing. It refers to the end-user characterization process—customizing the customer by purchasing the product. This paper uses segmentation to describe the ‘Learning Management System’ user. The process begins with “One size fits all,” then ‘Segmentation’ and finally ‘micro-segmentation’ as shown in Fig. 1.

Fig. 1. From “One-size-fits-all” to micro-segmentation

“One-size-fits-all” is an approach based on the assumption that consumers’ requirements and behavior are homogeneous. Therefore, there is no good reason to differentiate between supply in terms of service. The approach is seen as a contributor to efficiency since all customer needs are the same [15]. Segmentation - Segmentation parameters make it easier to focus marketing efforts and resources. It makes it possible to know the customers’ habits and identify their needs. It is possible to meet these needs through service and execute better top-down strategies. The consumer market can be segmented according to the following customer characteristics: demographic, geographical, psychographic, and behavioral [16]. Micro-Segmentation. “Micro-segment” is a distinct divider of the market according to the customization of customers and users. Micro-segmentation can be used in information technology, corporations, and marketing. It is formed by using a dataset through data mining, artificial intelligence, and algorithms. Creating a user’s profile makes it possible to deliver him good customized service. Hence the advantage that allows resources to be used much more efficiently to suit the customer’s needs [17].

382

I. D. Rahimi

This paper presents a descriptive framework for the use of AmI in LMS based on Gill & Cormican’s paper [18]. They define the concept of AmI from a systems development area. Their research is based on a project about the use of AmI to explore and develop systematic innovation in the manufacturing of small to medium-sized businesses. They framed the use of AmI by classifying it into tasks and skills to create a better understanding of system structure. In this article, we propose a structure to adopt this classification for use in the Learning Management Systems (LMS). Advantages of AmI in LMS • Adapting a learning environment makes it possible to meet individual learners’ needs and learning styles. • Improves the learner’s learning quality and makes it more efficient and faster thanks to a customized approach for each student. • Curriculum automation.

2 AmI System Structure The AmI system includes tasks and skills [18]. The top box in Fig. 2 represents the tasks that the AmI system needs to respond to. The bottom box contains the skills that the AmI system should include. The ‘tasks’ are the events the system needs to identify, recognize, and respond to accordingly. The tasks are human-oriented - they represent a variety of human characteristics that the AmI needs to “know”. The skills are geared towards technology. Skills are AmI’s toolkit with which technology interacts with humans. They represent the tools that technology must naturally acquire to interact. The tasks and skills are interrelated and interdependent. The relationship between them can achieve micro-segmentation in LMS, as shown in Fig. 2.

Fig. 2. Micro-segmentation use Learning Management Systems Model, based on Gill & Cormican’s diagram

Ambient Intelligence in Learning Management System (LMS)

383

2.1 Tasks a. Habits - Routines, practices, traditions, conventions, patterns of behavior, tendencies, likes, and preferences. the system maps the user’s habits and acts accordingly. • Social-collaborative or individual learning Cooperative learning - Some students will need more frequent communication with other students - the system will adjust the accessibility of elements such as chat or discussion forum according to this criterion. The system can also offer rich shared tools that allow students to actively participate in discussions, collaborate on assignments and learn from each other. On the other hand, some prefer individual learning and do not need these tools. The system will “offer” tasks to individuals and will not burden their screen with the shared virtual spaces. • Self-paced learning adjustment The pace of learning in one subject can have an impact on the pace of learning in another subject when these topics are intertwined. Learning at a certain speed is also a user’s habit. Regardless, the learning speed dictates the total amount of time for teaching a unit of study (given a linear instruction that moves from topic to topic). Despite this, no conclusions are drawn from this behavioral monitoring and that LMS does not translate these behaviors into data collection on the student. b. Needs - Needs are the things that human beings need to survive. According to the needs mapping, the system will identify requirements and desires. It will also determine our desires, preconditions, and the things we cannot live without. For example, in identifying needs derived from ADHD - the system will fit the proper learning method. • Learning methods Options for deploying study materials into custom units include creating quick and focused learning units for those who need you—also, the length of the lesson - short or long, the same as for reading material. • Training The system can recommend training based on the specific performance of the user in the course so that they can get the most out of it.

384

I. D. Rahimi

c. Preferences Each person has different methods and preferences of using computer systems, from choosing the display size to arranging folders into icons or rows. The system will adjust itself to these preferences. • Relevant learning channels • The system will recommend relevant learning channels for the learner, for example, Mobile learning. • Learning resource Some students will prefer to read electronic text. Some will choose to listen to podcasts or recorded lessons, and some will pick to watch a video. • Workspace The system will adjust the LMS operating system and screen display to the student’s preference. d. Recourses availability ‘Recourses availability’ Pocus on An interface that includes specific and different levels of knowledge. Although the content presented in the system is uniform, other learners come with varying levels of knowledge. All learners come with prior knowledge, which is ‘internal resources’. At the same time, the ‘external resources’ are when they meet the study unit or course (Fig. 3). Given that, the management system can deal with early levels of knowledge through study units customized according to the student’s levels of knowledge.

Fig. 3. Internal and external resources by Frederick Reif diagram1

Study aids and learning contents are the resources that should be available.

1 https://achemicalorthodoxy.wordpress.com/2018/10/25/simplifying-cognitive-load-theory/.

Ambient Intelligence in Learning Management System (LMS)

385

• Use of study aids Some learners will need high accessibility of formula sheets, vocabulary, glossary, etc. Because their use is often- their location in the interface will be noticeable. Some learners will not need this, and what will keep the study aids aside. • Allocation of learning content We have already shown different students different levels of knowledge. Therefore the interface will allow each learner to use learning resources according to their level of expertise—for example - automatic skipping of content for those who can progress faster. e. Gestures Gestures are the ability to recognize physical behavior: means- body language. The idea here is to allow “reading” of the non-verbal communication to supplement the information collected about the student. Technologically, this is already possible today and even occurs in testing systems that accompany the LMS systems. For example, the eye movements of the examinees who are examined and analyzed to determine whether the student violates the exam instructions and uses prohibited study material throughout the exam. f. Emotions Emotions are feelings that a person has. These feelings can be happiness, sadness, anger, disgust, surprise, etc. AmI technology should be able to recognize the external expressions of the various emotions that humans experience. Identifying emotions can help in understanding the student’s use of the system. Is it convenient for him? how accessible is it? Is it frustrating because it is impractical? Etc. 2.2 Skills a. Responsive The system should be responsive with a sensitive range for the learner’s actions and usage habits. It needs to respond quickly to the various situations it encounters. The interface is accurate, responsive, perceives, “understands”, sensitive, and mostly well-tuned to the learner’s requirements.

386

I. D. Rahimi

b. Adaptive During use, the AI system monitors the student’s activity. (for example, how much time a student spent on each page). Automatic rules are set at each stage so that the system can make suggestions based on the student’s activity. c. The learner at the center An essential system requirement is placing the student’s learning at the top of the priorities. The interface capabilities come as a secondary component that helps the learner. d. Flexible in time and place Maximum flexibility to a large extent is required. The system must be “present” all the time and everywhere. ‘Everywhere’ means - interface adjustment to every device, whether a mobile phone, home computer, tablet, etc.

3 Conclusions This paper examines the essential elements to map with which can personalize an LMS. We discussed the definition of these components and their role in refining LMS and upgrading LMS systems. We propose ways to integrate AmI to existing LMS; based on previous experience in AmI applications on users and customers. This concept makes it possible to present an LMS model that provides an intelligent and user-friendly learning environment. The framework produces a micro-segmentation format. A systemic structure implementing AmI in LMSs will require several steps. First, a preliminary survey on student preferences. There is a lot of reliable information that can be taken from the students. The goal at this stage is to collect baseline data firsthand for initial fusion. The second step will be to create contact channels with the students- A communication through chatbots, digital assistants, and smart agents. This stage is based entirely on the first stage data. The third is using deep learning, reinforcement learning, and genetic algorithms. These components will process the collected data to adapt the system to the individual. Fourth, using artificial intelligence methods such as classification, anomaly detection, and relationship analysis. Their applications include speech recognition, text mining, speaker recognition, webcam motion, expression monitoring, social network analysis, etc. These are some essential components for users’ system learning. It will bring the learner through the LMS system by combining AmI into its components and producing high-level learning.

References 1. Zanjani, N.: The important elements of LMS design that affect user engagement with elearning tools within LMSs in the higher education sector. Australas. J. Educ. Technol. 33(1), (2017)

Ambient Intelligence in Learning Management System (LMS)

387

2. Alias, N.A., Zainuddin, A.M.: Innovation for better teaching and learning: adopting the learning management system. Malays. Online J. Instr. Technol. 2(2), 27–40 (2005) 3. Ain, N., Kaur, K., Waheed, M.: The influence of learning value on learning management system use: an extension of UTAUT2. Inf. Dev. 32(5), 1306–1321 (2016). https://doi.org/10. 1177/0266666915597546 4. Valuates. Global learning management system market (2019) 5. Seaman, J.E., Allen, I.E., Seaman, J.: Grade Increase: Tracking Distance Education in the United States. Babson Survey Research Group (2018) 6. Soffer, T., Cohen, A.: Students’ engagement characteristics predict success and completion of online courses. J. Comput. Assist. Learn. 35(3), 378–389 (2019). https://doi.org/10.1111/ jcal.12340 7. Zilka, G.C., Cohen, R., Rahimi, I.D.: Teacher presence and social presence in virtual and blended courses. J. Inf. Technol. Educ. Res. 17(1), 103–126 (2018) 8. Rahimi, I.D., Zilka, G.C., Cohen, R.: Sense of challenge, threat, self-efficacy, and motivation of students learning in virtual and blended courses. Am. J. Distance Educ. 33(1), 2–15 (2019). https://doi.org/10.1080/08923647.2019.1554990 9. DePietro, A.: Here’s a look at the impact of coronavirus (COVID-19) on colleges and universities in the U.S. Forbes (2020). https://www.forbes.com/sites/andrewdepietro/2020/04/30/ impact-coronavirus-covid-19-colleges-universities/?sh=29de62a861a6 10. Yan, Z.: Unprecedented pandemic, unprecedented shift, and unprecedented opportunity. Hum. Behav. Emerg. Technol. (2020). https://doi.org/10.1002/hbe2.192 11. Dhawan, S.: Online learning: a panacea in the time of COVID-19 crisis. J. Educ. Technol. Syst. 49(1), 5–22 (2020). https://doi.org/10.1177/0047239520934018 12. Mahmood, Z. (ed.): Guide to Ambient Intelligence in the IoT Environment: Principles, Technologies and Applications. Springer, Preface (2019). https://doi.org/10.1007/978-3-030-041 73-1 13. Demir, K., Turan, B., Onel, T., Ekin, T., Demir, S.: Ambient intelligence in business environments and internet of things transformation guidelines. In: Mahmood, Zaigham (ed.) Guide to Ambient Intelligence in the IoT Environment. CCN, pp. 39–67. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04173-1_3 14. Sharma, A., Kumar, A., Bhardawaj, A.: A review of ambient intelligence system: bringing intelligence to environments. Int. J. Inf. Comput. Technol. 4(9), 879–884 (2014) 15. Hjort, K., Lantz, B., Ericsson, D., Gattorna, J.: Customer segmentation based on buying and returning behaviour. Int. J. Phys. Distrib. Logist. Manag. (2013). https://doi.org/10.1108/IJP DLM-02-2013-0020 16. Goyat, S.: The basis of market segmentation: a critical review of literature. Eur. J. Bus. Manag. 3(9), 45–54 (2011) 17. Vieth, M.: Customer segmentation in B2B markets: the relationship between customer segmentation and market orientation. Bachelor’s thesis, University of Twente (2018) 18. Gill, S.K., Cormican, K.: Support ambient intelligence solutions for small to medium size enterprises: typologies and taxonomies for developers. In: 2006 IEEE International Technology Management Conference (ICE), pp. 1–8. IEEE (2006)

eMudra: A Leftover Foreign Currency Exchange System Utilizing the Blockchain Technology Rituparna Bhattacharya, Martin White(B) , and Natalia Beloff University of Sussex, Falmer, Brighton, UK {rb308,m.white,n.beloff}@sussex.ac.uk

Abstract. The global travel industry leaves almost every traveller with some amount of leftover foreign currency at the end of the trip. Most travellers are unable to use these leftover foreign currencies efficiently and profitably because exchange bureau rates do not favour the traveller nor accept low denomination coins. In this paper, we explore existing currency exchange systems for leftover foreign currency and consider prevalent challenges. We propose an innovative system, eMudra, for exchanging cash based leftover foreign currency by integrating smart kiosk-based systems with peer-to-peer currency exchange utilizing the concept of blockchain technology alleviating the challenges. A key component of our system advocates the role of wearable technology for secure identification with a kioskbased transactions user interface. We have implemented, tested and evaluated eMudra components—the User Wallet and the Transaction Management Network as a permissioned consortium blockchain while simulating the kiosk interface and associated physical bill and coin exchange. Leftover foreign currency lying unused in the wallets and drawers of travellers is significant and can easily add up to billions if considered globally. Our blockchain based system, eMudra, can play an important role in bringing these currencies back into the global economy by making money deposit, management and exchange process easy and lucrative. Keywords: Blockchain · Cash currency exchange · Permissioned consortium blockchain · Smart kiosk

1 Introduction Different countries have different currency systems with a few exceptions such as the European Union. So, during the start and end of any foreign travel there emerges the need for some form of currency exchange. While there are a large number of exchange bureaus and banks providing exchange services, recent disruptive systems exploiting peer-to-peer (P2P) communication is starting to see the evolution of online currency exchange platforms. However, the process of exchanging cash (particularly coin) based leftover foreign currency (LFC) still remains inconvenient and non-profitable. In this paper we describe, eMudra, an innovative blockchain-based currency exchange system that provides competitive exchange rates while making cash based LFC exchange ubiquitous and easy for the global traveller. The eMudra architecture © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 388–408, 2022. https://doi.org/10.1007/978-3-031-10467-1_25

eMudra: A Leftover Foreign Currency Exchange System

389

implements the concept of blockchain technology—a shared distributed ledger where all transactions are approved through consensus from the nodes in the network thus enforcing decentralization. However, to prevent crimes such as money laundering, functionalities such as new user registration with initial identity verification does need to be controlled by an authority (e.g. a financial organization) usually as part of countries’ financial regulation framework. The LFC exchange model (used by eMudra) also proposes a human kiosk interaction and authentication technique using a wearable device such as a smart-band that enables money deposit or withdrawal along with identity verification. However, the kiosk ecosystem is beyond the scope of this project and as such is simulated. We focus here on the decentralized components of the system that leverage blockchain such as user’s multicurrency wallet or the Transaction Management Network and the keep detailed description of other components directly controlled by the regulatory stakeholder organizations beyond the scope of the project. This paper is organized as follows: Sect. 2 includes background research focused on current LFC utilization methods, problems with traditional LFC exchange methods, the potential market for LFC, drawbacks of some of the newer methods of LFC exchange and a discussion on P2P exchange systems. Section 3 presents eMudra, a new P2P currency exchange framework exploiting Internet of Things, Wearable technology and most importantly Blockchain technology for leftover foreign currency in terms of specifying system architecture. Conclusion is discussed in Sect. 4.

2 Background Research Conventional methods of carrying funds to a foreign destination are cash, debit, credit and prepaid cards and less so these days traveller’s cheques [30]. While we are shifting towards a digital age where we may make all payments using cards and our smart devices such as phones or wearable devices abroad in the future, more often travellers will have some amount of cash based LFC when they leave their foreign destination. 2.1 Current LFC Utilization Methods Utilization of cash LFC on return are generally based on the options such as Exchange through bank or Bureau de Change, buy back guarantee, shopping, charity, exchange with friends, save for future travel and present as gifts. While the first two options may be considered for significant to moderate sums, and ‘Exchange with friends’ and ‘Save for Future Travel’ are suitable for moderate and small amounts, the rest of the options are for amounts regarded as relatively negligible to the traveller, but globally can add up to billions of cash based LFC. 2.2 Issues with Existing LFC Exchange Methods A particular issue with existing or traditional currency exchange systems has been the difficulty in exchanging cash, whether coin or notes based. It may be inconvenient to visit a bank for exchanging cash on a day when the exchange rate is good. Moreover, if the exchange amount is not considerable, users may not be willing to go to the bank to

390

R. Bhattacharya et al.

exchange. Furthermore, exchange bureaus do not usually accept coins, and in the case where coins are accepted, they are of particular currency types and values. Often there is an option to donate, for example when exiting a plane, the LFC to a charity, but not all travellers may be willing to dispose of their LFC in this way. 2.3 LFC Market A recent study suggests that there are £663 million in unused currency with holidaymakers in UK [19], while Zopa suggest even higher at 2.9 Billion [1]. It is interesting to explore the amount of currency that is wasted globally—“Visa Europe has estimated over the years that there is between £900 million to £2.9 billion in foreign currency in homes across the UK. Global Coin Solutions estimates over $750 million CAD of foreign currency is in Canada while the US has between $2.0 and $10.0 billion USD— possibly even more” [23]. If this is projected globally, it is likely that the amount is going to be much more. The number of international tourist arrivals is going to increase over the near future, not accounting for Covid 19 disruption to travel [2]. Hence, the amount of cash based LFC wasted is expected to rise proportionally unless there is a convenient, easy, secure and profitable way to bring these billions of cash based LFC back into the world economy. 2.4 New LFC Exchange Methods Recently, a few systems have emerged that permits cash LFC exchange. These systems can be largely described by two main categories: ‘exchange via post’ and ‘kiosk-based exchange’. Exchange Via Post: This method can exchange both foreign notes and coins including obsolete currency. Example systems include: Cash4Coins and LeftoverCurrency [28]. The processing times for both these cases vary and the exchange rate is centrally decided by the agencies [4]. Kiosk Based Exchange: This method can also exchange both foreign notes and coins including obsolete currency. Example kiosk-based exchange systems include Fourex. As of this writing, the Fourex kiosks can exchange money from a large number of currencies, including old European currencies, and convert them to pounds, euros or dollars [20]. The exchanged money is immediately dispensed at the kiosk as cash.

2.5 Drawbacks of the New LFC Exchange Methods Common issues with the new cash based LFC exchange systems (both exchange via post or kiosk) include: Submission of Money: The exchange via post method incurs postal charges and it requires much time for the entire process to be completed. The exchange rate is chosen and offered by the company. Some countries do not allow sending money through post [6].

eMudra: A Leftover Foreign Currency Exchange System

391

Decision of Exchange Rate: For both exchange via post and kiosk-based operations, the exchange rate is fixed by the converting company, therefore the traveller does not have any option for choosing the exchange rate. These systems also have limits in terms of the amount of currency that can be exchanged. Usage of the Exchanged Money: The cash dispensed from the machines of Fourex will be in the form of pounds, euros, or dollars, which potentially implies a second time exchange will be essential for travellers heading to countries with currency type different from these three. Other organizations provide exchanged money through bank transfer or PayPal depending on the traveller’s location. These redemption methods are not necessarily ideal and depend on travel destination or current travel location. Geographical Limitation: Most of the LFC exchange services available are for specific currencies and the kiosk-based operations are limited, they are usually located inside particular airports. Entry to and exit from a country are possible by road, rail or ships indicating that coverage of possible country entry and exit locations is currently sparse.

2.6 Peer to Peer Exchange Systems A contemporary trend of exchanging money involves the disruptive method of P2P currency exchange. A key advantage of the P2P currency exchange is the ability for peers to decide the exchange rate. There can be two different ways to exchange currency in this context: the first method permits currency exchange without the need of any associated crypto-currencies and the second method utilizes a virtual cryptocurrency system for currency exchange. The former method needs travellers to send money from their bank accounts to the exchange system. When a match is found, money is exchanged directly with a peer looking for currency in the reverse direction. There are also different ways by which the exchange rate is fixed for such transactions. User Specified Rate in Marketplace: This is a P2P currency exchange and transfer platform where the user decides the exchange rate. Currencyfair is one such system [9]. The user has to deposit money to the Currencyfair account. There is a minimum amount of currency to be exchanged and there is an associated bank transfer fee to send the exchanged currency to the user’s bank account. User Specified Rate Limit with Mid-Market Rate: This system allows users to set a rate limit, that is, the lowest possible rate at which users are willing to convert their money. The system uses the mid-market rate for exchange. If rate goes below the user specified rate limit, exchange is postponed till the rate reaches the preferred rate limit for the currency. One such system is Transferwise [10]. Transferwise charges a small fee for converting currencies, at the time of writing this paper. The money is deposited from the user’s bank account to her Transferwise account, and once converted is sent to the beneficiary’s bank account.

392

R. Bhattacharya et al.

System Decided Rate: This group of systems allows currency exchange between peers holding accounts with the operating organization at a rate which is computed by the system based on live data feed from currency markets providing the mid-market rate for each currency market. MoneySwap [12] and Midpoint [14] are instances of such a system. WeSwap is another such system that additionally allows cash withdrawals over 200 GBP on a WeSwap card and purchases at shops, restaurants and online [13]. P2P Exchange for Companies: This allows financial exchanges between SME and mid-cap companies at mid-market rates. Kantox is one such example [11]. Another example of P2P currency exchange system would be KlickEx [26]. For the above-mentioned systems, client funds are generally kept in accounts segregated from their own thus ensuring protection of the client money. However, these P2P systems have certain challenges. • Cash Money Acceptability: They do not have the provision of accepting cash based LFC from a trip. • Transfer Fees: The transfer and/or exchange fees for the P2P systems are unsuitable for small exchange amounts. • Limited list of currencies accepted: These systems work on limited set of currency pairs. For the second category of exchange platforms based on cryptocurrency, the concept of Blockchain technology has been leveraged [15]. These systems can enable a decentralized P2P communication employing a shared distributed time-stamped ledger with transactions recorded based on agreement between the participating nodes. Some such examples include Bitcoin [15], Stellar [16] and A.I.Coin [17]. However, these systems do not resolve the problem of cash LFC exchange particularly, small amounts.

3 eMudra: The LFC Exchange System Based on the background research, it is found that though a few recent attempts have been made to solve the cash based LFC exchange problem, there are a number of issues still prevalent, particularly for small amounts. In our earlier research work [18, 27], we have outlined a blockchain based system that may alleviate the challenge of cash LFC exchange. We have implemented a prototype system that solves these challenges and seamlessly allows travellers to make use of their cash based LFC conveniently. Here, we describe in detail the cash based LFC exchange model that follows a P2P exchange framework deploying IoT (for interconnectivity), smart kiosks (for deposit/withdraw), Wearable technology (for identity verification) and Blockchain (for financial ledgers) with certain parts of the system controlled and regulated by stakeholder organization(s). In our opinion the LFC problem is unique as there needs to be an exchange of physical cash, which means at the least there needs to be a third-party stakeholder system that handles that cash. Our choice is to exploit foreign currency kiosk systems connected to a blockchain architecture.

eMudra: A Leftover Foreign Currency Exchange System

393

3.1 Traveller’s Journey Use Case Let us consider the following use case scenario. John is on a business visit to Chicago and is now returning to London. He has some unused USDs with him. At the airport gate, he gets the location of the nearest currency-accepting smart kiosk with his smartphone. He brings his smartband close to the kiosk for identification. Through an ECG based identification mechanism in the device, his identity is authenticated. He deposits his cash money and walks off. He gets the receipt of the deposit in his smartphone instantly. Later, he checks in his computer the list of exchange rates published by peers in the network who need USD for GBP. He decides to publish his own exchange rate. John’s LFC gets exchanged with one of the peer travellers and the exchanged money is ready to be transferred to his associated account or collectable at a local kiosk. He gets an alert on his smartwatch and smartphone once the exchange is completed and the money is ready for retrieval. A critical component of this scenario is the ability to identify John at the smart currency exchange kiosk through an interaction between the wearable and kiosk, because for small amounts it has to be as simple as tossing one’s coins into a bin, just like tossing one’s coins into a road tollbooth to raise the barrier. Otherwise, a traveller is tempted to just keep the coins in her pocket. Further, the smart kiosk can allow coins not registered against a user at the time of deposit to be automatically assigned to a charity—perhaps John forgot to wear his smartband, but deposited anyway. Other preferences can be set up a priori when John registers with the system. John could also just simply toss his coins into the kiosk over many leisure and business trips not worrying how much was deposited nor what currencies were deposited. He may decide to check on his multiple currency balances on a future trip and have all balances convert to a new destination currency. 3.2 Business Model Because of the complex nature of the entire system, a combination of various business models may be adopted to satisfy the application. As the system will have an integrated online shopping platform where user can shop with their LFC after depositing the cash at the kiosk, the commission business model could be adopted. This model takes a percentage of the sale and generates revenue. In general, in decentralized applications, advertisements cannot be integrated as there will be no one to govern the advertising standards [22]. Despite the commission model that can act as a solution for the shopping section of the P2P application, a subscription model could be adopted by the overall P2P system where users pay a fee to access and use the application, say for instance, the user pays the fee to access the shopping section of the application. This model generally provides access to specific contents of any e-commerce website only if the user has an active subscription [3]. Users can pay to use such applications using internal currency or the cryptocurrency associated with the application. Instead of adding a transfer or exchange fee to the transactions that will discourage transfer or exchange of small amounts, a subscription fee may be paid to access the application or its specific components.

394

R. Bhattacharya et al.

In addition to the above-mentioned models, we can envisage an interest-based model such that when the system is fully functional and runs worldwide at airports, users earn interests on the deposited money and fees are deducted from her earnings. 3.3 Blockchain Technology A blockchain “is a fully distributed, peer-to-peer software network which makes use of cryptography to … easily transfer digital instruments of value that represent real-world money” [25]. It facilitates creation of a shared distributed ledger [22] that consists of blocks threaded together in a serialized fashion where a block is composed of a number of transactions, a hash of the previous block, timestamp designating the time of block creation, reward for the miners who discover the blocks through a process called mining, block number among other information. Since every block has a hash of the previous block, blocks are linked together with each other forming a chain of blocks originating from the first block in the chain, i.e. the genesis block, block number 0 or 1. The copy of the blockchain is shared with every node in the network. Cryptography ensures a secure network without any central authority or ownership [25]. Blockchain is supported by the following ingredients: • Peer-to-peer Networking—this includes a group of computers communicating with each other without a single central authority resulting in absence of single point of failure. • Asymmetric Cryptography—this enables the creation of a set of credentials for user accounts such that anyone can verify the sender’s authenticity but only intended recipients can read message contents. • Cryptographic Hashing—in popular blockchain based applications such as Bitcoin or Ethereum, the Merkle tree data structure can be utilized to record the canonical order of transactions, “which is then hashed into a “fingerprint” that serves as a basis of comparison for computers on the network, and around which they can quickly synchronize” [25]. When we implement a digital transaction or ‘transfer of value’ we need to guard against fake transactions, such as double spending. Nakamoto discusses this in more detail in [15], but to summarize, he discusses existing financial systems relying on trusted third parties, i.e. banks, to handle fraud for example, whereby a certain amount of fraud is effectively accepted because of inherent weakness that cannot guarantee the possibility of reversing a transaction. In blockchain based cryptocurrency transactions we want to eliminate the third party, i.e. the bank who provides the trust mechanism to eliminate double spending. This gives rise to the need for electronic payments between two parties based on a cryptographic proof instead of the trust normally provided by a third party. The concept of a Blockchain is the foundation for bitcoin, the most widely used cryptocurrency. As mentioned above, it provides a public distributed time stamped ledger across a decentralized network where every node can access the ledger (this applies to public blockchain; consortium and private blockchains may involve restrictions in terms of who can access the ledger). A consensus mechanism, e.g. proof of work, or proof of

eMudra: A Leftover Foreign Currency Exchange System

395

stake, is needed to append transactions to a blockchain. For all Bitcoin transactions, in every 10 min slot, a new block containing those transactions is added to the chain thus forming a Blockchain. Bitcoin blockchain mining is the process of adding transaction records to the Bitcoin blockchain where miners are challenged with a computationally expensive (takes around 10 min to compute) cryptographic puzzle. The miner solving the puzzle then adds a new block to the chain and receives a certain number of bitcoins in return [15]. Bitcoin works on a consensus model where any block not approved by a majority is eventually eliminated. It is a trustless system where users do not trust each other or any central body, but only the system that does not allow double spending and at the same time solves the Byzantine General’s problem [5]. So, a blockchain is a distributed digital ledger used for recording and verifying a transaction, e.g. a payment transaction or ‘transfer of value’. A block ‘appends only’ the details of the transaction in a ledger (effectively a book keeping system—the blockchain). This ‘append or record only’ is a key feature of the blockchain—unlike a physical paper (or traditional digital ledger) where you can cross or tear out a page (paper) or delete (digital) you can only write data to the blockchain, i.e. you cannot delete or update a block in the blockchain. Therefore, all previous transactions in a blockchain are theoretically always there chained together consecutively. Thus, the blockchain “provides us with a permanent and complete record of every transaction that’s ever-taken place, so that this record or ledger (the blockchain) represents the truth, which can’t be modified after the fact” [21]. 3.4 Proposed System Architecture of e-Mudra: A P2P Cash LFC Exchange Application In this section we describe our architectural model for the prototype of a cash based LFC exchange system (e-Mudra) where the following components are expected to work together. Figure 1 illustrates the proposed system architecture, and we describe each module below.

Fig. 1. e-Mudra architecture

396

R. Bhattacharya et al.

User Registration and Verification Unit: When a new user, say Alice opens an account to use the proposed LFC application, she will have to undergo verification of her identity, as required by most countries for anti-money laundering, e.g. by submitting a copy of her passport. Once her identification and enrolment processes are complete, and account is set up she is at liberty to use the system for her travels. A key component of the user registration and verification process is to record unique user features, such as the heart beat pattern (ECG), palm vein image using an infrared camera [8, 24], and so on, so that the user identity verification system can verify a user’s identity thus enabling her to access her account. So, if access management to the smart kiosk is based on heart beat authentication, at registration time, Alice will need to register the biometric template of her heart beat, which would be used later (at a kiosk) to submit or withdraw money from a smart kiosk during travel [7]. Smart Kiosks: A smart kiosk allows Alice to deposit cash based LFC after authentication. It will also let her withdraw currency as cash if she has sufficient balance. A smart LFC kiosk will be able to accept or provide currency local to the country where it is placed. For instance, if it is placed in Gatwick, it can allow deposit and withdraw of GBP. However, it can also be designed to accept different currencies during deposit, for example EURO in addition to GBP, to satisfy the scenario where a user is unable to deposit cash LFC before departure from the foreign country and attempts to convert the same at the destination airport, say Gatwick. Multi-currency kiosks such as the ones designed by Fourex [20] can be cited in this context as they can accept multiple currencies and allow money to be withdrawn in specific currencies. The smart kiosk also has an additional functionality: Sending alerts to a Kiosk Control Centre when it is full and is unable to accept any more cash currency or there is insufficient money to satisfy any withdrawal request. The set of connected kiosks can be ubiquitously placed at different exit and entrances around an airport or other ports of entry and exit from a country. Kiosk Control Center: When any of the kiosks sends a request for refill or cash collection to the server in the Kiosk Control Centre, it is dispatched to the respective attendant based on the kiosk location automatically with an alert on his smartphone or wearable. This unit will have different physical branches across different countries and locations. When the cash stored in the kiosk through deposit and withdraw by users reaches an upper threshold, an automated message is sent to the Kiosk Control Center. As this unit includes human beings as attendants, the request from the kiosk to collect cash above the threshold is usually sent as an alert to the attendant’s smartphone or smartwatch. Similarly, if the cash in the kiosk falls below a lower threshold, an alert is sent such that the attendant supplies cash to the kiosk. It is beyond the scope of this research to actually build such a kiosk system, instead we simulate the kiosk interface. Our work focuses more on the blockchain for LFC exchange. The kiosks remain connected to the servers in the Kiosk Control Centre so that they can convey their contained currency status in addition to remaining connected to the Transaction Management Network. While their connection to the servers in the Kiosk Control Centre follows a centralized approach, the connection to the Transaction Management Network or in other words, the permissioned consortium blockchain follows a decentralized approach.

eMudra: A Leftover Foreign Currency Exchange System

397

Fiat Currency Management Unit: This unit is responsible for managing the overall cash currency deposited or withdrawn from the kiosks across different locations. It maintains the accounts for the cash in circulation in different kiosks. Travellers’ money is kept in accounts segregated from organization’s money. In cases, where a matching peer during exchange is not found, the organization can act as an intermediary and provide its own exchange rate and allow exchange by the user. Identity Verification System: When a user, say Alice provides her credentials in front of the kiosk, her identity is verified through this component. The kiosk contacts the respective server for verification. Identity verification is also required when logging into her wallet and is managed through this system. Alice needs to provide username and password during logging into the wallet and these usernames and passwords are generated at the time of registration. User Wallet: This is the application in the Alice’s computer that will let her check and set exchange rates, check her balance of different currencies, exchange with a peer and send money to her bank account. Her multicurrency wallet can display both fiat money and cryptocurrency balances but before sending currencies to bank account, the cryptocurrencies must be converted to fiat currencies. She does not need to have a separate cryptocurrency wallet as her multicurrency wallet can display different cryptocurrency balances (however, the transfer of currencies to/from other external accounts has not been implemented as part of the prototype). She can also publish her preferred exchange rate and amount and wait for a peer to exchange. In addition, she can transfer money to her friend or relatives’ accounts or receive from her other account or friends and family. Alice can also use her money for shopping, buying gifts and charity. She can see the history of the exchanges performed or donated to charity or spent in buying gifts. She can also opt for auto-exchange where she does not need to select or publish an exchange rate. Exchange is done with a matching peer automatically. She can add other users as friends and exchange with them. Transaction Management Network: This is a decentralized network of different nodes maintaining a shared time-stamped ledger exploiting blockchain technology to eliminate any single point of failure. The ledger is updated every few seconds and any new block addition with transaction verification is based on consensus between the nodes. So, when any deposit or withdrawal happens at any of the kiosks, the associated kiosk broadcasts that to the nodes, and the same happens when a user transfers money to/from her account or exchanges it with a peer from her wallet. Additionally, a different registry of advertisements published by users depicting their preferred exchange rates is maintained. As this application spans across countries, it will be maintained by more than one regulatory body (organization/authority) and/or more than one branch of an organization. So, this is decentralized as any transaction is recorded and validated by nodes participating from multiple organizations or multiple nodes from different branches in a single organization competing to win rewards, but at the same time, this is permissioned as the identities of such nodes must be accounted for. This is a consortium blockchain where only a restricted set of nodes participates in the consensus process. The nodes participating in the consensus process here are internal to the participating

398

R. Bhattacharya et al.

organization(s) and not the end users of the application as travellers cannot be expected to do block addition or validation. The ledger is readable to all or a specified list of participants or partially readable to some of the participants depending on the stakeholder organizations’ policy. The application cannot leverage a public blockchain or enforce complete decentralization as the transactions must be governed by regulatory bodies to prevent money laundering. Besides, it involves fiat currency and kiosks which must be controlled by some organization(s). To prevent money laundering, user identities are managed by stakeholder organizations that can relate which user performed which transactions by mapping their identities to usernames and public keys, however, the travellers may request to read the ledger which projects transactions mapped to public keys and not real user identities, hence, travellers cannot easily find out transaction details of other users. Wearable Technology: This can be used for authentication purpose, e.g. based on heartbeat [7]. In addition, sending alert to attendant to collect or refill currency in a kiosk can be accomplished using wearable devices such as smartwatch worn by the attendant. Exchange Rate Adjuster: Before Alice posts a new exchange advertisement, it is checked if the intended exchange rate falls within acceptable limits. This component checks against live rates for the day and adjusts the published rate to avoid too high or too low exchange rates, thus preventing money laundering. The adjusted rate is then broadcasted to the Gateway Nodes by the multicurrency user wallet.

3.5 e-Mudra: The Implementation of the Prototype Our prototype, eMudra, builds on components we created previously, e.g. converting fiat money to cryptocurrency [29]. We have now developed further prototype of the e-Mudra application components using Java technology. It includes the User Wallet, Transaction Management Network and a simulation of the Kiosk User Interface. Here, we add a description of the prototype. Transaction: There can be seven different types of transactions: (a) user deposits money at kiosk, (b) user withdraws money from kiosk, (c) user transfers money from her account to a peer’s account, (d) user exchanges currency with another user, (e) user donates money to charity, (f) user buys commodities with the money in his account, (g) user transfers money from her account to an external bank account. Donation is a specialized version of Transfer where user selects a charity and the money is transferred to the charity’s public key. For clarity, in this paper, we limit our discussion to the first four types of transactions as these are the most important functionalities designed to recirculate LFC back into the economy. Each transaction has a sender, a receiver, amount, currency unit or type (whether cryptocurrency or fiat currency), inputs, outputs, and a transaction id. The sender and the receiver are specified by their corresponding public keys. None of the transactions are associated with transaction fees as the application follows a subscription model where user pays a subscription fee. A subscription model will enable users to access different parts of the application based on subscription, for example, the user can access the buying and sending gifts section only if she has an active subscription.

eMudra: A Leftover Foreign Currency Exchange System

399

Moreover, adding exchange or transfer fees will discourage users from exchanging or transferring small amounts. Nodes: The application utilizes blockchain technology, the network comprises of four types of nodes: Kiosk node and User Wallet are clients while the Gateway Nodes and the Miners perform block addition and validation. The Transaction Management Network described above involves the Gateway Nodes and the Miners. The users of the application access the wallet that do not need to participate in mining, block addition or validation. The application leverages a permissioned consortium blockchain such that only a selected set of nodes with known identities carry out block addition and validation. The Gateway nodes, miners, and the kiosk nodes are managed by the stakeholder organizations. User identities must be recorded by the stakeholder organizations to prevent money laundering. The user wallets and the kiosk nodes are connected to the Gateway nodes and the Gateway nodes are connected to each other. Each Gateway node is connected to a group of miners or mining nodes [27]. Kiosk Simulation: Every kiosk should serve two main functionalities, deposit and withdraw money. Recall we have simulated the Kiosk User Interface, therefore implementation of user authentication using physical wearable devices, cards or smartphone is kept beyond the scope of this paper. Every kiosk has a pair of public and private keys that are stored in the kiosk locally against a kiosk name. These keys are generated when the kiosk application is launched for the first time. The keys are also stored in the Gateway Nodes through the “ADD_TO_DIRECTORY” instruction when the keys are generated; this we will discuss later in this paper. The Kiosk superclass has two subclasses: MoneyDeposit and MoneyWithdraw. The parent class exposes four methods that are leveraged by the child classes: • Retrieve Unspent Transaction Outputs—this method queries any of the Gateway Nodes about the Unspent Transaction Outputs for a particular user for a specific currency type. The instruction sent to the Gateway Node is “RETRIEVE_WALLET_UTXO”, we will discuss more about instructions when we describe the Gateway Node. • Retrieve Balance for a currency type for a specific user—this method queries any of the Gateway Nodes about the available balance for a specific user for a particular currency type. The instruction sent to the Gateway Node is “RETRIEVE_BALANCE”. • Retrieve Public Key for a particular user—this method queries any of the Gateway Nodes about the Public Key of a user by supplying the username of the user. The instruction sent to the Gateway Node is “RETRIEVE_PUBLIC_KEY”. • Send Currency—this method sends currency transfer instruction to the Gateway Nodes by supplying the recipient Public Key, currency amount and type. It fetches the Unspent Transaction Outputs for a user by querying the Gateway Node. It sends the currency type and public key of the user to the Gateway Node for this purpose. It then prepares the input for the new transaction and signs it with the kiosk’s private key. The instruction sent to the Gateway Node is “PROCESS_TRANSACTION”.

400

R. Bhattacharya et al.

The Money Deposit User Interface that extends the Kiosk class asks the user her username, currency type and amount to be deposited. The User Interface then calls the Send Currency method defined in the superclass when the user hits the Deposit button. The corresponding transaction is signed by the kiosk’s private key because when the user will supply cash to the kiosk machine, it will get added to the Kiosk balance and then the kiosk will transfer an equal amount to the user’s balance. The Money Withdraw User Interface that extends the Kiosk class asks user her username, currency type and amount to be withdrawn. The User Interface then calls its Send Currency method that overrides the superclass method when the user hits the Withdraw button. The Withdraw operation needs the user to sign the transaction as the amount will be debited from her account, this class must have access to the user’s keys. Since, we are only simulating the kiosk interface. We made the user keys available in this class. Each transaction generated at the kiosk whether during deposit or withdraw is broadcasted to all the Gateway Nodes. User Wallet: This is a multi-currency wallet accessible to the users from their computers. Since we have implemented a prototype, we have kept the user interface simple, coded it in Java Swing, but it can be upgraded using other technologies such as Android for mobile use. At the time of user registration, each user is provided with a username and password, which she can use to access the wallet. For transferring money from any user’s account to a different account or exchanging currencies, the user must log into the wallet and select the currency balance and type from her multi-currency wallet and provide the username of the recipient. As mentioned, user authentication is kept beyond the scope of this paper. The User Wallet has four tabs: Manage Keys, Check Balance, Manage Friends and View History. We have implemented the first two functionalities. • Manage Keys—This tab exposes a method to generate a new public key—private key pair. When the wallet is launched, the user can press the Generate button and get a pair of public and private keys. Once the keys are generated, they are stored in the user’s computer locally. At the time of generation of the keys, an instruction “ADD_TO_DIRECTORY” in a Data Packet, more about Data Packet in a minute, is sent to the Gateway Nodes to add the username—public key map for the user to their directories. • Check Balance—As the wallet corresponds to a multi-currency account, in the balance screen, the user can check the amounts of currencies possessed for different currency types, for instance, amounts of Mudra and GBP owned by the user, where Mudra is the internal cryptocurrency associated with the e-Mudra application. These currency balances are fetched from the Gateway Nodes to which the User Wallet is connected to by sending a Data Packet with instruction “RETRIEVE_BALANCE”, public key of the user and the currency type to the connected Gateway Nodes. There are buttons to Transfer, Exchange, and Refresh, see Fig. 2.

eMudra: A Leftover Foreign Currency Exchange System

401

Fig. 2. Check Balance User Interface—User6 selects Mudra

Transfer—When a currency balance is selected and the Transfer button is clicked, it opens a screen where the user needs to enter the recipient’s username and the amount of currency she wants to transfer to the recipient. Users while transferring money can only specify the recipient’s username and the corresponding public key will be provided by the Gateway nodes. This will enhance user experience as users are not required to deal with complex public keys, instead they only need to have the username. As the user hits the Send button, the application checks which currency is selected from his multi-currency wallet. If the amount to be transferred is less than the currency possessed, the public key of the recipient is fetched from a Gateway Node by sending a Data Packet with instruction “RETRIVE_PUBLIC_KEY” and the recipient’s username. The public keys for all peers are stored against their usernames in the Gateway nodes and can be requested any time by the wallets. Then, funds are sent to the recipient. To transfer funds to the recipient, the sendFunds method with transaction type “TRANSFER” is invoked. It fetches the Unspent Transaction Outputs for the user’s public key for the chosen currency type by sending a Data Packet with instruction “RETRIEVE_WALLET_UTXO” to the Gateway Nodes, prepares the transaction inputs, creates a new transaction, signs it with the user’s private key, broadcasts it to the Gateway Nodes with instruction “PROCESS_TRANSACTION” wrapped in a Data Packet. Exchange—When a currency balance is selected and the Exchange button is clicked, a new screen opens that displays the list of available exchange rates published by other peers in the network for the selected currency. So, if the user selects Mudra from his currency balance, a Data Packet with instruction “FETCH_ADS”, currency type Mudra and the amount owned by the user is sent to the Gateway Nodes and the Gateway Nodes send a list of exchange rate ads published by other peers requesting Mudra in return of other currencies such that the requested amount is less than or equal to the amount of Mudra owned by the user. The user can choose to select an advertisement and hit the Exchange button or she can hit the Post Ad button on the screen. If the user selects any of the exchange rates from other peers and clicks the Exchange button, her currency gets exchanged. The send funds method is invoked with parameters ad publisher’s public key, required currency amount and type as published by the peer, transaction type “EXCHANGE” and the ad identifier. The method fetches the Unspent Transaction Outputs for the user’s public key for the currency to be exchanged by sending a Data Packet with instruction “RETRIEVE_WALLET_UTXO” to the Gateway Nodes, prepares the transaction inputs, creates a new transaction, signs it

402

R. Bhattacharya et al.

with the user’s private key, broadcasts it to the Gateway Nodes with instruction “EXCHANGE_TRANSACTION”, public key of the user and ad identifier wrapped in a Data Packet. If the user clicks the Post Ad button, a new screen opens that asks user to supply the following information: the amount and type of currency needed or the target currency, the amount of selected currency balance available for exchange or the source currency. The amount of selected currency balance for exchange must be less than or equal to the amount possessed by the user. There is a POST button which when clicked generates a new advertisement. An exchange advertisement is created with data—ad identifier, source amount and currency unit, required amount and currency unit, locked transaction, advertisement status and public key of the user. A Data Packet is created that wraps this exchange advertisement and is sent to the Gateway Nodes with instruction “POST_AD”. An exchange operation requires three transactions: a locked transaction carrying funds from the advertiser to the holding account, a transaction from the user to the advertiser and a transaction from the holding account to the user. A holding account is identified amongst the Gateway Nodes’ accounts. The reason why three transactions are required to exchange funds is as follows: when the user selects an advertised exchange rate and decides to exchange, while a transaction carrying target currency from the user to the recipient can be generated and signed by the user’s private key, the second transaction carrying source currency from the recipient to the user cannot be done immediately without the recipient’s private key; after receiving the target currency from the user, the peer may not transfer the source currency to the user, if this transfer is left to the peers instead of operating programmatically. So, the source currency is locked in a transaction signed by the peer’s private key at the time of posting the ad and is transferred to the user when the user performs the exchange operation by selecting the corresponding exchange rate. Refresh—This button when clicked refreshes the currency balances in the Check Balance screen after performing any operation such as Transfer or Exchange. Gateway Node—The Gateway Nodes form the backbone of the Transaction Management Network. As e-Mudra is based on a permissioned consortium blockchain, each Gateway node along with its assisting miners will correspond a stakeholder organization for a specific geographic region. The identities of the Gateway nodes and its assisting miners are known to the organizations governing the application. The Gateway Nodes from different organizations or different branches of the same organization compete with each other to find blocks and receive rewards. Gateway Node or miners do not enjoy complete control over the application. Every Gateway Node has a public key with which it can receive funds and the corresponding private key with which it can sign transactions. Every Gateway Node stores blockchain, advertisement list, list of Unspent Transaction Outputs and username public key map locally and can be requested by other Gateway Nodes to provide any of these. When a Gateway Node is launched for the first time, it requests other Gateway Nodes for a copy of the blockchain, list of advertisements, list of Unspent Transaction Outputs and username public key map. If it is unable to fetch a copy, say it is the first node that has been launched, it initiates a new blockchain with a genesis block and a

eMudra: A Leftover Foreign Currency Exchange System

403

genesis transaction. It also initiates the list of advertisements, list of Unspent Transaction Outputs and username public key map. The Gateway Node operates based on a list of instructions, communication between different Gateway Nodes, Miners, Kiosks and User Wallets are carried out using Data Packet objects that carry instruction and data necessary to carry out the instruction at the Gateway Node. Here, we include a list of instructions and their associated job. • ADD_TO_DIRECTORY—When any Gateway Node receives a Data Packet with this instruction, it retrieves the username and public key of the user from the Data Packet and stores it in user name and public key map locally such that if it receives a request for the public key for any username, it can supply it from this map. • RERTRIEVE_BALANCE—When any Gateway Node receives a Data Packet with this instruction, it retrieves the public key and currency unit from the packet and calculates the corresponding balance for that user for that currency unit and sends it to the requestor. • RETRIEVE_PUBLIC_KEY—When any Gateway Node receives a Data Packet with this instruction, it retrieves the user name from the packet and fetches the corresponding public key from the user name and public key map stored locally. It then sends the public key back to the requestor. • RETRIEVE_WALLET_UTXO—When any Gateway Node receives a Data Packet with this instruction, it retrieves the user’s public key and currency unit from the packet and computes the corresponding Unspent Transaction Outputs for that user for the given currency type and sends it back to the requestor. • PROCESS_TRANSACTION—When any Gateway Node receives a Data Packet with this instruction, it retrieves the transaction from the packet and adds it to the transaction pool. It also broadcasts the transaction to all its miners. • VERIFY_BLOCK—When any Gateway Node receives a Data Packet with this instruction, it retrieves the block from the packet and validates it. If the block is valid, the Gateway Node adds it to its local blockchain. If the Gateway Node received the block from its miner, it broadcasts it to other miners and gateway nodes. If the Gateway Node received the block from another Gateway Node, it broadcasts it to its miners. • REQUEST_FOR_BLOCKCHAIN—When any Gateway Node receives a Data Packet with this instruction, it sends a copy of its local blockchain to the requestor i.e. a Miner or a Gateway Node. • REQUEST_FOR_ADVERTISEMENT—When any Gateway Node receives a Data Packet with this instruction, it sends a copy of the list of exchange rate advertisements to the requestor, i.e. another Gateway Node. • REQUEST_FOR_UTXO—When any Gateway Node receives a Data Packet with this instruction, it sends a copy of the list of Unspent Transaction Outputs to the requestor. • REQUEST_FOR_USERNAME_PUBLICKEY_MAP—When any Gateway Node receives a Data Packet with this instruction, it sends the user name and public key map to the requestor. • FETCH_ADS—When any Gateway Node receives a Data Packet with this instruction, it retrieves the available currency amount and unit from the packet. It then prepares a list of ads whose required currency amounts are less than or equal to the retrieved

404

R. Bhattacharya et al.

currency amount, required currency type same as the retrieved currency unit and advertisement status ACTIVE and sends it back to the requestor, say a user wallet. • POST_AD—When any Gateway Node receives a Data Packet with this instruction, it retrieves the advertisement from the packet and adds it to the list of advertisements maintained by the Gateway Node. • EXCHANGE_TRANSACTION—When any Gateway Node receives a Data Packet with this instruction, it retrieves the transaction from the packet and forwards it to the miners with instruction “PROCESS_TRANSACTION”. It also retrieves the advertisement from the list of ads stored locally whose ad identifier is supplied in the Data Packet. It then adds the locked transaction from the advertisement to the transaction pool and also broadcasts it to the miners with instruction “PROCESS_TRANSACTION”. The locked transaction is not realized till any exchanger exchanges with the advertiser. It then creates the third transaction for exchange that carries funds from the holding account to the exchanger and adds it to the transaction pool. It marks the advertisement status for that ad as EXPIRED. It also broadcasts the third transaction to the miners with instruction “PROCESS_TRANSACTION”. For deposit, withdraw, exchange and transfer operations, once the transactions are broadcasted to the Gateway nodes, they are gathered in a transaction pool. The Gateway nodes collect the transactions from the pool when the number of transactions in the pool exceeds a threshold value and perform mining i.e. the Gateway nodes compete with each other to solve the cryptographic puzzle for adding a new block and upon solving it they broadcast the new block to the other Gateway nodes. Since, the Gateway Nodes perform mining only when the number of transactions exceeds the threshold value, there is never an empty block without any transaction. Each Gateway node is assisted by a group of miners in the process of mining. When a Gateway node receives a transaction, it forwards it to its miners and if any of the miners finds a new block, it broadcasts it to the corresponding Gateway node which in turn broadcasts it to other Gateway Nodes as well as other miners assisting the same Gateway node. If the Gateway node finds a block, it forwards it to its assisting miners as well as other Gateway nodes in the network. When a Gateway Node receives a new block from another Gateway Node, the former forwards it to its assisting miners. When a Gateway Node or a Miner finds a new block, it is rewarded with 100 Mudras. Miner—Every Gateway Node is connected to a set of Miners or Mining Nodes. The miners assist the Gateway Nodes in mining a new block. When a Gateway Node receives a transaction from a kiosk or a user wallet, it broadcasts it to all the miners it is connected to. The Gateway Nodes send a Data Packet containing the transaction with instruction “PROCESS_TRANSACTION”. The miners add this transaction to its local transaction pool. When the number of transactions received by any miner surpasses a threshold value, the miner adds a coinbase transaction carrying rewards as 100 Mudras to the miner, finds a new block and sends it to its associated Gateway Node. If the Gateway Node finds a new block before any of its miners, it broadcasts its newly mined block to the other Gateway Nodes as well as its miners in a Data Packet with instruction “VERIFY_BLOCK”. The other Gateway Nodes and the miners then check if the block is a valid one and if it is a valid block, add it to their local blockchain. On the other hand, if the Gateway Node is unable to mine a new block when it receives a new block from

eMudra: A Leftover Foreign Currency Exchange System

405

any of its miners, it validates and adds this block sent by the miner to its blockchain and also broadcasts the new block to the other Gateway Nodes and miners it is connected to. When a miner is launched for the first time, it has a public key—private key pair generated. The miner queries the associated Gateway Node for a copy of the blockchain and Unspent Transaction Outputs by sending Data Packets with instructions “REQUEST_FOR_BLOCKCHAIN” and “REQUEST_FOR_UTXO” respectively and saves the copies locally. The main role of the miners is to find a block faster than the connected Gateway Node and supply the newly mined block to that Gateway Node as the Gateway Node has functionalities in addition to block mining such as maintaining the exchange rates advertisement repository, answering queries such as available balance, public key for a username, unspent transaction outputs for a user, available exchange rates, maintaining a directory of username—public key map, supplying blockchain, unspent transaction outputs to its Miners as well as other Gateway Nodes, supplying username public key map and advertisement list to other Gateway Nodes. The consensus protocol followed in this prototype is Proof of Work. Advertisements—For exchanging currency with another user, a user needs to publish an advertisement or select from an existing list of advertisements. An advertisement is broadcasted by the User wallet to the Gateway nodes with status Active. Each exchange operation has three associated transactions: one carrying funds from the ad publisher to a holding account, signed by the ad publisher—the locked transaction. The second transaction happens when any other user accepts the ad and exchanges money, the locked transaction is executed at this time, the second transaction carries money from the exchanger to the advertiser, signed by the exchanger. Another transaction carries funds from the holding account to the exchanger signed by a Gateway node. When two users complete exchanging money, the corresponding ad is marked as status Expired. The advertisements are stored separately by Gateway Nodes that maintains the list of exchange rate advertisements published by peers in the network who are looking for other peers to exchange currencies with them at their preferred exchange rates. Every advertisement is stored as an ExchangeAd object that has the following properties: • Ad Identifier—this is the string that is required to uniquely identify the advertisement. • Source Currency Amount—this is the currency amount the peer or the advertiser wants to exchange. • Source Currency Unit—this is the currency type the peer or the advertiser wants to exchange. • Required Currency Amount—this is the currency amount the advertiser wants in return of her source currency. • Required Currency Unit—this is the currency type the advertiser wants in return of her source currency. • Advertisement Status—this can be ACTIVE or EXPIRED. • Ad Publisher—this is the public key of the advertiser. • Transaction—Exchange Ad also needs to store the locked transaction carrying funds from the advertiser to the holding account of a Gateway Node. This transaction is signed by the advertiser, it remains locked as long as her ad remains Active and no one exchanges currency selecting her ad.

406

R. Bhattacharya et al.

When a peer advertises an exchange rate, it gets added to the advertisement list as an ExchangeAd object with status ACTIVE. When a user accepts that exchange rate and required amount published in the ad by the peer, currencies get exchanged and the status of the ExchangeAd object is updated to EXPIRED. As long as the status of the ExchangeAd remains ACTIVE, it is displayed in the available exchange rate ads for the given currency and amount in the peers’ wallets. When the status changes to EXPIRED, it is no longer displayed in the ad list of the peers looking for the currency corresponding to that advertisement. Internal Currency—Permissioned Decentralized Applications usually do not have internal currency. But eMudra has cryptocurrency “Mudra” generated through mining that forms the rewards to the Gateway Nodes who win mining of blocks. It encourages competition between Gateway Nodes of different participating organizations. Mudra can be exchanged with other fiat currencies in the multicurrency wallets of travellers and the application can be extended to allow exchange between Mudra and other cryptocurrencies transferred to the multicurrency wallets from external applications.

4 Conclusion In this paper, we reviewed recent cash LFC exchange systems highlighting certain limitations in terms of ubiquity, transfer fees, money deposit processes, list of currencies accepted and most importantly exchange rates. We then presented a new cash based LFC exchange model and proposed architecture utilizing Blockchain technology that could solve or alleviate these challenges by allowing a seamless P2P currency exchange at user decided exchange rates for cash based LFC. Any system that needs a consortium of stakeholder organizations to participate and compete block mining with the identities of nodes and users known to the stakeholder organizations can use this architecture. However, the success of such a P2P system depends on the number of participants, the more the number of participants, more competitive will be the exchange rates. The smart kiosks with cash money at the entry and exit ports need to be managed by some organization, we envisage this LFC architecture operating as a permissioned blockchain network, effectively managed by a consortium of stakeholders. The shopping with LFC can be implemented in future. A mobile wallet can be built up using technologies such as Android so that the application can be more readily used during transit by the users. The main benefits of the P2P currency exchange system described here lie in the provision to exchange cash based LFC profitably with money submission and withdrawal processes made convenient to the user ubiquitously. Cash transactions being governed by identity verification, the amount of money exchanged can be considerable without the risk of money laundering.

References 1. Cable, S.: Britons hoarding £3 BILLION in unused foreign currency at home, with just 13% bothering to exchange money after a holiday, 30 Sept 2014. http://www.dailymail.co.uk/ travel/travel_news/article-2774741/Britons-hoarding-3BILLION-unused-foreign-currencyhome-just-13-bothering-exchange-money-holiday.html. Accessed 23 Nov 2015

eMudra: A Leftover Foreign Currency Exchange System

407

2. Tunney, D.: International trips on the rise, but air travel dropping. Travel Weekly (17 Aug 2015). http://www.travelweekly.com/ConsumerSurvey2015/International-trips-on-therise-but-air-travel-dropping. Accessed 23 Nov 2015 3. Business Models of the Internet. https://www.9thco.com/insight/6-business-models-of-theinternet. Accessed 26 Jan 2016 4. How to send us coins. cash4coins. http://www.cash4coins.co.uk/how-send-us-coins/. Accessed 30 Oct 2015 5. Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982) 6. Sending valuables or money overseas. Royal Mail. http://www.royalmail.com/personal/helpand-support/I-need-advice-about-sending-money-and-jewellery-overseas. Accessed 26 Nov 2015 7. Introducing Nymi Enterprise Edition. nymi. https://www.nymi.com. Accessed 11 Apr 2019 8. What is palm vein authentication? Fujitsu Frontech. https://www.fujitsu.com/jp/group/fro ntech/en/solutions/business-technology/security/palmsecure/what. Accessed 1 Jun 2019 9. Get better exchange rates when transferring money internationally!. CurrencyFair. https:// www.currencyfair.com. Accessed 13 Nov 2016 10. Bye bye bank fees, hello world. Transferwise. https://transferwise.com. Accessed 11 Apr 2019 11. Kantox Tomorrow’s FX today. Kantox. http://kantox.com/en/how-it-works-kantox. Accessed 1 Nov 2015 12. MoneySwap. Moneyswap. https://www.moneyswap.com. Accessed 1 Nov 2015 13. Say hello to more to spend abroad. WeSwap Social Currency. https://www.weswap.com. Accessed 1 Nov 2015 14. Sending money abroad? Midpoint. https://www.midpoint.com/. Accessed 11 Apr 2019 15. Nakamoto, S.B.: A Peer-to-Peer Electronic Cash System. https://bitcoin.org/bitcoin.pdf. Accessed 1 Nov 2015 16. How Digital Money Works. Stellar. https://www.stellar.org/learn/. Accessed 13 Jan 2016 17. About A.I. Coin. AI COIN Artificial intelligence. http://www.ai-coin.org/. Accessed 1 Nov 2015 18. Bhattacharya, R., White, M., Beloff, N.: A Blockchain based Peer-to-Peer Framework for Exchanging Leftover Foreign Currency, pp. 1431–1435. IEEE, Computing Conference London (2017) 19. Amey, K.: British holidaymakers have £663MILLION in leftover foreign currency after holidays abroad... with few converting it back. Mail Online, 25 Aug 2015. http://www.dai lymail.co.uk/travel/travel_news/article-3210077/British-holidaymakers-663MILLION-lef tover-foreign-currency-following-overseas-holidays-bothering-convert-back.html. Accessed 1 Nov 2015 20. Fourex world money exchange. Fourex World Money Exchange. http://www.Fourex.co.uk/. Accessed 1 Nov 2015 21. Campbell, D.: The Byzantine Generals’ Problem. http://www.dugcampbell.com/byzantinegenerals-problem/. Accessed 17 Jan 2016 22. Prusty, N.: Building Blockchain Projects. Packt, Birmingham (2017) 23. Hutchings, S.: Making a meal out of leftovers: the untapped potential of foreign currency collections, 2 Jan 2018. https://www.internationalairportreview.com/article/63610/global-coinsolutions-foreign-currency/. Accessed 30 Nov 2020 24. Alabi, S., White, M., Beloff, N.: Contactless Palm Vein Authentication Security Technique for Better Adoption of E-Commerce in Developing Countries. Science and Information Conference London (2020) 25. Dannen, C.: Introducing Ethereum and Solidity. Apress, Brooklyn (2017)

408

R. Bhattacharya et al.

26. KlickEx Payment without Borders. https://www.klickex.com/default.aspx. Accessed 9 Nov 2015 27. Bhattacharya, R., White, M., Beloff, N.: An exploration of blockchain in social networking applications. In: SAI Computing Conference, 15–16 July 2021. London, United Kingdom (2021) 28. LeftoverCurrency Exchange your old currency for cash. leftovercurrency.com. http://www. leftovercurrency.com/. Accessed 30 Oct 2015 29. Huckle, S., White, M., Bhattacharya, R.: Towards a post-cash society: an application to convert fiat money into a cryptocurrency. First Monday (2017) 30. Daily Mail Reporter: Journey’s end for travellers’ cheques with just one in 12 holiday makers using them. Mail Online, 26 May 2013. http://www.dailymail.co.uk/news/article-2331343/ Journeys-end-travellers-cheques-just-12-holiday-makers-using-them.html. Accessed 23 Nov 2015

A Multi-layered Ontological Approach to Blockchain Technology Rituparna Bhattacharya, Martin White(B) , and Natalia Beloff University of Sussex, Falmer, Brighton, UK {rb308,m.white,n.beloff}@sussex.ac.uk

Abstract. In 2009, blockchain led the foundation of an era of cryptocurrencies with Bitcoin playing the role of avant-garde in the series. This phenomenal technology behind Bitcoin has been visualized by many as having far reaching impacts not only in finance and economy, but also in other sectors. This decade has witnessed the emergence of different types of blockchain, associated platforms and a plethora of applications without any standardization of the associated concepts. To facilitate a common interpretation of this technology among its users, we present a multilayered blockchain ontology that can be leveraged for constructing a structured knowledge management and representation system in this domain. We propose a new Ontology Development Life Cycle and a methodology for the design of such complex ontologies of emerging technologies like blockchain. We devised a prototype tool leveraging the proposed methodology that will enable implementing a multi-layered modularized ontology for any domain and design applications for that domain. Keywords: Blockchain · Ontology · Semantic Blockchain · Taxonomy

1 Introduction The disruptive technology behind Bitcoin gaining much attention during the past few years is Blockchain. Implemented as a distributed ledger system coordinated over trustless peer to peer networks, blockchain makes the role of financial intermediaries redundant in decentralized digital transactions between peers. Digital exchange of cryptographic keys establishes the ownership status over assets or currencies and consensus rules enforced through cryptographic protocols, such as Proof of Work, negates fake transactions, such as double spending [14]. There has been a proliferation of applications built on the principles of blockchain, however, they widely vary in terms of target users, functionalities and/or architecture. Those applications are largely not interoperable. In essence, they lack any standardization or certainty with respect to the concepts associated with their foundation. As Tasca and Tessone [12], Tasca et al. [13] has pointed out, deficit of standards leads to risks linked to privacy, security, governance, interoperability and poses threats to users and market participants. Moreover, with blockchain still being an emerging technology, the chances are that each new application or platform will lead to the formulation of new concepts in this domain. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 409–428, 2022. https://doi.org/10.1007/978-3-031-10467-1_26

410

R. Bhattacharya et al.

Having its roots in philosophy, the term ontology means a formal specification of concepts, their properties and relationships within a domain. It provides representation of common semantics within that domain. Ontologies have been used in various fields of Computer Science such as “Artificial Intelligence, the Semantic Web, Systems Engineering, Software Engineering, Biomedical Informatics, Library Science, Enterprise Bookmarking, and Information Architecture” [2]. Since its inception, ontology has gradually emerged as a medium of creating sharable, reusable, extendable and connected knowledge bases. The terms defined and represented by ontologies enable knowledge dissemination and understanding in any domain among its respective users [1] and assist in data communication and exchange among the applications in such domains. Thus, a proper ontological approach may lead to the resolution of the problem mentioned by Tasca and Tessone [12], Tasca et al. [13] in the context of blockchain. In this paper, we propose a comprehensive multi-layered ontology definition framework to represent the concepts and their relationships associated with blockchain technology along with a methodology for designing such complex ontologies. This paper is organized as follows: Sect. 2 includes related research work in this area, Sect. 3 presents a new Ontology Development Life Cycle (ODLC) and the proposed methodology along with the design of a multi-layered blockchain ontology using the methodology. Finally, we conclude in Sect. 4.

2 Related Work Use of ontology provides various benefits such as a common understanding of information structure among people or software agents, reuse of data and domain knowledge, support for explicit domain assumptions, differentiation of domain knowledge from the operational knowledge and domain knowledge analysis [3]. Bhat et al. argue that ontologies being machine interpretable and having the capacity to facilitate reasoning, can offer semantic interoperability between resources, services, databases, and tools [5]. The tradition of deploying ontology in Computer Science is, of course, not in its infancy. Happel and Seedorf have investigated the employment of ontology in the software development life cycle [4]. The application of ontology to the Web led to the notion of the Semantic Web that supports knowledge organization in conceptual spaces based on the meaning, thus helping in the production of enhanced knowledge management systems and human-machine cooperative work [7]. Ontology in the context of Semantic Web consists of “knowledge terms, including the vocabulary, the semantic interconnections, simple rules of inference and logic for some particular topic” and thus involves much stronger semantics than taxonomy. We find research endeavours for forming blockchain taxonomies in the work of Tasca and Tessone [12], Tasca et al. [13]. As Tasca et al. state, it involves “the identification, description, nomenclature, and hierarchical classification of blockchain components”. We envisage the next step as the construction of a comprehensive and detailed ontology with richer semantics including types, properties, relationships, constraints and other relevant aspects of the concepts associated with blockchain.

A Multi-layered Ontological Approach to Blockchain Technology

411

2.1 Semantic Blockchain Utilization of a distributed database across multiple organizations requires a shared understanding of data across these organizations. Studies related to the employment of ontology in the context of blockchain are limited, but not rare. Further investigation in this area introduced us to the notion of Semantic Blockchain, i.e. a blockchain with a semantic layer enhancing the protocol. Research by Kim et al. describes an endeavour to bring together blockchain community’s concentration on protocol-level blockchain interoperability and the ontology community’s concentration on semantic-level interoperability [9]. It cites the importance of ontology in the background of systems lacking data intermediaries. Kim et al. further depict “an ontology of blockchain as a natural language characterization of blockchain constructs”. These researchers attempted to encode ontologies on the blockchain for domains very core to blockchain functionality such as organizational governance. Kim and Laskowski also proposed the notion of using ontologies for developing blockchain applications, particularly for supply chain provenance [16]. They highlighted that a modelling approach utilizing formal ontologies can assist “in the formal specifications for automated inference and verification in the operation of a blockchain”. Such a modelling approach using formal ontologies can help in creating smart contracts executing on the blockchain. A rigorous approach towards integrating ontology and blockchain can be seen in the Blockchain ONtology with Dynamic Extensibility (BLONDiE) project where an OWL ontology interfacing Bitcoin and Ethereum networks has been developed [17]. It should be able to inform details such as the miner of each block, the height of each block, count of transactions included in a block, confirmation status of a transaction or the total number of coins transferred on a block. The aim of this research was to create “a schema for a query’able knowledge base that stores information from the Bitcoin and Ethereum Blockchains native structure and other related information” and associated “business intelligence in the knowledge repository that runs on powerful semantics to answer queries from the users about the Bitcoin and Ethereum distributed ledgers” [17]. Current Blockchain browsers are mostly based on relational or key-value databases instead of graph databases that are fully compatible with a machine-readable format similar to RDF (Resource Description Framework). They only enable browsing the content of existing Blockchain frameworks like Bitcoin and Ethereum. The Semantic Web of Things faces the problem of non-viable trust management strategies. Ruta et al. [10] explore as a solution a semantic resource/service discovery layer based on a basic blockchain infrastructure having a consensus validation utilizing a Service Oriented Architecture. It uses a semantic blockchain implemented with smart contracts enabling distributed execution and trust for registration, discovery, selection and payment. The Huuzlee platform reveals a state machine-based approach with full Declarative Semantics towards resolving issues with Semantic blockchain having Procedural Semantics [11]. In short, the previous works we have studied addressed the following concepts: • Ontological representations of different domains can be encoded into the blockchain via smart contracts as depicted in [9, 16]. • Ontology can be used to represent fully, structural data from the blockchain [17].

412

R. Bhattacharya et al.

But, encoding ontological representation of any domain into blockchain via smart contracts is not the requirement here. We need to build an ontology to represent different aspects of a blockchain based system comprehensively, not just fetch structural data from the blockchain and such a blockchain ontology should be capable of being used to form a knowledge management system in the blockchain domain and assist in designing different blockchain based applications. 2.2 Layered Blockchain Ontology We consider the use of multi-layered ontologies in both non-blockchain and blockchain domains. An example of the former would be the generic multi-layered ontologies proposed for urban applications by Benslimane et al. [1]. For the latter, i.e. blockchain domains, we consider two previous research works. Study by Kruijff and Weigand: They made an attempt to provide a basic layered ontological approach to blockchain at three levels, Essential, Infological and Datalogical layers following the distinction axiom of Enterprise Ontology [6]. • Datalogical layer: the technological basis including concepts related to blocks, miners, mainchains, sidechains, etc.; the level of data structures and data manipulation is also included. • Infological layer: the description of blockchain as a ‘distributed ledger’ is included as an infological characterization providing abstraction from the elements of the datalogical level. Any transaction is considered as a transfer of some value object. They state that “a ledger consists of accounts (e.g. debit account), and this concept is indeed generic across the majority of blockchain providers that are part of this analysis.” • Essential or Business layer: the entities formed directly or indirectly by communication are included. Commitments are usually established or evaluated by communicative acts where a commitment depicts the activity an actor must perform in any future situation. When two parties concur with respect to a commitment, there is a change in the social reality. With the institutional context in place, a transaction in an infological blockchain carrying some value from one account to another can be considered as a change in this social reality, for instance, transfer of ownership. Changes such as this can be delineated as the essential blockchain transaction. However, this approach does not include extensive validation of the ontology; it needs subsequent validation with applications and mappings to the existing blockchain implementations. Study by Seebacher and Maleshkova: This research aims to capture the properties of existing blockchain based business networks through a union of an ontology that would formalize the concepts and properties of the blockchain network and a layer model that would help describing such network following a model-driven approach [8]. The blockchain layer model is composed of three interconnected layers, the highest layer of abstraction being the business model and the lowest layer of abstraction being the technical implementation, with each layer influencing the next starting from the technical implementation.

A Multi-layered Ontological Approach to Blockchain Technology

413

• Business model: It describes the business model of a respective business network and facilitates the understanding of different business entities, processes, components and rules in blockchain business networks, including their relationships. • Network composition: This relying on the Blockchain Business Network Ontology has three main parts describing general characteristics of the network, the participants and their roles and the communication patterns of a business network. • Technical implementation: It describes the code-based structure and content of a blockchain business network. So far, we have not found any comprehensive ontology for blockchain that if implemented and instantiated, would answer any query related to the applicability, working principles, architecture, design, development, operations, administration of any blockchain driven system, provide an overall standardized view of such system or lead to an effective distributed knowledge management system in this domain. However, a layering approach can be leveraged in achieving a clear differentiation of different areas of a domain and constructing a modularized blockchain ontology as otherwise the ontology will become immensely complex.

3 Our Approach Blockchain is a highly complicated subject that has a lot of information and few related ontologies already available. However, it is still an emerging technology that is rapidly evolving and growing with a large number of applications and platforms under construction or enhancement. So, the corresponding ODLC should be able to support a complex, extendable, adaptable blockchain ontology. Hence, an incremental agile methodology is required that should also enable reusing existing and forthcoming ontologies in the domain. There are many methodologies available for building ontologies such as Uschold and King, Grüninger and Fox, Bernaras, METHONTOLOGY and SENSUS [18, 19]. We also find two interesting methodologies in the work of John et al. [20], and Saripalle et al. [21]. The former group followed a hybrid approach towards ontology design by deriving Incremental and Iterative Agile Methodology underpinned by software engineering process models. It includes feasibility study, ontology development that follows an iterative approach, and ontology implementation (evaluation). It lacks a concrete requirements analysis phase, clear differentiation between design, development and implementation stages are not present, it also does not specify any maintenance phase. The latter group adopted a software engineering-oriented process for the design and development of ontology. Their Hybrid Ontology Design and Development Life Cycle model has nine phases, Problem Analysis Phase, Ontology Integration, Knowledge Acquisition, Specification, Design Phase, Analysis, Implementation, Testing, Maintenance and Documentation. However, ontology integration phase should follow knowledge acquisition as first knowledge sources should be identified. Also, ontology integration, knowledge acquisition, specification steps are considered as separate phases and not a part of design phase, however, they are steps carried out during design phase. Validation during design phase is required and should be considered differently from the

414

R. Bhattacharya et al.

evaluation phase after implementation. Documentation is also required during design phase and a separate extension and adaptation phase should be added. Hence, these two methodologies cannot be used in their current form for creating blockchain ontology. 3.1 ODLC and Methodology The construction of blockchain ontology cannot fit into any one specific methodology. Instead, based on the combined study of these methodologies as well as the Software Process Models, we propose an ODLC, as illustrated in Fig. 1, and the corresponding methodology for ontology design of complex domains undergoing rapid evolution in an organized and modularized fashion, such as that depicted by the current rapid evolution of Blockchain.

Fig. 1. Proposed ontology development life cycle

Our blockchain ODLC and methodology follows five key phases. Phase 1: Requirements Analysis—During this phase, the purpose of building a new ontology or adapting an existing ontology and its uses are determined. The project scope and the limitations are identified. The requirements are gathered, analyzed and requirement specification document is created. Phase 2: Design—During this phase, activities starting from determining knowledge sources, gathering and capturing knowledge, specifying and formalizing the ontology, verifying and validating it, documenting it and supporting its future adaptation are carried out. The activities can be divided into the following sub-phases. Sub-phase 1: Knowledge Acquisition—This involves identifying domain experts and knowledge sources such as research publications, white papers, articles in blogs,

A Multi-layered Ontological Approach to Blockchain Technology

415

magazines and other websites, books, tutorials and other resources related to the domain. The ontology is determined as simple or complex. Sub-phase 2: Ontology Integration—Existing ontologies that may be reused in part or whole are determined. Sub-phase 3: Ontology Capture and Formalization—The following work flow can be adopted in this phase. (1) If a new ontology is created, a. b. c. d.

Identify the concepts and properties Identify the attributes and relationships Identify the constraints and rules Remove redundancies and ambiguities and generate precise text definitions for these concepts, attributes and relationships e. Identify terms to refer to them If an existing ontology is extended or adapted,

a. Identify the concepts and properties b. Identify the attributes and relationships among the new concepts and also between the new concepts and those in the existing ontology c. Identify the constraints and rules d. Remove redundancies and ambiguities and generate precise text definitions for these concepts, attributes and relationships e. Identify terms to refer to them We use a middle-out approach and start with the primary or most important concepts and then consider the secondary or more intricate concepts in both the cases. (2) If the domain is complex a. Identify a reasonable number of layers from the highest level of abstraction to the lowest level based on the different facets of knowledge acquired and identify modules, and groups for the layers, if any, when an ontology is newly created b. Place the concepts with their attributes and relationships in correct module and layer i. If there is no relevant module, form a new module c. If there are too many concepts in any module, dissemble them into a set of modules, each focused around a specific sub-area The level of dependency between concepts should be highest within a module, lowest between the modules in different layers and intermediate between modules in the same layer. If the domain is simple without layering,

416

R. Bhattacharya et al.

a. Identify the parent module, if required b. If there is no relevant module and modularization is required, form a new module c. Place the concepts with their attributes and relationships in the ontology directly or in the correct module, if any d. If there are too many concepts in any module or the ontology, dissemble them into a set of modules, each focused around a specific sub-area The level of dependency between concepts should be highest within a module and lowest between different modules. The conceptual model of the ontology is formalized using any representation technique such as a UML class diagram. Sub-phase 4: Validation—At this stage, the designers, the domain experts and the end users validate the ontology against a frame of reference, which may be a list of competency questions or a real-world platform or application, and decide if it conforms to established design guidelines. If the design pertains to extending or updating an ontology, then regression testing is to be done to see if the new concepts contradict with or break the existing ontology. Sub-phase 5: Documentation—A detailed report including requirements analysis, list of ontologies reused, ontology specification including ontology concepts, its axioms, and usage, the validation results with recommendations, if any and pertinent knowledge source references should be prepared and maintained with proper versioning corresponding to future extensions and adaptations of the ontology. Sub-phase 6: Extension and Adaptation—This is where any changes to the conceptual model in the future are taken up leading to the execution of all phases in the ODLC from Sub-phase 1 to Phase 5. Increment or adaptation of the ontology can happen at regular intervals for an emerging domain that is likely to witness new concepts frequently or each time a new system in the domain is designed necessitating the new concepts to be accommodated in the ontology. This sub-phase starts from the moment the other sub-phases of the Design phase are completed. For extending or adapting the ontology, the extension and adaptation of the design of the ontology should be carried out first and it involves updating the design. As this is an iterative ODLC, this sub-phase supports iterative execution of design process and then other phases, Implementation, Evaluation and Maintenance are carried out in the iteration. Any change in Design will automatically require Implementation, Evaluation and Maintenance phases to be followed. Phase 3: Implementation—In this phase, the concrete implementation of the ontology is accomplished. Separation of implementation choices from the database conceptual model has been considered important. Such separation of implementation options from the conceptual model is also required in the ontological context [6]. Various factors such as usability, performance, interoperability, and availability are taken into consideration and decision is made for ontology language and framework to be used for implementation. Phase 4: Evaluation—In this phase, ontologies are tested in terms of various aspects such as their software environment, usability, performance, interoperability, reuse and availability.

A Multi-layered Ontological Approach to Blockchain Technology

417

Phase 5: Maintenance—During this phase, the ontology so implemented along with associated documentation has to be maintained for smooth and efficient performance of the system. The methodology is domain, platform and application independent and can be applied to both simple and complex ontologies. Domain experts can be included from Requirements Analysis through to the Evaluation phase. 3.2 Blockchain Application Ontology The continuous evolution of blockchain is likely to generate more knowledge in this complex domain necessitating subsequent modifications of any relevant ontology and hence designing blockchain ontology appears to be a good candidate for the employment of our proposed methodology. In this paper, we limit our scope to only blockchain ontology design. We follow the ODLC and methodology described in Sect. 3.1 for constructing the ontology of blockchain. We describe each phase of the proposed ODLC in the context of building blockchain application ontology next. Requirements Analysis—Since the inception of blockchain, a number of blockchain platforms and multitudinous applications have surfaced, but they lack standardization of related concepts necessary for a shared interpretation of this technology that may be leveraged for creating a structured knowledge management and representation system in this domain. Hence, there is a need for blockchain application ontology. Design: Knowledge Acquisition—Over almost a decade, various studies have been conducted on blockchain technology by researchers from both academia and industry. A plethora of information is available related to blockchain. A treasury of useful knowledge can be accumulated from this already available information, platforms and applications associated with blockchain, if it is properly garnered and mined. We have fetched the concepts related to blockchain technology from various resources such as online research publications, technical reports, white papers and articles in varied contexts such as “research context, experimental context, application context” and then integrated into the blockchain ontology. Such consideration of different contexts while gleaning data for ontologies is discussed in the writings of Bhat et al. [5]. Design: Ontology Integration—We utilized the blockchain taxonomy proposed by Tasca and Tessone [12], Tasca et al. [13] as one of our references for concept identification. It demonstrates the blockchain taxonomy organized as a hierarchical componentbased matrix, however, the relationship(s) between the sub or sub-sub component(s) under one main component with that (those) under another main component is (are) not portrayed. Design: Ontology Capture and Formalization—We analyzed a number of blockchain applications to identify their commonalities, exceptions and novelties in terms of concepts tethered to blockchain. The currently existing blockchain applications are diverse in their functionalities and heterogeneous in their architecture. One of the challenges of constructing any comprehensive blockchain ontology is that unless we follow a modularized approach, the ontology will become immensely complex. We organized our blockchain ontology into five broad layers based on level of abstraction, see Fig. 2, with the Application layer having the highest level of abstraction and the Development Layer corresponding to the lowest level of abstraction. The intermediate ones are the three core layers.

418

R. Bhattacharya et al.

Fig. 2. Blockchain Ontology

Development Layer—This is the lowest layer of abstraction. It depicts the conceptual model related to the development and coding of blockchain backed systems. The concepts related to the programming of blockchain systems are organized in a single module named Codebase. • Codebase Module: The codebase module tries to standardize different aspects of development of blockchain based systems. A blockchain platform that supports development of blockchain applications may allow a single programming language such as Bitcoin in its early years or multiple languages such as Ethereum or Stellar. The transparency of the codebase varies between different blockchain platforms, for instance, Bitcoin is open source while a private blockchain is likely to be closed source. The design of the codebase may follow a monolithic approach or a polylithic approach; while a blockchain based on monolithic design is constructed as a single-tier application lacking modularity, a blockchain utilizing polylithic design is modular and extensible. Smart contracts are associated with a Turing Complete scripting language. Scripting languages can be Turing Complete, Generic Non-Turing Complete, Application-specific Non-Turing Complete and Non-Turing Complete + External Data [12, 13]. A blockchain application may be deployed on a TestNet, MainNet or PrivateNet. This module, therefore, answers queries such as: Is the codebase design monolithic or polylithic? In which environment is the codebase deployed? Is the script language Turing Complete? Is the codebase open source or closed source? Which programming languages are supported by the codebase?

A Multi-layered Ontological Approach to Blockchain Technology

419

Core Layers—The core layers represent the concepts underpinning the working principles, architecture and administration of blockchain driven systems. Structural Layer—This depicts the conceptual model of the fundamental components that work in harmony to make the system functional. It has two modules, the block and the network. • Block Module: The block structure has two distinct components: the block header and the block content. A block has information related to the count of transactions, block height, block reward, miner’s details, block size, version, and block hash. The first block mined is called the genesis block. The block header informs the hash of the previous block that links blocks, time stamp of the block creation, nonce—a 32-bit random number used in computing the hash for the block, difficulty target denoting how difficult it is to successfully mine the block and the merkle root hash. The merkle tree related to the block header can be Binary Merkle Tree or Patricia Merkle Tree. The block content carries transactions which includes both coinbase transaction and non-coinbase transactions. A coinbase transaction is usually the first transaction in the block and it carries reward to the miner. A non-coinbase transaction has inputs that portray which coins are being spent, outputs portraying the receiver of the coins and technical data specifying transaction lock time, count of inputs, count of outputs, size and transaction fee. A transaction may have one of the two possible statuses of confirmation: Confirmed and Unconfirmed. This module therefore, presents information related to block structure and may answer queries related to the blocks such as: Who mined the block? How much is the block reward? How many transactions the block has? Is the status of a particular transaction confirmed? What is the transaction fee for a particular transaction? What variant of Merkle Tree is used? And so on. This module will have the capacity to inform data extracted from the live system. • Network Module: When there is a new transaction, it is broadcast to all nodes in the network. They collect these transactions in a block. A computer connected to the network is called a node. Unlike a Partial node, a Full node stores the entire blockchain ledger and enforces all rules. The latter downloads transactions and applies the consensus rules, transactions that do not conform to any rule are rejected from getting added to the block. The nodes that compete to find and add a new block by solving a computationally difficult problem, such as finding a hash with n number of leading zeroes are said to act as miners. These miners select a random number or the nonce and then verify whether the hash produced by the cryptographic algorithm using the nonce satisfies the difficulty requirement. They carry on with this process until the difficulty requirement is satisfied. If the transactions in the block are valid, the remaining nodes accept the block. Nodes then continue finding the next block using the hash of this block as the previous hash. Nodes of different blockchains follow different consensus protocols, such as Proof of Work to reach an agreement on the addition of a block. The consensus finality informs the probability of reaching consensus. This may be non-deterministic or deterministic [12,

420

R. Bhattacharya et al.

13]. Gossiping is the process following which the nodes propagate new blocks to other nodes in the network or information is transmitted from the blockchain to new nodes participating in the network. Gossiping can be Local or Global [12, 13]. A network has rules of message communication in the network. Communication can be Synchronous or Asynchronous. Hashrate informs how much computing or hashing power is being contributed to the network by all the nodes. It is the speed at which a node is finding a block. The higher is the hashrate, the more is the chance of the node to receive block reward. The Difficulty Level is the measure of how difficult it is to compute a hash below a given target. The network monitors how long the nodes are taking to solve the cryptographic puzzle. If the nodes require time much more or much less than the Block Time to find blocks, the network adjusts the Difficulty Level where Block Time is the time taken by the hashing power of the network to find a solution of the cryptographic puzzle i.e. the time taken to mine a block. The Expected Block Time is the target block time of the network. The Average Block Time is the average time taken by the nodes to find blocks. So, if the Average Block Time deviates much from the Expected Block Time, the Difficulty Level is adjusted so that the Expected Block Time can be achieved. A blockchain fork happens when there is a split in the blockchain, a modification in the protocol or multiple blocks of same block height are present. Forks can be intentional or accidental. While the former modifies the rules of a blockchain, the latter happens when multiple miners find a block at the same time. Forks may or may not be compatible with older versions of the software, those software updates which are compatible are designated as soft forks while the incompatible ones are called hard forks. The Network Module depicts the network structure and its working principles. It answers queries such as: What is the average and expected block time? Which consensus protocol is followed by the system? Is the message communication synchronous? Is consensus finality deterministic or non-deterministic? What is the role of a node? Is gossiping local or global? Operational Layer—This depicts the conceptual model of the non-functional aspects that are used to judge the operations of such systems. The concepts related to this layer are organized into three different modules—extensibility, security and privacy and performance modules. • Extensibility Module: Extensibility of a blockchain based system is depicted through its interoperability and intraoperability. While the former portrays whether a blockchain is capable to exchange information with external systems other than blockchains, the latter portrays whether a blockchain is capable to exchange information with other blockchains. The extent of a blockchain interoperability or intraoperability can be implicit, explicit or none. A blockchain is implicitly interoperable if the smart contracts specifying conditions following which a particular transaction will happen can be written in a Turing complete blockchain script language [12, 13]. A blockchain is explicitly interoperable, if the script is not Turing complete or there are distinct tools facilitating interoperability with the external world. A blockchain has no interoperability if it is unable to interact with external systems. Likewise, a blockchain is implicitly intraoperable if the smart contracts specifying conditions following which a particular transaction will happen can be written in a Turing complete

A Multi-layered Ontological Approach to Blockchain Technology

421

blockchain script language. A blockchain is explicitly intraoperable, if the script is not Turing complete but is designed to infuse intraoperability. A blockchain has no intraoperability if it is unable to interact with other blockchains. Chain linking is the process that connects blockchain with sidechain and enhances the extensibility of the blockchain based system. This module answers queries such as: What is the extent of interoperability or intraoperability of the blockchain? • Security and Privacy Module: Data in the blockchain may be encrypted using algorithms such as SHA-2, ZK-SNARKS, Keccak-256, Scrypt. Cipher is a technique used for encryption or decryption of data. There are two variants of cipher: the block ciphers encrypt a collection of plaintext symbols as a block whereas stream ciphers encrypt symbols of plaintext one at a time into a corresponding ciphertext symbol. Data Privacy may be implemented as Add-on where external techniques such as coin mixing are used to obfuscate data, or Built-in where data is obfuscated by default. In certain blockchains, user identities are anonymous, in some cases, user identities are pseudonymous, and in the rest, user identities are known. This module answers queries such as: What is the data encryption technique used? What type of user identity is supported? Is privacy built-in or add-on? • Performance Module: Performance is evaluated by several metrics such as Latency, Throughput, Scalability and Fault Tolerance. Latency is measured by response time per transaction. Throughput is measured by number of successful transactions per second. Scalability depends on latency and throughput during increased number of nodes and the workloads. Scalability can be improved by solutions such as sharding, segregated witness, off-chain state channels, off-chain computations, Plasma and blockchain rent. Scalability may be limited by factors such as number of transactions, number of users, number of nodes and Block Confirmation Time that may be deterministic or stochastic. The level of dependency of scalability on each of the factors—the number of transactions, the number of users, and the number of nodes may be at most linear, at most quadratic, worse than quadratic or indifferent [12, 13]. The Fault Tolerance depends on latency and throughput during failure. This module will inform the performance of the system and represent dynamic data determined from the live system. It answers the queries such as: Which solution can improve scalability? What is the dependency level on number of transactions, number of users, number of nodes that limit scalability? Is the block confirmation time deterministic or stochastic? Administrative Layer—This depicts the conceptual model of how the system functionalities are administered and governed. This layer has three modules: the resource module, governance module and incentive module. • Resource Module: A blockchain network has resources or native assets including cryptocurrency. A cryptocurrency has units, for instance, satoshi is a unit of bitcoin. The native asset is represented as token that is created by tokenisation. Initial Coin Offering is akin to funding utilizing cryptocurrencies where cryptocurrencies are sold as tokens. The supply of blockchain resources is Limited-Deterministic where the resource supply grows sub-linearly over time and has a well-defined limit, Unlimited-Deterministic where supply is unlimited and Pre-mined where resources are distributed before the system started execution [12, 13]. Some blockchains have tokenisation, some do not

422

R. Bhattacharya et al.

and some includes tokenisation through third party add-ons. Some blockchains such as the private ones usually have no native assets, some blockchains have own convertible currencies while others have convertible multiple assets. So, this module answers queries such as: Does the network involve tokenisation? What sort of native asset is supported by the system? How is the supply of resources managed by the network? • Governance Module: Blockchains are of three types, Public, Private and Consortium. They can also be classified as Permissioned and Permissionless blockchain. A public blockchain is a fully decentralized ledger accessible to all nodes from any place. Any node can participate in the consensus process adding blocks and verifying transactions. In a consortium blockchain which is a partially decentralized ledger, a subset of the nodes participates in the consensus process. The ledger can be read by all the nodes or a group of participants or only a limited number of participants have partial readonly access to the ledger. In a private blockchain, access to read the ledger may be public or partially restricted, however, access to write to the ledger is centralized. In a permissionless blockchain, identities of transaction processors are not restricted whereas, in a permissioned blockchain, a subset of nodes with known identities is given the permission to process transactions. User identities are anonymous or pseudonymous in certain blockchains while in others, particularly the private blockchains, user identities are known to the governing organization that implement a Know Your Customer (KYC)/Anti Money Laundering (AML) identity verification process [12, 13]. The governance rules underpinning blockchain applications are of two variants: technical rules constituted by software, protocols, processes, algorithms and such other technical elements and regulatory rules constituted by regulatory frameworks, provisions, industry policies and such elements defined by external governing bodies [12, 13]. The technical rules are associated with any of the following three models: • Open-source Community Mode where open communities of developers and validators collaborate and perform updates or other technical changes related to the blockchain. • Technical Mode where enterprises provide the technical rules that would meet their business goals. • Alliance Mode where technology platforms can be shared between companies having common business or technological enhancement requirements. So, this module answers queries such as: What is the type of the blockchain depending on access, Public, Private or Consortium? Is the blockchain Permissioned or Permissionless? How is user identity managed? What technical and regulatory rules are followed by the system? Do the technical rules follow open source community mode, technical mode or alliance mode?

A Multi-layered Ontological Approach to Blockchain Technology

423

• Incentive Module: Different blockchain platforms have different incentive schemes for the participants. The transaction processors are rewarded in return of their services such as verification and validation of transactions and addition of blocks to the blockchain. There can be a lump-sum reward given to the miner who found the block that is usually added to the block as a coinbase transaction. In some other blockchain platforms, in addition to the block reward, a reward for appending the validation forked blocks is given to the transaction processors. Users also provide fees to other participants in return of their services such as validation. Fees can be mandatory, optional or absent depending on the rules of the blockchain platform. Fees, if present in a blockchain platform, can be fixed or variable. So, this module answers queries such as: How are rewards managed by the blockchain system? Is the fee variable or fixed? Are fees mandatory or optional, if any? Application Layer—This is the highest layer of abstraction and depicts the conceptual model related to the applicability of blockchain backed systems including business usage. It has only one module—the service module. • Service Module: This module portrays the blockchain generation, application category that describes the blockchain backed application and the Decentralized Autonomous Systems (DAS) category. There are four generations of blockchain: Blockchain 1.0 that includes cryptocurrency applications, Blockchain 2.0 that includes applications related to economics and financial contracts, Blockchain 3.0 that includes applications beyond currency, contracts or markets, such as culture, health, art and Blockchain 4.0 that emerged with Multiversum — Relational Blockchain as a Service for CryptoRelational Database. The application category depicts the area of the application while DAS category informs the corresponding DAS type, provided the application satisfies the criteria of any DAS variant. This module answers queries such as: What is the blockchain generation? What type of application it is? What type of decentralized autonomous system it is? Design: Validation—We prepared a set of competency questions to check if the ontology satisfies established design guidelines. We used e-Mudra, a blockchain based application [15] to further validate the ontology. Design: Documentation—A detailed report on the ontology as per the guideline provided in the Documentation phase of the ODLC has been prepared. Design: Extension and Adaptation—The blockchain ontology can be easily extended and adapted in future, as frequent changes are expected for a domain as dynamic as blockchain, by executing all steps from Sub-phase 1 to Phase 5 of the ODLC. Implementation of the Blockchain Application Ontology using a novel Ontology Design Tool (ODT)—We have developed an ODT that leverages the ODLC and methodology proposed in this paper. This web-based application facilitates users to implement ontology for any domain by using a form-based approach to create/modify/delete groups, layers, modules, concepts along with their attributes and relations with other concepts. It also enables users to create instances of ontologies, i.e. application designs for any domain whose ontologies have been implemented with this tool. So, this tool not only assists in implementing a blockchain ontology, it also helps in designing a specific

424

R. Bhattacharya et al.

blockchain application using the ontology for blockchain domain, so implemented with this tool. We envision this tool will emerge as a comprehensive knowledge base of various concepts associated with any domain, say for instance, blockchain. Such knowledge base can be used to create the design of blockchain based systems. It is the information repository. This can be thought of as a unified knowledge base for blockchain utilizing ontologies as the building blocks. The knowledge relevant to various aspects of blockchain technology is captured in the ontology which is organized into five broad layers each comprising of one or more sub-ontologies or modules. Technologies Used—The following technologies have been used in the construction of the tool. • • • • • •

Java/J2EE Google App Engine App Engine Datastore HTML/CSS JointJS JQuery/JQueryUI.

ODT Architecture—The tool is implemented as a web application i.e. the ontology manager using Model-View-Controller approach and having Google App Engine Datastore as the database, see Fig. 3. The datastore has the master data for the ontology that comprises of ontology design information i.e. the groups, layers, modules, concepts and their attributes and relations for any domain and application data for instances of the ontology or in other words, the applications designed based on the ontologies implemented with the tool.

Fig. 3. Ontology design tool architecture

A Multi-layered Ontological Approach to Blockchain Technology

425

Description—The tool has two major sections: Ontology Design and Application Design. Ontology Design—This section can be leveraged to create new ontologies and their elements, edit existing ontologies or their elements, delete existing ontologies or their elements and view the ontologies graphically. Elements here signify groups, layers, modules, concepts and their attributes and relations. The addition/modification/deletion of elements to the ontology follows a form-based approach. First, to construct a complex ontology, groups are created as appropriate, then layers are added. The modules are added to the layers next and finally concepts with their associated relations and attributes are added to the modules. If the ontology is simple without layering, the concepts can be directly added to the ontology. This section has four sub-sections in the tool as described next. • Create Ontology/Add Elements—This section enables user to add new elements to the ontology or create an ontology for a new domain. • Edit Ontology/Elements—This section enables modifying existing components of the ontology. This includes merging and splitting of modules. • Delete Ontology/Elements—This section presents options to delete ontology, its groups, layers, modules, concepts, relations or attributes of concepts. • View Ontology—This facilitates user to have a view of the ontology created, whether a top-level view or a detailed view of an individual module. Top level View—This enables user to visualize the modularized structure of the ontology. If any change is made to the modules, layers or groups of the ontology, say for example addition or deletion of a module, this image will be automatically updated without any user intervention. Detailed View—This presents a detailed view of each module with all its concepts and associated relations and attributes expressed as a UML Class Diagram. If any change is made to the concepts of the ontology, say for example addition of a relation between two concepts or deletion of an attribute from a concept, this image will be automatically updated without user intervention. Application Design—This section can be leveraged to create instances of ontology designed i.e. create, view and delete application designs for the domain. • Create Application Design—This enables user to design application using the ontologies implemented using this tool. For example, the e-Mudra application can be designed using blockchain ontology. The application design parameters and their possible values in the form is dynamically populated from the ontology master data, any change in the ontology implemented will change this form automatically without any user intervention. • Visualize Application Design—This enables user to view the applications designed using ontologies implemented with this tool. • Delete Application Design—This enables user to delete applications designed using ontologies implemented with this tool.

426

R. Bhattacharya et al.

Evaluation—The blockchain ontology implemented with the tool is being tested. The current functionalities of the tool and its capability for reuse are also tested. Usability, Performance and Interoperability examinations are yet to be carried out. Maintenance—The blockchain ontology design, code for the ODT and all associated documentations are maintained for effective performance of the system. 3.3 Observations The specification for a blockchain can be implemented in a platform or technology independent way. We observed that the information and data related to any blockchain system can be categorized as static and dynamic. The static part includes all the information that depicts the design of the overall system. The dynamic part includes live, runtime or experimental data generated from, analyzed from or related to such system. Some of the blockchain platforms such as Bitcoin or Ethereum have online block explorers available for users to inspect some of the live data generated from the blockchain. BLONDiE is an ontological approach to witness completely structural data from blockchain [17]. But we do not find any tool to represent the static part or the dynamic part representing data analyzed from or related to such system, for example, live performance statistics. We used our proposed methodology for ontology design of complex, extendable ontology such as blockchain. We followed a middle out approach as mentioned in our methodology. We used the main concepts which have associated enumerated values or specializations to design a blockchain based system. The dynamic view can be achieved by performance, block and network modules of the blockchain ontology in future. A view of the ontology is illustrated in Fig. 4, which shows the level of dependency among modules is minimum.

Fig. 4. Blockchain main modules

A Multi-layered Ontological Approach to Blockchain Technology

427

4 Conclusion and Future Work In this paper we have proposed a new ODLC and formulated a methodology to design extendable and reusable ontologies in an organized and modularized fashion for very complicated domains that are still maturing. The methodology can also be applied to construct simple ontologies. We presented a tool that enables implementing a multi-layered modularized ontology for any domain and designing applications for that domain. We specified a multi-layered blockchain ontology, which has been implemented with the tool. The implemented blockchain ontology will act as a reusable and extendable knowledge management and representation system in the blockchain domain provisioning common or shared understanding of this technology among users in an organized way. It will assist system builders in exploring, evaluating and contrasting various blockchain based application designs or solutions. It will assist system designers in identifying requirements and defining a system specification. Our future efforts would involve enhancing the tool that will emerge as an ontologybased knowledge management and representation platform for blockchain and adding functionalities to visualize such blockchain driven application designs using the platform, given that it is already capable to furnish ontology designs as UML class diagrams and the layered ontology structure graphically.

References 1. Benslimane, D., Leclercq, E., Savonnet, M., Terrasse, M.-N., Yetongnon, K.: On the definition of generic multi-layered ontologies for urban applications. Comput. Environ. Urban Syst. 24(3), 191–214 (2000) 2. Man, D.: Ontologies in computer science. Didactica Math., 43–46 (2013) 3. Blat, J., Ibáñez, J., Navarrete, T.: Introduction to ontologies and tools; some examples. Universitat Pompeu Fabra. http://www.dtic.upf.edu/~jblat/material/doctorat/ontologies.pdf. Accessed 18 Aug 2018 4. Seedorf, S., Happel, H.-J.: Applications of Ontologies in Software Engineering. Karlsruhe Institute of Technology. https://km.aifb.kit.edu/ws/swese2006/final/happel_full.pdf (n.d.). Accessed 19 Aug 2018 5. Bhat, M., Shah, S., Das, P., Kumar, P., Kulkarni, N., Ghaisas, S.S., et al.: PREMAP- knowledge driven design of materials and engineering process. In: ICoRD 2013 International Conference on Research into Design, Chennai, pp. 1315–1329 (2013) 6. Kruijff, J.D., Weigand, H.: Towards a Blockchain Ontology. Semantic Scholar. https://pdfs. semanticscholar.org/0782/c5badb4f407ee0964d07eda9f74a92de3298.pdf (2017). Accessed 18 Aug 2018 7. Unit V: Semantic web Technology, Layered Architecture, RDF and OWL representation. www.srmuniv.ac.in. http://www.srmuniv.ac.in/sites/default/files/files/Semantic%20web% 20Technology,%20Layered%20Architecture,%20RDF%20and%20OWL%20representa tion.pdf. Accessed 18 Aug 2018 8. Seebacher, S., Maleshkova, M.: A model-driven approach for the description of blockchain business networks. In: Proceedings of the 51st Hawaii International Conference on System Sciences, Hawaii, pp. 3487–3496 (2018)

428

R. Bhattacharya et al.

9. Kim, H.M., Laskowski, M., Nan, N.: A First Step in the Co-Evolution of Blockchain and Ontologies: Towards Engineering an Ontology of Governance at the Blockchain Protocol Level. arxiv.org. https://arxiv.org/abs/1801.02027, 6 Jan 2018. Accessed 18 Aug 2018 10. Ruta, M., Scioscia, F., Ieva, S., Capurso, G., Sciascio, E.D.: Semantic blockchain to improve scalability in the internet of things. Open J. Int. Things (OJIOT) 3(1), 46–61 (2017) 11. Wendland, M.V.: Semantic Blockchain-A Review of Sematic Blockchain and Distributed Ledger Technology Approaches (DLT). ResearchGate. https://www.researchgate.net/public ation/324706165_Semantic_Blockchain_-_A_Review_of_Sematic_Blockcghain_and_Dis tributed_Ledger_Technology_Approaches_DLT (2018). Accessed 18 Aug 2018 12. Tasca, P., Tessone, C.J.: Taxonomy of Blockchain Technologies. Principles of Identification and Classification. Cornell University Library. https://arxiv.org/abs/1708.04872, 31 May 2017. Accessed 19 Aug 2018 13. Tasca, P., Thanabalasingham, T., Tessone, C.J.: Ontology of Blockchain Technologies. Principles of Identification and Classification. https://allquantor.at/blockchainbib/pdf/tasca2017 ontology.pdf, 31 May 2017. Accessed 19 Aug 2018 14. Deutsche Bank Research: Blockchain–attack is probably the best form of defence (Fintech #2). www.dbresearch.com. https://emergingpayments.org/wp-content/uploads/2017/02/ Blockchain-attack-is-the-best-form-of-defence.pdf, 28 Jul 2015. Accessed 18 Aug 2018 15. Bhattacharya, R., White, M., Beloff, N.: A Blockchain based Peer-to-Peer Framework for Exchanging Leftover Foreign Currency. In: Computing Conference London, pp. 1431–1435. IEEE (2017) 16. Kim, H.M., Laskowski, M.: Towards an ontology-driven blockchain design for supply chain provenance. In: Workshop on Information Technology and Systems (WITS). Dublin, Ireland, 15–16 Dec 2016 17. Ugarte R., Héctor E.: A more pragmatic Web 3.0: Linked Blockchain Data. https://hedugaro. github.io/Linked-Blockchain-Data/, 17 Apr 2017. Accessed 5 Sep 2018 18. Slimani, T.: A study investigating knowledge-based engineering methodologies analysis. Int. J. Comput. Appl. 128, 67–91(2015) 19. López, M.F.: Overview of methodologies for building ontologies. In: Proceedings of the IJCAI-99 Workshop on Ontologies and Problem-Solving Methods (KRR5), Stockholm, Sweden, 2 Aug 1999, pp. 4-1–4-13 (1999) 20. John, S., Shah, N., Smalov, L.: Incremental and iterative agile methodology (IIAM): hybrid approach for ontology design towards semantic web based educational systems development. Int. J. Knowl. Eng. 2(1), 13–19 (2016) 21. Saripalle, R.K., Demurjian, S.A., De la Rosa Algarín, A., Blechner, M.: A software modeling approach to ontology design via extensions to ODM and OWL. Int. J. Semantic Web Inf. Syst. 9(2), 62–97 (2013)

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading Zan-Jun Wang1 , Ching-Chun Huang2(B) , Shih-Wei Liao1 , and Zih-shiuan Spin Yuan3 1

2

National Taiwan University, Taipei City, Taiwan (R.O.C.) [email protected] National Cheng Kung University, Tainan City, Taiwan (R.O.C.) [email protected] 3 BiiLabs Co., Ltd., Taipei City, Taiwan

Abstract. Various types of data are generated every day, and the amount of generated data is growing exponentially. People are interested in extracting the value of data. Some valuable data among them can be viewed as digital products to be traded. For example, with the outbreak of COVID-19, patients’ personal health records, electronic medical records and travel history become important and valuable information for epidemic prevention. On the other hand, the license keys for highpriced software such as EDA (Electronic Design Automation) tools also have value and can be considered as tradable products. However, the trust between two parties in the trading process becomes an issue. Consumers do not pay until providers give the data while providers are not willing to do since they distrust that consumers will pay after receiving the data. In this paper, we propose a Blockchain-based protocol for data trading with zero-knowledge proofs. To protect the data and maintain their value, Zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) is included that the provider can convince the consumer of the correctness and security of the data without revealing the details before receiving the payment. Predeﬁned agreements between both parties in smart contracts are executed automatically. When the data is valid, the provider receives the payment, and in the meantime, the consumer has the ability to obtain the purchase data. Otherwise, the payment is refunded to the consumer immediately if the provider cheats. This approach employs the method of two-way exchange, as known as Delivery versus Payment (DvP) in physical commodity trading and ensures the rights and beneﬁts for both parties. The whole process is decentralized for the purpose of constructing fair data trading without any trusted third party and ensuring system availability. Keywords: Blockchain · Zero knowledge trading · Conﬁdentiality · Data privacy

· Decentralization · Data

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 429–447, 2022. https://doi.org/10.1007/978-3-031-10467-1_27

430

1

Z.-J. Wang et al.

Introduction

With the growth of smart phone, smart city, smart grid, smart vehicle [41] and other new technologies, streams of data are generated by these smart sensors rapidly. Douglas [1] deﬁned the 3 Ds of data: Data Volume, Data Velocity, and Data Variety. Data Volume is the size of data, and Data Velocity is how fast the data stream is being generated. For example, 300 PB of data are stored in Meta’s (aka Facebook) data warehouse infrastructure, Hive [2], and 4 PB of data are generated on Facebook every day. Data Variety means the data could have a big variety of data types, formats, degrees from various sensors, devices and applications. The pandemic of COVID-19 is ongoing and there are more than 18 million conﬁrmed cases globally [3]. The Minister of Health and Welfare said that the cost to society for each patient is 71573 USD in Taiwan. To control the outbreak of the coronavirus disease, the information of patients’ travel history such as Global Positioning System (GPS) records, transaction records of credit cards or Electronic Toll Collection (ETC) System records for the nationwide freeways, becomes important data to trace the potential secondary cases [4]. Many research teams analyze the electronic medical records and medication records, and look forward to identifying eﬀective treatments against the coronavirus disease [5]. The results of vaccine clinical trials are valuable data for developing coronavirus vaccine to prevent COVID-19 infections [6]. These valuable data can be considered as digital commodities which need data markets and fair trading rules. In another instance, license keys [7] are used to verify that the user is authorized to install the software, which can not be copied without authorization to protect the copyright. License keys can be traded on the electronic commerce, in which some license keys are sold at a low price. Although some of them are legal and just bought from other countries where the keys are cheaper, consumers still want to verify the keys before they pay the payment. However, providers are not willing to give the keys to consumers since they are easy to be copied. Also, it is a risk to let a broker check the license keys because the broker may try to steal the keys and sell. A reliable trading without a third party is needed for this kind of data. In physical commodity transactions, the trust between two parties is a problem. A malicious provider may not deliver the product after an honest consumer pays money. Contrarily, if an honest provider delivers the product ﬁrst, a malicious consumer may receive the goods without paying. A trusted third party plays an important role. First, the consumer pays money to the third party. The third party does not send the payment to the provider until the consumer receives the product. Some disputes such as shipping mistakes or unsatisﬁed products may arise. Any dispute between the provider and consumer is settled by the third party. The product may be returned or exchanged and the payment is refunded. However, both parties need to trust the third party and a brokerage fee is required. Some researches [8,9] include reputation systems to evaluate the participants based on his/her behaviors. Nevertheless, malicious participants can eﬀortlessly create Sybil identities pseudonymously and bypass the reputa-

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

431

tion system. It is probable that participants collude with each others and get unfair reputations. Furthermore, the centralized third party is a single point of failure. Once the third party is not working, the whole system will be completely unavailable. Many researches [8,10,11] explores the DLT-based applications and aims to facilitate more decentralized solutions. Transactions and new blocks are approved by Proof-of-Work consensus algorithm with the Bitcoin network. Transactions of smart contracts are executed by Blockchain miners so single point of failure can be prevented. All transactions between providers and consumers will be recorded on Blockchain [12] and become traceable. Meanwhile, Ethereum supports general-purpose smart contracts to facilitate automated trading rules by pre-deﬁnition [13] transparently and irreversibly. Next, the user privacy and data conﬁdentiality gain more attention as the General Data Protection Regulation (GDPR) was conducted since 2016.[42] Therefore, to ensure the conﬁdentiality of data and participant privacy, zeroknowledge proofs (ZKPs) and smart contracts are used as conﬁdentialitypreserving tools while data encryptions are adopted to data transfer between providers and consumers. Zk-SNARK [14] is a type of zero-knowledge protocol, which allows the prover to verify a statement succinctly and non-interactively. To prevent malicious behavior, logical operation or computation will be converted into a proof within 192 bytes based on Zk-SNARK as long as the given data and witness satisfy the prover’s statement. Using Blockchain and zero-knowledge proofs, Delivery versus Payment (DvP) is employed and some malicious behaviors are able to be prevented. The provider can not take the money without sending the data to the consumer. Otherwise, it is impossible to cheat the consumer with the tampered data. For the consumer, he/she can not deny the purchase as long as the transaction is included in Blockchain. Also, the consumer can not learn any information about the data from zero-knowledge proofs. Providers and consumers can conduct a fair data trading.

2

Related Work

Data economic emerges as the IoT revolution, thus data utilization becomes an issue. Big data are considered as a kind of digital commodity which allows data owners and consumers to connect with each other, sharing and further increasing the utility of data [15]. Several studies have explored data markets, which allow data searching, data auction, data exchange and data trading with copyright protection. Decentralized architecture like Blockchain [12] is considered a solution. Gupta et al. [8] proposed a 3-tier framework which consists of the provider, consumer and broker with Ethereum smart contracts for managing the terms of the agreement without intermediary. Databox which is a protective virtual container is used to aggregate data, store data, control sharing and performs computations. The data trading process is based on smart contract, where a register

432

Z.-J. Wang et al.

Fig. 1. The trading process of a DvP trading protocol. The provider deploys the smart contract ﬁrst. Then, the consumer asks for purchasing the data. According to the consumer’s request, the provider generates a zero-knowledge proof to show that the data are valid and the Consumer can get the data if he/she sends the payment. Also, the provider sends the consumer the ciphertext of purchase data with an oﬀ-chain channel. The consumer veriﬁes the result as well as the ciphertext and sends payment to the smart contract when both are valid. Then, the provider needs to send the valid one-time symmetric key to withdraw the payment. In case 1, when the Key is valid, the payment is sent to the provider and the consumer can decrypt the ciphertext with the key. In case 2, when the key is invalid, the payment is refunded. In case 3, when the provider does not publish the key for a given time interval, the consumer can request for a refund.

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

433

contract is used to maintain a contract lookup table and the data subscription contract manages the subscription list. Wang et al. [16] proposed a decentralized and trustless architecture which consists of a registrar, data providers, data consumers, and brokers, to include the transaction process on distributed ledgers. It eﬃciently enhances the degree of transparency, since all interactions between participants with smart contracts will be recorded on Blockchain. Lin et al. [17] extended [16] to introduce an automated subscription procedure, which is demanded for the sake of data monetization. Its storage built upon cryptographic message protocols allows transmitting, accessing and validating data streams over distributed ledgers without authorities, and the digital rights of trading participants deserve a guarantee. Dai et al. [10] implemented a secure data trading platform based on Ethereum and Intel’s Software Guard Extensions (SGX). Seller’s data will be analyzed with smart contracts which are performed on blockchain-based data trading ecosystem (SDTE) which consists of SGX supported nodes. Xiong et al. [18] proposed a data trading model based on smart contracts and machine learning and removed the trusted third party in the data trading. A challenge response mechanism and an arbitration mechanism are proposed to authenticate and authorize the data owner and resolve the dispute of the data trading. Zhao et al. [9] proposed a decentralized secure pub-sub (SPS) architecture which provides the conﬁdentiality and reliability of data, anonymity of subscribers and payment fairness. They use the reputation system to evaluate the behavior of participants. A publisher can be trusted when the reputation value exceeds the threshold value of the reputation system. If the publisher is not fully trusted, the subscriber can post a transaction to punish the publisher by setting its reputation value to zero. Also, they [19] proposed a data trading protocol which guarantees the fairness between providers and consumers with ring signature, double-authentication-preventing signature and similarity learning. However, some trusted environments such as Databox, Broker, SDTE, SGX, trusted entities, Arbitration institution (ARB) or Market Manager is still needed, in the above research. Chen et al. [11] proposed a decentralized solution for the big data exchange which aims to create an ecosystem in which all participators can share data with each others. Blockchain is used to record transaction logs and important documents and no third-party is needed. Data copyright and conﬁdentiality protection are also considered. However, these studies have a lack of discussion about the fairness of data exchange. Li et al. [20] designed a valid data trading ecosystem based on Blockchain with a decentralized arbitration mechanism for data availability. Nevertheless, an arbitration is needed and executed by arbitrators of the arbitration committee when an invalid data is traded. Both provider and consumer had to trust the arbitrators who make a result of the arbitration. Delgado-Segura et al. [26] presented a fair protocol for data trading based on Bitcoin script language. However, the opcode OP AN D they used are disable and will make the transactions fail.

434

Z.-J. Wang et al.

Zero-knowledge technology is used in privacy-preserving Blockchain. Maxwell [21] proposed the ﬁrst Zero-Knowledge Contingent Payment (ZKCP) on the Bitcoin network. Zcash [22] provided privacy protection by hiding sender and recipient identities as well as transaction amount. Zether [23] and ZETH [24] provided privacy-preserving Blockchain based on Ethereum with smart contracts. Campanelli et al. [25] deﬁned the notion of Zero-Knowledge Contingent Service Payment (ZKCSP) for digital service paying. In this paper, we propose a fair data trading protocol with data integrity validations and privacy-preserving proofs.

3 3.1

Preliminaries Ethereum Smart Contract

The Smart contract is a protocol deployed on Blockchain that implements given agreement represented in programming languages. Ethereum [13] provides a Turing-complete programming language that allows users to build universal smart contracts and ﬂexible applications. Each smart contract is given a unique address. When a transaction is sent to its address, some computations are executed and its result is agreed by participants in the consensus protocol. In our protocol, trading rules are deﬁned in Ethereum smart contracts which are transparent. Furthermore, the whole trading process is recorded in smart contracts that are also traceable and irreversible. Although the transaction fee is charged in this approach, proﬁts of the two parties can be assured without a centralized third party which is required in the oﬀ-chain solutions. 3.2

Merkle Tree

A Merkle tree based scheme [27] is proposed to authenticate the data integrity of outsourced data. It is based on a Tree Signature and Tree Authentication scheme with a binary hash tree proposed by Ralph Merkle [28]. Each leaf node contains the hash of a data block and others nodes contains the hash of their children. Sometimes, a randomly generated salt is used to protect the plaintext from dictionary attacks. Merkle tree veriﬁcation is eﬃcient for the reason that a data block of the leaf can be veriﬁed within logarithmic time and space. In this paper, we apply Merkle tree to verify the integrity of selling data. The data can be divided into data blocks which form a Merkle tree. The Merkle tree root is published to smart contract as a proof of existence. Consumers trust providers by checking the Merkle tree root as it is diﬃcult to tamper any content of a single data block which achieves the same Merkle tree root. However, providers are not willing to make data public before receiving a payment. Zeroknowledge proofs can be used to prove the data integrity without revealing data, and convince consumers to trust providers.

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

3.3

435

zk-SNARK

In zero-knowledge proofs scenario, there is a prover who wants to prove a statement without revealing the details of private data as well as veriﬁers who verify the statement and learn nothing else. The prover would like to prove a statement y = F (xprivate , xpublic ) and veriﬁers verify the proof with a veriﬁcation function V (w, xpublic , y) which returns 1 when the proof is valid or return 0 when invalid and learn nothing else. A witness w which is related to private data xprivate is used for proof while xprivate is kept secret. Zero-knowledge succinct non-interactive arguments of knowledge (zkSNARK) [14] is one zero-knowledge proof system that transforms statement to a quadratic arithmetic program (QAP) which can verify all NP question. QAP could be automatically translated from a rank-1 constraint system (R1CS) which consists of three vectors A, B and C of which all elements are in a ﬁnite ﬁeld of order p and a solution to the R1CS is a vector s where C · s = A · s × B · s. For example, a prover wants to verify that he/she has a secret number x = 9 which satisﬁes x2 + x = 90 and the following sequences could be generated: w1 = x ∗ x y = w1 + x

(1)

Then, an R1CS can be converted with variable mappings [one, x, w1 , y], where one represents the number 1: 0010 C= 0001 0100 (2) A= 0110 0100 B= 1000 The solution s for this R1CS is: s = [1, 9, 81, 90]

(3)

Subsequently, it is converted to QAP form which can generate a zeroknowledge proof automatically by using some library such as libsnark [29]. It is succinct owing that communication complexity and veriﬁcation time are bounded by a polynomial in the security parameter. In non-interactive proof, a setup phase that generates a common reference string is needed. Only a single message is sent from the prover to the veriﬁers and the proof can be veriﬁed. Otherwise, It is possible to integrate with Ethereum and veriﬁed on smart contracts. In our protocol, a provider proves the data integrity and the consistency of the ciphertext using zero-knowledge proofs. Hence, the provider can protect the data before receiving the payment and the consumer can trust the provider before paying to the smart contract.

436

4 4.1

Z.-J. Wang et al.

Trading Protocol Participants

There are two types of participants in the trading model. First, providers who own a lot of data are willing to sell these data to consumers and receive payment that can be spent in increasing the quantity and accuracy of their data. Second, there are consumers who demand big data which produce huge value for their services by data mining and machine learning. However, it is a big challenge that most consumers may not have the ability to collect data themselves. They look forward to purchasing the desired data from providers. Moreover, consumers could become providers afterward. Some consumers may buy raw data, reduce dimensionality [30], detect outlier [31], or process data for speciﬁc purposes. Finally, they become provider to sell these processed data which are more valuable than raw data with higher prices. As a result, the value of data can be maximized. 4.2

Data Integrity

When the data are generated, the provider divides the data D into data blocks and publishes the Merkle tree root R to the smart contract. If the data are being generated, the Merkle tree root had to be updated. After the smart contract transaction is included in a block, the more block conﬁrmations the transaction has, the higher conﬁdent the data integrity is guaranteed. The probability of a Blockchain fork is decreased exponentially over the growing of block number [32] and it is hard to tamper the data which form the same Merkle tree root after the consumer pays for the data. With the help of smart contracts and zero knowledge proofs, the data can not be tampered with or the payment should be refunded. The integrity of data storage and data transfer can be assured. Consumers trust the Merkle tree root on the smart contract and the result of zero knowledge proof, and a fair data trading can be established. 4.3

Trading Details

Figure 1 shows the trading process. First, the provider deploys the smart contract with the metadata and keeps updating the Merkle tree root when the data are generated. The provider publishes the metadata along with the smart contract address to the Internet or data markets [15] in which consumers search desired data and the action processes are determined. Also, the data blocks, Di , which is traded is speciﬁed. For example, a provider sells sensor data for monitoring air pollution. A consumer only requires sensor data detected in the daytime since his/her business service is only open during the day. The provider can prove the data integrity with the Merkle tree veriﬁcation without giving them data detected at night time which are not bought by the consumer. For another example, the electronic medical records [33] are published to the blockchain as a proof of existence. However, some of them are related to personal information

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

437

which should not be revealed to the consumer. The provider speciﬁes which electronic medical records are able to be traded. By listening to smart contract events, the provider notices that a consumer requests to start trading. The provider generates a one-time symmetric key k and provides the following proof: 1.M erkleRoot(Di , pathElements, pathIndices) = R 2.hash(k) = Hk 3.Encrypt(k, Di ) = Cd 4.hash(Cd ) = Hd

(4)

where pathIndices, R, Hk , Hd are public and pathElements, Di , k, Cd remains private by using a zero-knowledge proof. The element of pathIndices is 0 when it is the right child on the Merkle tree path or 1 when it is left child. pathElements are elements on the Merkle tree path which are used to verify the Merkle root. R is the root of the Merkle tree which is formed with the data blocks of selling dataset to prove the data integrity. Hk is the hash value of the symmetric key k, which is a symmetric key to encrypt selling data. Cd is the encryption of selling data Di . Hd is the hash value of the encryption Cd , which can be used to validate that the ciphertext the consumer has is the same as the one in (4). The veriﬁcation process is done on the smart contract which is executed by Ethereum miners. The consumer can believe the zero knowledge proof process as well as the veriﬁer contract. With an oﬀ-chain channel such as transport layer security (TLS), the provider also sends the consumer Cd which can be validated by computing its hash value and comparing with Hd : hash(Cd ) = Hd

(5)

One of the hash function properties is the second-preimage resistance [34] that it is diﬃcult to ﬁnd another input which has the same hash value. When the hash value of Cd sent by the provider is the same as Hd in (4), Cd should be the same as that in (4). When the result of (4) and Cd are both valid, the consumer is convinced that Cd is the encryption of the data he/she asks and the consumer could send tokens to the smart contract as the payment. However, the provider can not withdraw these tokens until he/she provides the correct symmetric key k. k is veriﬁed on the smart contract: hash(k) = Hk

(6)

Although the symmetric key k is published to the Blockchain, only the consumer who has the ciphertext can obtain the original data. By the smart contract, the provider convinces the consumer that the consumer can use the symmetric key k to decrypt Cd and obtain purchase data of which data integrity has been proven in (4). When the transaction to verify k is executed and the result is invalid, the consumer payment should also be refunded automatically. If the provider can not provide the correct symmetric key, the consumer can request a refund.

438

Z.-J. Wang et al.

The provider can receive the payment and the consumer can acquire the tamper-prooﬁng data when everything is going well. However, if the provider is malicious, the consumer can get the payment back. Otherwise, a malicious consumer can not obtain the data without payment. The trading process with zero-knowledge proofs and smart contracts achieves DvP in physical commodities trading. A fair trade between the provider and the consumer can be ensured. 4.4

Potential Attack

It is possible that the consumer is a miner in the Blockchain. The consumer may learn the symmetric key k from the provider’s transaction and try to send a transaction to request a refund before including the provider’s transaction in the Blockchain. The Blockchain fork probability is decreased exponentially [32]. A given time interval is set and the refund will not be executed until waiting the provider for a period of time after the consumer sends payment. The time interval is calculated by a block interval BInterval : BInterval = BConf irmations +

TP rovider + BConf irmations TBlock

(7)

where BConf irmations is the required block conﬁrmations for Payment transaction and VerifyKey transaction, TP rovider is the provider response time to issue a VerifyKey transaction, TBlockT ime is the expected block time of the Blockchain of which the expected block time in Ethereum is between 10 to 20 s. 4.5

Security Discussion

Security Properties. In the following, we deﬁne the required security of a data trading protocol. – Completeness : The completeness means that when both the provider and the consumer are honest and follow the trading rules, the provider can get the payment by selling the correct data and the consumer can obtain the validated data after sending the payment when both parties are honest. – F airness : The fairness means that a malicious provider can not get the payment without giving the correct and validated data, and a consumer can not obtain the data without payment. – Conf identiality : The conﬁdentiality means that the data are conﬁdential. The consumer can not know the data when verifying the zero-knowledge proof and no any third party can steal the data during data trading. – Decentralization : The decentralization means that both the provider and the consumer can believe that the trading process is executed decentrally, and the veriﬁcation results are agreed by the decentralized network instead of an authority third party.

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

439

Security Analysis. In this section, we analysis the security of the proposed DvP data trading protocol. – Completeness : We consider that both the provider and the consumer are honest and follow the trading rules. The zero-knowledge proof proves that the ciphertext Cd is the encryption of Di with the symmetric key k. The provider sends the valid Cd to the consumer secretly. The consumer checks the result of zero-knowledge proof and Cd by comparing its hash value with Hd in the validated proof. When both results are correct, the consumer sends the payment to the smart contract. The provider can not withdraw the payment now, he/she should send a valid k, which can be used to decrypted Cd , to the smart contract. k is veriﬁed on the smart contract and the payment is transferred to the provider when k is valid. As a result, the provider can get the payment after the fair data trading. At the meantime, the consumer can obtain Di by decrypting Cd with k. The protocol is completeness. – F airness : It is possible that the provider is malicious and try to withdraw the payment with invalid data or an invalid key. Also, the consumer may be malicious and want to steal the data without payment. In the ﬁrst case, we consider that the provider is malicious and the consumer is honest. The consumer only sends the payment after checking the hash value Hd of ciphertext Cd with the smart contract. With an invalid Cd , the provider can not cheat the consumer to pay. After the consumer sends the payment to the smart contract, the provider tries to publish an invalid k. However, the smart contract checks k by calculating its hash value and ﬁnds that k is invalid. As a result, the payment will be refunded to the consumer. The malicious can not get the payment with invalid data or an invalid key. In the second case, we suppose that the consumer is malicious and the provider is honest. Before the consumer sends the payment, the provider sends the ciphertext Cd to the consumer. The consumer can not extract the data Di without a symmetric key k. After sending the payment, the provider publishes k to the smart contract. The consumer obtains k, which can be used to decrypt Cd , and tries to send a refund transaction which conﬂicts with the provider’s transaction. However, the refund transaction will be fail until enough block interval BInterval . It is hard to fork the Blockchain after the provider’s transaction has BConf irmations conﬁrmations. The consumer can not obtain the data without payment. A fairness data trading is ensured. – Conf identiality : The data Di is encrypted with the symmetric key k. Without k, the consumer can not extract Di before sending the payment. Only when the consumer sends payment and the provider publishes the key, the consumer can decrypt Cd and obtain the data. The details of Di are not revealed when veriﬁers want to verify the zero-knowledge proof. The ciphertext of data Cd is transferred secretly. As long as the ciphertext is not revealed by the provider and the consumer, other people can only see the symmetric key and do not learn the details of trading data. The conﬁdentiality of the data can be ensured.

440

Z.-J. Wang et al.

– Decentralization : We suppose that the smart contracts are deployed on a Blockchain which ensures the robustness of its consensus protocol. The smart contract transactions and the execution results are veriﬁed and recorded by each node to ensure integrity. Any authority third party highly unlikely controls the Blockchain and the trading process. The trading protocol is decentralization.

5 5.1

Implementation Zero-Knowledge Proof

Several zero-knowledge proof applications include proving the knowledge of a preimage of a hash. In our trading model, symmetric encryptions and hash functions are used to preserve trading data before the consumer pays the payment and ensure data integrity. However, hash functions such as Secure Hash Algorithms (SHA), Advanced Encryption Standard (AES), Galois/Counter Mode (GCM) are not friendly to zero-knowledge proofs due to the heavy cost of multiplication. Using snarkjs [35], which is a JavaScript implementation of zkSNARK schemes, a SHA256 circuit requires 29380 constraints which are too expensive. MiMC [36] is a secure block cipher and a secure cryptographic hash function as well which minimizes the number of multiplications in a ﬁnite ﬁeld for zkSNARK applications. The MiMC7 block cipher is constructed with a round function several times where each round consists of a key addition and a round constant addition. However, it [36] is advised to use modes where the inverse is not needed, since encryption and decryption need to be implemented separately and decryption is much more expensive. Thus, MiMCFeistel which is in a Feistel network is adopted for block cipher and hash function. A MiMCFeistel block cipher and hash requires only 1983 constraints. By using snarkjs with NodeJS v10.20.1, the circuit is compiled then the R1CS binary, WASM code and symbols ﬁle are generated. To generate proving key and veriﬁcation key, the setup phase is executed with the R1CS binary. The witness can be calculated by WASM code, symbols ﬁle and circuit input. Finally, the zero-knowledge proof is generated with the proving key and witness. Veriﬁers can verify the proof with the veriﬁcation key. Moreover, the proof can be veriﬁed on Blockchain with smart contract. The process above can be executed automatically. We evaluate costs of zk-SNARK circuits for the scenarios of diﬀerent depth of Merkle tree, ranging between 4 and 20. Table 1 shows numbers of data blocks, wires, constraints, private inputs and public inputs under diﬀerent depth. When the depth of a binary Merkle tree is d, the maximum number of data blocks is 2d . The number of wires, which connect the gates, is 1324 ∗ d + 3303. The number of constraints, which are related to the number of gate and circuit output, is 1323 ∗ d + 3300. The number of constraints is directly correlated with the proving time. The number of private inputs is d + 2 and the number of public inputs is d + 3 because the number of pathElements and pathIndices are increasing by the depth d.

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

441

Table 1. The statistics of circuits for diﬀerent depth of merkle tree. Depth of merkle tree Leaves/Data blocks Wires Constraints Private inputs Public inputs 4

16

8592

6

7

8

256

13895 13884

8599

10

11

12

4096

19191 19176

14

15

16

65536

24487 24468

18

19

20

1048576

29783 29760

22

23

Fig. 2. The proving time for each depth.

Figure 2 shows the proving time for each depth. To prove a zero-knowledge proof, the execution times of compiling and calculating witness can be ignored while setup and generate-proof phase cost a lot. Proving key and veriﬁcation key are created in the setup phase, then a proof can be generated with the witness and proving key. The execution times of setup and generate-proof phase increase by the depth of the Merkle tree. Succinct proofs of zk-SNARK can be veriﬁed with low proof length and low veriﬁcation time. Also, the veriﬁcation can be executed with smart contract generated with veriﬁcation key. Figure 3 shows the gas cost of deploying veriﬁcation contracts. Gas is only consumed when the contract is deployed, while calling veriﬁcation function does not need gas because the state of the smart contract is not changed. Although the cost will grow with the depth d since the length of pathIndices is related to the memory usage, the system is scalable since the cost increase logarithmically with the number of data blocks. For the Merkle tree which has a depth of 20, there are 1 million data blocks and the gas consumption is 1971966 gas. In May, 2020, the approximate gas price is 17 Gwei [38], where 1 Gwei equal to 10−9 Ether. Once the contract is deployed and the proof is veriﬁed, the result is transparent and traceable for everyone.

442

Z.-J. Wang et al.

Fig. 3. The transaction cost of the veriﬁcation contract for each depth.

Fig. 4. An overview of smart contracts.

6

Trading Process and Smart Contracts

The trading process is deﬁned and executed with smart contracts which are implemented in Solidity [37] and deployed on the Ethereum testnet. Everyone in the network is able to verify the result. After the transaction is conﬁrmed, it is hard to revoke the execution. Figure 4 is an overview of smart contracts. The M iM C library implements the MiMCFeistel function which can be used by other smart contracts. The T rading contract deﬁnes the trading rules and records the trading process which has 8 states: Initialized, P urchaseReceived, P roof V erif ied, P aymentReceived, KeyV erif ied, F inished, Ref unded, and F ailed. The V erif ier contract is used to verify the zero-knowledge proof. The M iM C library is deployed and can be used repeatedly. When the provider wants to sell his/her data, T rading contract is deployed by the provider and initialized with Initialized state. Before the consumer purchases the data, U pdateM erkleRoot is called when the provider publishes the Merkle tree root.

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

443

The consumer searches the data in data market and reaches an agreement with the provider. Then, the consumer calls P urchase. The T rading contract is changed to P urchaseReceived state. After the data block number is speciﬁed, the provider can provide the proof of (4) by developing V erif ier contract and calling V erif yP roof . When the result of V erif yP roof is valid, the contract state becomes P roof V erif ied. The provider sends the consumer the ciphertext of trading data privately. The consumer checks the ciphertext with the record on smart contract. Then, the consumer calls P ayment with tokens of the selling price and changes the contract state to P aymentReceived. The provider publishes k by calling V erif yKey. When the key is valid, the T rading contract becomes KeyV erif ied state and store the key which can be used to decrypt the ciphertext by the consumer, and sends the payment to the provider automatically. The T rading contract is now F inished. Otherwise, the payment is returned to the consumer when the proof is invalid. If the provider does not continue the process for a period of time, which is evaluated by the block number in the Blockchain, the consumer can also use Ref und to get the payment back. However, if the consumer is a Blockchain miner as well, the consumer may receive the broadcasted transaction that contains the symmetric key before it is executed. To prevent that the consumer peeks the symmetric key and sends Ref und with a higher transaction fee to let the Ref und transaction executed before V erif yKey and cause a loss of provider, Ref und is disable before enough block interval, BInterval . Moreover, both the provider and the consumer can stop the trading in each state. Although, the T rading contract is F ailed, the consumer will not lose his/her tokens and the provider’s data are not disclosed. The provider receives the payment if, and only if, the consumer can obtain the data. Table 2. The transaction cost for each method. Contract Method

Sender

Cost(gas)

MiMC

Constructor

–

3186746

Trading

Constructor

Provider

1782090

Trading

UpdateMerkleRoot Provider

64215

Trading

Purchase

Consumer

Trading

VerifyProof

Provider

Trading

Payment

Consumer

68383

Trading

VerifyKey

Provider

92014

Trading

Refund

Consumer

35519

Veriﬁer

Constructor

Provider

98011 848752

1486775

Table 2 shows the transaction cost of contract deployments and each method where the Merkle tree has a depth of 4. The cost of generating a zk-SNARK proof is corresponding to the complexity of the circuit, while the cost of interacting

444

Z.-J. Wang et al.

with Blockchain is deterministic. Although the provider spends time and resource for generating the proof and undertakes a large proportion of transaction cost in the trading process, the conﬁdentiality of the data can be ensured and the trusted data are more valuable in the data market.

7

Conclusion

We proposed a Peer-to-Peer data trading protocol employing Delivery versus Payment (DvP) with smart contracts. The provider and consumer can trade data fair without trusted third parties, environments, reputation systems or arbitrations. Zero-knowledge proofs are used as conﬁdentiality-preserving proofs to maintain the value of the data. By this approach, malicious providers are impossible to tamper the trading data or withdraw the payment without giving the data to the consumer. On the other hand, malicious consumers can not obtain data before paying. As a result, a fair trading is established and both provider and consumer rights can be ensured. Although the cost of generating zero-knowledge proofs oﬀ-chain is expensive, the transaction costs are deterministic and the process can be executed automatically without waiting for responses of third parties and the provider can receive the payment or the consumer can receive the refund immediately. In this paper, only the integrity of data is considered. In some scenarios, the provider is not selling the whole data set. Some kinds of data, such as electronic medical records [33] which reveal personal identity, should be processed by deidentiﬁcation or only part of the whole data are sold. The process can be veriﬁed by zero-knowledge proofs as well. Zero Knowledge Range Proofs [39], which can prove that an integer belongs to the given interval and keep the integer secret simultaneously, can be included in data trading for the speciﬁc data type. The provider guarantees that the content of trading data meets the conditions the consumer asks. Data compression [40] is commonly used in sensor data so that fewer bits or less disk space is required in data transmission and data storage. The provider can sell the compressed data with a zero-knowledge proof which ensures the correctness of the result. Zero-knowledge proofs can be used to ensure the conﬁdentiality of license keys when they are validated and traded. Otherwise, Ethereum is a pseudonym mechanism and weak anonymity. Secret trading can be established by hiding the details of transactions if both the provider and consumer want to remain anonymous [22]. Blockchain and Zero-knowledge proofs not only create a mutual trust between the provider and consumer in the data trading, but provide a way to preserve conﬁdentiality and security.

References 1. Douglas, L.: 3d data management: Controlling data volume, velocity and variety, 6 February 2001 2. Osman, M.: Wild and Interesting Facebook Statistics and Facts (2020). https:// kinsta.com/blog/facebook-statistics/ 1 Jan 2020

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

445

3. World Health Organization. 2020. Coronavirus disease (COVID-2019) situation reports URL: https://www.who.int/emergencies/diseases/novel-coronavirus2019/situation-reports/ 4. Chen, C.M., et al.: Containing COVID-19 among 627,386 persons in contact with the diamond princess cruise ship passengers who disembarked in Taiwan: big data analytics. J. Med. Internet Res. 22(5), e19540 (2020). https://www.jmir.org/2020/ 5/e19540. PMID: 32353827. PMCID: 7202311. https://doi.org/10.2196/19540 5. Xu, X., et al.: Eﬀective treatment of severe COVID-19 patients with tocilizumab. Proc. Natl. Acad. Sci. 117(20), 10970–10975 (2020). https://doi.org/10.1073/pnas. 2005615117 6. Le Thanh, T., Andreadakis, Z., Kumar, A., et al.: The COVID-19 vaccine development landscape. Nat. Rev. Drug Discov. 19(5), 305–306 (2020). https://doi.org/ 10.1038/d41573-020-00073-5 7. Subramanya, S.R., Yi, B.K.: Digital rights management. In: IEEE Potentials, vol. 25, no. 2, pp. 31–34, March-April 2006, https://doi.org/10.1109/MP.2006.1649008 8. Gupta, P., Kanhere, S.S., Jurdak, R.: A Decentralized IoT data marketplace (2019). https://arxiv.org/abs/1906.01799 9. Zhao, Y., Li, Y., Mu, Q., Yang, B., Yu, Y.: Securepub-sub:Blockchain-based fair payment with reputation for reliable cyber physical systems. IEEE Access 6, 12295–12303 (2018) 10. Dai, W., et al.: SDTE: A secure blockchain-based data trading ecosystem. IEEE Trans. Inf. Forensics Secur. 15, 725–737 (2020) 11. Chen, J., Xue, Y.: Bootstrapping a blockchain based ecosystem for big data exchange. In: 2017 IEEE International Congress on Big Data, BigData Congress 2017, Honolulu, HI, USA, 25–30 June 2017, pp. 460–463 (2017) 12. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2009). https:// bitcoin.org/bitcoin.pdf 13. Ethereum Foundation. A Next-Generation Smart Contract and Decentralized Application Platform (2014). https://github.com/ethereum/wiki/wiki/WhitePaper 14. Bitansky, N., Canetti, R., Chiesa, A., Tromer, E.: From Extractable Collision Resistance to Succinct Non-Interactive Arguments of Knowledge, and Back Again. Cryptology ePrint Archive, Report 2011/443. https://eprint.iacr.org/ 2011/443 (2011) 15. Liang, F., et al.: A survey on big data market: pricing, trading and protection. IEEE Access 6, 15132–15154 (2018) 16. Wang, Z.J., Lin, C.H., Yuan, Y.H., Huang, C.C.: Decentralized data marketplace to enable trusted machine economy. In: 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, pp. 246–250 (2019). https://doi.org/10.1109/ECICE47484.2019.8942729 17. Lin, C.H., Huang, C.C., Yuan, Y.H., Yuan, Z.S.: A fully decentralized infrastructure for subscription-based IoT data trading. IEEE Int. Conf. Blockchain 2020, 162–169 (2020). https://doi.org/10.1109/Blockchain50366.2020.00027 18. Xiong, W., Xiong, L.: Smart contract based data trading mode using blockchain and machine learning. IEEE Access 7, 102331–102344 (2019) 19. Zhao, Y., Yu, Y., Li, Y., Han, G., Du, X.: Machine learning based privacypreserving fair data trading in big data market. Inf. Sci. 478, 449–460 (2019) 20. Li, T., Li, D.: A Valid Blockchain-based Data Trading Ecosystem. Cryptology ePrint Archive: Report 2019/1306. https://eprint.iacr.org/2019/1306.pdf, 2019 21. Maxwell, G.: The ﬁrst successful zero-knowledge contingent pay- ment. Bitcoin Core Blog, https://bitcoincore.org/en/2016/02/26/ zero-knowledge-contingentpayments-announcement/, February 2016

446

Z.-J. Wang et al.

22. Ben-Sasson, E., et al.: Zerocash: decentralized anonymous payments from Bitcoin. In: IEEE Symposium on Security and Privacy (SP), pp. 459–474 (2014) 23. Bunz, B., Agrawal, S., Zamani, M., Boneh, D.: Zether. Towards Privacy in a Smart Contract World, IACR Cryptology ePrint Archive (2019) 24. Rondelet, A., Zajac, M.: ZETH: On Integrating Zerocash on Ethereum. arXiv preprint arXiv:1904.00905 (2019) 25. Campanelli, M., Gennaro, R., Goldfeder, S., Nizzardo, L.: Zero-knowledge contingent payments revisited: attacks and payments for services. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 229–243. ACM (2017) 26. Delgado-Segura, S., Perez-Sola, C., Navarro-Arribas, G., Herrera-Joancomarti, J.: A fair protocol for data trading based on bitcoin transactions. IACR Cryptology ePrint Archive 2017, 1018 (2017) 27. Goodrich, M.T., Tamassia, R., Triandopoulos, N.: Super-eﬃcient veriﬁcation of dynamic outsourced databases. In: Malkin, T. (ed.) CT-RSA 2008. LNCS, vol. 4964, pp. 407–424. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3540-79263-5 26 28. Merkle, R.C.: A certiﬁed digital signature. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 218–238. Springer, New York (1990). https://doi.org/10.1007/ 0-387-34805-0 21 29. SCIPR Lab. libsnark. https://github.com/scipr-lab/libsnark 30. Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017) 31. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009) 32. Decker, C., Wattenhofer, R.: Information propagation in the Bitcoin network. In: Peer-to-Peer Computing, 2013 IEEE Thirteenth International Conference, pp. 1–10 (2013) 33. Liao, S., et al.: DeepLinQ: distributed multi-layer ledgers for privacy-preserving data sharing. In: 2018 IEEE International Conference on Artiﬁcial Intelligence and Virtual Reality (AIVR), pp. 173–178 (2018) 34. Rogaway, P., Shrimpton, T.: Cryptographic hash-function basics: deﬁnitions, implications, and separations for preimage resistance, second-preimage resistance, and collision resistance. In: Roy, B., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 371–388. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-259374 24 35. iden3. snarkjs. https://github.com/iden3/snarkjs 36. Albrecht, M., Grassi, L., Rechberger, C., Roy, A., Tiessen, T.: MiMC: eﬃcient encryption and cryptographic hashing with minimal multiplicative complexity. In: ASIACRYPT, pp. 191–219 (2016) 37. Solidity (2020). https://solidity.readthedocs.io/en/latest/ 38. ETH Gas Station (2020). https://ethgasstation.info 39. Morais, E., Koens, T., Wijk, C.V., Koren, A.: A survey on zero knowledge range proofs and applications. arXiv:1907.06381v1. https://arxiv.org/pdf/1907.06381. pdf. (2019) 40. Lin, J.W., Liao, S.W., Leu, F.Y.: Sensor data compression using bounded error piecewise linear approximation with resolution reduction. Energies 12, 1–20 (2019)

A Two-Way Atomic Exchange Protocol for Peer-to-Peer Data Trading

447

41. Wang, S.H., Lin, C.X., Tu, C.H., Huang, C.C., Juang, J.C.: Autonomous vehicle simulation for asia urban areas with a perspective from education. In: 2020 International Computer Symposium (ICS) (2020). https://doi.org/10.1109/ICS51289. 2020.00095 42. Huang, C.C., Yuan, Z.S.: privacy implication and technical requirements toward GDPR compliance. In: Proceedings of the Future Technologies Conference (FTC) (2019). https://doi.org/10.1007/978-3-030-32523-7 24

Blockchain Industry Implementation and Use Case Framework: A Survey Gabriela Ziegler(B) Davenport University, Grand Rapids, USA [email protected]

Abstract. This paper explains BC Technology (BCT) and concepts in simple terms. It explores concepts and BC usability outside of currency. The focus of this research paper is to describe BC technology, surveyed business applications and the framework for identifying BC valid opportunities. It also discussed BC security issues and challenges. Keywords: Blockchain · Use case development · Emerging technology · Business processes · Cybersecurity · Bitcoin

1 Introduction This paper explains Blockchain Technology (BCT) and concepts in simple terms. It describes concepts and blockchain usability outside of currency. The focus of this research paper is to describe blockchain technology. It surveyed BCT business applications and the framework for identifying blockchain valid opportunities. It also discussed blockchain security issues and challenges. Blockchain is defined as “a distributed database of records or public ledger of all transactions or digital events that have been executed and shared among participating parties” [11] p. 1. Blockchain is a group of records linked together creating an unbreakable chain. The framework implementation is Bitcoin and Ethereum. Bitcoin was the first implementation of blockchain and supports the Bitcoin mining function, refereeing to the new application of the distributed blockchain database [29]. The basic definition of blockchain is the creation of a growing list of records (blocks) that are linked and secured using cryptography. Each block contains a cryptographic hash of the previous block. The cryptographic allows a resistance to modification of data. In this case it is an open distributed ledger (recording transactions between two parties). Transactions are efficient, verifiable, and permanent. Blockchain uses cryptography to store data in a fully distributed system. The development of Turing-complete programming languages allowed the implementation an execution of programs in the blockchain: smart contracts. These programs opened the utilization of a wide-array of blockchain-based applications. It is critical to determine if blockchain is part of the solution. Organizations need to assess the value of a technology to solve a business problem using a systematic approach to develop a convincing use case. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 448–469, 2022. https://doi.org/10.1007/978-3-031-10467-1_28

Blockchain Industry Implementation and Use Case

449

The main application of blockchain has been for distributed ledger for cryptocurrencies. There are current trends using distributed ledgers to transform business operation models to increase value to customers, such as creating permanent, public, and transparent ledger systems compiling data on sales. It is useful for creating automated escrows, authenticating copyrights registrations. IBM developed innovative applications with its ASCAP & PRS application for music distribution tracking. Kodak is developing a token system for photograph copyrights recording. Accounting firms as Ernst & Young already have provided cryptocurrency to all their Swiss employees, installed bitcoins ATMs at their offices in Switzerland, and accepts bitcoins for their consulting services [4, 7, 8]. Researchers [1, 13, 14, 24, 32, 37, 45, 46]; and [1] had studied applications or proposed blockchain use cases, challenges, and solutions. More digital currencies are being used after Bitcoin; today there are over seven hundred different digital currencies that use blockchain technology. Financial exchanges can benefit from BCT by developing decentralized systems. BCT is used in financial markets to trade stocks on a platform that is not controlled by any single governing body as opposed to current systems. All these examples support the argument that blockchain is going to disrupt different industries [16, 26, 42]. After all, the buzz for blockchain, it is critical to develop use cases to evaluate this important technology and its impact to business. BC business applications are new; there is not a proven framework to understand the technology and potential development of use cases [16]. Researchers have used different methods to evaluate use cases. Some of these frameworks are going to be explored in this paper: (1) Value-driven Business Process Management framework [26], (2) the author [36] based the framework on Greenspan’s (2015) developed six rules to assess use cases; (3) Disruption Evaluation Framework (DEF) [3]; and (4) use case framework with three categories: intermediary, data, and process [22]. Additionally, to developing a framework to assess blockchain use cases, it is important to consider security issues related to blockchain. The security issues that will be included in the discussion are scalability, privacy leakage, selfish mining, traceability, transparency, and counterfeit [46].

2 Research Objective The purpose of this research is to review existing studies’ findings and to describe the efforts of research in blockchain technology (BCT) application for industry, identify a systematic approach to develop convincing user cases, and a discussion about security issues and challenges of BCT.

3 Research Method Blockchain Technology is a relative new topic. BCT application in industry is a much newer concept, most of the industry application outside currency can be found in business and finances. Until now, the extent of peer-reviewed publications and analysis was extremely limited. Recently, there is a rising interest in using BCT to solve business problems. The research goal for this paper was to review exiting literature related to BCT application in industry and the framework used to develop use cases. The research

450

G. Ziegler

method used was structured literature review (SLR). The SLR research is the method to study peer reviewed literature developing insights, critical reflections, and proposing future research recommendations [25, 34, 41]. The SLR uses well-defined research questions to guide the study. The author developed two main research questions: RQ1: Is there a systematic approach to develop BC industry application use cases? RQ2: Are there BC industry applications? To answer these research questions the author performed a literature review [10, 12, 23]. This literature review for this research study applied a thematic review of the literature. The themes selected for the literature review were blockchain, use case development, emerging technology, business processes, cybersecurity, Bitcoin. The information found by searching peer-reviewed articles, scholarly texts, and periodicals using online platforms. Since blockchain technology is a new field of study, the study also included information available on white papers and practitioner-oriented sources, such as related forums and blogs. The platforms searched were ACM, IEEE Xplore Digital Library, Google Scholar, Research Gate, ProQuest, and EBSCOhost. Where possible, limiters were set to retrieve only scholarly material. The main objective was to locate books and journal articles that were available for the research study. The author applied an inclusion and exclusion criteria to decide which articles were going to be part of the paper [10, 12, 23] and [41]. The Inclusion Criteria (IC) was: IC1 the paper should be published from 2014 IC2 From the abstract of the paper or project described BCT application in industry IC3 the paper included at least a basic description of the system. The Exclusion criteria (EC) was: EC1 the paper was too short EC2 the paper/article was not accessible of available. The limitations of this literature review include the subjectivity of deciding what a basic description of the system was acceptable, and the subjectivity on the length of the paper to be accepted or disregarded. The author believed that the research brought a critical mass of work to describe different industry application outside currency and several frameworks’ applications to develop BCT use cases. The author also researcher cybersecurity to include a discussion of BCT security issues and challenges. Once the literature was accepted, the next step was to complete a literature review. The literature was summarized in abstracts for the literature review. Table 1 categorizes the literature review answering the research questions.

Blockchain Industry Implementation and Use Case

451

Table 1. Overview of blockchain use case frameworks and industry case studies by research question Research question

Discussion

Framework-application summary

Industry

Authors

RQ1: Is there a systematic approach to develop BC industry application use cases?

Value-driven Business Process Management framework

Uses 7 categories to analyze new technology: Transparency (core value), networking, quality, agility, integration, efficiency, and compliance

Milani, Garcia & Dumas (2016)

General framework to assess technology use cases

General rules to assess use cases: (1) Agreed and shared database, (2) Multiple parties need to edit the database, (3) Transactions by different writers interact or even depend on each other, (4) The parties who can edit do not trust each other, (5) don’t use a third party as an intermediator and (6) able to connect the blockchain database to the real world where needed

Seppala (2016)

Disruption Evaluation Framework

This framework evaluated the use case under seven different perspectives: Business model, technical, legal, regulatory, financial, and operational. The DEF includes a check-list type of questions to evaluate use cases

Batlin (2016)

(continued)

452

G. Ziegler Table 1. (continued)

Research question

Discussion

Framework-application summary

Industry

Authors

General Use Case Evaluation

Consist of three categories: (1) intermediary exploring the existence & role of intermediaries in the use case, (2) assesses the use of data; and (3) process, the potential for automation

Klein, Prinz, Grather (2018)

Action design research

Developed 6 steps that are applied in a waterfall: (1) understanding blockchain technology, characteristics, and application (2) get creative & unbiased by using blockchain technology to approach potential use cases; (3) presenting and discussing existing use cases; (4) get creative-informed on how the organization can use blockchain, (5) structure ideas by clustering, prioritizing, and assessing specific use cases; (6) prototype phase based on specific use cases from step 5

Fridgen, Lockl, Radszuwill, Rieger, Schweizer, & Urbach (2018)

Flowchart

The DHS US developed a flowchart for industry to follow when assessing if BCT is feasible for implementation

DHS US in Yaga, Mell, Roby & Scarfone (2019) (continued)

Blockchain Industry Implementation and Use Case

453

Table 1. (continued) Research question

RQ2: Are there BC industry applications?

Discussion

Framework-application summary

Industry

Authors

BC Playbook

The America Council for Technology and Industry Advisory Council (ACT-IAC) developed a manual including a set of questions to determine if BCT would be a solution

Logistics use case

Categorizing use cases based on attributes of innovation. The implications on two dimensions: novelty and coordination effort

Citizens Broadband Radio Service

Entrepreneurial Mobile opportunity to serve Network customers better and Operators differently framed by enablers, limiting factors as well as challenges caused by the business context

Supply Chain

Blockchain ready manufacturing supply chain using distributed ledger

The Worldwide Ledger

The blockchain can hold Ledgers any legal document, from deeds and marriage licenses to educational degrees and birth certificates

Tapscott & Tapscott (2016)

Blockchain security in cloud computing

it provides security through the authentication of peers that share virtual cash, encryption, and the generation of hash value

Park & Park (2017)

ACT-IAC in Yaga, Mell, Roby & Scarfone (2019)

Logistics

Dobrovnik, Herold, Fürst & Kummer (2018)

Yrjola (2016)

Manufacture Abeyratne & Monfared (2016)

Financial Services

(continued)

454

G. Ziegler Table 1. (continued)

Research question

Discussion

Framework-application summary

Industry

Authors

Distributed Ledger technologies/BC

Challenges, opportunities, and prospects for standards

Ledgers

Deshpande, Steward, Lepetit & Gunasheka (2017)

IBM Blockcahin Microsoft Azure as a service

Cloud Computing

IBM (2016)

Loyyal

Loyalty and rewards platform in the financial services built with blockchain & smart contracts technology

Financial Services

Loyyal, 2014 & Morabito (2017)

Everledger

Establishing an Security electronic ID, digital passports, and preventing illegal activities by using blockchain & smart contracts technology. Introducing transparency into diamond industry

Everledger (2015) & Morabito, (2017)

GemHealth

Enabling collaboration by allowing sharing and transferring healthcare data

Health Care

GemHealth (2014) & Morabito, (2017)

Wave

Global trade transaction using blockchain technology

Financial Services

Wave (2014) & Morabito (2017)

AlignCommerce

B2b payment process Financial combining blockchain Services technology with traditional banking transactions and treasury operations

AlignCommerce (2014) & Morabito (2017)

Civic

Identity management Security service to protect against identity theft

Civic (2015) & Morabito (2017) (continued)

Blockchain Industry Implementation and Use Case

455

Table 1. (continued) Research question

Discussion

Framework-application summary

Industry

Authors

ShoCard

Digital identity verification card platform using a mobile app on top of the public blockchain data layer

Security

ShoCard, (2015) & Morabito (2017)

Factom

Verifying integrity of the Security data used by the business and data generated from the IoT by using blockchain’s distributed ledger and its data security features

Factom, (2014) & Morabito (2017)

Focusing on industrial Supply practices and use cases. Chain Assessing BC applicability in the supply chain. The authors reviewed several studies and projects of BCT application in supply chain

Gonczol, Katsikouli, Herskind & Dragoni (2019)

Researcher how BCT is used to improve cubersecurity. The authors surveyed several industry projects

Taylor, Dargahi, Dehghantanta & Parizi (2020)

AWS, Azure, IBM cloud, SAP cloud, HP Helion, Oracle Cloud

Several

Presented a Several comprehensive survey of different BCT applications and use cases, implementing the technology in a secure and trustworthy manner

Rawat, Chaudhary, Doku (2020)

Presented a summary of BCT use cases software projects

Z¯ıle & Strazdi¸na (2018)

Several

Performance Analytical Cloud Comparison of Computing Blockchain-as-a-Service (BaaS) Platforms

Onik, Mehedi Hassan & Miraz (2019)

(continued)

456

G. Ziegler Table 1. (continued)

Research question

Discussion

Framework-application summary

Industry

Authors

Completed research describing several BCT application case studies

Several

Treiblmaier. (2019)

4 Brief Overview of Blockchain Blockchain was designed by Satoshi Nakamoto in 2007 to be used by bitcoin. Nakamoto’s works was based on Haber and Stonetta’s (1991) and Bayer, Haber and Stonetta’s (1992) cryptographically secure chain of blocks research. In 2008, Satoshi Nakamoto conceptualized the first block chain. It was implemented as the cryptocurrency bitcoin. This cryptocurrency acts as public ledgers for all transactions in a network. Bitcoin solved the double spending problem without a required trusted authority. Block Chain became blockchain in 2016, refereeing to the new application of the distributed blockchain database [29]. Blockchain is a “distributed ledger” where all transactions and parties involved are documented. Each block can be considered a page in the ledger. The cryptographic allows a resistance to modification of data. In this case it is an open distributed ledger (recording transactions between two parties. Transactions are efficient, verifiable, and permanent [29, 42]. The chain is constantly growing with new blocks added to the current blockchain. Each transaction is broadcasted in the network using cryptography communication. The transactions are verified using proof-of-work and a new block is created. Once the new block is added to the blockchain, a new copy of the block is

Fig. 1. A typical example of BCT distributed ledger technology- without trusted party, Rawat et al., 2020 License CC BY 4.0

Blockchain Industry Implementation and Use Case

457

broadcasted to the entire network. The BCT runs in a peer-to-peer network in which there is not a trusted third-party verification. Figure 1 shows a simple example of distributed ledger technology [6, 42]. It is important to understand how blockchain works. Each block contains a cryptography hash of the following block. It has a timestamp and transaction data. A peer-to-peer network manages the blockchain. It uses a communication protocol for inter-node communication and validating new blocks. It is also critical to emphasize that once recorded, data cannot be changed retroactively without changing all chained blocks [29, 33] (see Fig. 2).

Fig. 2. Typical block with header and transaction in blockchain technology license CC BY 4.0

There are three distinct types of blockchains: (1) public (Bitcoin & Ethereum) which does not have any access restrictions. There are economic incentives for those using proof-of-Work Algorithms to ensure validation. Everybody can participate worldwide, (2) private which are permissioned by invitation to join by network administrators. Participants and validators are restricted. An organization controls the write rights, and (3) consortium Blockchains which are semi-decentralized and permissioned by consortium administrators [46]. As described, Bitcoin was the first implementation of blockchain. Bitcoin is a cryptocurrency and a global payment system. It is a decentralized digital currency without the need of a central bank as all other currencies need. The networks are peer-to-peer, network nodes verify transactions by using cryptography, and transactions are recorded in a public distributed ledger (blockchain). Bitcoin creation uses mining, these coins can

458

G. Ziegler

be exchanged for other currencies, services, products, and others [5]. This peer-to-peer payment network uses the cryptographic protocol. Users transmit bitcoins broadcasting digitally signed messages. Bitcoin uses cryptocurrency wallet software. In summary, blockchain is the public database in which transactions are recorded. The function of mining is a record-keeping service using computer processing power. Blockchain mining is the complex process of creating new bitcoins and avoiding duplication of using same coin. In the next section, the paper discusses Bitcoin mining algorithms [46]. Purpose of Bitcoin mining is to ensure all participants have a consistent view of Bitcoin data, core of the cryptocurrency. The second purpose is to create new Bitcoins. This mining function works in a peer-to-peer system without a central database. The function creates a log of all transactions that are distributed across the network. A particularly important function is to avoid inconsistencies of spending same bitcoins twice by using Proof-of-Work. Each mined block references the previous block creating an unbroken chain back to the first Bitcoin block [24, 37]. Authors included in their research a basic explanation on how the Bitcoin mining works: it uses a cryptography with a hash function double SHA-256. This hash function takes chuck of data as input, collecting new transactions into a block. It shrinks it down into a smaller hash value (256-bit block hash value). If a block has enough zeros, then this block is successfully mined and sent to Bitcoin network. One key point here is that it uses cryptography hashing to implement Bitcoin Proof-of-Work [24, 37]. Figure 3 explains the structure of the block-hashing [24, 37]. The block header describes the block, and transactions that go into the block: (1) Coinbase, (2) Bitcoin transactions. The block structure contains the hash of previous block in blockchain ensuring unbroken sequence in the blockchain. It also contains the Merkel root: special hash of all transaction in the block, key in Bitcoin security by ensuring transactions cannot be changes once part of the block. It also has the Timestamp of block and the Mining difficulty value in bits. The most difficult part is the Nonce, one that works, an arbitrary value incremented on each hash attempt providing new hash value.

Fig. 3. The structure of blockchain, Liang, 2020 License CC BY 4.0

5 Blockchain Innovation and Disruption Blockchain 1.0 started for the transfer of cryptocurrencies, like Bitcoin. Today there are more than seven hundred cryptocurrencies with approximately $26 Billion dollars

Blockchain Industry Implementation and Use Case

459

capital. Blockchain 2.0 introduced smart contracts. Ethereum is a typical BC 2.0 system. Smart contracts, enabled by the blockchain technology, “are self-executing contracts without extra enforcement. The contractual clauses between nodes are converted into computer programs in a form such as “If-Then” statements. The executable computer programs are then securely stored in the blockchain. When the predefined conditions in smart contract are satisfied, the clauses in smart contracts will be executed autonomously, and the execution will be recorded as an immutable transaction in the blockchain (p. 128) [39]. The implementation of technologies based on blockchain such as Ethereum, an open software platform enabling developers to build and deploy decentralized applications, and Hyperledger, an open-source collaborative effort created to advance crossindustry technologies. These platforms provided Turing-complete programming languages that allows for the implementation and execution of programs on the blockchain. A programming language is considered Turing Complete if it can simulate a single taped turing machine. Programming languages such as C, C++, C#, Java, Lua, Python are Turing complete. These programs are known as smart contracts and allow for a wide range of blockchain-based applications. These new developed platforms had opened the implementation of blockchain based applications across industries being highly disruptive [3, 16].

6 Use Case Frameworks Although blockchain technology is being used across industry, there is an uncertainty about the impact of blockchain on individual organizations. There is not a structured approach to develop blockchain use cases. Traceability, anti-fraud, trust management, transparency and IoT need to be integrated in the BCT use case as a business solution. The literature research in blockchain use cases brought different frameworks used across industry. A value-driven Business Process Management framework classifies the value of the technology to business in seven categories: Transparency (core value), networking, quality, agility, integration, efficiency, and compliance [26]. Industry can use a six rule assessment [36] based on Greenspan’s (2015) to assess use cases: “(1) You need a shared database on which all parties agree upon, (2) Multiple parties need to edit the database, (3) Transactions by different writers interact or even depend on each other, (4) The parties who can edit do not trust each other, (5) You can’t/don’t want to use a trusted third party as an intermediator and (6) You have a way to connect the blockchain database to the real world where needed” (p. 24). The Disruption Evaluation Framework (DEF) [2] evaluates the use case under seven different perspectives: Business model, technical, legal, regulatory, financial, and operational. The DEF includes a check-list type of questions to evaluate use cases. Klein, Prinz, and Gather’s [23] framework consist of three categories: intermediary, data, and process: (1) intermediary exploring the existence and the role of intermediaries in the use case, being blockchain functions independent and incorruptible intermediary, (2) assesses the use of data; and (3) process, the potential for automation in the use case can be assessed.

460

G. Ziegler

Applied action design research can be used to develop their blockchain use case development [16]. The authors developed 6 steps that are applied in a waterfall: (1) understanding blockchain technology, characteristics and application using traditional methods such as training, workshops and prototype presentations; (2) get creative and unbiased by using blockchain technology to approach potential use cases; (3) Glance in the market by presenting and discussing existing use cases; (4) get creative-informed on how the organization can use blockchain, leading to more specific use cases; (5) structure ideas by clustering, prioritizing and assessing specific use cases; (6) prototype phase based on specific use cases from step 5. Public and private organizations have designed frameworks to assess BCT implementation feasibility. The United States Department of Homeland Security developed a flowchart describing a sequence of requirements to meet when committing to implement BCT. The American Council for technology and industry advisory Council (ACT-IAC) developed a BCT playbook for insdtry to follow when assessing BCT feasibility [44]. It is important to use at least a three criteria assessment [47] to analyze the feasibility of BCT to solve business problems: (1) BC content needs to be available and dependable in the eyes of the user, (2) Applications using BCT need to be secure and (3) applications using BCT need to be trustworthy. Figure 4 summarizes BCT framework application to industry:

Fig. 4. Summary of BCT framework industry application

7 Industry’s Use Cases Blockchain is going to disrupt several different industries [3, 4] Researchers described several use cases applications and proposed uses the areas of finances, risk management, social, legal, Internet of Thinks (IoT), cloud computing, and broadcasting services [1, 7, 8, 13, 14, 16, 18–20, 24, 32, 38, 45, 46]. The blockchain revolution can be compared to spreadsheets and PCs revolution many years ago [42]. Blockchain “…has gained

Blockchain Industry Implementation and Use Case

461

widespread traction and it is attracting investments like no other emerging technology” [26] (p. 1). After all the buzz for blockchain, it is critical to develop use cases to evaluate this important technology and its impact to business. Blockchain disruptive technology bring to the front the need of a framework that could be replicated to assess use cases and its impact of business. “There is a lack of systematic approach to understand blockchain technology, its potential and the development of viable use cases” [16]. Researchers such as [1, 9, 14, 24, 32, 38, 45, 46] had studied several applications or proposed blockchain use cases, challenges, and solutions. More digital currencies are being used after Bitcoin, today there are over seven hundred different digital currencies that use blockchain technology. Financial exchanges can benefit from BCT by developing decentralized systems (examples of use cases Coinbase, ItBit or Kraken). Financial markets could trade stocks on a platform that is not controlled by any single governing body as opposed to current systems. Risk management combined with blockchain could be used to analyze investment risk in the Luxembourgish scenario. Blockchain-based smart contract enables the decentralized autonomous organizations (DAO) making possible business-work collaborations using BCT and digital identities to improve security proposing issuing “private keys” for voters to authenticate the voting process. Using blockchain based smart contracts to reduce amount of human involvement required to create, execute, enforce a contract, lower cost, and assuring execution and enforcement processes. BCT is used for asset tracking to trace physical assets allowing a record of ownership to be maintained for each asset (Example use cases Everledger and Provenance Company). BCT is used to register and publicizes land physical status and related rights; energy saving by using a digital currency to reward solar energy producers (example solarcoin); education by applying BC to the online educational market; and IoT by using BCT to integrate the things providing users with diverse services. Blockchain can also help in improving privacy in IoT applications (example IBM proof of concept for Autonomous Decentralized Peer-to-Peer Telemetry (ADEPT)). Combining BCT with cloud computing to enhance cloud environment security using BCT in combination of Broadcast Radio Service reducing transaction costs through automation. There are benefits of adopting BCT into current cloud platforms [31]. The author completed an analytical comparison of Blockchain-as-a-Service (BaaS) Platforms. The benefits listed are: (1) the adoption is seamless a cost effective, (2) regulations such node verification, attachment, deletion need to be eliminated in the adoption, but BaaS take care of them; and (3) BaaS BCT is based on current cloud infrastructure, making PaaS, IaaS, SaaS and other native to BaaS. The current BaaS platforms in the market are Microsoft Azure Blockchain Workbench (Ether.Camp and BlockApps) providing a BCT Ethereun BC application development environment. Amazon developed a blockchain service based on Hyperledger in two different forms: Amazon Quantum Ledger Database (QLDB) and Amazon managed Blockchain. Additionally, Amazon’s AWS provides developers with a wide selection of blockchain frameworks. IBM revealed developed its BaaS using the Hyperledger fabric on IBM cloud. Hewlett Packard Enterprise (HPE) introduced their first BaaS named ‘Mission Critical Distributed Ledger Technology’ (MCDLT). Oracle developed Oracle Blockchain Cloud Service (OBCS) besides their already established Platform as a service (PaaS) and Software as a Service (SaaS). SAP introduced both SAP-Cloud Platform

462

G. Ziegler

Blockchain Service and SAP HANA Blockchain Service. SAP HANA connects to the most popular enterprise blockchain platforms [31]. There is an increase of BCT applications in industry, specifically in supply chain. A study surveyed BCT use cases for supply chain [17]. The authors reviewed thirty-nine [39] use cases between academic solutions and industry projects. In the food sector, Walmart, in the medicine sector, projects such as PharmaTrace, Mediledgeris, Good Distribution; in the shipping industry, TradeLens in collaboration with IBM and Maersk; Openport and ShipChain are BCT to implement traceability of the product life cycle. Critical to prevent food illness outbreaks, prevent exposure to unsafe conditions and providing transparency in the shipping travels. Projects using BCT such as Insurwave, CargoX, Cargo Coin, Skuchain and others are implementing the technology to achieve “accessible, trusted simple source of truth across different stakeholders” (p. 11862). BCT improves cyber security. Projects such as Hyperledger Fabric, a private IoT, implements BCT to track data management and avoid malicious access [39]. As discussed, BCT has disrupted the industry. There are different BCT applications and use cases, implementing the technology in a secure and trustworthy manner [33]. The authors completed a comprehensive survey (see Fig. 5).

Fig. 5. Different blockchain applications and use cases. Rawat et al. 2020 License CC BY 4.0

Other researcher presented a comprehensive survey of use cases using BCT providing an overview and general idea of software solutions using distributed ledger technology [39]: Data Management: Developing BCT’s use cases in the network infrastructure (Eris, Mastercoin, Chromaway, NXT); content and resource distribution (Swarm); cloud storage (Storj, Maidsafe, PeerNova); Identity data management (UniquID, SolidX, OneName, uPort Microsoft, IBM, Shocard); contract management (Ethereum, Mirror); Inter organizational data management (Multichain); system metadata storage (Blockstack); data replication and protection (Securechain); digital content publishing and selling (Ascribe). Data Verification: Developing BCT’s use cases in the photo and video proofing (Uproov); document notarization (BitCourt, Blocksign, Enigio Time, Stampery); work history verification (APPII); Academical verification (Sony Global Education); Identity verification and key management (Microsoft, Authentichain, Everpass); product quality verification (Everledger, Verisart, Bitshares, Bitreserve); proof of origin (Provenance, Tierion, artPlus, Stampery).

Blockchain Industry Implementation and Use Case

463

Financial: Trade finance (Barclays, Santander, BNP Paribas); currency exchange and remittance (Kraken, Bitstamp, Coinbase, BitPesa, Bitso, Coincheck, RobinHood, Huobi); P2P payments (Codius, BitBond, BitnPlay, BTCjam; crowdfunding (Waves, Starbase); insurance (Insurechain); stock share and bond issuing (Chain); central bank money issuing (Sweden); supply chain management (Eaterra, Profeth); value transfer and lending (Ripple, Monero, Bitcoin, Litecoin, Zcash, etc.) Other Applications Prediction recording (Augur, Gnosis); social voting system (Thankscoin); rides sharing (Arcade City, La’Zooz); domain name registration (Namecoin); health care record storing (DNA.bits, Medicare, BitHealth, Medvault); software license validation (IBM); content or product timestamping (Po.et, Nexus Group); lottery (Lastis, EtherPot); property right registration (Georgia Land Register, Ascribe, ChromaWay, BitLand); social ranting, creation, monitoring (SOMA); voting in elections (European Parliament, BallotChain); marriage registration (Boderless.tech); court proceedings (PrecedentCoin); donations (BitGive); Computational power outsourcing (SETI@home, Folding@home); electronic locks (Slock.it); electro energy selling (TransActive Grid); product tracing (Blockverify); gaming (PlayCoin, Deckbound); and review and endorsements (TRST.im, Asimov, The World Table). Case studies described BCT applications in industry [40]. The purpose of his research was to develop an approach to “systematically transfer industry experience into research agendas which benefit both theory development and testing as well as design science research. In this paper, I offer guidelines and suggestions on how to design and structure Blockchain case studies to create value for academia and the industry.” (p. 1). The case studies researched are: The impact of theory-based factors on the implementation of various Blockchain technologies using cases in the energy sector; FabRec: A prototype for a peer-to-peer network of manufacturing nodes; Potential impact of the Blockchain on the pricing model and organizational design of Ryanair as well as the behavior of pilots; Development of a wine supply traceability system. “Blockchain for Education” platform which issues, validates and shares certificates; Smart contracts in the real estate industry; A seller/buyer reputation-based system in a Blockchain-enabled emission trading application; a Blockchain cloud manufacturing system as a peer to peer distributed network platform; Grain Quality Assurance Tracking based on a Blockchain Business Network in Brazil; Ascribe.io: A solution to identify and authenticate ownership of digital property; The Brooklyn Microgrid: A Blockchain-based microgrid energy market without the need for central intermediaries; The disruptive potential of the Blockchain in the record industry; Architecture of Lykke Exchange, a marketplace for the exchange of financial assets; Backfeed: A three-layered system that allows for the production, recording, and actualization of value; Application of the Blockchain to facilitate machine-to-machine (M2M) interactions and establish an M2M electricity market in the chemical industry; Application of the Blockchain to e-residency in Estonia; Austrian case study in which the Blockchain was used to play the game Go on the façade of a public building; Ecommerce platform of Hainan Airlines; FHIRChain: Applying Blockchain to securely share clinical data [40].

464

G. Ziegler

8 Security Issues and Challenges Research has acknowledged that BCT improves cybersecurity [39]. Internet of Thinks (IoT) authentication of devices and authentication of end user devices can use BCT to secure deployment of firmware. It could also help threat detection and malware prevention. Another improvement in cybersecurity would be to secure data storage and sharing in the cloud. Additionally, BCT improves Network security by storing data authentication in a decentralized and robust manner. “Blockchain as a popular ledger technology has the potential to be leveraged in different areas of the cyber space. Blockchain attempts to reduce transaction risks and financial fraud, owing to its characteristics such as decentralization, verifiability, and immutability for ensuring the authenticity, reliability, and integrity of data” [35] (p. 154). This paper discussed the following BCT security issues: (1) scalability, considering the increasing number of transactions, the blockchain becomes heavy. Transaction validation is based on the data storing process. Bitcoin blockchain only can process seven transactions per second. Small transactions are delayed because block capacity is small; (2) privacy leakage, to make blockchain safe, transactions are made with generated addressed and not with real identities. The issue is that blockchain cannot guarantee the transactional privacy: the values of the transactions and balances are publically visible; (3) selfish mining, the blockchain is vulnerable to colluding selfish miners, (4) traceability of the product life need to be maintained, (5) BCT is vulnerable if there is a lack of transparency and counterfeit measures, BCT 2.0 major risk vulnerability are criminal contracts, and (6) lack of compliance and regulations making the technology vulnerable to cybersecurity attacks. [17, 23, 39, 46] and [43]. The common security risks between BCT 1.0 and 2.0 are [24, 39], and [46]: (1) Fifty-one percent Vulnerability-based in a series of distributed consensus mechanisms is to establish mutual trust. Li et al. (2020) further described that “…if a single miner’s hashing power accounts for more than 50% of the total hashing power for the entire blockchain, then 51% attacks may be launched” (p. 7). This 51% vulnerability leave the BC open for the following attacks: (a) reverse transactions and initiate double spending attack (the same coins are spent multiple times), (b) exclude and modify the ordering of transactions, (c) hamper normal mining operations of other miners, and (d) impede the confirmation operation of normal transactions. (2) Private Key Security (identity and security credential for transactions) is vulnerable to hacking attacks when the signature process does not produce enough randomness. Once the private key is lost, it cannot be recovered. It also would be difficult to track the attacker because BC does not use third-party verification. (3) Criminal activity, bitcoin users can have different addresses and identities not related to a real person. Cybercrimes that can be committed in a BC system are ransomware, underground market, and money laundering. (4) Double spending. BC consensus mechanism validate each transaction, consumers can use the same cryptocurrency several times.

Blockchain Industry Implementation and Use Case

465

(5) Consistency a property in which all nodes have the same ledger at the same time. Some argued that BCT can provide an eventual consistency, which is considered weak. Eventual consistency is a tradeoff between availability and consistency. BC applications need to choose the privacy and security technologies that will fit their requirement needs. Usually, a combination of technologies is the answer. “Enigma combines cutting edge cryptographic technique SMPC and hardware privacy technology TEE with blockchains to provide computation over encrypted data at scale” [46], (p. 51).

9 Conclusions and Recommendations This paper researched structured frameworks used to evaluate blockchain use cases. It presented a structured review of the literature of frameworks used; use cases in different industries and an overview of blockchain security issues and challenges. It explained Blockchain Technology (BCT) and concepts in simple terms. It described concepts and blockchain usability outside of currency. The author concluded that there was not a systematic approach to understand blockchain technology, or a structured framework to develop applicable use cases. Researchers and practitioners agree on several security and challenges when using blockchain technology scalability, privacy leakage, and selfish mining. Due to blockchain technology being new, the literature review completed in this paper consisted of scholarly, practitioner, industry papers and web blogs. Some of the literature is based on non-peer-reviewed articles. However, this paper contributes to the ongoing discussion about blockchain applications, and the need for developing a framework to assess usable use cases. The literature review was not an exhaustive review but could help future researchers to better examine blockchain technology investigations for different applications. It is critical that practitioners can identify potential applications and develop user case scenarios to assess the application of blockchain technology as a business solution, and to identify suitable business opportunities and corresponding starting points. Blockchain technology will be used to substantially disrupt the digital economy. The fundamental and initial concern of the Blockchain application is to ensure and retain trust [27, 28]. Because of rapids changes in technology and based on this literature review, use case frameworks need to be adjusted depending on the actual application. There are valid use cases for each application that need to be determined carefully. It will be critical for businesses to be able to assess technology and weigh the benefits and drawbacks of utilizing them. Blockchain technology is still in its infancy, and businesses should handle it as they would any other technological option at their disposal: employ it only when necessary [44]. An important consideration for future research would be to study how BCT adoption differs across industries [2]. The use cases presented in this paper are general but relevant to other sectors such as insurance, healthcare, distribution, manufacturing, and IT. Future work is recommended to include additional case studies with practitioners in different sectors in which private blockchains could be considered. Specifically for any situation in which two or more organizations need a shared view of reality. It would be helpful to develop workshops with these practitioners, obtaining results that can be used to further improve the

466

G. Ziegler

use case framework, and test its applicability for different industries. Additionally, it is recommended to develop a raking system to calculate an overall score to help in the decision-making process of refining the use case framework. Lastly, the frameworks analyzed in this paper were based in today’s blockchain technology and they should be updated when technology advances occur. Future research can focus on how blockchain can disrupt business and help overcome the challenges in different organization areas (finances, supply chain, AI, etc.) by creating new decentralized organizations, applications, and services. Additionally, researchers should focus on investigating the ways in which blockchain adoption can be promoted [21, 29]. With respect to blockchain security, the author recommends including a study on efficiency to assess security and the potential of the massive amount of information that could be transmitted on an application suitable for blockchain technology. As stated, blockchain as a technology is still very new. It is critical to continue reviewing use cases applications that will be put forward by business in the future to have more sources of information that could help to develop a more systematic approach to develop a framework to examine blockchain technology and to develop valid use cases. It is equally important to assess the timing of developing a systematic approach to develop use case frameworks for implementing blockchain technology. A late development might miss opportunities that the new technology could bring at an earlier stage of development. One area that this research paper did not address was the use of artificial intelligence (AI) and BCT. AI and blockchain have recently emerged as two of the most disruptive technologies. Blockchain technology can automate payment transactions and give decentralized, secure, and trusted access to a shared ledger. AI, on the other hand, provides intelligence and decision-making capabilities to computers that are humanlike [9, 22, 30]. AI and BCT can help and assist each other. Many organizations are working to promote the IoT and blockchain connection [15]. Literature indicates that blockchain adoption for AI applications is still in its early stages, with numerous research challenges to be addressed and overcome in areas such as privacy, smart contract security, trusted oracles, scalability, consensus protocols, standardization, interoperability, quantum computing resiliency, and governance [9, 22, 30]. In summary, blockchain technology is significantly disrupting a wide spectrum of business processes. As this technology is used more and there is more information to include in further evaluations, experts might consider exploring the opportunities that this technology can bring to business processes rather than replacing existing technologies with a new one. There are significant cultural and technical challenges brought forward with the implementation of blockchain technology to solve business problems. The literature review provided a good overview of the technology, use case frameworks and the security issues but it was not exhaustive and did not cover all areas of business. The recommendation for future research is to do case studies in different industries and propose a structured framework which could be used as a guide for organizations to determine if blockchain would be a technology solution for a determined case.

Blockchain Industry Implementation and Use Case

467

Much research is required to develop a systematic framework to develop use cases at an industrial scale with close collaboration between academia, private companies and the public sector, federal and state government. It can be concluded that the trend of investing in blockchain will keep growing to address the increased market demands.

References 1. Abeyratne, S.A., Monfared, R.P.: Blockchain ready manufacturing supply chain using distributed ledger. Int. J. Res. Eng. Technol. 05(09), 1–10 (2016) 2. Akter, S., Michael, K., Uddin, M.R., McCarthy, G., Rahman, M.: Transforming business using digital innovations: the application of AI, blockchain, cloud and data analytics. Ann. Oper. Res. 308(1–2), 7–39 (2020). https://doi.org/10.1007/s10479-020-03620-w 3. Batlin, A.: Crypto 2.0 Musings–Blockchain disruption evaluation [WWW Document]. LinkedIn Pulse. https://www.linkedin.com/pulse/crypto-20-musings-blockchain-disruptionevaluation-alex-batlin/ (2016). Accessed 8 Nov 2018 4. Bayer, D., Haber, S., Stornetta, W.: Improving the efficiency and reliability of digital timestamping. Sequences. 2, 329–334 (1992). https://doi.org/10.1007/978-1-4613-9323-8_24.Ret rieved04/17/2018 5. Bitcoin: In Wikipedia. https://en.wikipedia.org/wiki/Bitcoin (n.d.). Accessed 17 Apr 2018 6. Blockchain: In Wikipedia. https://en.wikipedia.org/wiki/Blockchain (n.d.). Accessed 17 Apr 2018 7. Buntinx, J.: Future Use Cases for Blockchain Technology: Copyright Registration. Bitcoin.com (2015). Accessed 01 Nov 2018 8. Catalini, C., Gans, J.: Some simple economics of the blockchain. Electr. J. (2016). https:// doi.org/10.2139/ssrn.2874598 9. Chattu, V.K.: A review of artificial intelligence, big data, and blockchain technology applications in medicine and global health. Big Data Cognit. Comput. 5(3), 41 (2021) 10. Creswell, J.W.: Educational research: planning, conducting, and evaluating quantitative and qualitative research. Prentice Hall, Upper Saddle River (2015) 11. Crosby, M., Nachiappan, P., Verma, S., Kalyanaraman, V.: Blockchain technology: beyond Bitcoin. Appl. Innov. Rev. 2, 7–19 (2016) 12. Delattre, M., Ocler, R., Moulette, P., Rymeyko, K.: Singularity of qualitative research: from collecting information to producing results. Tamara J. 7(7.3), 33–50 (2009) 13. Deshpande, A., Steward, K., Lepetit, L., Gunashekar, S.: Distributed Ledger Technologies/Blockchain: Challenges, Opportunities, and Prospects for Standards. British Standards Institution, Overview Report (2017) 14. Dobrovnik, M., Herold, D., Fürst, E., Kummer, S.: Blockchain for and in logistics: what to adopt and where to start. Logistics 2(18) (2018). https://doi.org/10.3390/logistics2030018 15. Durneva, P., Cousins, K., Chen, M.: The current state of research, challenges, and future research directions of blockchain technology in patient care: systematic review. J Med Internet Res 22(7), e18619 (2020). https://doi.org/10.2196/18619 16. Fridgen, G., Lockl, J., Radszuwill, S., Rieger, A., Schweizer, A., Urbach, N.: A solution in search of a problem: a method for the development of blockchain use. In: Proceeding of the 24th Americas Conference on Information Systems, New Orleans (2018) 17. Gonczol, P., Katsikouli, P., Herskind, L., Dragoni, N.: Blockchain implementations and use cases for supply chains-a survey. IEEE Access 8, 11856–11871 (2020). https://doi.org/10. 1109/ACCESS.2020.2964880 18. Greenspan, G.: Four Genuine blockchain use cases/ MultiChain [WWW Document] http:// www.multichain.com/blog/2016/05/four-genuine-blockchain-use-cases/ (2016). Accessed 03 Nov 2018

468

G. Ziegler

19. Haber, S., Stornetta, W.: How to timestamp a digital document. J. Cryptol. 3(2), 99–11 (1991). https://doi.org/10.1007/bf00196791. Accessed 01 Nov 2018 20. Hodson, R.: Analyzing documentary accounts. Sage Publications, CA. http://www.mul tichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ (2002). Accessed 03 Nov 2018 21. Salah, K., Rehman, M.H.U., Nizamuddin, N., Al-Fuqaha, A.: Blockchain for AI: review and open research challenges. IEEE Access (2018) 22. Klein, S., Prinz, W., Grather, W.: A use case identification framework and use case canvas for identifying and exploring relevant blockchain opportunities. In: Print, W., Hoschka, P. (eds.) Proceedings of the 1st ERCIM Blockchain Workshop 2018, Reports of the European Society for Socially Embedded Technologies (ISSN 2510-2591) (2018). https://doi.org/10.18420/blo ckchainin2018_02 23. Leedy, P., Ormrod, J.: Practical Research: Planning and Design. Pearson, Upper Saddle River (2012) 24. Lin, L.C., Liao, T.-C.: A survey of blockchain security issues and challenges. Int. J. Netw. Secur. 9(5), 653–659 (2017). https://doi.org/10.6633/IJNS.201709.19(5).01) 25. Massaro, M., Dumay, J., Guthrie, J.: On the shoulders of giants: undertaking a structured literature review in accounting. Account. Audit. Account. J. 29, 767–801 (2016). https://doi. org/10.1108/AAAJ-01-2015-1939 26. Milani, F., Garcia-Bunuelos, L., Dumas, M.: Blockchain and business processes improvement. [WWW document]. https://www.bptrends.com/blockchain-and-business-process-imp rovement/ (2016). Accessed 03 Nov 2018 27. Miraz, M.H., Ali, M.: Applications of blockchain technology beyond cryptocurrency (2018). arXiv preprint arXiv:1801.03528 28. Morabito, V.: Business Innovation Through Blockchain. Springer International Publishing, Cham (2017) 29. Narayanan, A., Bonneau, J., Felten, E., Miller, A., Goldfeder, S.: Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction. Princeton University Press, Princeton (2016) 30. Lopes, V., Alexandre, L.A.: An overview of blockchain integration with robotics and artificial intelligence (2018). arXiv preprint arXiv:1810.00329 31. Onik, M.M.H., Miraz, M.H.: Performance analytical comparison of blockchain-as-a-service (BaaS) platforms. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2019. LNICSSITE, vol. 285, pp. 3–18. Springer, Cham (2019). https://doi.org/10.1007/9783-030-23943-5_1 32. Park, J.H., Park, J.H.: Blockchain security in cloud computing: use cases, challenges, and solutions. Symmetry 9, 164 (2017). https://doi.org/10.3390/sym9080164/sym9080164 33. Rawat, D.B., Chaudhary, V., Doku, R.: Blockchain technology: emerging applications and use cases for secure and trustworthy smart systems. J. Cybersecur. Priv. 1, 4–18 (2020). https:// doi.org/10.3390/jcp1010002 34. Risius, M., Spohrer, K.: A blockchain research framework. Bus. Inf. Syst. Eng. 59(6) (2017) 35. Zhang, R., Xue, R., Liu, L.: Security and privacy on blockchain. ACM Comput. Surv. 52, 3 (2019). Article 51 (July 2019), 34 p. https://doi.org/10.1145/3316481 36. Seppala, J. (2016). The role of trust in understanding the effects of blockchain on business models. https://aaltodoc.aalto.fi/handle/123456789/23302. Accessed 03 Nov 2018 37. Shirrif, K.: Bitcoin mining the hard way: The algorithms, protocols, and bytes (web log article). http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html (n.d.). Accessed 01 Nov 2018 38. Tapscott, D., Tapscott, A.: Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World. Portfolio Penguin (2016)

Blockchain Industry Implementation and Use Case

469

39. Taylor, P., Dargahi, T., Dehghantanha, A., Parizi, R., Choo, K.: A systematic literature review of blockchain cyber security. Digit. Commun. Netw. 6(2), 147–156 (2020). ISSN 2352-8648, https://doi.org/10.1016/j.dcan.2019.01.005 40. Treiblmaier. H.: Toward more rigorous blockchain research: recommendations for writing blockchain case studies. Front. Blockchain, 2 (2019). https://doi.org/10.3389/fbloc.2019. 00003 41. Walliman, N.: Research Methods: The Basics. Routledge, London (2010) 42. Watson, L., Mishler, C.: Get ready for blockchain. Strategic Finance, pp. 62–63 (2017) 43. Xiaoqi, L., Peng, J., Ting, C., Xiapu, L., Qiaoyan, W.: A survey on the security of blockchain systems. Future Gener. Comput. Syst. 107, 841–853 (2020). ISSN 0167-739X. https://doi. org/10.1016/j.future.2017.08.020 44. Yaga, D., Mell, P., Roby, N., Scarfone, K.: Blockchain technology overview (2019). arXiv preprint arXiv:1906.11078 45. Yrjola, S.: Citizens broadband radio service spectrum sharing framework-a path to new business opportunity for mobile network operators? In: Proceedings of the Sixth International Conference on Advances in Cognitive Radio, 2018 (2016) 46. Zheng, Z., Xie, S., Dai, H.-N., Cen, X., Wang, H.: Blockchain challenges and opportunities: a survey. Int. J. Web Grid Serv. 14(4), 352–375 (2018) 47. Z¯ıle, K., Strazdi¸na, R.: Blockchain use cases and their feasibility. Appl. Comput. Syst. 23(1), 12–20 (2018). https://doi.org/10.2478/acss-2018-0002

A Blockchain Approach for Exchanging Machine Learning Solutions Over Smart Contracts Aditya Ajgaonkar1(B) , Anuj Raghani1 , Bhavya Sheth1 , Dyuwan Shukla1 , Dhiren Patel1 , and Sanket Shanbhag2 1 VJTI, Mumbai, India [email protected] 2 BlockEye, VJTI-TBI, Mumbai, India

Abstract. Blockchain technology enables us to create ‘Smart’ contracts, capable of offering a reward in exchange for the services of skilled contributors, by contributing a trained machine learning solution for a particular dataset or specific code packages for aiding development in large projects. Leveraging the opportunities presented with this technology, in this paper, we present a proposal to deploy a system of Smart Contracts to facilitate the creation and fulfillment of collaborative agreements. The smart contract is used for automatically validating the solution, by evaluating the submissions in order of their arrival, and whether the quality requirements specified are met. The most critical advantage would be the impartial and fair evaluation of the work submitted by the prospective collaborators, with assurance for fair compensation for their efforts, wherein their payment would not be subject to subjective and manipulable factors and incentivize data contributors to refine the solution. Keywords: Machine learning · Blockchain · Smart contracts · Ethereum

1 Introduction Machine Learning and Artificial Intelligence are being increasingly adopted in various new applications every day. From predicting the value of assets in the financial domain to sophisticated natural language processing in personal assistants, machine learning is used extensively. With the advancements in the field of machine learning further, the computational resources and skills required have increased equivalently. Businesses today can certainly enhance the value of their products and services by leveraging the use of these technologies and their data to gain meaningful insights into the customer’s behavior or the overall market sentiment. We are already viewing the promising results this approach has brought to the early adopters of the technology. However, as of now, it is also observed that developing meaningful Machine Learning solutions is not a trivial task, as it requires a very specific skill set in data science and programming, which all businesses may not possess because it drives up the operating costs considerably. This limits the extent to which they can improve their products © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 470–482, 2022. https://doi.org/10.1007/978-3-031-10467-1_29

A Blockchain Approach for Exchanging Machine

471

through technology. Therefore, it is the need of the hour to tackle this problem in some form. There have been some recent developments in low-code machine learning APIs, which require little to no coding knowledge beforehand. However, these have their shortcomings. Firstly, these APIs currently only perform a handful of generalized applications, which may not suit every businesses’ needs. Secondly, these products exist in ecosystems, like the cloud computing services of GCP and AWS, which are still not very well known to an average computer user/ business stakeholder. Another solution to this problem is to develop these models with engineers that are hired on a contractual basis. Platforms for sourcing this talent already exist, but don’t cater to the specific needs that a specialized piece of software like a machine learning model might require. Along with this, the ready availability of these relevant skill sets is limited. Another problem with this approach is that we need to assume that the portal that hosts buyers and sellers of contractual services is a trusted, centralized party. This may not always be the case, and hence, one can argue that Blockchain technology can be quite useful here with all its various advantages. Finally, even if the buyer and sellers of contractual services find each other, pricing the contract is a difficult task due to a lack of standardized pricing for machine learning contracts. Blockchain is a distributed ledger technology that is used to ensure trust and reliability since the data and transactions are committed into the blockchain only after a consensus is reached amongst the participants. To ensure that such agreements can take place seamlessly, in a secure and trustworthy manner, we employ its services in our proposed system [1, 2]. The foundation for ensuring such arrangements lies in the use of Smart Contracts. Smart Contracts are essentially computer programs stored inside a Blockchain and associated with a particular blockchain address that references the Contract software code [3]. Once a smart contract is published, its code cannot be changed, and anyone can interact with it [4]. Our proposed solution, discussed in greater detail in Sect. 3, therefore sets out to solve the following problems: to make machine learning talent available widely, to develop a decentralized, trustless platform on blockchain that caters specifically to testing and evaluating machine learning models and other secondary tasks like data collection and enhancement, which is at the very core of data science and machine learning. This decentralized platform will be a marketplace where multiple buyers and sellers can negotiate the contract price. Therefore, the prices of machine learning contracts will be a consequence of our proposed solution and could be the first step towards the development of a decentralized market for machine learning contracts. The Rest of the paper is organized as follows: Sect. 2 provides the context of current systems and an analysis of their limitations. Section 3 entails the overview of the proposal & design rationale for the system. Section 4 highlights key features and the architecture. Section 5 reiterates the proposal presented and submits its validation. The paper is finally concluded in Sect. 6, along with references at the end.

472

A. Ajgaonkar et al.

2 Existing Approaches and Corresponding Shortcomings There have been several attempts to encourage and increase the participation of skilled developers in machine learning development. At its forefront, the Open Source movement has fostered a climate for learning, sharing and cooperation. However, it is limited primarily to a volunteer basis, by either students or a constrained number of open source developers, hence missing out on talented participants who are unable to contribute due to the lack of compensation for their efforts. Additionally, companies stay reluctant to adopt the open-source benchmark, since it risks the loss of proprietary technology, opens up the possibility of attacks on their systems and several other concerns that come with providing extensive access to internal code repositories. Hence, the idea remains underutilized with limited avenues of professional collaboration. A few novel solutions have emerged which attempt to decentralize the machine learning process using various techniques. Google uses cloud infrastructure to distribute the Gboard (Google keyboard) [5] model among phone devices to improve predictive typing. This approach proposes the use of a single shared model which is used and developed on user’s data. Several such updates are summarized and pushed on the cloud and integrated with the single shared model. The basic principles of this approach show that decentralization can be used in a manner that facilitates the development of a single shared ML model. However, this approach supports only a specific type of model and suffers from scalability since the training of a model requires a substantial number of users. Another solution by Nvidia proposes building robust ML algorithms, which enable the collaboration of different nodes in model training while preserving data privacy. Nvidia implements a server-client approach [6], where a centralized server works as a manager/facilitator of participating clients. The infrastructure allows the developers to share their models and components and have control over the training process. Partial models are trained on clients, then the partial model weights are shared, and the model is updated based on weights and history of contributions. This infrastructure supports the usage of different models. However, it is dependent on a centralized node to aggregate weights and update the shared model, which can cause the entire system to fail if the centralized node fails. Similarly, large scale approaches, particularly in the domain of Machine learning have now gained traction. By primarily addressing the limitations caused by the lack of powerful hardware, these approaches provide the power of cloud computing to compensate for deficiencies in user infrastructure by providing greater computing capacity. This is not suitable, however, particularly when the team does not contain experience and knowledge in the domain of machine learning in the first place. Such platforms provide powerful means, but still require a seasoned hand at the helm. A contract instead provides the grounds for an agreement with a pool of highly skilled specialists in the domain to utilize their expertise. Proposals have been put forth along the lines of deploying Blockchain Technology for creating contracts that offer a reward in exchange for a trained machine learning model for a particular data set, most notably by Kurtulmus and Daniel [7]. However, this paper intends to greatly build and expand on the groundwork of their proposal, attempting to address certain limitations and incorporating additional opportunities by

A Blockchain Approach for Exchanging Machine

473

widening the scope of the initial proposal. The sections that follow, reference the work and proposals laid down in the paper referenced above, and analyze its propositions. As mentioned by the authors, the DanKu protocol [7] has multiple potential improvements, some of which we attempt to address with our paper. 1. Compensations Regulation In the current arrangement of the DanKu protocol, the price reward of training the model is decided by the organizer. However, this price may not always be fair considering the amount of computational resources used. Hence, we propose a solution to build an intelligent reward regulator into the contract itself that assesses how much resources and effort a training model might take and recommend a range of fair prices to the organizer. 2. Support for Data Sourcing Contracts The DanKu protocol only currently deals with machine learning contracts. However, this concept can be extrapolated to sourcing data for training models as well. We also explore this concept in greater detail in the following sections. 3. High Gas Costs Handling all the data in the chain itself drives the gas used to very high values. We propose solving this using a combination of both decentralized storage [8] as well as delegate running the evaluations at the developer’s end to reduce computation which has a direct effect on gas cost. 4. Dependence on Ethereum The DanKu protocol only currently works on the Ethereum blockchain. However, permissioned blockchains like Corda [9] and Hyperledger [10] are also widely used throughout the industry. There is no clear path as to how this problem will be overcome currently. 5. Limitations Due to the Execution Requirement on the Ethereum Virtual Machine and Solidity The current semantics of the Solidity language also acts as a hindrance here, as Solidity has minimal support for offline ML libraries, and function stacks are limited to 16 units deep, as acknowledged in the paper by Kurtulmus and Daniel [7], which may not be enough for performing evaluations on the chain. We propose creating an ecosystem where the solutions are validated on a hashed test dataset at the user’s end. This approach is elaborated in the further sections.

474

A. Ajgaonkar et al.

Another issue faced by Machine Learning algorithms is the lack of quality data, which has a significant effect on the model’s performance. More data helps machine learning algorithms generalize unseen data. It is a very difficult task for an organization to collect data because data is dispersed across various platforms and the quality of the data is also not assured. So, there is a need for mechanisms to incentivize data contributors. A primitive mechanism is where data contributors can earn tokens when other contributors validate their contributions, just like Stackoverflow. This method depends on the willingness of the other contributors to get involved.

3 Proposed Solution A key goal in the future of development has been to constantly increase accessibility to high performing technologies and innovations. This in turn is supported by the motive of improving the quality of code and software as well as making these better software and coding practices available more readily and to a larger audience. As a result, the penetration and adoption of newer technologies would be greatly expedited and would create a more robust ecosystem. A straightforward way to try to achieve this is by getting talented individuals, specialists in their fields to come together to work on challenging problems and create efficient and ingenious solutions. However, this possibility has been greatly limited due to significant issues such as the difficulty of finding and teaming up with skilled individuals, a limited pool of readily available such individuals, lack of transparency and accountability, unequal sharing of the prospective gains and ensuring adequate compensation to the individuals involved among others. Our proposed solution begins with an ‘Organizer’ or contract poser who requires a particular solution to their development problem. They create a smart contract on the Ethereum blockchain [11] particularly specifying the, 1. 2. 3. 4.

metric requirements, the submission period, number of acceptable submissions and the maximum token reward that can be offered based on the work to be done.

To maintain market regulation and avoid unfair practices, Machine Learning techniques can be applied to historical transactional data, to get the optimal ask price range for the organizer so that it is fair for everyone. This is further elaborated in Sect. 5.1.4. Going forward, the solution diverges based on the nature of the contracts. In the case of Machine Learning Contracts. The submitted dataset, provided in the contract for training and testing will be uploaded on a decentralized file-sharing system like IPFS [12]. One of the main benefits of a decentralized file-sharing system is a reduction in gas cost on the main ethereum network. Another advantage of the filesharing system is that if one node on the data sharing goes down, the user will still have access to the data as multiple copies are present across the network. Training data can be downloaded by the user from the decentralized file-sharing network, on which the user can train the machine learning model. Test data will be kept

A Blockchain Approach for Exchanging Machine

475

encrypted on the file-sharing network so it is hidden from all developers. This test data will be used to evaluate the model developed. However, the test data will not be revealed until the competition is over. This will prevent the user from overfitting the developed model on the test data. This improves the fairness of the contract and preserves the integrity of the test data. The developer can optionally specify the bid price based on work done. Before the submission period is over, the user has to evaluate his model using the evaluations function provided. Evaluation will be done on the developer’s machine itself and the results will be shown. These results will help him assess his solution and improve on it. To submit the model, the developer can call the submission function provided. Submission will be done in two steps - model evaluation metrics will first be uploaded in the smart contract(ethereum blockchain) and then the corresponding model will be uploaded onto the decentralized file-sharing system. Multiple submissions from the same participant can lead to congestion on the networks. To discourage multiple submissions, the user will first have to hash the evaluation metrics in a particular format and submit the evaluation metrics, nonce and the hash. This submission will be verified by the smart contract, if verification is successful the model will be uploaded on the file-sharing system. Once the submission period is over, the best submission is rewarded and the reward is transferred into the user’s account. This signifies the completion of the contract. In case of Data Contribution Contracts. A supportive contract, associated with Machine Learning is also theorized, which incentives contributors to add more data to the existing dataset. This helps in refining the training datasets for a particular model, which can result in increasing model accuracy. This is premised on the understanding possessed by experts in the field of Machine Learning and leverages the knowledge that irrespective of the Machine Learning algorithm, the impact of the features and outliers of the dataset on its performance is significant [13]. Suggesting improvements to the dataset used to train the model, contributing additional dataset points or identifying and correcting the excessive or undesirable correlations to the specific data linked with the model and such contributions are immensely valuable. Additionally, to avoid malpractices and to ensure verification, feedback is sought from the organizer. Once the feedback is validated in the contract, rewards can be provided to contributors.

4 Functional Overview At the outset, the flowchart in Fig. 1 summarizes the propositions for the execution of a Machine Learning focused contract, in a concise format. Building on the work done by Kurtulmus and Daniel [7] and Wang [14], propose solutions to implement a modified ‘DanKu’ protocol for machine learning contracts as well as contracts for generalized software applications, like APIs for instance. The following sections detail the specifics required to elaborate on the nuances of the proposal. 4.1 Specifications of Smart Contracts ML Contracts. In Machine Learning Contracts, taking inspiration from the DanKu protocol, the organizer uses the random function to generate the indexes of the training

476

A. Ajgaonkar et al.

and testing data, which are generated by setting the seed using the ID of the organizer. The test data is kept encrypted and it gets unlocked using the hash of submission and participants ID. Every participant gets data in random order to keep it fair for them. The organizer specifies the metrics, based on which submissions will be evaluated. For example in classification tasks, we have Accuracy, Precision, Recall, F1 score [15]. The time period for acceptance and the number of submissions allowed is set based on the organizer’s flexibility. The organizer is recommended a reward value based on a set of similar historical transactions, which is elaborated in 5.1.4. These REST calls are made using Oracalize, which is a service that aims to enable smart contracts to access data from other blockchains and the World Wide Web [16]. The incentive mechanism for this is discussed in 5.1.5. Data Sourcing Contracts. The organizers can create a supplementary data contribution contract, where the best model available so far is fine-tuned with additional data points. Contributors get rewarded if their data contribution adds value in improving the performance of the best model i.e. it tries to fill the gaps in the existing training data. Newly created data for example can be new texts, new movie titles, new pictures etc.

4.2 Evaluation and Submission Mechanism Given the two-pronged nature of the proposal, it is ideal to consider the evaluation process for both contracts individually - for Machine Learning applications and Data Sourcing. ML Applications. Evaluating a machine learning or artificial intelligence model is relatively straightforward as concrete metrics for evaluating such models are in place. According to a paper by Hossin and Sulaiman [17], the confusion matrix of a classical classification model can give a complete view of the performance of the model. The test dataset will be encrypted and uploaded onto the decentralized file-sharing system, which can only be accessed via the evaluation function provided. The evaluation model will be run through the evaluation stage of the DanKu protocol. The shortcoming of this method is that the gas price goes through the roof for a data-heavy evaluation, so we propose evaluating the model on the developer’s machine itself with the help of the evaluation function provided. This approach reduces the gas cost of evaluating the model on the EVM. Also, the limitation of solidity of not being able to handle floating-point numbers, which are used in certain activation functions like sigmoid. This limitation of solidity language can significantly restrict the evaluation process of machine learning models. This approach further tackles model execution timeout on the EVM, by evaluating the model on the developer’s machine. Another consequence of this new approach is the increase in the degree of decentralization of the system in general. During the submission phase, users can call the submit function on their best solution. The evaluation metrics calculated in the previous evaluation step are hashed using the SHA256 algorithm. This discourages multiple submissions on the network, which in turn ensures that the best solution is submitted by every developer. The evaluation data, nonce and the hash are uploaded onto the smart contract where they are verified. If

A Blockchain Approach for Exchanging Machine

477

the verification is successful, the solution is encrypted using SHA256 encryption and then uploaded onto the decentralized file-sharing system. This ensures the model is not visible to the organizer as well as other developers. This keeps the submitted solutions confidential. Once the winner is declared, his or her solution is decrypted and is made available to the organizer. Simultaneously, the reward is transferred from the escrow account to the winner’s account. Data Sourcing Contracts. To encourage people to contribute new data that will help improve the model’s performance, we propose the following incentive mechanism. Organizer creates a smart contract and uploads the best solution along with the dataset on which it is trained and tested. The contributors can download the data and apply preliminary data analysis techniques to comprehend the data. This analysis can give relevant insights about data quality and improve it. Once the contributor is able to find the existing shortcomings of the dataset and enhance it. The provided model can be retrained on this modified dataset on the developer’s machine. This updated model is evaluated on the test data provided by the organizer. If the evaluation metrics are better than the existing benchmark, the contributor can submit the dataset. The evaluation metrics will be stored in the smart contract and the modified data will be uploaded in the decentralized file storage system. Since contributors have different approaches to view the data, it is possible they can apply different techniques to improve the performance of the solution. So, once the submission period is over, we can have multiple contributions that surpass the existing benchmark. So, we propose a proportionate rewarding mechanism. Let’s assume the benchmark score is A and we have “N” submissions and their metric score is E1 , E2 ….EN . So the reward received by the ith contributor will be Ri which will be: Ri = ((Ei - A)/

n j 0

(Ej - A)) x R

(1)

where R is the total reward, as mentioned in the contract. The above mechanism ensures that bad quality data isn’t rewarded. The reward mechanism incentivizes more people to contribute, as there is no winner-takes-all scheme. It creates a healthy competitive environment instead. 4.3 Machine Learning for Market Regulation There is a need for carefully modeling the bid price of models that are available so that both the buyer and the seller are in a win-win situation [18]. Sellers are freelancers or a group of Machine Learning researchers who are willing to contribute by providing solutions in return for incentives in the form of tokens, and hence are fairly prone to exploitation. Conversely, an unfair market would result in unsatisfactory participation and the idea of collaboration and competition in the market would both be defeated.

478

A. Ajgaonkar et al.

Fig. 1. System flowchart for a machine learning contract

A Blockchain Approach for Exchanging Machine

479

These are the people who have sound knowledge of ML concepts and have experience working on various types of data. Buyers are people with fundamental knowledge of metrics of models, they have a clear picture of what type of solution they want and they might not have sufficient computing resources at their end. They are looking to outsource this work at a reasonable reward. The paper by Kurtulmus and Daniel [7] suggests that their DanKu contract protocol would have the consequence of creating a market where there will be a well-defined price of GPU training for machine learning models. However, in the view of the authors, without a robust regulatory mechanism, it is certainly difficult to achieve this goal. We are creating a platform where one’s requirement is getting accomplished by another’s skillset. But the problem here is that since there is no third party involved in the contract, there is a possibility of unfair practices where the price of the solution is manipulated by either of the two parties. So to prevent that, the parameters need to be decided based on which the price could be negotiated between the two parties. Particularly, we propose applying Machine Learning to the successful historical transactional data and using that to recommend the optimal price to the buyers and sellers [19]. The output of this algorithm will be the optimal trade price (OTP). Pricing Model will be hosted on cloud, which is shown in the high level architecture diagram in Fig. 2. The parameters to be considered while approximating them are 1. 2. 3. 4.

Hardware specifications Size of the data Model metrics requirements Deployment and scalability specifications.

The buyers and sellers can use OTP as the base price and further negotiate on it.

480

A. Ajgaonkar et al.

4.4 Architectural Diagram

Fig. 2. Architecture diagram

5 Validation The proposed solution leverages the core features of blockchain and Smart Contracts to foster a cooperative and supportive environment for sustained development, ensuring the trust between the contract posers and the independent contractors in this exchange. The test dataset is hashed using the SHA256 algorithm, which can be unlocked only using the participant’s private key. Smart Contracts are written in Solidity which will be deployed on Ethereum Blockchain. We propose the usage of the developer’s machine which will reduce gas costs and overcome the limitations of solidity. Oraclize will be used to handle the REST calls [20]. Advantages of the proposed system: 1. Creates an unbiased marketplace where solutions are validated and exchanged for the benefit of all. Also, an ideal place for people to monetize their technical skillset. 2. Evaluating the solutions will be done on the developer’s machine; hence the limitations of solidity language of not handling floating point numbers will not be faced. This will lead to a more efficient evaluation of models. Further, the issue of execution timeout on the ethereum chain will be eliminated as the execution will be done on a local machine. 3. Tackles the problem of handling large datasets by storing them on decentralized file storage systems to balance the load on the blockchain network.

A Blockchain Approach for Exchanging Machine

481

4. Incentivizes data contributors to improve the solution and eliminates the need for trusted third parties for ensuring contract requirements are satisfied.

6 Conclusion The system intends to create a free, inclusive and accessible market that will help foster collaboration on a large scale between talented people across the world, and hence will incentivize the creation of better machine learning models, better code modules and entirely more refined software projects. At the center of the proposed system lies a core goal to make AI, Machine Learning and in the same vein, general software technologies and modules, simpler and more accessible to companies and software agents and fetch them quality results. The scope of this proposal can be extended to creating a marketplace for software solutions. For software solutions, the protocol remains the same. In this case, the metrics are the unit test cases, which is equivalent to Machine Learning metrics. A file containing detailed descriptions of the requirements is analogous to a training dataset in the machine learning contract. The proposed solution creates a platform for applying your skills and domain knowledge in return for a small token. It proposes a decentralized and collaborative platform improvement of AI on the blockchain. Handling more complex software solutions and improving security in the on-off chain API requests will refine the current prototype. Better designed smart contracts and new features in Solidity will make the smart contracts more efficient. Future research in decentralized cloud computing will make it feasible to handle complex computations off-chain and improved encryption methods will improve the security and legitimacy of the decentralized marketplace.

References 1. NIST Blockchain Technology Overview Draft NISTIR8202, 23 Jan 2018 2. Blockchain And The Future of the Internet: A Comprehensive...., 23 Feb 2019 3. Pinna, A., Ibba, S., Baralla, G., Tonelli, R., Marchesi, M.: A massive analysis of Ethereum smart contracts empirical study and code metrics. IEEE Access 7, 78194–78213 (2019) 4. Bartoletti, M.: Smart contracts. Front. Blockchain, 04 Jun 2020 5. Hard, A., et al.: Federated learning for mobile keyboard prediction. arXiv 2019, arXiv:1811. 03604v2 6. Wen, Y., Li, W., Roth, H., Dogra, P.: Federated Learning Powered by NVIDIA Clara. https:// developer.nvidia.com/blog/federated-learning-clara/ (2019) 7. Kurtulmus, A.B., Daniel, K.: Trustless Machine Learning Contracts; Evaluating and Exchanging Machine Learning Models on the Ethereum Blockchain, 27 Feb 2018 8. Wilkinson, S., Lowry, J., Boshevski, T..: Metadisk a blockchain-based decentralized file storage application. Technical report (2014) 9. Brown, R., Carlyle, J., Grigg, I., Hearn, M.: Corda: An Introduction. https://doi.org/10.13140/ RG.2.2.30487.37284 10. Androulaki, E., et al.: Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains (2018). https://doi.org/10.1145/3190508.3190538 11. Wood, G.: Ethereum: a secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, pp. 1–32 (2014)

482

A. Ajgaonkar et al.

12. Benet, J.: IPFS-Content Addressed, Versioned, P2P File System, Jul 2014 13. Acuña, E., Rodríguez, C.: An empirical study of the effect of outliers on the misclassification error rate. Trans. Knowl. Data Eng. (2005) 14. Wang, T.: A Unified Analytical Framework for Trustable Machine Learning and Automation Running with Blockchain, Mar 2019 15. Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced datasets. In: Cai, Z., et al. (eds.) ISICA 2009, CCIS 51, pp. 461–471. Springer-Verlag, Berlin, Heidelberg (2009) 16. Maleshkova, M., Pedrinaci, C., Domingue, J.: Investigating web APIs on the World Wide Web. In: 2010 Eighth IEEE European Conference on Web Services, pp. 107–114 (2010) 17. Hossin M., Sulaiman M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5(2), 1 (2015) 18. Chen, L., Koutris, P., Kumar, A.: Model-based Pricing for Machine Learning in a Data Marketplace, 26 May 2018 19. Harris, J.D., Waggoner, B.: Decentralized and collaborative AI on blockchain. In: 2019 IEEE International Conference on Blockchain (Blockchain), pp. 368–375 (2019)https://doi.org/10. 1109/Blockchain.2019.00057 20. Liu, X., Chen, R., Chen, Y.-W., Yuan, S.-M.: Off-chain Data Fetching Architecture for Ethereum Smart Contract, pp. 1–4 (2018). https://doi.org/10.1109/ICCBB.2018.8756348

Untangling the Overlap Between Blockchain and DLTs Badr Bellaj1,2 , Aafaf Ouaddah1(B) , Emmanuel Bertin3 , Noel Crespi2 , and Abdellatif Mezrioui1 1

National Institute of Post and Telecommunication (INPT), Rabat, Morocco [email protected] 2 Institut polytechnique de Paris, Paris, France 3 Orange Lab, Caen, France

Abstract. The proven ability of Bitcoin and other cryptocurrencies to operate autonomously on the trustless Internet has sparked a big interest in the underlying technology - Blockchain. However, the portage of Blockchain technology outside its initial use case led to the inception of new types of Blockchains adapted to diﬀerent speciﬁcations and with different designs. This unplanned evolution resulted in multiple deﬁnitions of what a Blockchain is. The technology has diverged from its baseline (Bitcoin) to the point where some systems marketed as “blockchain” share only a few design concepts with the original Blockchain design. This conceptual divergence alongside the lack of comprehensive models and standards made it diﬃcult for both system designers and decisionmakers to clearly understand what is a blockchain or to choose a suitable blockchain solution. To tackle this issue, we propose in this paper “DCEA” a holistic reference model for conceptualizing and analysing blockchains and distributed ledger technologies (DLT) using a layer-wise framework that envisions all these systems as constructed of four layers: the data, consensus, execution and application layers. Keywords: Blockchain Review

1

· DLT · Reference model · Blockchain-like ·

Introduction

The emergence of many projects heavily inspired by Bitcoin’s blockchain drove the industry to adopt a broader term; “Distributed ledger technology”, or DLT when referring to this category. Nevertheless, there is no rigorously deﬁned set of terminologies or commonly acceptable reference model delineating the border of DLTs subcategories. As a result, terms like “blockchain”, “DLT” or even “distributed database” were misunderstood, misused, and misinterpreted. As a result, many projects or enterprises use the word “blockchain” extensively, simply as a marketing word to describe their Blockchain like products without abiding by a clear standard. Although there are multiple proposals for standardizing blockchain (ISO [6–9], IEEE [3], ITU [10]), there is no recognized standard c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 483–505, 2022. https://doi.org/10.1007/978-3-031-10467-1_30

484

B. Bellaj et al.

for deﬁning blockchain or DLT. In the absence of a referential deﬁnition, we observe the growing use of imprecise and inconsistent language and terminology across diﬀerent projects—where the same term may be used to refer to diﬀerent things—that leads usually to confusion. To help untangle the underlying concepts and delineate the diﬀerent categories of DLTs, we deﬁne in this paper a reference model capturing a longitudinal and representative view of DLT systems. The proposed model introduces a systematic and holistic approach to conceptualizing and analysing DLTs in general as a functioning system constructed of four key layers: Data, consensus, execution and application layers. Hence, the rest of the paper is structured as follows. Section 2 deﬁnes a new layer-wised framework serving to normalize and deliberates on the classiﬁcation and taxonomy of DLTs Sects. 3, 4, 5 and 6 present, respectively the data, the consensus, the execution and the application layers, where in each section we outline the main components and properties of the studied layer as well as it related state of the art. In Sect. 7 we discuss the brieﬂy the diﬀerence between Blockchain and Blockchain-like systems. Finally, we close with a conclusion in Sect. 8.

2

DCEA Framework

We propose DCEA, a framework that deﬁnes a layered heterogeneous stack for DLT systems. From a design perspective, our conceptual framework (DCEA) segregates DLT technologies into four essential and distinct layers: data, consensus, execution and application layers—each one playing a well-deﬁned role in the DLT architecture. The framework consists of the DLTs components and their main properties (Table 1, with logically related functions grouped together. This layering approach is aligned with the DLT’s modular architecture. It will help to provide a better understanding of DLTs and serves as a baseline to build a comparative analogy between diﬀerent DLT variants. In the following, we introduce the four layers that form the DLT stack. – Data Layer: Represents the data (transactions and states) ﬂowing through the distributed network and stored in the ledger. Data in this layer is represented by entries recorded in the ledger, under consensus and shared amongst the network participants. These records may represent elements deﬁned by the underlying protocols (such as cryptocurrency, or smart contracts), or data received from external environments (such as IoT data). Generally, the data layer covers data stored on the blockchain itself (on-chain storage) as well as data stored in an auxiliary source using a distributed database (oﬀ-chain storage). – Consensus layer: Deﬁnes the global software-deﬁned ruleset to ensure agreement among all network participants on a uniﬁed ledger. Consequently, this layer designates the formal rules that govern the system. – Execution layer: Represents the components responsible for enforcing and executing distributed programs (e.g. smart contracts). Basically, these programs

Overlap Between Blockchain and DLT

485

Table 1. Layers and components of DCEA framework Application Layer Integrability Execution Layer

Execution environment

Consensus Layer

Safety

Data Layer

Data structure

DLT orientation and purpose Turing-completeness

Determinism

Wallet and identity management Openness

Interoperability

Liveness Finality Network model Failure model Adversary model Governance model Transaction ordering Conﬂict resolution Data shareability

Data immutability

States storage

or contracts codify a given logic (e.g. a business logic) as a set of instructions for manipulating the states recorded in the ledger. – Application layer: Represents an abstraction layer that speciﬁes a variety of protocols and APIs provided by the DLT system to enable the building of distributed applications commonly called DApps. This layer also represents a communication link between the external actors or applications and the code hosted on the DLT ledger. Based on the above layering, we propose a four-layered taxonomy (Table 1), to categorize DLT systems. The purpose of the taxonomy is to: – Classify the typical DLT systems proposed in the academia and in the industry; and to – the relative strengths and weaknesses of existing systems and identify deﬁciencies in the current set of DLTs. At each layer, DLTs adopt diﬀerent settings for DCEA properties deﬁned in Table 1. Based on their combinations, at the four layers, we can deﬁne diﬀerent DLT classes. For instance, at the data layer we diﬀerentiate between DAG-based and chain-based DLTs based on the nature of the underlying data structure; at the consensus layer we diﬀerentiate between permissioned and permissionless DLTs based on the identity model of the consensus mechanism; at the execution layer we diﬀerentiate between Smart-contract based DLTs and script-based DLTs; and at the application layer, we can diﬀerentiate between DApps-oriented DLTs and Cryptocurrency-oriented.

3

Data Layer

In this section, we lay out the key components, and their characteristics, that construct the data layer as introduced in Fig. 1. 3.1

Components and Properties

DLT’s ledger represents a distributed data store where data is duplicated among multiple nodes, by means of data synchronization. In these data stores, the data organization, in its macroscopic structure, varies from one technology to another. Generally, we distinguish between two main models of data structures in the DLT space; the linear chain of blocks and the chain-less models.

486

B. Bellaj et al.

Fig. 1. The main components of the data layer with examples

Chained Model Chain of Blocks: Data in the chain of blocks is organized in elementary units called blocks. Each block is a collection of transactions validated by the network. These units are organized chronologically as a chain of inter-hinged blocks, which are tied by tamper-evident hash pointers. Each new block can only be valid if it is built upon an unchangeable body of previous blocks. Blocks are composed of a header and a record of transactions. The block’s header contains meaningful metadata such as a cryptographic reference to the previous block and the current time. This linear linkability ensures data integrity through cryptographic connections between blocks and enables each participant in the network to verify and validate data. Data in a chain of blocks is carried over and stored in the ledger using transactions, therefore we consider a transaction as the most elementary data type. At the block level, the transactions are ordered and hashed into a Merkle tree, with the hash root placed in the corresponding block’s header. This structure guarantees a cryptographically sealed and tamper-proof data vault resistant to any type of data corruption. Skipchain: The data structure of a skipchain is inspired by skip lists . Skipchain adapts the skip list idea to the chain of blocks by adding links between blocks both forwards and backwards in time. In skipchain, a block includes not just a single hash link to the previous block, but also an additional hash link to a point farther back in time. Thus, skipchain can build subsequent layers of linked blocks on top of an original linked list of blocks. Skipchain is very useful when concurrent access to data is required. Chainless Model. In order to overcome some limitations imposed by the adoption of the chained block structure, certain DLTs have opted for a chain-less model. Instead, they use new data structures for better scalability or security. DAG: In contrast to using a chain of blocks, some DLTs are using a nonlinear structure such as the Direct Acyclic Graph (DAG) to oﬀer better performance. A DAG is a graph that grows in one direction without cycles connecting the other edges (i.e., there is no path from a vertex back to itself). As with a chain of blocks,

Overlap Between Blockchain and DLT

487

a DAG is used as a sorted collection of hierarchically connected transactions where each transaction and sometimes a block of transactions is represented by a node (which refers to a vertex in the graph) and linked to at least another node. The DAG is extended chronologically by appending new transactions to the previous nodes. The ledger is thus an ever-growing tree, starting initially from a root node. The acyclic nature of the DAG and its unidirectional evolution enables participants to conﬁrm transactions automatically based on previous transactions. Based on the representation of its nodes, we identify two types of DAGs: – Transaction-based DAGs: DAG nodes that represent individual transactions; and – Block-based DAGs: DAG nodes that represent a block of transactions. Decentralized Database Model: Some DLTs adopt radical changes in their architecture over the conventional blockchains, to the point they resemble a classical distributed database. We consider these solutions as decentralized databases, as they manage data similar to how conventional databases handle data but they present a diﬀerent technology. In fact, unlike in a conventional distributed database, where nodes cooperate to maintain a consistent view of data across all systems, a decentralized database is designed to allow multiple parties that may not trust each other to collaborate with their peers to maintain shared records. Hybrid Data Model: Some DLT projects combine both chains of block models along a block-less model to manage transactions and states in the network. The hybridization is designed to exploit the advantages of each model to enable better scalability and rapid transaction validation. In this model, the states are generally stored in external dedicated key-value databases and the blocks contain only the transactions aﬀecting the ledger’s states. Using key-value databases makes it easy to directly access the updated value of a state rather than having to calculate it by traversing trees of transactions. State Management. A key distinguishing factor among various DLTs, is how states are managed within the system. Although DLTs serve as distributed ledgers for shared data, in the case of many DLTs, data is stored outside the transactional distributed ledger (oﬀ-chain/oﬀ-ledger) using auxiliary databases. Conventional blockchains, however, tend to always store data on the shared ledger (on-chain/on-ledger). When we analyze how general states (e.g. user’s balance) are managed in existing DLTs, two models emerge UTXO model, Account model. The ﬁrst is a special transactions set, linking new transactions to old transactions, wherein a newly produced transaction (new UTXO) points to a single or multiple ulterior transactions (inputs), whereas, the second is a model where the ledger keeps track of up-to-date global states related to each account. Data Shareability. All nodes in a DLT network exchange transactions carrying shared data in order to reach consensus, but due to privacy reasons diﬀerent

488

B. Bellaj et al.

visions of data shareability have been adopted. Some systems favor complete shareability of all data—which we consider as global shareability—, whereas others restrict the perimeter of shareability including some nodes and excluding others—which we consider as restricted shareability. Data Immutability/Atomicity. There is a common belief that records stored on a DLT (especially a blockchain) are immutable and unalterable. However, that is not necessarily the case, as diﬀerent DLT systems provide diﬀerent degrees of immutability depending on the system design. This means that, under some circumstances, nodes can hold inconsistent states, or that a conﬁrmed transaction may be reversed. For data immutability, we diﬀerentiate between: – Strong immutability. When the state variables or blockchain entries cannot be mutated or tampered after their creation; and – Weak immutability. When the state variables or blockchain entries could be mutated or tampered with after their creation. It is worth noting that for some strong immutable systems, their states can be updated without breaking immutability. This is achieved by using tree structures to store persistently both new and old values for a given entry. Data Privacy. Data privacy is securing data from public view. In a shared context like in DLTs, data can be private or not private. This is possible with cryptographic techniques such as Zero-knowledge proofs which enable verifying private data without revealing it in its clear form. 3.2

Data Layer: State of the Art

This subsection is meant to present an overview of DLTs projects adopting the diﬀerent data structures previously outlined by our framework as well as an evaluation of their properties. Chained DLTs. Most DLTs follow the linear data chain structure initially deﬁned by Bitcoin. In this broad category, multiple projects deﬁne diﬀerent inner block structures. Bitcoin. In Bitcoin and its clones, transactions are assembled in the block’s body and then linked in a Merkle tree. The root of this tree, or the Merkle root, is a hash representing an indirect hash of all the hashed pairs of transactions in the tree and is included in the block header, thereby ensuring transaction veriﬁcation. In addition to the Merkle root, the block header also contains other important information, including: the timestamp, and the previous block’s hash. Moreover, Bitcoin adopts the UTXO model to track the system states (Wallet balances). The UTXO set is stored oﬀ-chain in an auxiliary database.

Overlap Between Blockchain and DLT

489

Ethereum. The block structure is more complex in Ethereum than in Bitcoin, and the system’s state tracking is diﬀerent than in Bitcoin. In fact, the block’s header comprises more metadata and its body englobe multiple types of data, namely: transactions, receipts and system states. Each of these data types is organized into a Merkle tree or a Patricia tree (Radix tree) in the case of the state tree. The state tree is an important component in the Ethereum ledger, as it is used to implement the account model, whereby each account is linked to its related states (account balances, smart contract states, etc.). Any node can parse the tree and get the updated state without any overhead calculation. The state tree grows each time a change occurs in a state. It grows by adding new nodes (stored in the new block)—containing new states—which points to the nodes (stored in the previous block) containing the old value for the same state. To enforce immutability Ethereum keeps its root hash in the block header. Skipchain. Chainiac Nikitin et al. [17] introduced Chainiac to solve oﬄine transaction veriﬁcation problems (enable nodes to check if a transaction has been committed to a blockchain without having a full copy of the ledger). The Chainiac solution was to add traversability forward in time using a skipchain, where backpointers in Chainiac are crypto-graphic hashes, whereas forward-pointers are collective signatures. With long-distance forward links and via collective signatures, a client or node can eﬃciently verify a transaction anywhere in time. Chainless DLT DAG Based Chains. The idea of using DAGs as underlying data structure has encountered great interest from DLT designers of multiple projects, including Byteball, DagCoin IOTA Nano Phantom and Hedera. Some studies have tried to introduce DAG in conventional blockchain DLTs, for instance the GHOST protocol [12] proposes a modiﬁcation of the Bitcoin protocol by making the main ledger a tree instead of a blockchain. Such a modiﬁcation reduces conﬁrmation times and improves the overall security of the network. Decentralized Databases: Corda R3 In the corda network, each node maintains a local database called a “vault” that stores time-stamped data. Each vault has many diﬀerent versions (current and historic) of data in the form of state objects. A vault does not store transactions, instead it stores the output state relevant to a party (state’s participants). The transactions are stored in the “NODE1 TRANSACTIONS” table in the node’s database. Alongside, Corda adopts a UTXO model to store state data, which means a transaction consumes current states and produces or not new states. Hybrid DLTs: Hyperledger Fabric Hyperledger Fabric combines between the usage of a chain of blocks to store only the validated transactions, and the usage of a key-value classical database to store the system’s states (transaction outcomes). In the Fabric chain, the block structure resembles the structure of a block in a conventional chain but with an additional part: block metadata. This additional section contains a timestamp, as well as the certiﬁcate, public key and signature

490

B. Bellaj et al.

of the block writer. The block header is straightforward and the transactions are ordered in the block body without Merkilization. BigchainDB. The BigchainDB [15] was introduced as a blockchain database. It aims to combine the key characteristics of “traditional” NoSQL databases (MongoDb) and the key beneﬁts of traditional blockchains. BigchainDB server nodes utilize two distributed databases (transaction set or “backlog”) holding incoming transactions and a chain of blocks storing validated transactions (Creation or Transfers). Each transaction represents an immutable asset (represented as JSON documents in MongoDB). Data Shareability. Most DLTs operating as global cryptocurrency platforms adopt by design a global shareability of the transactions. In fact, networks such as Bitcoin, Ethereum and many others, operate in relay mode where nodes are relaying transactions to each other, thereby propagating it to the entire network without restrictions. In other DLTs, such as Hashgraph, senders deliver their transactions to a set of selected nodes that are responsible for including them into their DAG and sharing them with others by Gossiping. On the other hand, the DLTs constructed for business purposes, such as Corda or Hyperledger Fabric, impose restricted shareability of the transactions as privacy is an important requirement in such contexts. In Corda, for instance, each node maintains a separate database of data that is only relevant to it. As a result, each peer sees only a subset of the ledger, and no peer is aware of the ledger in its entirety. In Fabric a subset of the ledger restricts data shareability by using the concept of channels [5]. A channel is a private sub network between two or more speciﬁc network members. Each transaction on the network is executed on a channel, where only authenticated and authorized parties are able to transact on that channel. Therefore, the network ends up with a diﬀerent ledger for each channel. Similarly, Quorum, an Ethereum-based distributed ledger protocol with transaction/contract privacy, enables sending private transactions between multiple parties in the network by use of constellations.

4

Consensus Layer

DLTs have renewed the interest in the design of new distributed consensus protocols. In fact, a myriad of consensus algorithms, for DLT, have been proposed in the literature presenting diﬀerent properties and functionalities. In this section, we present the properties and features we consider as part of the DCEA framework for studying and diﬀerentiating between the protocols. 4.1

Components and Properties

Basic Properties. The concepts of safety and liveness properties were introduced initially by Lamport in 1977 and have been well adopted in the distributed computing community. All consensus algorithms provide these properties under diﬀerent assumptions of synchrony, adversary model, etc.

Overlap Between Blockchain and DLT

491

Safety. Safety represents in the context of DLT networks, the guarantee that the correct nodes will not validate conﬂicting outputs (or make conﬂicting decisions) at the same time (e.g. chain forks). Liveness. A consensus protocol guarantees liveness if requests (transactions) from correct clients are eventually processed. Finality. In the DLT settings, we deﬁne the ﬁnality property as the aﬃrmation and the guarantee for a transaction to be considered by the system as ﬁnal and irreversible. The Finality as property can be divided into two types: – Probabilistic ﬁnality, where the probability that a validated transaction will not be reverted, increases with time after the transaction is recorded onto the ledger. – Absolute ﬁnality, where a transaction is considered ﬁnalized once it is validated by the honest majority. Network Models. In both traditional distributed systems literature and DLT consensus protocols, we consider the message-passing model in which nodes exchange messages over the network, under diﬀering assumptions of network synchrony. We adopt in this survey the following taxonomy deﬁned by [7]. – Synchronous, where we assume the existence of a known upper bound on message delay. That means messages are always delivered within sometime after being sent. – Partially-synchronous, where we assume there is some known Global Stabilization Time (GST), after which the messages sent are received by their recipients within some ﬁxed time-bound. Before the GST, the messages may be delayed arbitrarily. – Asynchronous, where messages sent by parties are eventually delivered. They may be arbitrarily delayed and no bound is assumed on the delay of messages to be delivered. Failure Models. Diﬀerent failure models have been considered in the literature; we list hereafter two major types. – Fail stop failure (Also known as benign or crash faults): Where nodes go oﬄine because of a hardware or software crash. – Byzantine faults: This category of faults was introduced and characterized by Leslie Lamport in the Byzantine Generals Problem to represent nodes behaving arbitrarily due to software bugs or a malicious compromise. A Byzantine node may take arbitrary actions, provide ambivalent responses or intentionally mislead other nodes by sending sequences of messages that can defeat properties of the consensus protocol. We consider, therefore, a protocol as fault tolerant, if it can gracefully continue operating without interruption in the presence of failing nodes.

492

B. Bellaj et al.

Adversary Models. Under the assumption of a message-passing model, the adversary is able to learn the message exchange and to corrupt diﬀerent parts of the network. We distinguish between the following three adversary models: – The Threshold Adversary Model : This model is the most common adversary assumption used in the traditional distributed computing literature, which assumes that the Byzantine adversary can corrupt up to any f nodes among a ﬁxed set of n nodes. Under this model, the network usually has a closed membership requiring a permission to join. The consensus protocol should be able to operate correctly and reach consensus in the presence of Byzantine nodes as long as their numbers do not exceed a given threshold. – Computational Threshold Adversary: A new model introduced by Bitcoin, where the control of the adversary over the network is bounded by the computational power—requiring concrete computational material—instead of the number of nodes he can control. In this model, typically the membership is open and multiples parties and the bounding computation is a brute force calculation. – Stake Threshold Adversary [1]: In this model, the adversary control is bound by his proportion of a ﬁnite ﬁnancial resource. In networks managing cryptocurrencies, the underlying protocol can ensure consensus based on cryptocurrency deposits, thus the adversary is bound by the share of cryptocurrency he owns. In addition, in these protocols’ punishment rules (e.g. stake slashing) could be put in place to deter bad behaviour. Adversary Modes. Consensus protocols assume the existence of diﬀerent types of adversaries based on their ability and the time they need to corrupt a node. – Static adversary: A Byzantine user who is able to corrupt a certain number of network nodes ahead of time and exercise complete control over them. However, he is not able to change which nodes they have corrupted or to corrupt new nodes over time. – Adaptive adversary: A Byzantine user who has the ability to control nodes and dynamically change, depending on the circumstances, the nodes under his control to gain more power. – Mildly adaptive adversary: A Byzantine user who can only corrupt nodes based on past messages, or its anticipations, and cannot alter messages already sent. Moreover, the adversary may mildly corrupt groups, but this corruption takes longer than the activity period of the group. – Strongly adaptive adversary: A Byzantine user can learn of all messages sent by honest parties, and based on their content, he can decide whether or not to corrupt a party by altering its message or delaying message delivery. Identity Model. Protocols manage nodes membership diﬀerently, but in general two opposite sides are adopted: – Permissionless, where the membership is open and any node can join the network and validate new entries.

Overlap Between Blockchain and DLT

493

– Permissioned, where the membership is closed and only a restricted set of approved members is able to validate new entries. In the DLT settings, the identity model iscommonly bound to the network openness nature—being private, public or consortium. Governance Model. The governance model refers to the process of decisionmaking adopted by a DLT network to decide on the protocol rules and their upgrade. Hence, the governance of the system boils down to a social concept, we ﬁnd it appropriate to identify some of the possible governance models from a social perspective: – Anarchic, where protocols upgrade proposals are approved by every participant in the network. Each participant chooses to accept or reject a given proposal, thus leading to potential splits in the network. – Democratic, where participants vote on new rules and protocol upgrades proposals and at the end, all participants have to follow the decision of the majority, even for those participants who voted against them. – Oligarchic, where new rules and protocol upgrades are proposed and approved by a group of participants. As most DLTs move governance and related issues “on-chain” or “oﬀ-chain” we consider also the diﬀerentiation between; Built-in (or on-chain governance), where the decision-making process in the network is deﬁned as part of the underlying consensus protocol; External governance (or oﬀ-chain governance), where the decision-making process is based on procedures independently performed without involving the DLT mechanisms. Transactions Ordering. Whether for a linear or a non-linear DLT (e.g. DAGs), the stored transaction should be ordered chronologically to avoid frauds and inconsistencies. Diﬀerent approaches have been introduced by the consensus protocols to provide reliable and fair transaction ordering. Usually, in DLTs the ordering is an integrating part of the consensus mechanism but in some cases, it can be decoupled from the execution and validation of transactions. Ordering is an important property with direct impacts on the security and the usage of a DLT, thus the need to evaluate this feature separately. Conflict Resolution Model. In some DLT networks, conﬂicting temporary versions of the ledger (known as forks) can coexist for diﬀerent reasons (e.g. network latency, parallel validation of blocks, etc.). To converge toward a canonical ledger or chain, networks and consensus mechanisms adopt diﬀerent rules. The most notable rule is deﬁned by Bitcoin protocol as the “longest chain rule”, whereby in the presence of conﬂicting orders, the network converges to one order following the longest chain—the chain with the largest accumulated PoW in case of PoWbased systems—and discards the rest. The longest chain rule is adopted by different protocols and each may adopt a diﬀerent cumulative parameter (witnesses votes, endorsement, etc.).

494

4.2

B. Bellaj et al.

Consensus Layer: Dtate of the Srt

In this subsection, we present multiple consensus mechanisms and their properties. Although, is out of scope of this paper to present a detailed taxonomy of the existing protocols (Fig. 2), we consider to group all the reviewed protocols in six categories. This protocol categorization serves us as a basis to categorize the DLTs. BFT Consensus Family (PBFT-Like.) This family refers to the classical consensus mechanisms introduced in the traditional distributed computing literature and their recent variants. The BFT family is easily recognized due to their property: all-to-all voting rounds, the identity of the nodes in the network is known, the number of participants is limited. Due to the big number of the protocols belonging to this family, we limit our review, in this paper, on the most used algorithms in DLT context, namely, PBFT, RAFT, IBFT, DBFT, POA (AURA, Clique), HoneyBadgerBFT and Hotstuﬀ. Nakamoto Consensus Family. We consider that Nakamoto’s consensus family represents protocols using a chain of block data structure and adopting the longest chain fork choice rule (or a variant like GHOST [21]), to ensure safety, along economic incentives. These protocols were introduced primarily to enable secure currency transfer over the internet. Conversely to PBFT, they are conceptually simple and tolerate important corruptions up to n/2. Besides, they are known for being permission-less (open enrollment)—they do not require node authentication and allow nodes to arbitrarily join or leave the network. We review hereafter some of the most discussed protocols in this category namely: PoW, memory bound PoW and BitcoinNG. Proof of Stake and its Variants. Proof-of-Stake (PoS) was ﬁrst proposed as an alternative to the costly PoW for use in the PPCoin [13]. Instead of a hashcalculation competition between validators, participants who desire to join validators board and forge the new block have to lock a certain amount of coins into the network as a ﬁnancial stake. Thus, the chances for a node to be selected as the next validator depends on the size of the stake. Diﬀerent implementations of PoS exist. We present here some of the typical representatives including Ethereum PoS, DPoS (EOS), Ouroboros and its variants, and Snow white.

Fig. 2. Taxonomy of consensus protocols

Hybrid Protocols. This family represents protocols which attempt to beneﬁt from the advantages of the known protocols PoW, PoS and other established protocols to provide better performance.

Overlap Between Blockchain and DLT

495

DAG-Based Protocols. IOTA is a hybrid consensus protocol, marrying between PoW at the entry level and a custom transaction validation algorithm. IOTA relies on PoW to protect the network against spamming as the transactions are fee-less. paragraphAvalanche Avalanche is a recent leaderless Byzantine fault tolerance protocol built on a metastable mechanism via network subsampling. Avalanche protocol is based on a metastable mechanism, whereby a node repeatedly takes a uniform random sample from the network, sends queries repeatedly in multiple rounds and collects responses. Federated BFT. Ripple was the ﬁrst implementation of a federated Byzantine agreement system (FBAS, for short), which was extended later by Stellar protocol. FBA revisits BFT settings by providing an open membership service based on a trust model. In fact, FBA protocols depart from the concept that each node interacts only with a limited group of its trusted peers—the unique node list (UNL) in Ripple and the quorum slice in Stellar. Thus, unlike traditional BFT protocols, the federated Byzantine agreement (FBA) does not require a global and unanimous agreement among the network participants.

5

Execution Layer

In this section, as illustrated in Fig. 3, we identify the fundamental components of the execution layer and their properties. Then we present the execution component widely adopted in the state-of-the-art. 5.1

Components and Properties

In a DLT system, business logic, agreed to by counterparties, can be codiﬁed using a set of instructions and embedded into the ledger in a speciﬁc format. The ruleset execution is enforced by the distributed consensus mechanism. Generally, we distinguish between two main models for rules codiﬁcation: Smart contracts and built-in scripts (scripting model). Execution Environment Smart Contract Model. In this model, clauses between counterparties are codiﬁed as a stateful self-executed program. Typically, this program (known as smart contract) is implemented either in a dedicated language or using an existing programming language such as Java or C++. The smart contract execution is handled by a dedicated environment such as a virtual machine or a compiler, which proceeds the instructions deﬁned in the triggering transaction, returns an output and often results in updating states. Commonly, the smart contracts live and execute on the DLT as an independent entity with reserved states. Although they are qualiﬁed as ‘smart’, they are not autonomous programs, as they need external triggering transactions, nor contracts in a legal sense.

496

B. Bellaj et al.

Scripting Model. Unlike the smart contract model, the scripting model enables codifying the desired logic using only a usage-oriented and predeﬁned set of rules deﬁned by the protocol, which limits the possible scenarios to implement. The idea behind this limitation is to avoid security problems and reduce the complexity of the system. Typically, scripting model is implemented in the DLTs that focuses on securing the manipulation of built-in assets rather than providing a platform for running universal programs. Turing Completeness. Generally speaking, a given environment or programming language is said to be Turing-complete if it is computationally equivalent to a Turing machine. That is, a Turing-complete smart contract language or environment is capable of performing any possible calculation using a ﬁnite amount of resources. Some DLTs are capable of supporting a Turing-complete execution environment, which provides its users with the ﬂexibility to deﬁne complex smart contracts, whereas other DLTs provides Non-Turing complete execution environments because they suﬀer from some inherent limitations. Determinism. Determinism is an essential characteristic of the execution environment in DLT systems. Since the distributed program (e.g. smart contract) is executed across multiple nodes, the deterministic behaviour is needed to yield coherent and identical outputs to obviate discrepancies in the network. In order to ensure determinism, DLTs have to handle non-deterministic operations (e.g. Floating-point arithmetic, or random number generation, etc.) either by disabling these features or by enabling them in a controlled environment. Runtime Openness. In most DLTs, the execution environment or runtime is by design an isolated component without connections with external networks (e.g. Internet). However, in many case scenarios, the need for accessing information from outside the DLT manifested as a necessity. Thus, to allow such a feature, different design choices were introduced which can be classiﬁed into three approaches: – Isolated: where interactions between the smart contract execution environment and the external environments are not allowed – Oracle-based: where interactions with external environments are managed by members of the network who are called oracles. An oracle refers to a third-party or a decentralized data feed service that provides external data to the network. – Open: The execution layer is able to connect to the external environments. Interoperability. DLT operating networks are currently by-design siloed and isolated from each other. The interoperability, which we consider as the ability to exchange data, assets or transactions between diﬀerent DLTs, is a complex operation that requires passing transactions between them, in a trustless manner without the intervention of third parties. Interoperability is a highly desired property thus multiple solutions were developed to enable interoperability between diﬀerent existing DLTs. These solutions can be categorized into the following groups:

Overlap Between Blockchain and DLT

497

– Sidechain: is a blockchain running in parallel with another chain (known as main chain) that allows transferring data (cryptocurrency) from the main chain to itself. – Multichain: is a network of interconnecting chains, upon which other chains can be built. In a multichain one major ledger rules all the sub-ledgers. – Interoperability protocols: represent protocols and means (e.g. smart contracts) added to the original DLT to enable interoperability with other DLTs. – Interoperable DLT: represents a DLT designed with the goal to enable interoperability between other DLTS. 5.2

Execution Layer: State of the Art

In this section, we provide an overview of the most widely-used execution environments implemented in the industry and the literature with a discussion of their properties. Execution Environments Ethereum Virtual Machine. In Ethereum, smart contracts represent a computer program written in a high-level language (e.g. Solidity, LLL, Viper, Bamboo, etc.) and compiled into a low-level machine bytecode using an Ethereum compiler. This bytecode is stored in a dedicated account—therefore has an address—in the blockchain. Then, it is loaded and run reliably in a stack-based virtual machine called Ethereum Virtual Machine (EVM for short), by each validating node when it is invoked. To enable the execution of the bytecode and state update, the EVM operates as a stack-based virtual machine. It uses a 256-bit register stack from which the most recent 16 items can be accessed or manipulated at once. The stack has a maximum size of 1024 possible entries of 256-bits words. The EVM has a volatile memory operating as a word-addressed byte array, where each byte is assigned its own memory address. The EVM has also a persistent storage space

Fig. 3. The main components of execution layer

498

B. Bellaj et al.

which is a word-addressable word array. The EVM storage is a key-value mapping of 2256 slots of 32 bytes each. Unlike the memory which is volatile, storage is nonvolatile and it is maintained as part of the system state. The EVM is a sandboxed runtime and a completely isolated environment. That is, every smart contract running inside the EVM has no access to the network, ﬁle system, or other processes running on the computer hosting the EVM. The EVM is a security-oriented virtual machine, designed to permit the execution of unsafe code. Thus, to prevent Denial-of-Service (DoS) attack, EVM adopts the gas system, whereby every computation of a program must be paid for upfront in a dedicated unit called gas as deﬁned by the protocol. If the provided amount of gas does not cover the cost of execution, the transaction fails. Assuming given enough memory and gas, the EVM can be considered a Turing-complete machine as it enables to perform all sorts of calculations. Bitcoin Scripting. Bitcoin uses a simple stack-based machine to execute Bitcoin scripts. A Bitcoin script is written using a basic Forth-like language. It consists of a sequence of instructions (opcodes), loaded into a stack and executed sequentially. The script is run from left-to-right using a push-pop stack. A script is valid if the top stack item is true (non-zero) at the end of its execution. Bitcoin scripting is intentionally not Turing-complete, with no loops. Moreover, the execution time is bounded by the length of the script (Maximum 10 kilobytes long after the instruction pointer. This limitation prevents denial of service attacks on nodes validating the blocks. Bitcoin scripting is considered as a limited and complex language for writing smart contracts, to overcome this limitation, multiple projects have been introduced, such as Ivy [11], Simplicity, BitML which are high-language with richer features that compile into Bitcoin scripts. Also, [4] introduced BALZaC - a high-level language based on the formal model, and Miniscript was proposed recently as a language for writing (a subset of) Bitcoin Scripts in a structured way, enabling analysis, composition, generic signing and other features. In addition, Rootstock (RSK) [16] was proposed as a smart-contract platform that integrates an Ethereum-compatible virtual machine to Bitcoin. Interoperability. The inability for siloed DLTs to communicate with one another has been a major hindrance to the development of the blockchain space, therefore diﬀerent proposals have aimed to solve this problem. In this subsection we present the most important approaches implemented at the execution layer to solve this problem. Sidechains. Multiple sidechains have been proposed in the DLT ecosystem. Rootstock is a sidechain of Bitcoin, equipped with RVM a built-in compatible Ethereum virtual machine. Rootstock chain is connected to the Bitcoin (BTC) blockchain via a two-way peg enabling transfers from BTC to SBTC (Rootstock’s built-in currency) and vice versa using Bitcoin scripts, whereby users lock up their BTC in a special address and get an equivalent amount of RBTC on the sidechain. Similarly, Counterparty is another sidechain of Bitcoin where coins to be transferred

Overlap Between Blockchain and DLT

499

are burned or locked by sending them to an unspendable address and generating the equivalent in the Counterparty chain. Drivechain is another proposal for transferring BTC between Bitcoin blockchain and sidechains. Unlike most DLTs where the sidechain is a separate project, Cardano has introduced Cardano KMZ sidechain as part of its ecosystem. Cardano KMZ is a protocol which serves for moving assets from its two-layer CSL to the CCL (Cardano Computation Layer), or other blockchains that support the Cardano KMZ protocol. Another sidechainbased project is Plasma [19]. It aims for creating hierarchical trees of sidechains (or child blockchains) using a combination of smart contracts running on the root chain (Ethereum). The idea is to build connected and interoperable chains operated by individuals or a group of validators rather than by the entire underlying network. Thus, Plasma helps scaling Ethereum by moving transactions toward the sidechains. Currently, Plasma is actively developed and used by projects such as OmiseGo, which aims to build a peer-to-peer decentralized exchange, and Loom, which provides the tools needed to build high-performance DApps while being operating over the Ethereum network. Determinism. To deal with the non-determinism issue, three general approaches are adopted. The ﬁrst approach is to guarantee determinism by design. For instance, in Ethereum the EVM does not support, by-design, any nondeterministic operations (e.g. ﬂoating point, randomness, etc.). Nevertheless, due to the importance of randomness, the RANDAO [20] project has been proposed as an RNG (Random number generator) of Ethereum based on an economically secure coin toss protocol. The idea behind is to build DAO (decentralized autonomous organisation) for registering random data on the blockchain. The second approach, adopted by other projects such as Multichain, Corda, or Stratis which use existing runtime environments, ensure determinism by adapting these environments to force determinism processing. For instance, MultiChain uses Google’s V8 engine with sources of non-determinism disabled. Similarly, corda uses a custom-built JVM sandbox. Stratis limits the capabilities of C#, or all of .NET’s core libraries that can be used. The third approach (Determinism by endorsement) introduced by Hyperledger Fabric ensures determinism diﬀerently. In order to guarantee determinism, the endorsement policy can specify the endorsing nodes to simulate the transactions and execute the Chaincode. In the case, where endorsing peers diverge with diﬀerent outputs, the endorsement policy fails and the results will not be committed into the ledger. 5.3

Environment Openness

Most DLTs rely on oracles to read data from external sources. Simply put, an oracle is a smart contract maintained by an operator that is able to interact with the outside world. Several data feeds are deployed today for smart contract systems such as Ethereum. Examples include, Town Crier [24] and oracle Oraclize it. The latter

500

B. Bellaj et al.

relies on the reputation of the service provider and the former is based on the concept of enclave hardware root of trust [8]. Other oracles such as Gnosis and Augur [18], leverages prediction markets MakerDao, which is a decentralized lending facility built on the Ethereum blockchain, utilizes a multi-tiered approach to feed reliable price data for its assets without sacriﬁcing the decentralization. For instance, Medianizer [14] is used as a MakerDao oracle, which serves to provide accurate prices for Ethereum, collects data from 14 independent price feeds. Similar to the MakerDAO system, ChainLink [22] aggregates data feeds from many sources. Conversely to most DLTs, Fabric’s Chaincode is able to interact with external sources such as an online HTTP or REST API. In case, where every endorser gets a diﬀerent answer from the called API, the endorsement policy will fail and, therefore no transaction will take place. Other DLTs, like Aeternity [2] incorporate an oracle in the blockchain consensus mechanism removing the need for a third-party.

6

Application Layer

In this section, we brieﬂy introduce the components and properties we deﬁnes for the application layer and review the related solution as illustarted in Fig. 4. 6.1

Components and Properties

Integrability. As a new technology, which is often perceived as hard to adopt, DLT systems try to oﬀer a better user-experience by providing necessary tools (APIs, frameworks, protocols.) to enable better integrability with existing technologies and systems (e.g. Web, mobile). The integrability of a DLT can be considered as a qualitative property, thus it is possible to deduce a “Level of Integrability”. That is, we establish a small integrability scale from “high” to “low”.

Fig. 4. The main components of the application layer with examples

Overlap Between Blockchain and DLT

501

DApp Orientation and DLT’s Purpose. Decentralized software applications (or DApps for short) are software applications whose server and client tiers are decentralized and operating autonomously using a DLT. We consider a DLT as DApporiented if it focuses on oﬀering the necessary tools for building and maintaining decentralized applications, using diﬀerent protocols and APIs. Wallets are an important component of the application layer. Generally, they manage the user’s cryptographic identities. Wallets are responsible for all cryptographic operations related to the creation or storage of the user’s keys or digital certiﬁcates as well as the management of transactions. 6.2

Application Layer: State of the Art

Due to the vastness of diﬀerent approaches and tools provided by diﬀerent DLTs at the application layer, we overview only the application layer of few notorious DLTs. Integrability. DLTs generally introduce a layer of integration between external entities and their data and execution layer. DLTs like Ethereum NEO or EOS and others, have a richer toolset and integration tools. Ethereum oﬀers a robust and lightweight JSON-RPC API with a good support for the JavaScript language. It provides Web3.js, an oﬃcial feature-rich JavaScript library for interacting with Ethereum compatible nodes over JSON-RPC. Further, for a better integration into legacy systems, Camel-web3j connector provides an easy way to use the capabilities oﬀered by web3j from Apache Camel DSL. In addition, Infura provides online access for external actors to communicate with the Ethereum chain, through Metamask, dropping the need for running an Ethereum node or client making the DApp easier for the end-user. Similarly, EOS presents a wide set of tools and features, easing its integration and interaction with external systems. In fact, EOS provides multiple APIs such as EOSIO RPC API, and its implementations in different languages (EosJs, Py Eos, Scala Eos wrapper, Eos Java, etc.). These tools enable developers to interact with EOS using most used programming platforms. B2B-targeting DLTs such as Hyperledger fabric or Corda platform, tackle the integrability issue by providing rich integration SDKs. For instance, Fabric provides a RESTful API Server which uses Fabric SDK as a library to communicate with the DLT network. Fabric SDK currently supports Node.js and Java languages. Other technologies like Bitcoin or its variants (Litecoin, Dogecoin, etc.) enable less integrability as it was not aimed, by design, to integrate or communicate with other systems. It provides basic RPC features along with unoﬃcial implementation of its protocol in diﬀerent languages such as BitcoinJ, limited python implementations (e.g. pybtc), and others.

502

B. Bellaj et al.

DApp Orientation and DLT Purpose. Bitcoin and similar projects (e.g. Zcash, Litecoin) are created with the purpose to serve as mere secure digital cash networks. Thus, they are considered Cryptocurrency-oriented. Other DLTs try to propose along the cryptocurrency other types of P2P value transfers. In the case of storage oriented DLTs such as Sia Network, Storj, FileIo, Ipfs, the network manages data storage alongside a cryptocurrency. Similarly, the service-oriented DLTs propose services consuming the inherent token, such as “Steemit” which runs a social network or Namecoin which aims to provide a decentralized DNS. On the other hand, various DLTs are DApp-oriented and allow developers to build generic applications. For example, Ethereum, EOS, Stellar, TRON, and many others, propose a more ﬂexible development environment for building DApps with built-in tokens. For more information about current blockchain DApps landscape we refer to this study [23].

7

The Distinction Between Blockchain and Blockchain Like System

When deconstructing DLTs system using the DCEA framework and evaluating the diﬀerences between the two high-level taxons blockchain and the blockchain-like, we observe that they share many common characteristics, as well as distinguishing properties (Table 2). In a zoom-out view, we consider that a system is not a blockchain and belongs to the blockchain-like category, if it displays at least two of the following traits. First, lack of good decentralization. In fact, multiple DLTs consider sacriﬁcing the decentralization or weaken it for diﬀerent reasons such as better scalability, straightforward and seamless governance or because they are intended to be deployed in contexts that do not require decentralization. Second, a blockchain-like system tolerates data tampering and provides weak immutability either for states or transactions. Third, the data structure does not rely on chained blocks of transactions to store data. In Table 2, we summarize the distinctive settings of each category to enable the separation between the two categories of DTLs. However, both categories are not disjoint, but overlap-often considerably. We can ﬁnd a blockchainlike system that exhibits all blockchain properties except one or two. Moreover, in our distinction, we do not rely on the operational settings—being public or not or being permissioned or permissionless—for the separation between blockchain and blockchain—like because a project deployed in public and permissionless settings (e.g. Ethereum) can be as well deployed in private and permissioned settings and vice-versa.

Overlap Between Blockchain and DLT

503

Table 2. Settings of blockchain and blockchain-like in DCEA framework

Data layer

Components and properties

Blockchain

Blockchain-like

Data structure Shareability

Chain of block Global

Chainless model Restricted by design Oﬀ-chain Weak

States management On-chain Immutability Strong Consensus layer

Consensus identity model (membership) Governance Data ordering

Execution layer

Conﬂict resolution Turing completeness Openness Interoperability

Determinism Execution environment and rules enforcement Application layer Integrability

Permissionless

Permissioned

Democratic, Oligarchic Decentralized and open

Dictatorship, Oligarchic Centralized or reserved

Longestchain/No-Forks Turing/NonTuring complete Closed/Oraclebased Noninteroperable/ Interoperable Deterministic

Longestchain/No-Forks Turing and Non-Turing complete Open/Oraclebased Noninteroperable/ Interoperable Nondeterministic VM, Script runtimes

VM, Script runtimes

High, Medium, Low DApp orientation DApps, Cryptocurrency Wallet management Built-in

8

High, Medium, Low DApps/ Cryptocurrency Built-in or External

Conclusion

In this paper, we have proposed a comprehensive and referential framework to ease the understanding and the investigation of diﬀerent approaches adopted by diﬀerent DLTs at the four layers: data structure, execution, consensus and application layers. We have deﬁned a stack of DLTs components and their main properties

504

B. Bellaj et al.

after analysing the design choices adopted by a large spectrum of existing DLTs solutions. The layer-wise approach adopted by DCEA is aligned with the DLT’s modular architecture and will help to provide a better and modular understanding of DLTs to decision-makers, who could then make granular decisions at each layer to construct the best solution. Moreover, DCEA will serve as a comparative baseline to build a comparative analysis between diﬀerent DLT variants. In the future work, we aim to apply this referential framework to classify existing DLTs into two broad taxa: blockchain and blockchain-like systems.

References 1. Abraham, I., Malkhi, D.: The blockchain consensus layer and BFT. Bulletin of EATCS, 3(123) (2017) 2. Aeternity. æternity - a blockchain for scalable, secure and decentralized æpps 3. IEEE Standards Association. IEEE blockchain standards 4. Atzei, N., Bartoletti, M., Lande, S., Yoshida, N., Zunino, R.: Developing secure bitcoin contracts with BitML. In: ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1124–1128, August 2019. Association for Computing Machinery Inc., New York (2019) 5. Brakeville, S., Bhargav, P.: Blockchain basics: Glossary and use cases (2016) 6. Dawson, E.N., Taylor, A., Chen, Y.: ISO/TC 307 Blockchain and distributed ledger technologies 7. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM (JACM) 35(2), 288–323 (1988) 8. GlobalPlatform and Inc., GlobalPlatform Security Task Force Root of Trust Deﬁnitions and Requirements. Technical report (2017) 9. ISO/TR. ISO/TR 23455:2019 Blockchain and distributed ledger technologies Overview of and interactions between smart contracts in blockchain and distributed ledger technology systems 10. ITU. Focus Group on Application of Distributed Ledger Technology 11. IVY. GitHub - ivy-lang/ivy-bitcoin: A high-level language and IDE for writing Bitcoin smart contracts 12. Kiayias, A., Panagiotakos, G.: On trees, chains and fast transactions in the blockchain. In: Lange, T., Dunkelman, O. (eds.) LATINCRYPT 2017. LNCS, vol. 11368, pp. 327–351. Springer, Cham (2019). https://doi.org/10.1007/978-3-03025283-0 18 13. King, S., Nadal, S.: Ppcoin: Peer-to-peer crypto-currency with proof-of-stake. selfpublished paper, 19 August 2012 14. Maker. Maker - Feeds price feed oracles 15. McConaghy, T., et al.: Bigchaindb: a scalable blockchain database. white paper, BigChainDB (2016) 16. Nova Mining. Rootstock (RSK): Smart contracts on Bitcoin. Medium (2018) 17. Nikitin, K., et al.: {CHAINIAC}: Proactive software-update transparency via collectively signed skipchains and veriﬁed builds. In: 26th USENIX Security Symposium ({USENIX } Security 17), pp. 1271–1287 (2017) 18. Peterson, J., Krug, J., Zoltu, M., Williams, A.K., Alexander, S.: Augur: a Decentralized Oracle and Prediction Market Platform. Technical report (2018)

Overlap Between Blockchain and DLT

505

19. Poon, J., Buterin, V.: Plasma: Scalable autonomous smart contracts. White paper, pp. 1–47 (2017) 20. Randaow. GitHub - randao/randao: RANDAO: A DAO working as RNG of Ethereum 21. Sompolinsky, Y., Zohar, A.: Secure high-rate transaction processing in bitcoin. In: B¨ ohme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 507–527. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47854-7 32 22. Tschorsch, F., Scheuermann, B.: Bitcoin and beyond: a technical survey on decentralized digital currencies. IEEE Commun. Surv. (2015) 23. Wu, K.: An Empirical Study of Blockchain-based Decentralized Applications. arXiv preprint arXiv:1902.04969 (2019) 24. Zhang, F., Cecchetti, E., Croman, K., Juels, A., Shi, E.: Town Crier: An Authenticated Data Feed for Smart Contracts, pp. 270–282, 24–28 October 2016 (2016)

A Secure Data Controller System Based on IPFS and Blockchain Saad Alshihri(B) and Sooyong Park Department of Computer Science and Engineering, Sogang University, Seoul 04107, South Korea [email protected], [email protected]

Abstract. Blockchain is mainly used to store the amount of coins transferred and the information of sender/recipient in text format. Treating such simple information, computing and comparing hashes of the blocks, and maintaining its integrety already burden the system, which causes ‘the limit of block capacity’, the difficulty of putting a large amount of data in one block of the blockchain. This presents a solution to the block capacity problem by efficiently distributing and encrypting files using the IPFS program with which a large amount of data is recorded outside the Blockchain. Keywords: Blockchain · Access control · Decentralized storage · Data privacy

1 Introduction Blockchain, one of the key technologies of Industry 4.0, is a distributed ledger technology (DLT) that allows all participants to control record and store transaction information over a P2P trust network without central authority. DLT blockchain, also known as the second internet revolution, will most likely be used as an infrastructure technology for the fourth industry, such as artificial intelligence, IoT, big data and cloud computing. Blockchain is known as the core technology of Bitcoin, and Blockchain is not only being researched for integration in various fields such as finance, economics and logistics, but is also considered potentially capable of changing the entire ecosystem of the industry. In spite of such a great potential, Blockchain is still limited by data storage space, the amount of data a block can store. This paper describes an Inter Planetary File System (IPFS), a P2P distributed storage system that can record data outside the Blockchain, while retaining the advantages of the Blockchain authentication, recording and storing information about transactions on the Blockchain and distributing large amounts of data in a highly efficient manner.

2 Related Work Blockchain is a data loss prevention technology based on distributed computing, where data is stored in a small area of memory called a block, which is protected from random © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 506–510, 2022. https://doi.org/10.1007/978-3-031-10467-1_31

A Secure Data Controller System

507

changes and can be searched by anyone [1]. A block consists of a hash that identifies the block, a header that contains information about the block, and a body that contains information about the transaction. Each block has a header containing six pieces of information (version, hash of the previous block, Mekle hash, time, bit, nonce). The hash value of the previous block refers to the hash value of the block created just before the current block (see Fig. 1).

Fig. 1. Blockchain records list, blocks connected using encryption

Blocks are verified using the linkage to the previous block in the way that the hash includes of the previous hash.

3 IPFS Inter-Planetary File System is a hypermedia protocol that works with files and identifiers, and a distributed file system that connects all computer devices through the same file system [2]. The IPFS network divides a file into blocks and stores them, with contentbased hashes acting as the address through which files can be accessed and downloaded. IPFS can be a solution to block capacity limitations, an existing limitation in Blockchain

Fig. 2. IPFS saved files access through hash value stored in Blockchain

508

S. Alshihri and S. Park

DLT. Large file data is distributed and stored in IPFS nodes, and only IPFS file hashes are stored in the Blockchain. The file can be accessed and uploaded to the Blockchain network using content-based hashes (see Fig. 2). Process for accessing files stored in IPFS using hashes stored in the Blockchain. IPFS distributes and manages hash tables directly using distributed hash tables (DHTs), which can handle nodes with large data volumes while avoiding load concentration [3]. In addition, BitTorrent, a P2P file transfer protocol that distributes and stores files on the Internet and downloads files from multiple locations simultaneously to speed up transfer, is a representative technology that has been extended to include DHTs [4]. IPFS delivers large files quickly and efficiently via the BitSwap protocol, which is inspired by BitTorrent and implemented by exchanging blocks with peers. In addition, IPFS stores all data in the network in a Merkle DAG structure, which is a combination of a Merkle tree and a directed acyclic graph (DAG) in which each node can store data. Since the Mekle tree is a binary tree in which hash functions are applied in series, the hash value of the highest root will change if the data is manipulated at all. With IPFS, the content itself serves as the address for the data exchanged over the P2P network. Due to the structure of the Merkle DAG, integrity can be verified with a checksum of multiple passwords to ensure that the data has not been altered (Fig. 3).

Fig. 3. IPFS timestamp batches storage and ordering

A Secure Data Controller System

509

4 Materials and Methods Bitcoin has proven that trade can take place without intermediaries. However, there were still many challenges to overcome, and data protection was one of the main issues [5]. Financial experts predicted that storing transaction data for digital assets on the Blockchain could be risky if data protection is not guaranteed. The problem with Blockchain control is that transactions must be disclosed transparently and approved by all participants. Although many techniques have been investigated, another limitation was that the reliability of the assets exchanged on the Blockchain largely depends on the reliability of the verifiers [6]. In this study, we proposed a technique to solve the verification problem using IPFS, which encrypts users’ assets to effectively hide the exact transaction amount and allows each user to verify transactions. This technique improved privacy by encrypting the account balance and allowing anyone to verify the transparency of transactions. However, this technique still poses a risk of data misuse, because although the exact data is still hidden, an approximate range can be derived from it. Therefore, the random number security algorithm should be improved to thoroughly hide the data (Fig. 4).

Fig. 4. The IPFS-based architecture for collaborative SDN network

5 Conclusion This paper introduces a method to overcome the block capacity problem by using the Blockchain and IPFS structure. IPFS stores the hash values of encrypted files, which are generated when large files are stored in a hash book. Large stored files are read by comparing the hash values stored in the Blockchain with the values stored in IPFS. However, data imported from outside may itself contain errors. Therefore, the reliability of possible errors in the data itself, caused by various factors such as omissions or manipulation of data, has yet to be addressed.

510

S. Alshihri and S. Park

Acknowledgements. For This work has been financially supported by the information and Communication Technology Promotion Center, funded by the South Korea government (Ministry of Science, Technology and Information) in 2018 (No. 2022-2017-0-01628, Human resource training of information and communication technology).

References 1. Cohen, B.: Incentives build robustness in bittorrent. In: Workshop on Economics of Peer-to-Peer Systems, vol. 6, pp. 68–72 (2003) 2. Huang, H., Lin, J., Zheng, B., Zheng, Z., Bian, J.: When blockchain meets distributed file systems: an overview challenges and open issues. IEEE Access 8, 50574–50586 (2020) 3. Pham, V.-D., et al.: B-Box-a decentralized storage system using IPFS attributed-based encryption and blockchain. In: 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), pp. 1–6 (2020) 4. Routray, S., Ganiga, R.: Secure storage of electronic medical records (EMR) on interplanetary file system (IPFS) using cloud storage and blockchain ecosystem. In: 2021 Fourth International Conference on Electrical Computer and Communication Technologies (ICECCT), pp. 1–9 (2021) 5. Tiwari, A., Batra, U.: IPFS enabled blockchain for smart cities. Int. J. Inf. Technol. 13(1), 201–211 (2020). https://doi.org/10.1007/s41870-020-00568-9 6. Hussien, H.M., Yasin, S.M., Udzir, N.I., Ninggal, M.: Blockchain-based access control scheme for secure shared personal health records over decentralised storage. Sensors (Basel, Switzerland) 21(7), 2462 (2021). https://doi.org/10.3390/s21072462

Key Exchange Protocol Based on the Matrix Power Function Defined Over IM16 Aleksejus Mihalkovich(B) , Eligijus Sakalauskas, and Matas Levinskas Department of Applied Mathematics, Kaunas University of Technology, Kaunas, Lithuania [email protected]

Abstract. In this paper we propose a key exchange protocol (KEP) based on the so-called matrix power function (MPF) deﬁned over a noncommuting platform group. In general, it is not possible to construct KEP using a non-commuting platform group. Therefore we proposed special templates for our public parameters thus allowing us to construct KEP relying on the basic properties of our MPF. Security analysis is based on the decisional Diﬃe-Hellman (DDH) attack game. We proved that the distribution of the entries of the public session parameter matrices and the shared key matrix asymptotically approaches to uniform with exponential rate. Hence proposed KEP is secure under the DDH assumption. This implies that our protocol is not vulnerable to the computational Diﬃe-Hellman (CDH) attack. We presented the evidence of CDH security by numerical simulation of linearization attack and showed that it is infeasible.

Keywords: Non-commutative cryptography Key exchange protocol

1

· Matrix power function ·

Introduction

Nowadays usage of non-commuting algebraic structures to establish cryptographic primitives such as key exchange or asymmetric encryption is considered a perspective trend of cryptography. Some of the ﬁrst key exchange protocols, which used non-commuting algebraic structures as a platform were described in [2] and [6]. However, in [20] the authors showed that the Ko-Lee key exchange protocol can be attacked by switching the conjugacy search problem to double coset problem, which is easier to solve. The mentioned protocols are examples of cryptographic primitives loosely based on the generalized discrete logarithm problem (DLP) [7]. This generalization was presented in [5] and was later applied to generalize such classic protocols like Diﬃe-Hellman key exchange and El-Gamal encryption scheme to the realm of non-commuting cryptography. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 511–531, 2022. https://doi.org/10.1007/978-3-031-10467-1_32

512

A. Mihalkovich et al.

The idea of raising a matrix to some integer power was also used to propose new protocols in non-commuting cryptography. This approach led to the proposals published in [1] and [21]. However, these protocols still relied on the variation of DLP deﬁned over matrix groups. Furthermore, this problem was proven to be reducible to some extension of the regular ﬁeld IFq . Our research in non-commutative cryptography relies on the properties of the so-called matrix power function (MPF), which was ﬁrst mentioned in [19]. The idea behind the deﬁnition of MPF is somewhat similar to regular matrix multiplication. Let us assume, that the square m × m matrix W is deﬁned over some multiplicative (semi)group S whereas square m × m matrices X and Y are deﬁned over some ring of scalars IR. Conveniently we denote the semigroup of square m×m matrices deﬁned over the algebraic structures S and IR by Matm (S) and Matm (S), respectively. Deﬁnition 1. MPF is a mapping Matm (IR)×Matm (S)×Matm (IR) → Matm (S) denoted as X WY = E, where the elements of matrix E = {eij } are calculated as follows: eij =

m m k=1 l=1

x

wklik

ylj

.

Keeping with deﬁnitions established in [11] we call the multiplicative (semi)group S a platform (semi)group and the numerical ring IR - a power ring. Furthermore, we refer to W as a base matrix and to X, Y as power matrices. Over the years several cryptographic protocols based on this function have been proposed in [11,12,15,19]. However, these protocols used commutative algebraic structures as a platform. Consequently, protocols described in [11,19] were attacked using linear algebra in [8]. The authors of the latter paper showed, that the system of matrix equations used to ﬁnd the private key of Alice (or Bob) can be transformed to a system of linear equations, and hence mentioned protocols can be broken in polynomial time. Partly due to the described attack improvements to the initial protocol presented in [11] were made in [16] and [17]. Furthermore, an analysis of an improved version of the so-called matrix power asymmetric cipher (MPAC) was performed focusing on denying the linear algebra attack and generating secure public parameters [10,17]. Authors also analyzed the performance of MPAC in [12,16,17] showing that it is possible to execute this protocol in embedded systems. Recently, the focus of research in this area leaned towards using noncommutative algebraic structures to deﬁne platform semigroup. Papers written by [14] and [18] showed, that this direction is promising. In this paper we recall one of the possible non-commuting groups to be used as a platform for MPF. Previous investigation of properties of the MPF deﬁned over the group IM16 presented in [9] has shown, that we can construct cryptographic primitives using this group as a platform. Hence in this paper we

KEP Based on MPF Over IM16

513

present a key exchange protocol based on MPF deﬁned over IM16 and prove its validity. We also perform the algebraic cryptanalysis of the presented protocol focusing on private key recovery.

2

Preliminaries

Let us ﬁrst recall the deﬁnition of the non-commuting group IM16 as presented in [4]: (1) IM16 = a, b|a8 = e, b2 = e, bab−1 = a5 . The index of the deﬁned group indicates, that there are exactly 16 distinct elements in it. One of these is a neutral element e. Note also, that generators a and b do not commute. In fact a5 b = ba and ba5 = ab. This follows directly from the deﬁnition of IM16 since b−1 = b. Hence this group is not isomorphic to the commuting group ZZ 8 × ZZ 2 . In fact it was shown in [4], that IM16 is one of the seven non-commuting groups of size 16, which are not isomorphic to any other group. It is important to note, that each element of the form ak1 b can be represented by an element of the form bak2 depending on the parity of power k1 . For more clarity, let us present a table of equivalent elements in IM16 in Table 1. Table 1. Equivalent elements in IM16 ab

a 2 b a3 b a4 b a5 b a6 b a7 b

ba5 ba2 ba7 ba4 ba

ba6 ba3

For key exchange protocol to function properly we have to deﬁne the normal form of the elements of IM16 . Hence we pick the latter form and consider it normal. Of course, our results stay valid if we switch to the opposite form of the elements. Let us formulate as propositions the basic operations with the elements of IM16 . Note, that we are going to use powers α, α1 , α2 ∈ {0, 1} and k, k1 , k2 ∈ {0, 1, 2, . . . , 7}. We also keep in mind, that powers of generator a are reduced modulo 8 and powers of generator b are reduced modulo 2. Proposition 1. Given two elements w1 = bα1 ak1 and w2 = bα2 ak2 their product is calculated in the following way: ⎧ bα1 +α2 ak1 +k2 if k1 is even; ⎨ α1 k1 +k2 b a if k1 is odd and α2 = 0; w1 · w2 = (2) ⎩ α1 +1 k1 +k2 +4 b a if k1 is odd and α2 = 1;

514

A. Mihalkovich et al.

Proof. The case of α2 = 0 is trivial and hence we omit it. In the case of α2 = 1 we consider the middle term ak1 b. Due to deﬁnition of the group M 16 we have: a2l b = ba2l ;

(3)

a2l+1 b = ba2l+5 .

(4)

By applying one of the equalities (3) or (4) we obtain one of the cases deﬁned by (2) depending on the parity of k1 . Proposition 2. Given an element w = bα ak its n-th power is calculated in the following way: ⎧ akn , if α = 0; ⎨ n kn n b a , if α = 1 and k is even; (5) w = ⎩ n kn+4[ n2 ] , if α = 1 and k is odd, b a where notation n2 stands for the integer part of n2 . Proof. The case of α = 0 is trivial and hence we omit it. In the case of α = 1 and an even value of k we can apply squaring algorithm and the equality (3) and hence no extra summands appear. As for the latter case of formula (5), let us consider the squaring algorithm and the equality (4). We calculate the second and fourth powers of the element w:

2 wk = bak = bak bak = bbak+4 ak = a2k+4 ; 2

4

w4 = bak = a2k+4 = a4k , since all the powers of the generator a are reduced modulo 8. Hence we can see, that an extra summand of 4 appears when rising to either second or sixth power. For odd values of n we can now use the squaring algorithm together with the second case of formula (2) to obtain the extra summand of 4 when rising to the third or seventh power. Let us now consider the function f (n) = 4 n2 mod 8. Table 2. Values of Function f (n) n

0 1 2 3 4 5 6 7

f (n) 0 0 4 4 0 0 4 4

We clearly see from Table 2, that the function f (n) indicates if the extra summand of 4 appears or not. This proves validity of formula (5). Corollary 1. Given an element w = bα ak its inverse is calculated in the following way: ⎧ a−k , if α = 0; ⎨ −1 −k (6) w = ba , if α = 1 and k is even; ⎩ 4−k ba , if α = 1 and k is odd.

KEP Based on MPF Over IM16

515

Proof. Recall, that −1 ≡ 7 mod 8. Hence, formula (6) is a special case of formula (5) when n = 7. The group IM16 has the following trivial cyclic subgroup of order 8:

a = e, a, a2 , a3 , a4 , a5 , a6 , a7 .

(7)

Furthermore, by considering the elements of M16 we can derive the following cyclic subgroup of order 8:

(8) ba = e, ba, a2 , ba3 , a4 , ba5 , a6 , ba7 Using the group IM16 as a platform group for structure of a base matrix W. ⎛ baω11 aω12 · · · bα1c aω1c ⎜ aω21 aω22 · · · bα2c aω2c ⎜ ⎜ ··· ··· ··· ··· ⎜ ω ωi2 αic ωic i1 a · · · b a a W=⎜ ⎜ ⎜ ··· · · · · · · · · · ⎜ ω ⎝a (m−1)1 · · · · · · ··· baωm1 aωm2 · · · bαmc aωmc

MPF we deﬁne the following ⎞ · · · baω1m · · · aω2m ⎟ ⎟ ··· ··· ⎟ ⎟ · · · aωim ⎟ ⎟ ··· ··· ⎟ ⎟ · · · aω(m−1)m ⎠ · · · baωmm

(9)

Further in this paper we will specify the entries of this matrix in greater detail to be used for key exchange.

3

Properties of MPF

Previously in our research we presented several cryptographic primitives based on the MPF problem having some similarity with classical discrete logarithm problem (DLP) and deﬁned in the following way: Deﬁnition 2. Given base matrix W and an MPF value matrix E ﬁnd matrices X and Y in the following equation X

WY = E.

(10)

We can see, that this problem is based on the following property of MPF:

X Y X Y W , (11) W = i.e. the order of actions in (10) does not matter. This can be easily shown for any commuting platform semigroup, which is used to deﬁne entries of matrices W and E. This result is also valid for the so-called modiﬁed medial semigroup. We have previously used this semigroup to construct a key exchange protocol. We also showed in [14], that obtaining a private key from Alice’s (or Bob’s) public key is an NP-complete problem. However, the identity (11) in general does not hold in the case of noncommuting platform semigroup as is in our case. Hence we can deﬁne the following functions:

516

A. Mihalkovich et al.

Deﬁnition 3. If the actions in MPF are performed from left to right, then we call this function the left-to-right MPF (LRMPF), i.e. ELR =

X

W

Y

.

(12)

Deﬁnition 4. If the actions in MPF are performed from right to left, then we call this function the right-to-left MPF (RLMPF), i.e.

ERL = X WY . (13) It is clear, that in the case of commuting platform semigroup ELR = ERL = E due to the property (11) as was in our previous research. Furthermore, MPF deﬁned over a commuting platform semigroup has the following properties:

U X W = UX W; (14)

Y V W = WYV . (15) The latter identities do not hold in general if the platform semigroup is noncommuting. Hence the key exchange protocol, deﬁned previously in [19] cannot be executed between Alice and Bob due to failure of properties (11), (14) and (15) in case of non-commuting platform semigroup. However, we can use some facts from our previous research to establish a working key exchange protocol between Alice and Bob. To achieve this goal, we have previously deﬁned templates in our paper [9]. The presented templates are based on the fact, that MPF properties hold in case of commuting entries of base matrix W. More precisely we consider columns of matrix W in case of LRMPF and rows of matrix W if RLMPF is used. In this paper we focus on LRMPF and hence consider the columns of matrix W. Of course, all of our results obtained in this paper also hold for RLMPF if we consider the rows of matrix W.

4

Key Exchange Protocol

We start by slightly modifying the matrix W in the following way: ⎞ ⎛ 2ω +1 ω ba 11 a 12 · · · bα1c aω1c · · · ba2ω1m +1 ⎜ a2ω21 aω22 · · · bα2c aω2c · · · a2ω2m ⎟ ⎟ ⎜ ⎟ ⎜ ··· ··· ··· ··· ··· ··· ⎟ ⎜ 2ω ωi2 αic ωic 2ωim ⎟ i1 ⎜ a ··· b a ··· a W=⎜ a ⎟ ⎟ ⎜ ··· ··· ··· ··· ··· ··· ⎟ ⎜ 2ω ω(m−1)m ⎠ ⎝a (m−1)1 · · · · · · ··· ··· a ba2ωm1 +1 aωm2 · · · bαmc aωmc · · · ba2ωmm +1

(16)

Note, that entries of each column of matrix W, aside from three columns, commute. These are the ﬁrst and the last and the c-th one. Clearly, in general entries of matrix W do not commute. Also note, that we ﬁxed the parity of

KEP Based on MPF Over IM16

517

powers of generators a and b in the ﬁrst and last columns of the matrix W whereas the parity of other entries of this matrix is irrelevant. We use the following template for the left side matrices of (12). This template was previously deﬁned in [9] for the LRMPF case. In that paper matrix W had the so-called corner form, i.e. a column of non-commuting entries was not considered. Template 1. Choose matrix X in (12) so that ∀i = 1, 2, . . . , m : xi1 + xim ≡ 0 mod 2. By applying the deﬁned template to matrix X of (12) we obtain an intermediate result H = X W. The entries of matrix H in all the columns aside from the c-th one are various powers of the generator a. We present the proof of this fact while proving the validity of our protocol. However, the c-th column may contain any element of group IM16 . Hence we deﬁne an extra template for the right side matrices of (12) to eliminate the generator b in the public session parameter in KEP, since A = HY . Template 2. Choose matrix Y in (12) so that ∀j = 1, 2, . . . , c − 1, c + 1, . . . m : ycj ≡ 0 mod 4 and ycc ≡ 2 mod 4. Obviously, matrices, satisfying either of the deﬁned templates, are singular modulo 2 and hence non-invertible modulo 8. Proof of this fact is trivial and follows directly from the basic properties of the determinant and modular arithmetic. Let us now assume that Alice and Bob desire to agree on a common key. Publicly known parameters are the following: – Square base m × m matrix W deﬁned over IM16 and having the structure (16); – Square power m × m matrices L and R deﬁned over ZZ 8 satisfying Templates 1 and 2 respectively. Alice performs the following actions to generate private and public data: 1. She chooses at random a vector α = (α11 , . . ., α1m , α21 , . . ., α2m ) of 2m coefﬁcients and uses it to calculate two matrices as polynomials of L and R respectively: X = α11 L + α12 L2 + . . . + α1m Lm ; Y = α21 R + α22 R2 + . . . + α2m Rm . Here it is important to note, that matrix Y has to satisfy Template 2. If it does not, then an extra term 2I can be added to the polynomial value. 2. She then uses the obtained values of X and Y to calculate matrix A in the following way: Y

A = XW

518

A. Mihalkovich et al.

Upon completing these steps Alice acquires her protocol data: private key and her public session parameter P uKA = A. Alternatively the P rKA = α pair (X, Y) can be kept as a private key for faster execution. As usual, Alice publishes her public session parameter online. Bob performs actions similar to Alice’s to obtain his data: = (β11 , . . ., β1m , β21 , . . ., β2m ) and uses it to 1. He chooses at random a vector β calculate two matrices as polynomials of L and R respectively: U = β11 L + β12 L2 + . . . + β1m Lm ; V = β21 R + β22 R2 + . . . + β2m Rm . Note, that matrix V has to satisfy Template 2. An extra term 2I can be added to the polynomial value if it does not. 2. He uses the obtained values of U and V to calculate matrices B in the following way: V

B = UW . and his public session parameter Bob now has his private key P rKB = β P uKB = B, which is published online. Alice can use Bob’s public session parameter to obtain the following result: KA =

X Y B

(17)

Similarly, Bob can use Alice’s public session parameter to obtain a matrix KB =

U V A

(18)

Since Alice and Bob have two pairs of commuting matrices, i.e. XU = UX; YV = VY, they have agreed on a common key K = KA = KB . The proof of the validity of the presented key exchange protocol relies on the following facts: Fact 1. The polynomial structure of private matrices X and U preserves the validity of Template 1. Fact 2. The polynomial structure of private matrices Y and V preserves the validity of Template 2 in the following way: – If α21 ≡ 0 mod 2 (or β21 ≡ 0 mod 2), then no extra terms are needed; – If α21 ≡ 1 mod 2 (or β21 ≡ 1 mod 2), then an extra term 2I needs to be added.

KEP Based on MPF Over IM16

519

These facts follow directly from the deﬁnitions of basic actions with matrices (matrix sum and product, multiplication by a scalar) and modular arithmetic modulo 8, and hence their proof is omitted. We now prove the following fact for the intermediate result H = X W, where matrix X satisﬁes Template 1: Fact 3. The entries of matrix H in all the columns aside from the c-th one are various powers of the generator a. Proof. This fact comes from the equal parity of entries of the power matrix. More precisely, if, according to Template 1, for some value of index i the entries xi1 and xim are even, then generator b is eliminated due to the third case of (5). Otherwise, if for some value of index i the entries xi1 and xim are odd, then the generator b is eliminated due to the structure (16) of matrix W and the third case of (2). Hence only the c-th column may contain any element of IM16 . The following fact regarding Template 2 is trivial since entries of the c-th column are even: Fact 4. The entries of matrix A = HY , where matrix Y satisﬁes Template 2 are various powers of the generator a. Corollary 2. Due to the deﬁned templates and the latter two facts the consequent actions performed during the protocol execution are no diﬀerent from regular MPF and hence we have:

X

W

Y V

=

X

W

V Y

.

(19)

Fact 5. If the power matrices Y and V satisfy Template 2, then the following identity holds:

Y V X U Y V U X W = W . (20) Proof. Due to Fact 3 both matrices X and U eliminate generator b leaving only the c-th column to deal with. Note, that if yic or vic is a multiple of 4, then no extra summand of 4 appears when raising to that power according to Table 2. Also due to Template 2 ycc ≡ vcc ≡ 2 mod 4 and hence summand of 4 is added two times canceling each other out modulo 8. Remark 1. The latter fact is invalid if matrix V is not used, i.e.

Y X U Y U X = . W W It is important to note that the two deﬁned templates are the main reasons that the protocol is valid since they grant us identity (20). Furthermore, due to Facts 3 and 4, all the entries of the LRMPF value matrix are presented in their normal form.

520

A. Mihalkovich et al.

Note, however, that in general MPF properties (14) and (15) do not hold for the considered structure (16) of the base matrix W. Upon executing the proposed key exchange protocol Alice and Bob obtain a shared key since

Y

V Y = KA = X B = X U W =

U X

W

Y V

=

U V A = KB ,

which is true due to identities (19) and (20).

5

Resistance of the Proposed KEP Against Decisional Attack

Let us now consider the distribution of entries of the LRMPF value matrix, i.e. we focus on the expression (24) and consider the probability P r[aij = as0 ], where s0 is some ﬁxed power n of generator a. Deﬁne s(n) = i=1 xi yi . We start by proving the following lemma: Lemma 1. Let xi ∈ ZZ 2 and yj ∈ ZZ 2 , where i = 1, 2, . . . , n, j = 1, 2, . . . , n, be uniformly distributed random variables. The limit lim P (s(n) = s0 ) = 12 , n→+∞

where s0 ∈ ZZ 2 is a ﬁxed value.

Proof. We prove this result by applying induction with respect to n. Since for all natural values of i and j we have P r[xi = 0] = P r[xi = 1] = 12 and P r[yj = 0] = P r[yj = 1] = 12 we determine the values of probabilities of P r[s(n) = s0 ] after n iterations. Furthermore, we also inspect their deviations from 12 . The following is true if n = 1: P r[s(1) = 0] = P r[x1 y1 = 0] = P r[s(1) = 1] = P r[x1 y1 = 1] =

3 4 1 4

= =

1 2 1 2

+ −

1 22 ; 1 22 .

1 . We can see that the deviation term σ(1) = 212 = 21+1 We now calculate the next iteration, i.e. for n = 2 we have:

P r[s(2) = 0] = P r[s(2) = 1] =

5 8 1 4

= =

1 2 1 2

+ −

1 23 ; 1 23 .

1 Hence the deviation term is σ(2) = 213 = 22+1 . Let us now assume that for n = k the following holds:

P r[s(k) = 0] = P r[s(k) = 1] =

3 4 1 4

= =

1 2 1 2

+ −

1 ; 2k+1 1 . 2k+1

Due to the obvious identity P r[s(k + 1) = s0 ] = P r[s(1) = 0] · P r[s(k) = s0 ]

KEP Based on MPF Over IM16

521

+P r[s(1) = 1] · P r[s(k) = s0 − 1] we get the following result by calculating probabilities for n = k + 1: P r[s(k + 1) = 0] = ( 12 + P r[s(k + 1) = 1] = ( 12 +

1 ) 2k+1 1 ) 2k+1

· ·

3 4 1 4

+ ( 12 − + ( 12 −

1 ) 2k+1 1 ) 2k+1

· ·

1 4 3 4

= =

1 2 1 2

+ −

1 2k+2 1 2k+2

1 We can see that the deviation σ(k + 1) = 2(k+1)+1 and hence induction assumption holds for all natural values of n. By calculating the limit when n tends to positive inﬁnity we get: 1 1 1 ± n+1 = . lim P r[s(n) = s0 ] = lim n→+∞ n→+∞ 2 2 2

Hence the result is proven. Using this lemma as a basis we can show that the following results are true: Lemma 2. Let xi ∈ ZZ 4 and yj ∈ ZZ 4 , where i = 1, 2, . . . , n, j = 1, 2, . . . , n, be uniformly distributed random variables. The limit lim P r[s(n) = s0 ] = 14 , where s0 ∈ ZZ 4 is a ﬁxed value.

n→+∞

Lemma 3. Let xi ∈ ZZ 8 and yj ∈ ZZ 8 , where i = 1, 2, . . . , n, j = 1, 2, . . . , n, be uniformly distributed random variables. The limit lim P r[s(n) = s0 ] = 18 , where s0 ∈ ZZ 8 is a ﬁxed value.

n→+∞

To shorten this paper we omit the proofs of these lemmas. Using Lemmas 1–3 we prove the main proposition: Proposition 3. Let L and R be two randomly generated matrices satisfying Templates 1 and 2 and let X and Y be two matrices calculated as polynomials of L and R while also satisfying these templates. Also, let W be a randomly generated matrix having the structure (16). Entries of matrices L, R, W are chosen uniformly from subsets of possible values and coeﬃcients of polynomials are distributed uniformly in ZZ 8 . For any entry aij of the LRMPF value matrix A in expression (24) the limit lim P r[aij = aω0 ] = 18 . m→+∞

Remark 2. Based on the templates deﬁned above we have: – For entries of matrix L: with the exception of the last column lij ∈ 0, 7 and hence P r[lij = l0 ] = 18 , where l0 ∈ 0, 7 is ﬁxed. Entries of the last column satisfy Template 1 and hence P r[lim = l0 |li1 ≡ l00 mod 2] = 14 , where l0 is ﬁxed and is in the set [0, 2, 4, 6] if l00 = 0 and is in the set [1, 3, 5, 7] if l00 = 1; – For entries of matrix R: with the exception of the c-th row rij ∈ 0, 7 and hence P r[rij = r0 ] = 18 , where r0 ∈ 0, 7 is ﬁxed. Entries of the c-th row satisfy Template 2 and hence for j = c probability P r[rcj = r0 ] = 12 , where r0 ∈ [0, 4] and P r[rcc = r0 ] = 12 , where r0 ∈ [2, 6]. Remark 3. Based on structure (16) of matrix W we have:

522

A. Mihalkovich et al.

– For w11 , w 1m , wm1 , wmm the value is chosen uniformly from the subset entries ba, ba3 , ba5 , ba7 , e.g. P r[w11 = baω0 ] = 14 , where ω0 ∈ [1, 3, 5, 7]; 1 – For entries of the c-th column P r[wic = w0 ] = 16 , where w0 ∈ IM16 ; 1 ω0 – For all other entries P [w11 = a ] = 8 , where ω0 ∈ 0, 7. Remark 4. We keep in mind that Lemmas 1-3 were proven using limits. For now we ignore this fact since otherwise we would have to apply limits to calculations of every probability in our proof. Proof. Relying on the polynomial structure of private matrices X, Y, U, V, Facts 1, 2 and Lemmas 1-3 we claim that the entries of these matrices satisfy statements presented in Remark 2. We omit the proof of this fact. Let us consider the intermediate result H = X W . We turn our attention to the simplest case ﬁrst, i.e. we consider the columns of W containing only distinct powers of generator a. Hence we focus on the following expression: hij =

m

(aωkj )

xik

,

k=1

where j = 1, c, m. Since, according to Template 1, we have xi1 ≡ xim mod 2, we can rewrite the latter expression in the following way:

ω1j +ωmj xi1 m−1 x x −x hij = a (aωkj ) ik (aωmj ) im i1 . k=2

However, addition modulo 8 is a balanced function (any value of the sum modulo 8 of two elements of ZZ 8 is equally possible) and hence P (ω1j + ωmj = ω0 ) = 18 for any ﬁxed value ω0 ∈ ZZ 8 . Denoting ω = ω1j + ωmj we obtain the following expression distributed uniformly in the subset a deﬁned by (7) based on Lemma 3: xi1 m−1 ω ωkj xik (a ) . H = a k=2

Let us now denote the diﬀerence xim − xi1 = 2x . Since the power 2x ωmj is even modulo 8 let us denote

P r[a2x ωmj = a0 ] = p0 ; P r[a2x ωmj = a2 ] = p2 ; P r[a2x ωmj = a4 ] = p4 ; P r[a2x ωmj = a6 ] = p6 . Then the probability h0

P r[hij = a ] =

3

P r[a2x ωmj = a2k ]P r[h = ah0 −2k ]

k=0

p0 + p 2 + p 4 + p 6 1 = , 8 8 where h0 ∈ ZZ 8 is a ﬁxed value. =

KEP Based on MPF Over IM16

523

We now consider the ﬁrst and the last columns of matrix W. The entries of matrix H in these columns are calculated as follows:

2ω1j +1 xi1 m−1

2ωmj +1 xim x ik a2ωkj hij = ba , ba k=2

where j = 1 or m. Exploring this expression on Lemma 2 we get a and relying uniform distribution in the subset a2 = e, a2 , a4 , a6 for the middle product: h =

m−1

a2ωkj

xik

.

k=2

Due to Template 1 generator b is eliminated. Furthermore, since xi1 ≡ xim mod 2 and the power in h is always even, extra summand of 4 appears with probability 1 2 depending on the parity of xi1 . It is also important to note, that the power in on the balance of the modular addition we hij is always even and hence relying get a uniform distribution in a2 for all entries of H in the considered columns, since P r[hij = a2h0 ] = P r[xi1 ≡ 0 mod 2]P r[w = a2h0 ] +P r[xi1 ≡ 1 mod 2]P r[w = a2h0 −4 ] = 12 · 14 + 12 · 14 = 14 ,

xi1 2ω +1 xim where w = ba2ω1j +1 h ba mj and h0 ∈ ZZ 4 is a ﬁxed value. Lastly, exploring the c-th column of matrix W we note that generators a and b can be considered separately and hence due to Lemmas 1 and 3 and relying on the balance of the modular addition we claim that entries of the c-th column of matrix H are distributed uniformly in IM16 . It is clear that in general entries of matrix H have distinct distributions. We now move on to consider the distribution of the LRMPF result matrix A = HY . Since generator b is gone entries of this matrix are commuting. We then split the product m y hikkj aij = k=1

into two parts

m

a =

k=1,k=c y

y

hikkj ;

y

km a = hi11j hiccj hyim .

Due to Lemma 3 a is a uniformly distributed in a random variable. The power in a is always even and hence we get a uniform distribution in a for all entries aij since P r[aij = aω0 ] =

3 k=0

P r[a = aω0 −2k ]P r[a

524

A. Mihalkovich et al.

1 p0 + p 2 + p 4 + p 6 = , 8 8 where ω0 ∈ ZZ 8 is a ﬁxed value and = a2k ] =

P r[a = a0 ] = p0 ; P r[a = a2 ] = p2 ; P r[a = a4 ] = p4 ; P r[a = a6 ] = p6 . This ends our proof. Let us now consider the deviation from the asymptotic uniform distribution of a single entry of the LRMPF value matrix. Note, that the signiﬁcant parts of the deviations denoted by σ0 (n) in Lemmas 1–3 are as follows: σ0 (n) = 2−(n+log2 μ) ,

(21)

where μ is the modulo (2, 4, or 8). Furthermore, the signiﬁcant part of deviation for any modulo μ we consider is obtained by dividing the term 2−n by that modulo. Based on this observation we claim that the deviation of a single entry of the LRMPF value matrix is expressed in the following way: σ(m) = σ0 (m) ± O(4−m ), where σ0 (m) is deﬁned by (21) for any value of square matrix order m. Due to the obtained result, we deﬁne the following decisional game for our KEP. Decisional Diﬃe-Hellman Attack Game for LRMPF KEP Let L, R, W be public parameters of our KEP and let α = (α11 , . . . , α2m ) and β = (β11 , . . . , β2m ) be two secret vectors of coeﬃcients used to calculate private Y

V

and U W be public session keys of both parties X, Y, U and V. Let X W

Y V . parameters of KEP. Assume that the shared key is K = U X W For a given polynomial-time adversary A and a challenger we deﬁne the following experiment: 1. The challenger chooses at random bit rbit ← [0, 1]. 2. Challenger chooses at random two vectors of coeﬃcients α = (α11 , . . . , α2m ) = (β , . . . , β ). He uses these coeﬃcients to compute matrices X , and β 11 2m Y , U and V . Furthermore, he uses the obtained matrices to calculate public Y V U X W . session parameters A , B and the shared key K1 =

V Y He also calculates K0 = U X W and grants adversary with the data (A, B, Krbit ). 3. Adversary A outputs a bit rbit ∈ [0, 1].

KEP Based on MPF Over IM16

525

The outcome of the experiment is deﬁned to be 1 if rbit = rbit and 0 otherwise. Denote Erbit an event that for an adversary A the outcome of the deﬁned experiment is 1. His advantage in solving DDH problem for LRMPF KEP is DDHadv[A, LRM P F ] = |P r[E0 ] − P r[E1 ]|. We say that the DDH assumption holds for LRMPF if for all eﬃcient adversaries A the quantity DDHadv[A, LRM P F ] is negligible when security parameter m > m0 . Since DDHadv[A, LRM P F ] depends on m, then DDHadv[A, LRM P F ] = |P r[E0 ] − P r[E1 ]| < negl(m).

(22)

Since the asymptotic uniform distribution of entries of matrices A, B and K was proven using limits with respect to m, the logical question is the size of this parameter. To clarify this moment we present the results of our experiments. We generated 500 instances of the presented KEP and kept track of frequencies of each power from 0 to 7 of generator a. To derive the frequency of each power we divided the total number of the appearance of the certain power by the 500m2 (total number of observations). The expected value of each frequency was 0.125 = 18 , which we marked with a dotted line. We considered both public matrices A and B and the shared key K. The results are presented in Figs. 1, 2 and 3, where we used the 8-th power to represent 0-th power (since 8 ≡ 0 mod 8) for more convenience.

Fig. 1. Distribution of single entry of alice public session parameter A

Fig. 2. Distribution of single entry of bob public session parameter B

526

A. Mihalkovich et al.

Fig. 3. Distribution of single entry of the shared key K

We can see, that the diﬀerence from the uniform distribution becomes less noticeable as m increases, which illustrated the validity of Proposition 3. However, according to the obtained results, the value m0 in the deﬁned game has to be at least no less than 16 to achieve the desired distribution. A larger value of m0 can be considered if required. In this paper we set m0 = 16. Relying on Proposition 3 and the performed experiments we claim that no eﬀective adversaries A to gain an advantage in solving DDH problem for LRMPF exist when m > 16. Furthermore, due to the signiﬁcant part of the deviation from the uniform distribution deﬁned by (21) for a single entry, the negligible probability in (22) can be evaluated as: negl(m) = 2−m

2

(m+3)

,

(23)

since there are m2 entries in the LRMPF value matrix. Resistance against DDH attack implies resistance to computational DiﬃeHellman (CDH) attack [3]. We illustrate this postulate by investigating resistance against linearization attack using digital simulation.

6

Resistance of the Proposed KEP Against Computational Attack

Throughout this section we assume that an adversary possesses at least one of the protocol parties (Alice’s or Bob’s or both) public session parameter i.e. matrix A or B. The goal of the attacker is to recover (or at the very least ﬁnd a collision) the private key of either party of the proposed KEP. Without loss of generality we focus on Alice’s public session parameter A and wish to recover secret matrices X and Y. In this case, the purpose of an adversary is to solve the following system of equations with respect to X and Y given that the rest of the information is publicly known: A=

X

W

Y

,

(24)

where matrices X and Y are L and R, in a linear span

of powers of matrices i.e. X ∈ Span L, L2 , . . . Lm and Y ∈ Span I, R, R2 , . . . Rm and matrices L

KEP Based on MPF Over IM16

527

and R satisfy predeﬁned Templates 1 and 2 respectively. It is important to note that an adversary cannot ignore or modify these templates since otherwise the protocol falls apart due to failure of properties (19) and (20) as presented in the examples above. Hence he has to play by the rules described in the presented templates. Note, that reduction of powers of generator a is performed modulo 8. Hence relying on the deﬁnitions of basic operations in IM16 given by (2), (5) and (6) we consider the following matrix equation: S ≡ (XTY + 4Z) mod 8,

(25)

where an adversary has to determine the values of unknown matrices X, Y and Z given that the rest of the information is publicly known. Here linear terms are used to take into consideration the non-commutativity of IM16 . Matrices T and S consist of powers of generator a in matrices W and A, respectively. Equation (25) can clearly be simpliﬁed modulo 2. For this reason, we focus on the following reduced matrix equation: S ≡ XTY mod 2,

(26)

where matrices X and Y are as described above for Eq. (24) whereas matrices T and S are deﬁned over the ﬁeld ZZ 2 . Note, that due to Templates 1 and 2 matrices X and Y are singular modulo 2. This comes from the requirement of Template 1, which means, that xi1 = xim in the ﬁeld ZZ 2 and hence matrices X and U contain a pair of identical columns. Furthermore, the requirement of Template 2 means that a zero row is present in matrices Y and V when reduced modulo 2. As stated previously, any usage of invertible matrices disrupts the execution of the protocol, which was previously shown in the examples above. Due to this result matrices X and Y are singular modulo 2 and hence no inverse matrices X−1 and Y−1 modulo 8 can be found. This makes it impossible to linearize the non-linear system (26) by multiplying each equation of it by an inverse matrix, i.e. the following transformations (27) and (28) cannot be computed: X−1 S ≡ TY mod 2;

(27)

SY−1 ≡ XT mod 2.

(28)

Since transformations (27) and (28) are not possible, we have to consider Eq. (26) as is. Keeping in mind the polynomial structure of private matrices X and Y we rewrite the latter equation as follows: S = (α11 L + . . . + α1m Lm ) T (α21 R + . . . + α2m Rm ) .

(29)

We can now expand each of these expressions to get explicit sums of matrices. As a result of these transformations we obtain a system of multivariate quadratic (MQ) equations with respect to unknowns α11 , . . . , α1m , α21 , . . . , α2m .

528

A. Mihalkovich et al.

This problem was shown to be hard if the coeﬃcients of these equations are chosen at random [13]. Despite the fact, that this is not the case, we think, that, at the very least, it is evidence that the Eq. (26) is hard to solve. In this paper we consider the linearization technique of solving such equations. The essence of this approach is the introduction of new temporary variables γk = α1i α2j , where i = 1, 2, . . . , m, j = 1, 2, . . . , m and k = (i − 1)m + j. Hence there are m2 new unknowns γk , where k = 1, 2, . . . , m2 . Using these unknowns the Eq. (29) is transformed to the following form: Nγ = s,

(30)

where N is an m × m linearization matrix obtained from coeﬃcients of MQ equations and s is a column matrix obtained from matrix S by vertically binding columns, i.e. T

s = s11 · · · s1m s21 · · · s2m · · · smm . 2

2

Equation (30) can be solved using Gauss elimination thus ﬁnding values of the vector of temporary unknowns γ . Due to this fact recovery of the initial vector α is a matter of time. The main remaining question is the defect of the linearization matrix N, which determines the number of free variables. An important fact to note is that none of the matrices L, R and T have a full rank of m due to Templates 1 and 2 and the structure (16) of matrix W which, of course, deﬁnes the structure of T. Through experiments we were able to determine, that the maximum possible value of the rank of matrix N is (m − 1)2 . Though the proof of this fact is an open question so far, relying on this result we claim that the eﬀectiveness of linearization technique is no better than brute force approach, i.e. total scan of possible values of coeﬃcients α11 , . . . , α1m , α21 , . . . , α2m . The only case, when the linearization method has an advantage over total scan is the maximum rank of N. However, this is fairly simply avoidable. For more clarity let us remark on the performed experiments. We randomly generated a triplet of public parameters i.e. matrices L, R, W which satisfy the predeﬁned above conditions, i.e. Templates 1 and 2 and (16). Using MATLAB 2016a software we calculated all powers of matrices L and R up to the m-th one and hence obtained linearization matrix N. We used the standard MATLAB function gfrank() to calculate the rank of matrix N. Experiments were performed with values of m from 5 to 16 and 1000 times for each value. As an example, we present results obtained with m = 16: – The maximum value of rank(N) was 225 = 152 . This value was calculated 61 times out of total 1000 experiments; – The minimum value of rank(N) was 27; – In total there were 110 distinct values of rank(N) obtained; – The rounded average value of rank(N) was 189 indicating that the average tends to a maximum value rather than to a minimum value. Due to results obtained in this section and performed digital simulations, we deﬁne the following computational attack game:

KEP Based on MPF Over IM16

529

Computational Diﬃe-Hellman Attack Game for LRMPF KEP Let L, R, W be valid public parameters of the KEP presented in Sect. 4 chosen = (β11 , . . . , β2m ) be two secret vectors at random. Let α = (α11 , . . . , α2m ) and β of coeﬃcients chosen at random used to calculate private keys of both parties

Y

V X, Y, U and V. Let A = X W and B = U W be public session parameters

Y V of KEP. Assume that the shared key is K = U X W . For a given polynomial-time adversary A we deﬁne the following experiment: 1. A acquires public session parameters of both parties A and B. 2. Using computations A obtains a vector of coeﬃcients δ11 , . . . , δ2m and calculates matrices: Ψ = δ11 L + δ12 L2 + . . . + δ1m Lm ; Υ = δ21 R + δ22 R2 + . . . + δ2m Rm . 3. The adversary A computes matrices K1 = Ψ AΥ and K2 = Ψ BΥ . The outcome of the experiment is deﬁned to be 1 if K1 = K or K2 = K. Otherwise the outcome is deﬁned to be 0. Denote by E [A, B, K] an event that for an adversary A the outcome of the deﬁned experiment is 1. His advantage in solving the CDH problem for LRMPF KEP is CDHadv[A, LRM P F ] = P r [E [A, B, K] = 1] . We say that the CDH assumption holds for LRMPF if for all eﬃcient adversaries A the quantity CDHadv[A, LRM P F ] is negligible when security parameter m > m0 . Since CDHadv[A, LRM P F ] depends on m, then CDHadv[A, LRM P F ] = P r [E [A, B, K] = 1] < negl(m). Relying on the results of the latter two sections and the result by Boneh and Shoup presented in [3] we claim that no eﬀective adversaries A to gain an advantage in solving the CDH problem for LRMPF exist.

7

Conclusions

In this paper we used a non-commutative group IM16 as a platform of MPF. Investigation of its basic properties showed, that in general MPF is not associative in this case, and hence the order of actions has to be taken into the consideration. Due to this fact, we deﬁned the left-to-right MPF and right-toleft MPF functions which together with specially selected templates allow us to construct KEP in this non-commuting case. Taking into account that secret pairs of power matrices (X, Y) and (U, V) must be commuting we chose them in polynomial form and proved that this structure of the private key matrices can be used to preserve the validity of the templates.

530

A. Mihalkovich et al.

Since templates of power matrices are singular the additional security is guaranteed. Adversary ignoring templates will cause the failure of execution of the presented KEP. The security of the proposed protocol is based on the decisional DiﬃeHellman (DDH) assumption adopted to our MPF. We have also proven that the distribution of a single entry of the public session parameter as well as the shared key matrices tends to uniform in a subset a . Furthermore, using experiments graphically presented in Figs. 1 - 3 we demonstrated that the uniform distribution of the single entry is achieved with a fairly small deviation when m > 16. A polynomial-time adversary has a negligible probability to win the DDH security game when m > 16. The asymptotic probability is deﬁned by (23). This implies that our protocol is not vulnerable to computation DiﬃeHellman (CDH) attack as well. To demonstrate that we presented the evidence of CDH security by numerical simulation of linearization attack and showed that it is infeasible. Since power matrices have to satisfy speciﬁed templates, then the linearization approach to break the presented protocol is no more eﬃcient than the total scan.

References ´ 1. Alvarez, R., Tortosa, L., Vicent, J., Zamora, A.: A Non-abelian group based on block upper triangular matrices with cryptographic applications. In: Bras-Amor´ os, M., Høholdt, T. (eds.) AAECC 2009. LNCS, vol. 5527, pp. 117–126. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02181-7 13 2. Anshel, I., Anshel, M., Goldfeld, D.: An algebraic method for public-key cryptography. Math. Res. Lett. 6(3), 287–291 (1999) 3. Boneh, D., Shoup, V.: A graduate course in applied cryptography. 2020. Version 0.5 (2020) 4. Grundman, H., Smith, T.: Automatic realizability of galois groups of order 16. Proc. Am. Math. Soc. 124(9), 2631–2640 (1996) 5. Klingler, L.C., Magliveras, S.S., Richman, F., Sramka, M.: Discrete logarithms for ﬁnite groups. Computing 85(1–2), 3 (2009) 6. Ko, K.H., Lee, S.J., Cheon, J.H., Han, J.W., Kang, J., Park, C.: New public-key cryptosystem using braid groups. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 166–183. Springer, Heidelberg (2000). https://doi.org/10.1007/3540-44598-6 10 7. Lanel, G., Jinasena T., Welihinda B.: A survey of public-key cryptography over non-abelian groups. IJCSNS 21(4), 289 (2021) 8. Liu, J., Zhang, H., Jia, J.: A linear algebra attack on the non-commuting cryptography class based on matrix power function. In: Chen, K., Lin, D., Yung, M. (eds.) Inscrypt 2016. LNCS, vol. 10143, pp. 343–354. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-54705-3 21 9. Mihalkovich, A.: On the associativity property of mpf over m16

KEP Based on MPF Over IM16

531

10. Mihalkovich, A., Levinskas, M.: Investigation of matrix power asymmetric cipher resistant to linear algebra attack. In: Damaˇseviˇcius, R., Vasiljevien˙e, G. (eds.) ICIST 2019. CCIS, vol. 1078, pp. 197–208. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-30275-7 16 11. Mihalkovich, A., Sakalauskas, E.: Asymmetric cipher based on MPF and its security parameters evaluation. Proc. Lithuanian Math. Soc. Ser. A 53, 72–77 (2012) 12. Mihalkovich, A., Sakalauskas, E., Venckauskas, A.: New asymmetric cipher based on matrix power function and its implementation in microprocessors eﬃciency investigation. Elektronika ir Elektrotechnika 19(10), 119–122 (2013) 13. Patarin, J., Goubin, L.: Trapdoor one-way permutations and multivariate polynomials. In: Han, Y., Okamoto, T., Qing, S. (eds.) ICICS 1997. LNCS, vol. 1334, pp. 356–368. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0028491 14. Sakalauskas, E.: Enhanced matrix power function for cryptographic primitive construction. Symmetry 10(2), 43 (2018) 15. Sakalauskas, E., Luksys, K.: Matrix power function and its application to block cipher s-box construction. Int. J. Inn. Comp., Inf. Contr. 8(4), 2655–2664 (2012) 16. Sakalauskas, E., Mihalkovich, A.: New asymmetric cipher of non-commuting cryptography class based on matrix power function. Informatica 25(2), 283–298 (2014) 17. Sakalauskas, E., Mihalkovich, A.: Improved asymmetric cipher based on matrix power function resistant to linear algebra attack. Informatica 28(3), 517–524 (2017) 18. Sakalauskas, E., Mihalkovich, A.: MPF problem over modiﬁed medial semigroup is np-complete. Symmetry 10(11), 571 (2018) 19. Sakalauskas, E., Listopadskis, N., Tvarijonas, P.: Key agreement protocol (KAP) based on matrix power function (2008) 20. Shpilrain, V., Ushakov, A.: The conjugacy search problem in public key cryptography: unnecessary and insuﬃcient. Appl. Algebra Eng. Commun. Comput. 17(3–4), 285–289 (2006) 21. Stickel, E.: A new public-key cryptosystem in non abelian groups. Proceedings of the Thirteenth International Conference on Information Systems Development. Vilnius Technika: 70–80, Vilnius (2004)

Design and Analysis of Pre-formed ReRAM-Based PUF Taylor Wilson1(B) , Bertrand Cambou1 , Brit Riggs1 , Ian Burke1 , Julie Heynssens1 , and Sung-Hyun Jo2 1 Northern Arizona University, Flagstaff, AZ 86011, USA

[email protected]

2 Crossbar Inc., Santa Clara, CA, USA

https://in.nau.edu/cybersecurity

Abstract. We present a Resistive Random Access Memory based Physical Unclonable Function design that gives near-ideal characteristics with high reliability when operating in extreme temperature conditions. By injecting the cells with electric currents, the resistances are much lower than they are in the pristine state and significantly vary cell-to-cell. This property can be exploited to design cryptographic key generators and create quasi-infinite possible digital fingerprints for the same array. The physical unclonable functions operate at low power, in a range that does not disturb the cells; unlike what is done by forming permanently conductive filaments, and the SET/RESET program/erase processes, this design does not modify permanently the resistance of each cell. The novelty of this architecture is to exploit the physical properties of this memory technology by forming gentle ephemeral conductive paths. We evaluate the proposed device’s performance by various stress tests on 1 kb–180 nm ReRAM Technology. Keywords: Physical unclonable functions · Resistive Random Access Memory · Quasi-infinite digital fingerprints · Hardware security · Low power

1 Introduction Resistive Random Access Memory (ReRAM) is a strong candidate for leading nonvolatile memory solutions due to its simple fabrications of a metal-insulator-metal (MIM) structure, low power consumption, high scalability, and offers CMOS integration [1]. These features are desirable for next-generation computing but operating the technology at very low power, where the memristors (memory + resistor) are not retaining information, becomes unclonable, and makes for an attractive Physical Unclonable Function (PUF) design [2–8]. PUFs are low-cost security primitives used for authentication and secret key generation and can often be described as digital fingerprints. PUF architectures leverage the uncontrolled process variations to derive “secrets” that are unique to the chip and the underlying mechanism is based on Challenge-Response Pairs (CRPs). A challenge is an input or set of instructions given from the server to the client’s device containing the PUF. A response is an output generated from a specific challenge and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 532–549, 2022. https://doi.org/10.1007/978-3-031-10467-1_33

Design and Analysis of Pre-formed ReRAM-Based PUF

533

CRPs must match to authenticate a client’s device. An overview of the PUF design follows; By sourcing small electric currents to the ReRAM technology, on the scale of nano-Amperes, a ReRAM cell creates ephemeral conductive paths that return unique resistances [9]. The cell to cell resistance variations exhibits strong non-uniformity and cells repeatedly queried to the same injected currents give almost the same resistances over a period. Here, operating at ultra-low currents, the electric fields are too weak to push ions into the dielectric to create permanent filaments which typically occur around 2–4 V. Our PUF design operates in a voltage potential range of 100–700 mV and no obvious damage to the cells has been found if deduced by analyzing the data. For memory applications and PUFs based on resistive switching, deterioration within the dielectric has been observed after a certain number of resistive switching cycles and this makes the technology unreliable [10–14]. Due to cell deterioration, a ReRAM-based PUF built on resistive switching may have their response drift thus producing an incorrect digital fingerprint over time. This unwanted effect makes the architecture more expensive both in terms of power and latency to address and correct the errors by implementing error-correcting schemes such as error correction codes (ECC) [15] or response-based cryptography (RBC) [16]. In this paper, we introduce a new methodology for ReRAM-based PUFs and evaluate its performance by various stress tests. The organization of the paper follows. Section 2 provides background information to understand the basis of the work. Starting with PUFs, which can be viewed as digital fingerprints to authenticate a client device and generate secret keys within, without assigning keys in digital memory. Next, the ReRAM technology that is used to build the PUFs, which is a three-layer cell with the middle layer being composed of an insulating material that restricts the flow of current, the output is resistance measured in Ohms (). Section 3 presents the experimental setup that describes the test chip and device layout, followed by the electrical equipment used to characterize the devices, then explains the implementation of the proposed PUF design. Section 4 gives PUF analysis. PUF analysis includes quantizing how the responses vary; at different temperature conditions, through consecutive current sweeps, for longer read periods of current injections, and against each other. Section 5 covers conduction mechanisms, or electron transport models, through dielectrics. The first model covered is Poole-Frenkel (P-F) Emissions, next is Ohmic Conduction (O-C), then Space Charge Limited Conduction (SCLC), and the last model covered is the Butler-Volmer (B-V) Equation. Section 6 concludes the paper.

2 Background Information 2.1 Physical Unclonable Functions In hardware security modules, traditional security primitives store secret digital keys in nonvolatile memory like FLASH or EEPROM (Electrically Erasable Programmable Read-Only Memory) with additional layers of protection making the security bulky and expensive [17, 18], whereas PUF configurations do not require the need to store

534

T. Wilson et al.

secret keys in some sort of nonvolatile memory. PUF architectures exploit intrinsic manufacturing process variations to derive “secrets” that are unique to the chip and are used as a source to securely authenticate integrated circuits (ICs). Even with the full knowledge of the chip’s design, PUFs leverage this uncontrolled variability so that there can be no identical chips of the same process [19]. In a secure environment, an initial step referred to as “enrollment,” is executed to characterize the PUF to generate CRPs. Here, the client’s PUF receives a challenge or set(s) of instructions given from the server to characterize the PUF. The client generates responses from the challenge(s), and this information, usually in the form of a binary stream, is downloaded into server memory. The stored CRPs with known PUF addresses can be referenced for future authentication and key generation cycles between server and client. Due to repetitive use, aging, and changing environments that can introduce response variation like drift, responses can vary over time and introduce potential bit errors for key generation cycles. This involves the implementations of error correction codes (ECC) and response-based-cryptography (RBC) which increases latency and power usage. Therefore, it is important to understand these effects and to minimize them for a reliable ReRAM-based PUF. 2.2 Resistive Random Access Memory Technology In metal ion-based ReRAM also known as electrochemical metallization memory (ECM), a single cell is composed of three layers: (1) an active top electrode (TE) usually Copper (Cu), Silver (Ag), or Aluminum (Al), that is susceptible to releasing positive ions (cations), like (Cu2+ ), (Ag+ ), or (Al3+ ) depending on the active electrode material, (2) an insulating material such as a metal oxide where the electrons are tightly bonded, and (3) a passive bottom electrode (BE) which is usually Tungsten (W), Titanium Nitride (TiN) or Platinum (Pt) [20]. This three-layer stack is illustrated in Fig. 1(a), and the thickness is several nanometers.

Fig. 1. (a) Cross-sectional view of a basic ReRAM cell in a crossbar Array. (b) SET and RESET operations.

When the memristor is subject to an applied positive voltage in the positive direction, metal cations migrate into the insulating material and this phenomenon creates a metal

Design and Analysis of Pre-formed ReRAM-Based PUF

535

conductive filament (CF). For memory applications, creating a CF abruptly drops the resistance from a high resistive state (HRS) to a low resistive state (LRS) and this mechanism can introduce binary logic (‘1’/’0’). Typically, HRS is on the magnitude of 104 or greater, whereas LRS usually falls below 103 . A transition from HRS to LRS is referred to as the SET operation, and vice versa gives a RESET operation, illustrated in Fig. 1(b). Currently, one of the issues the ReRAM is facing is endurance switching issues. During the switching transition, a soft breakdown of the dielectric occurs allowing the cations to migrate. When removing the CF by an applied negative voltage, residual metal atoms remain in the insulating material, and in return, brings the HRS range closer to the LRS range for a smaller read margin making it difficult to distinguish the two separate states. A non-intrusive approach can be taken by injecting small currents with their low electric fields that do not influence cation migration. Applying a current sweep from 100 nA to 8 uA in 30 nA increments can be seen in Fig. 2 for nine devices. As the current increases and the resistances decrease, the cells show durability up to 200 ks and when retested for the same currents, it gives back the same resistances. A case for cell damage would be viewed as resistances abruptly dropping further below 10 ks and when retested for the same currents, the resistances do not return to their initial reads.

Fig. 2. 100 nA–8 uA current sweeps applied to nine devices in 30 nA increments.

536

T. Wilson et al.

Our research team hypothesizes that sweeping higher currents to the devices, which equate to higher voltages across the ReRAM cell, leads to electric fields large enough to form non-negligible conducting filaments in the switching medium which is essentially a SET process. The SET process abruptly drops the pristine resistances to values 1 M at 100 nA with relatively large variations.

Fig. 3. (a) Overview of the test chip. (b) Water-level view

Design and Analysis of Pre-formed ReRAM-Based PUF

537

3.2 Device Layout On the test chip, to prevent “sneaky paths” which is the unwanted current flow to surrounding cells not being addressed, the devices were arranged as shown in Fig. 4a. There are 20 probe pads per row, with 18 pads holding individual devices and the remaining two pads serve as a ground terminal. To measure a given device, a read current was applied to the corresponding top electrode and the nearest ground pad was activated to complete the circuit. For example, the first device on the left would have its device grounded to the ground pad on the left. When a particular device is being read, all other devices are left floating as an open circuit so only one device is being addressed when testing.

Fig. 4. (a) Row structure of individual devices. (b) Cross-sectional view of ReRAM cell by TEM analysis.

Figure 4b shows the cross-sectional view of the ReRAM technology used in this study by TEM analysis, the stack composition from TE to BE is Al/AlOx/W. The TE and switching layer (SL) materials were deposited by atomic layer deposition (ALD), and the BE by sputtering. TE Cap 1 is composed of TiN material, which serves to prevent inhomogeneous material (other than Al) to enter the SL, such as the wiring metal and probe contact when applying electrical tests to the devices. A sidewall passivation layer (AlOx) was also implemented to improve device stability by protecting the outer edges of TE and SL. The TE and switching layer, Al/AlOx, were selected materials to provide

538

T. Wilson et al.

additional protection against invasive attacks as Al cations cannot be readily identified if introduced into the SL. 3.3 Electrical Equipment Keysight’s B1500A Semiconductor Analyzer with a High Resolution Source Measure Unit (HRSMU) card was used to execute electrical tests on the ReRAM devices. A probe card (tip size: 1.5 mils and tip length: 7 mils) was interfaced to the HRSMU with Triax BNC connectors. Within a probe station, the wafer sits in a chamber on a groundedvacuum chuck which is also elevated from the earth for precise probe measurements. For temperature testing at 0 °C, using the Model C1000 Cooling Module, the chamber is sealed off from ambient temperatures and the chuck is cooled by a coolant system. To prevent condensation build-up on the wafer surface, the enclosed chamber cycles compressed nitrogen through the system. The compressed nitrogen gives a low dew point preventing water build-up on the surface, which protects accidental shorts between the probes. For testing at 80 °C, using the Model H1001 Heat Control Module, the chuck serves as a heating element and is increased to high temperatures. Both temperature conditions can be adjusted accordingly by the Micromanipulator temperature controller. Tests at 23 °C are conducted in ambient temperatures. 3.4 Pre-formed ReRAM-Based PUF Design The design of the ReRAM-based PUF is unconventional, as it operates in a pristine (unformed) state with no additional changes or extra hardware required to the ReRAM technology. Any attempt to read the cells in the conventional approach will destroy the PUFs, yet measurements show that while working in the pristine state at low power, the cells prove robust, if measured correctly. Conventional approaches when working with ReRAM for memory and existing PUF applications are to source voltages to program the cells, however, sourcing small voltages at low power does not reproduce the same responses over time as sourcing a small current through the cells. When applying consecutive voltage sweeps to the devices, the responses have larger deviations to their initial responses, unlike applying current sweeps. Therefore, the PUF is built by sourcing defined currents through the cells with varying voltage potentials. Table 1 shows experimental data for forcing a current at 110 nA to four random cells. Here, the cells output unique resistances and due to process defects, each cell is naturally different from one another. Table 1. Device outputs subject to 110 nA injections. Cell #

Forced current (nA)

Resistance (M)

Cell 1

110

1.995

Cell 2

110

1.616

Cell 3

110

0.939

Cell 4

110

1.689

Design and Analysis of Pre-formed ReRAM-Based PUF

539

To keep the computing power at a minimum and reduce drifting effects, we can define the operating region from 50 to 800 nA with voltage compliance of 1.8 V. Experimental tests show that, while operating under 800 nA, it gives the best stability for the PUFs. For the PUF protocol, current injections, i.e., applied read currents, are taken in 50 nA increments at a pulse width of 100 us per current, which generates 16 different responses for a single memristor. The approximate max energy per reading is 56 pJ/bit. Due to the cell’s stochastic nature at such low currents, the cell’s resistance cannot be readily predicted, as shown in Fig. 5. For example, cell #112, at 50 nA, is ranked fourth from the top but ends with the second-highest resistance at 800 nA. Another example is cell #35, at 50 nA, it is ranked second from the bottom but ends at the fourth rank from the bottom by 800 nA, the devices shown in Fig. 5 are representative of the PUF technology.

Fig. 5. Resistance variations over increasing currents for several ReRAM devices (tested at room temperature).

During enrollment, a one-time full characterization of the ReRAM array will be executed. In this case, all devices will be read for 16 different injected currents for a total of 50 read-cycles per current and the median of the 50-responses will be downloaded into server memory. In the actual cryptographic protocol, the challenge from the server to the client PUF is to find the random addresses of the cells it was instructed to address and generate the responses for a given current. Key generation methodologies and their algorithms are outside the scope of this paper.

540

T. Wilson et al.

4 PUF Analysis 4.1 Temperature Testing The cells were also tested in 0 °C and 80 °C temperature conditions, cells representing the population are shown in Fig. 6(a) and (b), respectively. Over the increasing currents, the cells show excellent durability when operating in extreme conditions with no signs of cell damage. Recall, cell damage has occurred if the resistances continue to abruptly drop below the 10 k range, acting as a SET process, and do not return to their initial high resistances. For the devices, a factor to account for is that their resistances drift from their initial state when tested in different temperature conditions. This effect is seen in Fig. 6(c) for one device, notice the resistance is lower at higher temperatures and vice

Fig. 6. (a)–(b) Devices tested under 0 °C and 80 °C temperature conditions, respectively. (c) 1-device’s resistance is an example of resistances drifting differently for different temperatures (thermal effect). (d) 25 responses, tested for 100 nA, drifting nearly the same for three different temperatures.

Design and Analysis of Pre-formed ReRAM-Based PUF

541

versa for colder temperatures. The reason for this effect is due to more energy introduced into the system as the temperature increases, the atoms become more excited allowing the current to flow easier, whereas decreasing the temperature has the opposite effect for insulator properties. Figure 6(d) shows a small population of memristors appearing to drift the same at 100 nA for three different temperatures. Here, each device was read at three different temperatures and the expected outcome is that all responses stay true to the thermal effect. This is important because, if all cells drift the same then the bit error will be low when the PUF is operating in different temperature conditions. Future work will be to understand the bit error rates (BERs) for key generations across different temperatures. 4.2 Reliability Responses need to be stable over multiple read cycles and to explore PUF reliability; the population’s intra-cell variation is computed at three different temperature conditions at 0 °C, 23 °C, and 80 °C shown in Fig. 7(a)–(c) respectively. Intra-cell variation defines how a single cell’s responses vary when repeatedly queried to the same challenge, in this case, applying 50 consecutive current injection cycles at 100 nA. Ideally, the variation is expected to be 0% but due to systematic noise and other parasitic effects, some variation will be introduced. Equation (1) is used to calculate a cell’s intra-variation percentage

Fig. 7. (a)–(c) Intra-cell variation for 1,106 responses, tested at 100 nA, in 0 °C (Blue), 23 °C (Green), and 80 °C (Red) Conditions. For each Temperature, the average intra-cell variation percentage for the population is given by a line.

542

T. Wilson et al.

and was applied to 1,106 cells individually. Referring to Fig. 7, the cells show high reliability as the majority of the responses reside below the 1% threshold at all given temperatures. intra-cell variation(%) =

STDEV(r50 ) ∗ 100% median(r50 )

(1)

4.3 Stability Random devices across the wafer were tested at 100 and 800 nA current injections for 10 million read-cycles taken in 100 us intervals for an approximate run time of 17 min per device. Figure 8 shows resistance samples extracted in 10x increments plotted on the double-logarithmic scale. The devices prove to be robust overextended read periods as there appears to be no significant response drift or cell damage. This property is highly desirable as it improves the reliability of the PUF.

Fig. 8. Devices tested for stability at 100 nA (Blue) and 800 nA (light blue) current injections for 10 million read-cycles.

Design and Analysis of Pre-formed ReRAM-Based PUF

543

4.4 Randomness A response mapping of a PUF, that was tested at room temperature for 100 nA current injections, can be seen in Fig. 9(a). The responses range from 1 to 3 Ms and show excellent cell-to-cell variations. For key generation applications, the median of the population’s resistance can be calculated (1.83 Ms for the PUF shown in Fig. 9a) and serve as a binary threshold, where the subpopulation above the median can be defined as “1” and the bottom subpopulation defined as “0” resulting in a ~50% chance of selecting either a 1 or 0 bit. Responses near the binary threshold can potentially introduce bit errors by flipping from their initial state to the opposite state. To minimize this effect, a third state called the ternary state (“X”) can be implemented, which is illustrated as a pair of red lines acting as a boundary in Fig. 9(a). This state can be defined as a small percentage range below and above the median (e.g., ±2%) and the cells within this range will be masked, or not used for key generation cycles. The larger the bounds of the ternary range will result in lower bit error rates due to selecting fewer erratic cells near the binary threshold. Therefore, cells away from the median will only be used for key generation cycles giving highly reliable secret keys and reducing the need for ECCs. Recall from Sect. 3.4, the cell’s responses from 100 to 800 nA current injections cannot be readily predicted as they do not drift the same, with this effect, each PUF mapping at increased current injections will be independent of each other therefore giving extreme randomness.

Fig. 9. (a) Scatter plot of 1024-responses subject to 100 nA’s for a PUF 1. (b) A comparison of two different pufs and their responses also subject to 100 nA current injections.

544

T. Wilson et al.

A comparison of two different PUFs, or dies, that is from the same process size can be seen in Fig. 9(b) under 100 nA current injections. The responses exhibit high entropy. Notice the population’s median of PUF 2 is slightly lower than the population’s median of PUF 1, giving a newly defined range of 1 and 0 bits for PUF 2’s devices. This generates a completely new digital fingerprint.

5 Conduction Mechanisms Several models regarding conduction mechanisms through dielectrics have been proposed in the literature that describes electron transport. In addition, the given conduction mechanisms models can simulate experimental data by fitting the data on transformation plots and solving for unknown parameters. We fit four common models such as Poole-Frenkel (P-F) Emissions, Ohmic Conduction (O-C), Space Charge Limited Conduction (SCLC), and simulate the Butler-Volmer (B-V) Equation in this study. P-F, O-C, and SCLC are classified as bulk-limited conduction mechanisms, where the conduction depends on the bulk properties and electrons have the potential to emit from traps by thermal activation. B-V is modeled for ion conduction, although ion migration is repressed for the PUF design, the experimental data gives a decent fit against the model and is considered in this study. Exploring theoretical aspects of the PUFs is important as it increases our understanding of how the PUF design works but doing so, also allows an optimized approach for the PUF design. By determining the conduction mechanism for the PUFs, their model(s) can be used to implement a “secret model” that emulates the PUF and its CRP behavior, otherwise known as model-based PUFs [19]. By implementing this approach, drastically reduces the overall processing time and power consumption of the PUF design. The P-F and B-V models are briefly covered for the model-based PUF concept. 5.1 Poole-Frenkel (P-F) Emissions P-F Emissions [21, 22] involves the thermal excitation of electrons to potentially emit from traps into the conduction band of the dielectric by an applied electric field. The P-F current density equation is: −q(φT − qE/i 0 ) (2) J = qμNC Eexp kT For a fixed device size and material composition, most of the parameters in (2) are constants. The dependence of the electrical resistance with respect to voltage will follow an exponential curve described by: √ m0 V (3) R = R0 exp T where the parameters: R0 , m0 , V, and T are extrapolated by model prediction algorithms. If P-F is the dominant conduction mechanism, then a transformation plot of the log of

Design and Analysis of Pre-formed ReRAM-Based PUF

545

resistances versus the square root of voltages will be linear. The experimental data, shown in Fig. 10, display linearity for several devices from 0.4 V and greater with an R2 value of 0.99, thus supporting P-F emissions. The P-F model defined as (3) can develop a compact way of storing the PUF information on a server. Since each cell’s resistance over increasing currents is independent of each other and does not drift the same, the parameters in (3) will also be independent of each cell. Equation (3) will predict how the cell’s voltage and resistance will change as the applied current and temperature change. Therefore, the information can be downloaded into server memory to develop a model-based PUF.

Fig. 10. Transformation plot of the natural log of resistances versus the square root of voltages.

5.2 Ohmic Conduction (O-C) O-C is based on the premise that thermal activation may cause a small number of carriers to become mobile in the conduction band and holes in the valence band [21]. This conduction mechanism occurs at relatively low voltages and a linear relationship exists between the current and voltage (I ∝ V), otherwise known as Ohm’s Law, where J = σE; J is the current density, σ is the resistivity, and E is the electric field. A double-logarithmic transformation plot for the I-V data can be executed to identify the linearity between the two variables, this plot is shown in Fig. 11. The experimental data from 0.02 V to 0.31 V appear to be linear with an approximate slope of 1 thus supporting O-C, however, the linearity begins to stray as the voltage increases. The two vertical black lines give the PUF’s operating voltage ranges. Here, O-C partially fits the range, but the fit becomes disproportional with increasing voltage.

546

T. Wilson et al.

Fig. 11. Double logarithmic transformation plot for I–V experimental data.

5.3 Space Charge Limited Conduction (SCLC) SCLC [23–25] occurs when all traps are filled, a charge within the dielectric builds up and the voltage across the cell is no longer constant thus moving away from the Ohmic [linear] region into a Region 2, where the current is proportional to the square of the voltage (I ∝ V2 ). Furthermore, the traps get saturated, and the Fermi level moves closer to the conduction band allowing the number of free electrons to greatly increase thus ramping up the current. This region is also referred to as Child’s Law [21, 25]. SCLC can be identified on a double logarithmic plot like identifying the O-C region. It is an extended region to O-C but occurs on a quadratic scale. The plot for SCLC can be seen in Fig. 12. Reference [23] reports their Ohmic region to be less than 0.35 V which roughly follows the PUF’s experimental data then the data transitions into Region 2 (Child’s Law) which occurs for voltages greater than 0.35V before entering a new region 3 (not shown). Child’s law was fitted against the data, and it appears that it does not fit well at relatively low voltages thus SCLC is not the supported conduction mechanism for the PUF.

Design and Analysis of Pre-formed ReRAM-Based PUF

547

Fig. 12. Ohmic conduction and SCLC (Child’s Law) plotted against experimental data (black dots).

5.4 Simulated Butler-Volmer Equation The experimental data of the PUF’s I-V curves closely follows that of the B-V curve [26], illustrated in Fig. 13. The simulated B-V equation can be expressed as (4). Model prediction and parameter extrapolation for Io , Ao , and Bo can be executed for each device and the parameters will be independent of each device allowing the development of a model-based PUF. However, the model appears to fit poorly at the tail ends of the data. AoV −BoV − exp (4) I = I0 exp kT kT

Fig. 13. Experimental data (circles) fitted against Butler-Volmer equation (dashes)

548

T. Wilson et al.

6 Conclusion and Future Work In this paper, we introduce a new methodology for ReRAM-based PUFs. The design exploits the cell to cell resistance variations in the pre-forming range and proves high reliability while operating under different temperature conditions. The PUF operates in a current range from 50 to 800 nA and gives near-ideal PUF characteristics when generating responses for randomness with quasi-infinite possible digital fingerprints, reliability measured through intra-cell variation with the majority of the cell population below 1% at 0 °C, 23 °C, and 80 °C conditions (ideally 0%), with no significant drifting that can be seen over 10M read-cycles for 100 and 800 nA current injections. However, due to a small population size of only 1,106 devices reported in this study, the results can be declared statistically invalid as we need thousands of more devices to be tested to solidify our analysis. Also discussed were four common models used to describe electron transport in dielectrics. After transformation analysis, P-F emissions appear to be the dominant conduction mechanism for the ReRAM PUF with an R2 value of 0.99 (Goodness of fit). Our future work involves a variety of new studies, such as understanding why resistances abruptly drop to the ~5 k range for weak cells, while subject to higher current sweeps. This could be due to hot electrons forming a permanent conduction path in the dielectric material, but further analysis is needed to confirm this effect. The experimental data presented in this study will be extended to quantizing the bit error rates (BERs) for 256-bit key generations at different temperatures. Future work is also to implement a model-based PUF using the P-F emissions model, where this approach will reduce the overall processing time and power consumption without compromising any of the PUF design’s unique features to an attacker. Lastly, in addition to electrically characterizing more devices, the devices will be read in the military-grade operating temperatures (i.e., −20 °C to 125 °C) to further explore robustness.

References 1. Zahoor, F., Azni Zulkifli, T.Z., Khanday, F.A.: Resistive random access memory (RRAM): an overview of materials, switching mechanism, performance, multilevel cell (MLC) storage, modeling, and applications. Nanoscale Res. Lett. 15, 90 (2020). 2. Zhu, Y., Cambou, B., Hely, D., Assiri, S.: Extended protocol using keyless encryption based on memristors. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1230, pp. 494–510. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52243-8_36 3. Gao, Y., Ranasinghe, D.C.: R$ˆ3PUF: a highly reliable memristive device based reconfigurable PUF. arXiv.org, 24 February 2017 4. Govindaraj, R., Ghosh, S., Katkoori, S.: Design, analysis and application of embedded resistive RAM based strong arbiter PUF. In: IEEE Transactions on Dependable and Secure Computing, vol. 17, no. 6, pp. 1232–1242, 1 November–December 2020 5. Liu, R., Wu, H., Pang, Y., Qian, H., Yu, S.: Experimental characterization of physical unclonable function based on 1 kb resistive random access memory arrays. IEEE Electron. Dev. Lett. 36(12), 1380–1383 (2015) 6. Chen, A.: Reconfigurable physical unclonable function based on probabilistic switching of RRAM. Institution of Engineering and Technology, 1 April 2015

Design and Analysis of Pre-formed ReRAM-Based PUF

549

7. Chen, A.: Utilizing the variability of resistive random access memory to implement reconfigurable physical unclonable functions. IEEE Electron. Dev. Lett. 36(2), 138–140 (2015) 8. Zhang, L., Fong, X., Chang, C.-H., Kong, Z.H., Roy, K.: Feasibility Study of Emerging Non-volatile Memory Based Physical Unclonable Functions. NUS, 3 July 2019 9. Cambou, B., Chipana, R., Habib, B.: PUF with dissolvable conductive paths. Patent application US201761541005P, August 2017 10. Chen, A., Lin, M.: Variability of resistive switching memories and its impact on crossbar array performance. In: 2011 International Reliability Physics Symposium, pp. MY.7.1–MY.7.4 (2011) 11. Lin, C.-L., Lin, T.-Y.: Superior Unipolar Resistive Switching in Stacked Zrox/Zro2/Zrox Structure. AIP Publishing (2016) 12. Yang, J.J., et al.: High Switching Endurance in Taox Memristive Devices. AIP Publishing (2010) 13. Fantini, A., et al.: Intrinsic switching variability in HfO2 RRAM. In: 2013 5th IEEE International Memory Workshop, pp. 30–33 (2013) 14. Prakash, A., Jana, D., Maikap, S.: Tao x-based resistive switching memories: prospective and challenges. Nanoscale Res. Lett. 8 (2013) 15. Yu, M., Devadas, S.: Secure and robust error correction for physical unclonable functions. IEEE Des. Test of Comput. 27(1), 48–65 (2010) 16. Cambou, B., Philabaum, C., Booher, D., Telesca, D.A.: Response-based cryptographic methods with ternary physical unclonable functions. In: Arai, K., Bhatia, R. (eds.) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol. 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_55 17. Bar-El, H.: Security implications of hardware vs Software cryptographic modules (2002) 18. Attridge, J.: An Overview of Hardware Security Modules. SANS Institute, 05 August 2002 19. Herder, C., Yu, M., Koushanfar, F., Devadas, S.: Physical unclonable functions and applications: a tutorial. In: Proceedings of the IEEE, vol. 102, no. 8, pp. 1126–1141, August 2014 20. Yang, L., Kuegeler, C., Szot, K., Ruediger, A., Waser, R.: The Influence of Copper Top Electrodes on the Resistive Switching Effect in Tio2 Thin Films Studied by Conductive Atomic Force Microscopy. AIP Publishing, 6 July 2009 21. Chiu, F.-C.: A review on conduction mechanisms in dielectric films. Adv. Mater. Sci. Eng. (2014) 22. Schulman, A., Lanosa, L. F., Acha, C.: Poole-Frenkel Effect and VARIABLE-RANGE Hopping Conduction in Metal/Ybco Resistive Switching Devices. AIP Publishing, 28 July 2015 23. Fu, Y.J., et al.: Bipolar Resistive Switching Behavior of LA0.5SR0.5COO3− Films for Nonvolatile Memory Applications. AIP Publishing, 2 July 2014 24. Kim, S., Jeong, H., Choi, S., Choi, Y.-K.: Comprehensive Modeling of Resistive Switching in the AL/TIOX/TIO2 /AL Heterostructure Based on Space-Charge-Limited Conduction. Appl. Phys. Lett. 97, 033508 (2010) 25. Lim, E.W., Ismail, R.: Conduction Mechanism of Valence Change Resistive Switching Memory: A Survey, MDPI, 9 Sept 2015 26. Menzel, S., Tappertzhofen, S., Waser, R., Valov, I.: Switching kinetics of electrochemical metallization memory cells. Phys. Chem. Chem. Phys. (2013)

A Graph Theoretical Methodology for Network Intrusion Fingerprinting and Attack Attribution Chuck Easttom(B) Georgetown University, Washington, DC, USA [email protected]

Abstract. Currently the field of network forensics lacks a methodology for attack fingerprinting. Such a methodology would enhance attack attribution. Currently, attack attribution is often quite subjective. The current research provides a mathematically rigorous procedure for creating fingerprints of network intrusions. These fingerprints can be compared to the fingerprints of known cyber-attacks, to provide a mathematically robust method for attack attribution. Keywords: Network fingerprinting · Graph theory · Network forensics · Cyber-attack · Cyber forensics

1 Introduction In any forensic investigation, one goal is to attribute the attack to some suspect. In traditional forensics, this has involved several techniques. For many years, crime scene forensics has benefited from the use of fingerprint analysis [23, 24]. With fingerprinting in traditional crime scene investigations, specific points, minutia in the fingerprint are analyzed. A particular threshold of points is required to indicate a match. There is a need for a comparable technique for network forensics. As with traditional crime scene forensics, network forensics is interested in attributing the attack. This is particularly important in nation state attacks. Currently, attack attribution is subjective. A fingerprinting methodology that is mathematically rigorous would facilitate more accurate attack attribution. The goal of this current study is to provide a robust methodology that can be applied to network intrusions to analyze those intrusions. This will provide a mathematical methodology for fingerprinting network attacks and using that fingerprint for attack attribution. This current study expands on previous work by the author [2–4, 25, 34–36].

2 Literature Review The literature review begins with a brief review of graph theory itself. Then an exploration of existing literature applying graph theory is explored. The methodology explored in this current paper will focus on Spectral and Algebraic graph theory. Algebraic graph theory © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 550–580, 2022. https://doi.org/10.1007/978-3-031-10467-1_34

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

551

creates matrix representations of graphs then applies linear algebra to those matrices [12]. This brings the power of linear algebra to bear on analyzing graphs. Graph theory originated with the work of the celebrated mathematician Leonard Euler; specifically, his publication of the paper Seven Bridges of Königsberg in 1736 [1]. The Seven Bridges of Königsberg is now one of the fundamental problems in graph theory that is taught in many undergraduate graph theory textbooks and university courses. That original publication established the essential elements of graph theory. At its core, graph theory provides a mechanism for describing the elements in any system, and how those elements are connected to each other. Graph theory depicts various entities, defined as vertices, and the connection between those vertices as edges. Put formally: A finite graph G (V, E) is a pair (V, E), where V is a finite set and E is a binary relation on V [2]. This elementary definition ignores how the vertices are connected, and merely describes that they are connected. This gap will be addressed later in this paper. If the edges have a direction, they are called arcs [8]. A graph with arcs is considered to be a digraph (i.e., a directional graph). In addition to the vertices and edges, are incidence functions. The incidence function addresses the previously identified gap of how the vertices are connected. The incidence function defines the specific mechanism of how the vertices are connected. The specific objects that these vertices represent are irrelevant to the mathematics of graph theory. In fact, pure mathematics analyses graphs without any specific application. Describing a graph in clearer and mathematical rigorous terms is the following formula shown in Eq. 1. G = (V , Eψ)

(1)

The preceding formula simply states in mathematical terms what was described in the previous paragraph: a graph (G) is a set consisting of the vertices (V), the edges (E), and the incidence functions (ψ) connecting the edges to the vertices [2]. This provides a basic description of what a graph is. This definition provides a fundamental view of what a graph is. Graphs are often described using particular matrices. One such matrix is the incidence matrix, which has a row for each vertex and a column for each edge [26]. This allows one to readily see how the edges and vertices are connected. Incidence matrices can provide insight into the spread of malware through a network. The incidence matrix is calculated as shown in Eq. 2. ⎧ ⎨ 0 if v is not an endpoint of e (2) IG [v, e] = 1 if v is an endpoint of e ⎩ 0 if e is a self-loop at v When creating the incidence matrix, vertices are listed on the left, with edges on the top. Therefore, it is first necessary to add designations for the edges to a graph. This is shown in Fig. 1.

552

C. Easttom

Fig. 1. Adding edge designations.

The incidence matrix for the graph in Fig. 1 is shown in Fig. 2.

Fig. 2. Incidence matrix

A second type of matrix used to describe a graph is the adjacency matrix. The adjacency matrix represents whether pairs of vertices are adjacent to each other. The adjacency matrix for an undirected graph is defined in Eq. 3. 1 if u and v are adjacent

AG [u, v] = {0 if not

(3)

The adjacency matrix for the digraph can be calculated the same way, or it can be calculated in a similar fashion to the incidence matrix. To avoid confusion, this current research will compute the adjacency matrix as shown in Eq. 3. The adjacency matrix from Fig. 1 is shown in Fig. 3.

Fig. 3. Adjacency matrix

In addition to the previously discussed incidence and adjacency matrix, the degree matrix is of interest. The degree matrix addresses how many connections each vertex has. This includes self-connections/loops. The degree matrix of the graph in Fig. 1 is provided in Fig. 4.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

553

Fig. 4. Degree matrix

It should be noted that when working with directed graphs, it is common to separately compute the in-degree matrix and the out-degree matrix. These matrices are elementary, first steps in analyzing a given graph. However, they do provide information relevant to a network forensic investigation. Understanding the connections into and out from a given network node can help map out the path of any network attack. This is useful in identifying the machines that should be subjected to further analysis. While the degree matrix is a common artifact in graph theory, there has been no substantial application of it to identifying important nodes in network incidents. This current research fills that gap in the literature. Another important matrix in graph theory is the Laplacian matrix. The Laplacian matrix is most often defined as the degree matrix minus the adjacency matrix [13] and is sometimes referred to as the Kirchhoff matrix or discrete Laplacian. The Laplacian matrix provides substantial information about a graph. Using Kerchoff’s theorem it can be utilized to calculate the number of spanning trees in a graph. Thus, the Laplacian matrix for the graph in Fig. 1 would be calculated as is shown in Fig. 5.

Fig. 5. Laplacian matrix

There are variations of the Laplacian matrix such as the deformed Laplacian, signless Laplacian, and symmetric normalized Laplacian [7]. These alternatives will not be included in the methodology proposed and explored in this thesis. Specific properties of the standard Laplacian matrix will be explored. When dealing with directed graphs, one can use either the in degree or out degree matrix. The degree matrix demonstrates how connected specific vertices are. In a network forensic investigation this identifies key elements in the incident being investigated. One problem in network forensics is determining which machines to image and analyze further. Focusing on those vertices with the highest degree can address this problem. Group theory is also sometimes integrated into algebraic graph theory [7]. This is useful in understanding graph families based on their symmetry, or lack thereof. Frucht’s theorem states that all groups can be represented as an automorphism of a connected

554

C. Easttom

graph [18]. This provides a means of using graphs to analyze groups. Given that network nodes being analyzed in network forensics can form groups, this provides a means for analyzing those groups. The tools of algebraic graph theory provide a methodology for mathematically modeling network intrusions. Once the network involved in an incident, or at least a subsection of that network, has been rendered as a graph, algebraic graph theory can then be used to mathematically analyze and model relationships within the network and behavior of nodes. This provides a mathematical rigor to network forensics that is currently lacking. Spectral graph theory is a subset of algebraic graph theory. Spectral graph theory focuses on the eigenvalues, eigenvectors, and characteristic polynomials of the matrices associated with a graph [1]. Eigenvalues are a special set of scalars associated with a matrix that are sometimes also known as characteristic roots, characteristic values proper values, or latent roots. As an example, consider a column vector called v. Then also consider an n × n matrix called A. Then consider some scalar λ. If it is true that Av = λv, then v is an eigenvector of the matrix A and λ is an eigenvalue of the matrix A. Spectral graph theory uses eigenvalues to derive information about the matrix in question. The spectrum of a matrix is its set of eigenvalues, eigenvectors, and the characteristic polynomial of the matrices associated with a graph. This is normally the adjacency matrix, but in some situations the Laplacian matrix can be used [30]. Spectral graph theory compares the spectrum of various graph matrices. Spectral graph theory provides a powerful tool for analyzing graphs [4]. The Laplacian matrix discussed in Sect. 3.2 is often used in spectral graph theory. One of the more critical properties of a Laplacian matrix is the spectral gap of the Laplacian [30]. A spectral gap is the difference between the moduli of the two largest eigenvalues of the matrix. The spectral gap is used in conjunction with the Cheeger inequalities of a graph [10]. Cheeger inequalities relate the eigenvalue gap with the Cheeger constant of the graph. That relationship is expressed in Eq. 4. 2h(G) ≥ λ ≥

h2 (G) 2 d max(G)

(4)

In Eq. 4 dmax is the maximum degree in G and λ is the spectral graph of the Laplacian matrix of the graph. The value h(G) is the Cheeger constant [12]. The Cheeger constant is a quantitative measure of whether a graph has a bottleneck [16]. This is sometimes referred to as the Cheeger number and is often applied to analyzing computer networks. This makes the Cheeger constant also relevant to network forensics. The formula for the Cheeger constant is somewhat more complex than the Cheeger inequality. Consider a graph G with a vertex set V(G) and an edge set E(G). For any set of vertices, A such that A ⊆ V (G) the symbol ∂ is used to denote the set of all edges that go from a vertex in set A to a vertex outside of A. In general, the Cheeger constant is positive only if G is a connected graph. If the Cheeger constant is small and positive, that denotes some bottleneck in the graph [8]. It should be noted that most definitions of both the Laplacian matrix and the Cheeger constant define these in reference to an undirected graph [26]. Most textbooks generally only discuss the Cheeger constant and the Laplacian matrix in the context of an undirected graph [12]. This could lead to the conclusion that there is no relevance to digraphs.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

555

However, there is existing work from as early as 2005 applying both the Cheeger constant and the Laplacian to directed graphs [1, 2, 8]. The Cheeger constant of a graph is also relevant to analyzing computer networks and network intrusions. The Cheeger constant is a measure of whether or not a graph has a bottleneck. Given a graph G with n vertices, the Cheeger constant is defined as shown in Eq. 5. |∂(S)| 0 Vb in order for these two graphs to be isomorphic [12]. Graph similarity is also related to spectral graph theory, which was described in Sect. 3.3. When comparing graphs using spectral graph theory, another important issue is that graph G is said to be determined by its spectrum if any other graph with the same spectrum as G is isomorphic to G [12]. Put another way, if graphs G and H have the same spectrum, then they are said to be determined by that spectrum. If graphs G and H are not isomorphic but do have the same spectrum, they are said to be cospectral mates [7]. Graph edit distance (GED) is another mechanism for comparing two [30]. This technique is related to the string edit distance used to compare two strings. The essence of graph edit distance is how many alterations need to be made to a graph G1 to make it isomorphic to graph G2. This is described mathematical in Eq. 6. n C(ei ) min (6) GED(G1 , G2 ) = (e1 ····en )∈p(GJ G2 )

i=1

In Eq. 6, the e represents edges, but could be any change. The C(ei ) is the change be it an edge or vertex. The essence of the formula is simply to total the number of changes needed to alter graph g2 such that it is isomorphic to g1 [30]. Graph edits can be vertex or edge and involve either insertion, deletion, or substitution. This method does provide a metric of the degree of similarity between two graphs. It does not take into account incidence functions. These concepts in graph theory are important for understanding the methodologies described in this current research. Graph theory fundamental concepts, as well as more advanced algebraic aspects of graph theory will be utilized to model and examine network intrusions. The use of graph similarities in network forensics has been very limited. Network incident fingerprinting has been non-existent. The current research addresses that gap. An area of graph theory that has not been rigorously applied to network intrusions is spectral graph theory. This sub-domain of graph theory is focused on eigenvalues, eigenvectors, and characteristic polynomials of graph matrices. This often focuses on adjacency matrices or Laplacian matrices but can include degree matrices and incidence matrices. One important aspect of spectral graph theory for comparing systems is the area of cospectral graphs. Two graphs are considered cospectral if the adjacency matrices of the graphs have equal multisets of eigenvalues. This is sometimes termed isospectral. This element of spectral graph theory has not been previously applied to network incident fingerprinting. The current research addresses that gap. Chromatic graph theory can also be useful in analyzing graphs. This aspect of graph theory is focused on the chromatic number of a graph. The chromatic number of graph G is defined as the smallest number of colors needed to color the vertex set V such that now adjacent vertices have the same color [7]. This is often denoted as x(G). Given that loops clearly cannot be colored in this fashion, they are typically not considered in chromatic graph theory. Another part of chromatic graph theory is the chromatic polynomial. The chromatic polynomial counts the number of ways a graph can be colored using no more

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

557

k colors. Understanding the chromatic number of a graph and its chromatic polynomial gives indications as to how interconnected the graph is. For example, a high chromatic number would indicate many vertices are connected to each other, requiring more colors to prevent adjacent vertices from having the same color. When applying chromatic graph theory to a digital forensics analysis it can be useful to employ Brook’s Theorem. Brooks’ theorem describes a relationship between the maximum degree of a graph and its chromatic number [32]. According to the Brooks theorem, in a connected graph in which every vertex has at most X neighbors, the vertices can be colored with only X colors, except for two cases, complete graphs and cycle graphs of odd length, which require X + 1 colors. The current research also proposes an expansion of existing graph theory. Customarily, graph theory does not address the concept partial isomorphisms. However, there is certainly established literature regarding isomorphisms of subgraphs [11, 17]. However, typically graph theory does not address the issue of a percentage of isomorphism between two graphs. The use of partial isomorphisms was developed as part of the current research and a nascent version described previously in an IEEE paper [3]. When utilizing graph theory to analyze network forensics, a partial isomorphism is of interest. For example, the graphs of two separate network incidents are unlikely to be isomorphic. However, if a partial isomorphism is found between the two graphs that would indicate some relation between the two incidents. A partial isomorphism is defined as two graphs that have isomorphic subgraphs. However, unlike previous work with isomorphic subgraphs, the percentage of isomorphism of the two complete graphs is considered. The degree of isomorphism was first introduced in 2020 [3]. The formula described in this research is an enhancement of that previously published work. The degree of isomorphism is defined as the percentage to which two graphs are isomorphic. This is expressed as a percentage, rather than as an integer value. To calculate the degree of isomorphism between two graphs requires a simple formula. The number of identical vertices, edges, and incidence functions is summed. That sum divided by two, yielding the percentage of isomorphism between the two graphs. To put this in a more mathematically rigorous format, Given G1 = (V1, E2, ψ1) and G2 = (V2, E2, ψ2), Eq. 7 demonstrates how to calculate the degree of isomorphism between the two graphs. Id =

(

n

) +(

i=1 G1 (v)=G2 (v)

n

n

)+(

i=1 G(E)=G2 (E)

n

3

n

i=1 GI (ψ)=G2 (ψ)

n

) (7)

In Eq. 7, Id represents the degree of isomorphism. The formula is relatively simple, the number of identical vertices in each graph is divided by the total pairs of vertices being compared. The same calculation is done for edges and incidence functions. These three values are totaled and divided by three to provide a percentage of isomorphism. This calculation is another example of why it is advantageous to only graph those nodes (vertices) that are affected in some way by the network incident that is being forensically examined. Given the diversity in network structures and topologies, if all nodes in a network are part of the graph, then no two attacks would ever appear to have a substantial similarity. This method is an expansion of one published previously [3].

558

C. Easttom

Another measure of graph similarity is provided by the Randic index of a graph [30]. The Randic index is sometimes referred to as the connectivity index. It is a measure of the degrees of the vertices in the graph [31]. The formula is shown in Eq. 8. 1 (8) R(G) = √ (v,ω)∈E(G) δ(v)δ(W ) In Eq. 8, δ (v) and δ (2) denote the degrees of the vertices v and w, respectively. When comparing graph similarities, one can utilize induced subgraphs. Rather than compare the entire graph of a network for similarities such as homomorphisms, isomorphisms, and partial isomorphisms, one can compare induced subgraphs from the affected networks. This allows the forensic examiner to focus only on the affected areas of the network. In a study by Barrère et al. [21], the authors employed graph theory to the process of pictorially modeling a network attack. Their study did not incorporate any aspects of algebraic or spectral graph theory. The authors simply visually depicted the attack using a graph, without utilizing the many analytical tools found in graph theory. The limited use of graph theory in the Barrère, Steiner, Mohsen, and Lupu [21] study illustrates both the need for graph theory to be utilized in network forensics, as well as the considerable gap in said application. Palmer, Gelfand, and Campbell [15] published a study promoting the utilization of graph theory for network forensics. Their study only utilized the most basic elements of graph theory to pictorially represent an incident. Much like the Barrère, Steiner, Mohsen, and Lupu [21] study, Palmer, Gelfand, and Campbell did not explore any of the many analytical tools found in graph theory. Such tools include spectral graph theory, chromatic graph theory, or in fact any aspect of algebraic graph theory. The Palmer, Gelfand, and Campbell study does illustrate the possibilities of integrating graph theory into network forensics. Furthermore, their study also demonstrated a general awareness of graph theory in the network forensics community. However, the Palmer, Gelfand, and Campbell study also demonstrates the substantial gaps in applying graph theory to network forensics. Those gaps are addressed by the current research. Takahashi, Xiao, and Meng [20] published a study that utilized graph theory to analyze data from network and server logs. In that study, the authors utilized graphs to describe the flow of data, labeling their method “virtual flow-net”. This study does provide some benefit to the network forensic examiner; however, the study did not utilize any aspects of algebraic graph theory. As will be shown in this thesis, algebraic graph theory provides a rich set of mathematical tools for analyzing network intrusions and incidents. The studies by Palmer, Gelfand, and Campbell [15], Takahashi, Xiao, and Meng [20], and Barrère et al., [21], are emblematic of the manner in which graph theory has previously been applied to network forensics. A number of studies have been published employing graph theory for digital or network forensics. However, all these studies have a common theme of only applying a very limited aspect of graph theory. These studies clearly illustrate the need for a more expansive methodology for applying graph theory to network forensics. Milling, Caramanis, Mannor, & Shakkottai [22] applied some of the mathematical tools of graph theory to analyzing computer virus spread through a network. The authors

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

559

employed Erd˝os–Rényi graphs to analyze the computer virus propagation. This approach focuses on all graphs with a fixed vertex set and a fixed number of edges [2]. This methodology does apply algebraic graph theory, notably the use of Erd˝os–Rényi graphs. However, it is a limited application of algebraic graph theory. This study illustrates the possibilities of applying graph theory to network forensics. However, it also demonstrates the substantial gaps in the current literature. The dissertation of Wei Wang’s submitted in 201 theses was the most vigorous application of graph theory to network forensics, prior to the current methodology. Wang utilized graph theory to classify network evidence in order to present a consistent map of the relevant network traffic. Wang’s work enhanced network forensics by applying basic graph theory [19]. There were pioneering facets to the Wang [19] study that surpassed not only previous studies, but most studies done after Wang. The most obvious feature of the Want study was that it utilized more of graph theory than preceding studies had. As one example, Wang applied eigenvector centrality metrics to evaluate importance of nodes in the evidence graph and made limited use of the Laplacian spectrum. However, there were aspects of graph theory that Wang did not utilize. A few aspects of graph theory that Wang did not utilized include the Cheeger constant, Lovász number, or homomorphisms. Wang also never investigated network fingerprinting via graph theory, nor by any other method. Wang also did not examine the change in the graph spectrum over the span of a given network intrusion event. Furthermore, Wang’s work did not provide a framework for network forensics. While the Wang study was innovative at its time, it did leave additional work to be done. The current study seeks to fill those gaps. Easttom & Adda [4] used graph theory to perform network intrusion fingerprinting. The current paper expands upon that study and adds additional experimental evidence to validate the methodology.

3 Methodology The current paper provides a step by step process for applying graph theory to any digital forensics’ investigation. As will be shown as the paper progresses, some aspects of graph theory may be more applicable to one type of investigation than another. For example, some techniques might be illuminating when applied to a virus outbreak, but not as useful when analyzing insiders’ exfiltrating data. 3.1 Initial Steps The first step is to represent the nodes in the attack as vertices in a graph. Then connections between those nodes are represented via edges. If the communication between nodes is one way, then the edges should be directed (i.e., arcs). Weighting those edges will be covered later in this paper. The next step is to determine measures of centrality. The term eccentricity, in the context of graph theory, denotes how far a vertex is from the farthest vertex in the graph. The radius is the minimum of vertex eccentricities and is typically denoted as rad(G). The diameter is the maximum of the vertex eccentricities

560

C. Easttom

and is usually denoted as diam(G). These two metrics will give you an indication of the centrality of the graph. It may also be useful to document the trails and paths that exist within the graph model of the cyber breach. A trail is defined as a walk through the graph with no repeated edges. A path is a specialized case of a trail. A path is a trail with no repeated vertices. The length of the trails and paths give an indication of the maximal distance between vertices. The length of a path in a weighted graph is the sum of the edge-weights of the path. Eulerian trails can also provide information on spread and impact of a breach. A Eulerian trail is a trail that contains every edge of the graph G. Clearly, understanding the length of the Eulerian trails in graph G provides additional insight into the nature of the device(s) that are impacted by the breach modeled by graph G. Particularly in malware investigations, one might wish to calculate the shortest path between infected nodes. The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph [33]. Weighted digraphs are what will encounter most often in forensics. There are other algorithms that can accomplish similar goals such as Dijkstra’s algorithm. The next step is to examine similarities between the attack being modeled and similar attacks. This allows you to determine if the attacks were likely executed by the same threat actor or using similar attack modalities. This step involves analyzing graph isomorphism. Even in the case that two separate cyber-attacks are executed by the same threat actor using the same attack vector; the attack graphs may not be isomorphic. The differences in the attack target alone could prevent a true isomorphism. However, the first step in the methodology is to first determine if there is an isomorphism. It is likely that there will not be an isomorphism, unless only subgraphs are compared. If there is no isomorphism, then determine if there is a strong or weak homomorphism [5]. This is most effectively done if the graph is only of the affected region of the networks, rather than the entire networks. If the entire networks are graphed, then the differences in the networks themselves may prevent accurate fingerprinting. Once isomorphisms have been explored, the next step is to examine similarities found in induced subgraphs. In general terms an induced subgraph is formed form a subset of the vertices of another graph with the edges connecting the vertices in that subset. Put more formally if G = (V, E) and S ⊂ V of G, then a graph H whose vertex set is S, and which includes all of the edges that have both endpoints in S is an induced subset of G [2]. Induced subgraphs are useful when analyzing network intrusions. If one considers the affected devices as an induced subset of the network graph, it is then a straightforward process to compare the induced subset from another attack to analyze similarities in attack vectors, targets, and other aspects of the incident. There is existing work on comparing induced subgraphs, but this technique has not previously been applied to network forensics. This is referred to as the maximum common subgraph (MCS). It is also useful to analyze the neighborhood of any vertex that has been identified as relevant to the attack in question. A neighborhood of vertex v in a graph G is a subgraph of G induced by all vertices adjacent to v. It should be relatively obvious that multiple induced subgraphs will be found by studying the neighborhood of vertices of interest in a given attack.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

561

The procedure described in this current study begins with creating a graph model of the two attacks. The two attacks being the current attack, and some previous documented attack one wishes to compare the current attack to. Then the graphs are analyzed to see if they are isomorphic or egamorphic. Either of which would be a strong indicator of identical threat actors using the same threat vector on similar target networks. Certainly, isomorphic graphs demonstrate that the exact same attack was used, with an identical threat vector, on a substantially similar network topology. That could only occur with an identical attacker. Egamorphic graphs are not entirely identical as isomorphic graphs are but have enough overlap to clearly point to the same threat actor. Assuming that the induced graphs are neither isomorphic or egamorphic, the next portion of the analysis is to identify induced subgraphs that are homomorphic (even weakly homomorphic) and analyze those. Particular attention should be paid to the neighborhood induced graphs of key vertices in both attacks. Of particular interest would be the state wherein the neighborhood induced subgraph of G is a covering graph of a neighborhood induced subgraph of H. A covering graph is a covering map from the vertex set of G’s to H. More formally, a covering map is a surjection and an isomorphism. It is even more relevant to comparing two attacks if the graphs are multigraphs. A covering graph of two induced sub-multigraphs is a strong indicator of identical threat actors and attack vectors. Each induced subgraph that represents a homomorphism would be a point of match between the two attacks. This provides a method that is analogous to fingerprints. In fingerprint analysis, points of similarity are identified. The more matching points between fingerprints, the strong the identification is considered to be. The strength of the relationship in the induced subgraphs would be weighted, such that an isomorphism is weighted more than a weak homomorphism. A proposed weighting is shown in Table 1. Table 1. Relationship weighting Weight

Relationship

3

Isomorphism

2

Strong homomorphism

1

Weak homomorphism/egamorphism

It should be apparent that the weighting is relevant to the total size of the graph. If one has two graphs, each of only four vertices, three of which form an isomorphic subgraph yielding a very strong match. However, the same three vertices from each graph forming an isomorphic subgraph, when the entirety of each graph is 100 vertices, is not a very strong relationship. Therefore, the degree of similarity is calculated by the weight divided by the total number of vertices. This is shown in Eq. 9. θ=

w Tv

(9)

562

C. Easttom

The degree of matching is represented by the theta symbol θ. The w represents total weight assigned to subgraphs multiplied by 2. This is done because each original graph contains an induced subgraph that is being compared to an induced subgraph in the other complete graph. The Vt is the total number of vertices in the two graphs. This is normally described as the combined order of the two graphs. This produces a number between 0 and 1, quantifying the similarity. Given the fact that networks can be very different, one would expect relatively low values for θ. Values even above 0.25 would be considered strong matches. Network attack fingerprinting can also be facilitated by computing partial isomorphisms. The technique for calculating the partial isomorphism was described previously in this paper. Partial isomorphism is similar in function to matching fingerprints using a point system. 3.2 Integrating Information Theory into Graph Theory Other measures of similarity that were discussed previously in this paper can also be utilized in graph fingerprinting. As one example the graph edit distance (GED) is easily calculated and provides a measure of similarity. Comparing graph spectra can provide yet another metric for the similarity of the two graphs. Utilizing multiple modalities helps establish the validity of the methodology being used. When multiple techniques provide substantially similar results, that is, a strong indicator that the techniques are valid. Integrating information theory into the incidence functions of graphs was previously published as part of the current study [3]. The goal of incorporating information theory with graph theory is to provide a more robust modeling tool. In the current study, it is focused on applications in computer networks including computer network intrusions. However, this process can be applied to any system that involves information. The issue is to expand on what is currently considered regarding incidence functions. All too often the incidence function is simply discussed as a connecting two vertices via an edge or arc [6, 9, 14]. Expanding that definition to include the information flowing between two vertices adds detail to graphs that model any information related network. When considering two network nodes, the connection between them is an information flow. Examining that flow is critical to fully understanding the network itself. The methods posited in this section are not mutually exclusive. Rather, they are a set of tools that can be utilized as needed in the specific situation. The various information theoretical formulas described in this section can be applied individually or all of them may be applied. Which formulas and how many to apply is at the discretion of the forensic analyst. In the experimental section of this paper, the use of Hartley, Shannon, min-entropy, and Rényi entropy will be utilized. The most elementary aspect of information theory to include would be to calculate the Shannon entropy. The formula for calculating Shannon entropy is shown in Eq. 10. H (x) = −

n

p(xi ) log2 p(xi)

(10)

i=1

In Eq. 10, H indicates the Shannon entropy over some variable x. The outcomes of x (xi ,…xn ) occur a probability denoted by pxi ,….pxn . While log base 2 is often assumed,

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

563

it is expressly stated in Eq. 10 to avoid confusion. The information entropy provides a metric of information in a given message [29]. When applied to the incidence function of a vertex in a given graph, it provides a metric denoting the flow of information between two vertices [4]. Another measure of information entropy is the Rényi entropy [27]. The Rényi entropy is used to generalize four separate entropy formulas: Hartley entropy, the Shannon entropy, the collision entropy, and the min-entropy [28] as depicted in Eq. 11. H α(x) =

n 1 log piα i=1 1−α

(11)

In Eq. 11, X denotes discrete random variable with possible outcomes (1, 2, …., n) with probabilities pi for i = 1 to n. Unless otherwise stated, the logarithm is base 2, and the order, α, is 0 < α < 1 [27]. The aforementioned collision entropy is simply the Rényi entropy in the special case that α = 2, and not using logarithm base 2. There are certainly more formulas in information theory that could be applied to the incidence function of graphs. The goal of this current study is not to provide an exhaustive study of information theory. Rather, the goal is to provide a general framework for the application of graph theory to digital forensics, including applying information theory to incidence functions. 3.3 Integrating Spectral Graph Theory Integrating spectral graph theory into network forensics was previously published as part of this current study [4]. The methodology being described is to calculate the spectra of the adjacency matrix of a computer virus spreading through a network. It is possible to either calculate the graph spectrum or the Laplace spectra of a graph, as was discussed in the review of literature. Both can provide important information about the spread of a virus across a computer network. The focus of the current study is to use the adjacency matrix to calculate the spectra of the graph as the virus spreads. The eigenvalues of the adjacency matrix are calculated at points in the virus spread. This provides insight into how the virus spread. Such information can be useful in network forensics, or in intrusion detection systems. Simply examining the adjacency matrix, it can provide insight into the nature of the virus spread. The adjacency matrix can be used to demonstrate how a virus spreads across a network. Furthermore, changes in the adjacency matrix can quantify how rapidly the virus spreads. However, the current methodology will examine the spectrum of those adjacency matrices to derive even more information. The process is designed to add additional information in the analysis of a graph of a network intrusion. In order for a forensic analyst to understand what the spectrum of a graph means for the investigation; it is important to understand what a matrix is. While a matrix may appear to be simply an array of numbers, it is actually a linear transform. Eigenvectors change, by at most the eigenvalue, when a given matrix is applied to them. The formula is shown in Eq. 12. M(v) = λv

(12)

564

C. Easttom

There are many factors about a graph that are immediately clear from the graph spectrum. One such item is that the second smallest eigenvector can be used to partition the graph into clusters, via spectral clustering. Spectral clustering is a technique for dimensionality reduction. This is not an issue with the graphs being used in this experiment. However, if the forensic examiner wishes to apply various data analytics techniques to graphs of very large networks, spectral clustering can be an important tool. Another obvious fact that can easily be derived from the graph’s spectrum is that if there are multiple instances of the same eigenvalue, this indicates the presence of multiple nodes with similar connectivity. Thus, repeated eigenvalues provide insight into the connectivity of the graph in question. Yet another easily determined fact based on the graph’s spectrum is that the second smallest eigenvalue is a measure of the expansion of the graph. Given a random walk through a graph, the higher the second eigenvalue is, the faster on converges to a stationary distribution. The largest eigenvalue is also sometimes referred to as the dominant eigenvalue. There are two facts that can be derived merely by identifying the largest eigenvalue of the graph matrix. The index of a network represented by a graph is the largest eigenvalue of the matrix. The network index is also sometimes referred to as the spectral radius of a graph. When examining an undirected graph of a network, the corresponding eigenvector to the largest eigenvalue is the principal eigenvector of that network. The spectrum of the adjacency matrix can also provide information regarding similarity of attacks. Two graphs are considered cospectral if the eigenvalues of their adjacency matrix are the same. This fact makes the eigenvalues yet another method for fingerprinting network attacks for attack attribution. It should be noted that cospectral graphs are not necessarily isomorphic; however isomorphic graphs will be cospectral. Thus, calculating the spectrum of a graph also provides another modality for testing for isomorphism. When two graphs are cospectral, but not isomorphic, they are said to be cospectral mates. Calculating the spectrum of a graph also allows one to calculate the Hoffman-Delsarte inequality. Given a regular (i.e., same degree vertices) graph G with a smallest eigenvalue λmin , the independence number can be calculated with Eq. 13. n (13) αG ≤ k 1 − λmin In Eq. 13, K is the degree of the vertices, n is the number of vertices and λmin is the smallest eigenvalue. This inequality provides the independence number for the graph. The independence number is an indicator of how independent the nodes in the graph are. The Perron–Frobenius theorem dictates that the largest non-negative eigenvector provides a centrality measure for the node/vertex in question. The Perron–Frobenius theorem concerns real square matrices with positive elements and asserts that such matrices have a unique largest real eigenvalue and that the corresponding eigen vector can be chosen to have strictly positive values. The difference between the largest eigenvalue and the second largest eigenvalue of a d-regular graph defines the spectral gap. There are a number of theories that related to the spectral gap. However, those are beyond the current study but will be mentioned as possible future studies.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

565

There is additional information that can be derived from a graph’s spectrum, but that information is applicable only to specific graph applications, such as chemical graph theory. Chemical graph theory is outside the scope of this current study. However, an example of such applications is the Estrada index which is derived from a graph’s eigenvalues. The Estrada index is used in modeling protein folding. 3.4 Experimental Results The virus began at node A. The time of the infection of node A is denoted as T = 0. Every 5 min the various devices were checked to determine if that given node was yet infected. The infected nodes are shown at specific times in Table 2. Table 2. Virus infection spread Node

T=5

T = 10

T = 15

A

Infected

Infected

Infected

B

Infected

Infected

Infected

C

Infected

Infected

Infected

Infected

Infected

Infected

Infected

Infected

Infected

D E F

Infected

G H

Infected

The first phase of the application of graph theory is to calculate measures of central tendency for the network at the end of the infection. Thus, the graph was drawn with only the infected nodes at t = 15. That is shown in Fig. 6. Note that the graph in Fig. 6 is an undirected graph. This analysis could also be done viewing the graph as a directed graph, showing how the virus moved from machine to machine. That was intentionally not done. The goal of these experiments is to match as closely as possible, real-world incident conditions. In a real-world incident, the forensic analyst would not be able to determine the exact path of virus spread without extensive forensic examination of every infected machine. One of the benefits of applying graph theory to network forensics is to aid the forensic examiner in determining what machines should be imaged and analyzed. In the graph in Fig. 6, no node has an eccentricity of more than 2. As an example, the distance from F to H is 2. The distance from C to H is also 2. As a final example the distance from F to B is 2. Therefore, this graph has multiple centers, in fact all the nodes are centers. Therefore, in this case the determination of graph centers does not provide additional insight into the forensic analysis. The radius of this graph is 2 and the diameter is 3. When the radius and diameter are very close, they provide less analytical information.

566

C. Easttom

Fig. 6. Graph of infected nodes

It can also be advantageous to calculate the global clustering coefficient for the graph. This was defined previously as the number of closed triplets divided by the number of all triplets. In the case of the graph at T = 15 all nodes except H are part of closed triplets providing a clustering coefficient of 1. That indicates a high degree of clustering. This information is useful in analyzing the spread of the virus. It is also possible, simply using the data from Table 2, to find forensically interesting information. At t = 5, nodes B, C, and E were all infected. B and C are Windows 10 machines, whereas E is an Ubuntu Linux machine. This information indicates that the virus in question spreads equally rapidly to both Windows and Linux computers. At t = 10 only two more nodes were infected, D and F, which are both Windows machines. Then at T = 15 only one additional node was infected, H. H is an Android phone. Node G, A Windows 2016 server was not infected. This provides information that a forensic examiner can utilize. It is likely that the additional security in Windows Server 2016 prevented the virus from reaching that machine. It is also likely that the Android phone being infected last was simply due to random chance. This brings us to phase 2, the application of spectral graph theory. The adjacency matrix was generated, and the matrices spectrum calculated for each time t = 5, t = 10, t = 15. Adjacency, in this application is defined as infected nodes that are adjacent. These adjacency matrices for the three times are shown in Fig. 7, 8, and 9.

Fig. 7. Adjacency matrix at T = 5

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

567

Fig. 8. Adjacency matrix at T = 10

Fig. 9. Adjacency matrix at T = 15

Spectral graph theory begins with the calculation of eigenvalues of the adjacency matrix. To ensure the calculations were not in error, three separate online eigenvalue calculators were used: https://comnuan.com/cmnn01002/cmnn01002.php https://matrixcalc.org/en/vectors.html https://www.arndt-bruenner.de/mathe/scripts/engl_eigenwert2.htm The eigenvalues at the times T = 5; T = 10; and T = 15 are shown in Table 3. Table 3. Eigenvalues of virus spread T=5

T = 10

−1

−1

3

T = 15 0

4

4.2015

−1

−1

0.5451

−1

−1

−1

0

−1

−1.7466

0

0

−1

0

0

−1

0

0

0

568

C. Easttom

The associated eigenvalues and eigenvectors are given in Table 4. Table 4. Eigenvalues and eigenvectors of the virus adjacency matrix

T=5

T = 10

T = 15

Eigenvalue

Eigenvector

λ1 = 1

−0.8660, 0.2887, 0.2887, 0.0000, 0.2887, 0.0000, 0.0000, 0.0000

λ2 = 3

0.5000, 0.5000, 0.5000, 0.0000, 0.5000, 0.0000, 0.0000, 0.0000

λ3 = −1

0.2170, −0.8628, 0.3229, 0.0000,0.3229, 0.0000, 0.0000,0.0000

λ4 = −1

0.0000, 0.0000, −0.7071, 0.0000, 0.7071, 0.0000, 0.0000, 0.0000

λ5 = 0

0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000

λ6 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000,1.0000, 0.0000, 0.0000

λ7 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000

λ8 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,1.0000

λ1 = −1

−0.8944, 0.2236, 0.2236, 0.2236, 0.2236, 0.0000, 0.0000, 0.0000

λ2 = 4

0.4472, 0.4472, 0.4472, 0.4472, 0.4472, 0.0000, 0.0000, 0.0000

λ3 = 1

−0.1952, −0.4392, 0.8620, −0.1139, −0.1139, 0.0000, 0.0000, 0.0000

λ4 = −1

−0.1543, 0.8102, 0.1243, −0.3901, −0.3901, 0.0000, 0.0000, 0.0000

λ5 = −1

−0.0932, −0.2097, −0.2097, −0.3650, 0.8776, 0.0000, 0.0000, 0.0000

λ6 = 0

0.0000, 0.0000, 0.0000, 0.0000, 1/0000, 0.0000, 0.0000

λ7 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1/0000, 0.0000

λ8 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000

λ1 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000

λ2 = 4.2015

0.4584, 0.4164, 0.4164, 0.4584, 0.4164, 0.2182, 0.0000, 0.1091

λ3 = 0.5451

−0.2020, 0.2777, 0.2777, −0.2020, 0.2777, −0.7413, 0.0000, −0.3707

λ4 = −1

0.0000, −0.8165, 0.4082, 0.0000, 0.4082, −0.0000, 0.0000, −0.0000

λ5 = −1.7466

0.4717, −0.2518, −0.2518, 0.4717, −0.2518, −0.5402, 0.0000, −0.2701

λ6 = −1

−0.0000, 0.2265, −0.7926, −0.0000, 0.5661, 0.0000, 0.0000, 0.0000

λ7 = −1

0.5774, 0.0000, 0.0000, −0.5774, 0.0000, 0.0000, 0.0000, −0.5774

λ8 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,1.0000, 0.0000

Previously in this paper, the application of spectral graph theory was described in detail. This description included specific data that can be derived from the spectrum of a graph. As explained, multiple instances of the same eigenvalue indicate the presence of multiple nodes with similar connectivity. Observing the final adjacency matrix, that of T = 15, one can see there are 3 eigenvalues of −1. Noting the nodes with similar connectivity, the forensic analyst can focus on those nodes to determine their role in the expansion of the virus. When selecting specific machines for forensic imaging, these nodes are logical choices, due to their similar connectivity.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

569

Recall that the Perron–Frobenius theorem dictates that the largest non-negative eigenvector provides a centrality measure for the graph in question. Using this graphs largest eigenvalue was 4.2015, Perron-Frobenius states that this value is the spectral radius of the graph. The largest eigenvalue also gives an upper bound to the maximum degree of any vertex in the graph. Also recall that the largest eigenvalue provides the spectral radius of the graph. The eigenvalue associated with node 2 (B) is the largest eigenvalue at times T = 5; T = 10; and T = 15. This reinforces the centrality of this node. 3.5 Virus Experiment 2 One of the most basic tenets of science is to repeat an experiment. Therefore, the virus infection experiment was repeated in order to validate the results. Given that the first experiment utilized a random determination if a virus would be opened or not, it is expected that a second run of the experiment would not yield precisely the same spread pattern. The virus again was introduced at node A, and the spread was measured at T = 5 (5 min), T = 10 (10 min) and T = 15 (15 min). The results are shown in Table 5. Table 5. Virus infection spread experiment 2 Node

T=5

T = 10

T = 15

A

Infected

Infected

Infected

B

Infected

Infected

Infected

D

Infected

Infected

Infected

E

Infected

Infected

Infected

Infected

Infected

Infected

Infected

Infected

C

Infected

F G H

Infected

As with virus experiment 1, the first phase of the application of graph theory is to calculate measures of central tendency for the network at the end of the infection. Thus, the graph was drawn with only the infected nodes at t = 15. That is shown in Fig. 10. In this instance, at T = 15 all nodes were infected. In the graph shown in Fig. 10, node H to G is an eccentricity of 3. However, there are still multiple centers as there were in the first experiment. Nodes A, B, C, D, E, and F all have eccentricities of 2, and each are centers of the graph.

570

C. Easttom

Fig. 10. Experiment 2 infected nodes

The adjacency matrix for experiment 2 and t = 5, t = 10, and t = 15 are shown in Figs. 11, 12, and 13.

Fig. 11. Experiment 2 adjacency matrix at T = 5

Fig. 12. Experiment 2 adjacency matrix at T = 10

As with experiment 1, eigenvalues and eigenvectors were calculated using three separate online calculators to ensure accuracy of the measurements. The three online calculators are the same as used in experiment 1:

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

571

Fig. 13. Experiment 2 adjacency matrix at T = 15

https://comnuan.com/cmnn01002/cmnn01002.php https://matrixcalc.org/en/vectors.html https://www.arndt-bruenner.de/mathe/scripts/engl_eigenwert2.htm The eigenvalues at the times T = 5; T = 10; and T = 15 are shown in Table 6. Table 6. Eigenvalues experiment 2 T=5 A

3.1642

T = 10 3.4664

T = 15 0.0000

B

0.2271

0.0341 + 0.4322 * i

4.5479

C

−1.0000

0.0341 – 0.4322 * i

1.0000

D

−1.3914

−1.5347

−1.6816

E

−1.0000

−1.0000

−0.1947

F

0.0000

−1.0000

−1.0000

G

0.0000

0.0000

−1.0000

H

0.0000

0.0000

0.6716

Note that two of the eigenvalues are complex numbers. This is not uncommon. Calculating the eigenvalues of matrices frequently yields complex results. The eigenvectors along with the eigenvalues are shown in Table 7.

572

C. Easttom Table 7. Eigenvalues and eigenvectors for experiment 2

T=5

T = 10

T = 15

Eigenvalue

Eigenvector

λ1 = 3.1642

0.5125, 0.5125, 0.0000, 0.4736, 0.4736, 0.0000, 0.1620, 0.0000

λ2 = 0.2271

−0.1696, −0.1696, 0.0000, 0.4388, 0.4388, 0.0000, −0.7466, 0.0000

λ3 = −1.0000

−0.8165, −0.0000, 0.0000, 0.4082, 0.4082, 0.0000, 0.0000, 0.0000

λ4 = −1.3914

0.5054, 0.5054, 0.0000, −0.4227, −0.4227, 0.0000, −0.3632, 0.0000

λ5 = −1.0000

0.0000, −0.0000, 0.0000, −0.7071, 0.7071, 0.0000, −0.0000, 0.0000

λ6 = 0

0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000

λ7 = 0

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000

λ8 = 0

0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,1.0000

3.4664

0.5187, 0.4497, 0.0000, 0.4897, 0.4206, 0.3084, 0.1297, 0.0000

λ2 = 0.0341 + 0.4322*i

0.2731 – 0.0566 * i, −0.1636 + 0.1259 * i, 0.0000 + 0.0000 * i, −0.0781 – 0.2957 * i, −0.5148 – 0.1132 * i, 0.5305 + 0.0000 * i, 0.2598 + 0.3991 * i, 0.0000 + 0.0000 * i

λ3 = 0.0341 – 0.4322 * i

0.2731 + 0.0566 * i, −0.1636 – 0.1259 * i, 0.0000 + 0.0000 * i, −0.0781 + 0.2957* i, −0.5148 + 0.1132 * i, 0.5305 + 0.0000 * i, 0.2598 – 0.3991 * i, 0.0000 + 0.0000 * i

λ4 = −1.5347

−0.1724, −0.6220, 0.0000, 0.5856, 0.1360, −0.2404, 0.4053, 0.0000

λ5 = −1.0000

0.7071, 0.0000, 0.0000, −0.0000, −0.7071, −0.0000, −0.0000, 0.0000

λ6 = −1.0000

−0.7071, 0.0000, 0.0000, −0.0000, 0.7071, −0.0000, −0.0000, 0.0000

λ7 = −1.0000

0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000

λ8 = 0.0000

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000

λ1 = 0.0000

0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000

λ2 = 4.5479

−0.4502, −0.4116, −0.4718, −0.4254, −0.3868, −0.2143, −0.1376, −0.0990

λ3 = 1.0000

−0.2582, 0.0000, 0.5164, 0.0000, 0.2582, −0.5164, −0.5164, −0.2582

λ4 = −1.6816

−0.2203, −0.5860, 0.0579, 0.5084, 0.1427, −0.2492, 0.4967, 0.1310

λ5 = −0.1947

0.1254, −0.2493, −0.3527, 0.4601, 0.0853, 0.3018, −0.2695, −0.6442

λ6 = −1.0000

−0.5774, 0.0000, −0.0000, −0.0000, 0.5774, −0.0000, −0.0000, 0.5774

λ7 = −1.0000

0.5774, 0.0000, −0.0000, −0.0000, −0.5774, −0.0000, −0.0000, −0.5774

λ8 = 0.6716

−0.3576, 0.2811, −0.2957, −0.0340, 0.6047, −0.2098, −0.1063, 0.5324

As was discussed previously, multiple instances of the same eigenvalue indicate the presence of multiple nodes with similar connectivity. Observing the final adjacency matrix, that of T = 15, there are two nodes with an eigenvalue of −1. Also recall that the Perron–Frobenius theorem states that the largest non-negative eigenvector provides a centrality measure for the node/vertex in question. At T = 15 the largest eigenvalue is node B with λ2 = 4.5479. It is interesting to note that at T = 5 and T = 10, node A had the highest eigenvalue. This indicates some shift in centrality. At T = 15, two additional nodes were infected, nodes C and F. This indicates an examination of the logs for nodes A and B. Examining those logs it was found that node A was the primary source of infection at T = 5. Nodes A and B were both infecting other nodes at time T = 10, then at T = 15, node B infected both Nodes C and H. The graph shows node B not directly

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

573

connected to node H. However, the virus infection is via email, therefore there need not be a direct connection. Recall that the largest eigenvalue also denotes the spectral radius for the graph. From the data collected from the graph spectrum, it is clear that the forensic analyst would be advised to image and carefully examine nodes A and B. There is a need to reduce the set of machines that are imaged and individually analyzed. These two experiments, along with the previously published experiments [3, 4] demonstrate that spectral graph theory provides a mathematically sound method of determining which machines to image. 3.6 Comparing the Virus Experiments Both experiments involved the same threat vector, in fact the same script virus, and mode of spread, that of email. Using the same attack and same attack vector denotes a level of similarity that indicates the same threat actor may be responsible for both attacks. Consequently, comparing the two attacks using methods described previously in this paper is warranted. The first method applied was the graph edit distance (GED). This methodology determines how many changes are required to cause graph G2 to become isomorphic to graph G1 . Comparing the graph in Fig. 6 with the graph in Fig. 10, it is apparent that the only changes required are the addition of one vertex (G) and two edges. This produces a GED = 3, which is quite low. The GED indicates that the attacks may have the same threat actor as their source. Next, the maximum common subgraph (MCS) method was applied. Simply removing the node G with its two edges, all remaining vertices and edges are a common subgraph. As was discussed previously, the MCS problem can frequently take substantially longer to solve. However, the small size of the graph of infected nodes makes it more tractable in this experimental study. In this case the MCS was a rather large portion of the two graphs G1 and G2. That MCS is shown in Fig. 14.

Fig. 14. Maximum common subgraph of virus experiments

574

C. Easttom

The next mechanism used to test the similarity of the two virus outbreaks is the partial isomorphism described previously in this paper. In this experimental study the incidence function is sending the virus via email. And in this experiment both graphs have the same incidence functions for all edges. The calculation is reduced to what is shown in Eq. 14. Id =

(7) +(12) 8 15

+

(12) 15

3

(14)

The degree of isomorphism is 0.825 or 82.5%. This is a high value. The degree of isomorphism indicates that the two attacks may be attributable to the same threat actor. The maximum common subgraph and the graph edit distance also support this conclusion. Having three separate mathematical measures that all indicate a high degree of similarity in the two attacks makes the probability of the same threat actor being the cause of both attacks, substantial. Given the parameters of this controlled experiment, the same threat actor was indeed responsible for both attacks. It can also be useful to compare the two graphs’ spectra to determine if they are cospectral mates. The two spectra are shown in Table 8. Table 8. Comparing graph spectra First experiment

Second experiment

0

0

−1.7466

−1.6816

−1

−1

−1

−1

−1

−0.1947

0

0.6716

0.5451

1

4.2015

4.5479

From Table 8 it is clear that the two spectra are not identical. However, they are quite close. 3 of the eigenvalues are identical and 3 others are very close. This again supports the conclusion that there is a connection between these graphs. While the two modeling experiments to indicate the methodology is sound, it is important to test the negative case. That means checking the case where a different virus is introduced via a different machine, and the virus spread is also analyzed with graph theory. For this experiment a virus made with the Black Host Virus Maker tool was used. That tool is shown in Fig. 15. This tool can be downloaded for free from http://www.blackhost.xyz. It was set to a) place a popup window stating “infected” then open explorer and attempt to copy itself to shared folders on the network using a batch file. The file was placed in the C:\shared folder of the B node, a Windows 10 virtual machine.

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

575

Fig. 15. Black host virus maker

As this experiment is meant as a validation step for the previous two experiments, only the final outcome at T = 15 is analyzed. The infection at T = 15 is shown in Table 9. Table 9. Experiment 3 final infection Node

T = 15

A B

Infected

C D

Infected

E F G H

Infected

576

C. Easttom

The subgraph containing only infected vertices is shown in Fig. 16.

Fig. 16. Experiment 3 subgraph

The adjacency matrix for the entire graph, showing only the infected nodes is given in Fig. 17.

Fig. 17. Adjacency matrix for virus experiment 3

The eigenvalues for the final time, T = 15 are given in Table 10. Table 10. Eigenvalues for virus experiment 3 T = 15 A

1.0000

B

−1.0000

C

0.0000

D

0.0000

E

0.0000

F

0.0000

G

0.0000

H

0.0000

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

577

Given the small number of infected vertices, the graph spectrum is not as informative as it was in the earlier instances. However, the main focus of this third experiment is to compare the graphs. Therefore, the graph in experiment 3 was compared with the graph in experiment 2. Again, the first method applied was graph edit distance (GED). 5 vertices and 13 edges must be added to the graph of experiment 3 in order for it to be isomorphic to the graph of experiment 2. Conversely, one could remove 5 vertices and 13 edges from the graph of experiment 2 to make it isomorphic with the graph of experiment 3. In either case, this results in a GED = 18. The GED between the graphs of experiment 1 and 2 was 3. Next the percentage of isomorphism calculation was applied. The values for this formula are shown in Eq. 15. Id =

(3) +(2) 8 15

+

(0) 15

(15)

3

Note that there were no incidence functions in common. Experiments 1 and 2 are spread via email. In experiment 3 the virus spread via copying to network drives. The Id value is 0.1694, or 16.94%. This value is quite low. Given the parameters of this controlled experiment it is known that experiment 3 used a different virus, which was introduced at a different vertex, and spread via a different method. And two methods of fingerprinting demonstrated this virus breach was substantially different than the first two. The study of similarity can be extended to the spectra of the three graphs. This is shown in Table 11. Table 11. Comparing graph spectra Experiment 1

Experiment 2

Experiment 3

0

0

−1.7466

−1.6816

−1

−1

0

−1

−1

0

−1

−0.1947

0

0

0.6716

0

0.5451

1

0

4.2015

4.5479

0

1 −1

It is clear from examining the graph spectra, that while experiments 1 and 2 have substantial similarities in their spectra, experiment 3 is quite divergent from the other two. Thus, graph spectral analysis further confirms that the two first experiments have a strong correlation, while the third is quite dissimilar.

578

C. Easttom

3.7 Virus Experiment Summary The three virus modeling expiriments described in this current paper serve to validate the methodology presented. Furthermore, sub parts of the current methodology have been previously subjected to peer review [3, 4, 34–36]. Both of these facts demonstrate that the methodology is sound and can be utilized effectively in digital forensics investigations.

4 Conclusion The current study is the culmination of several years of research. Various preliminary phases of this methodology have been published in several previous papers [4, 5, 25, 34–36]. The current paper integrates the previous nascent elements and adds additional graph theory techniques. The current methodology described in this paper provides an effective tool for modeling digital forensics investigations and for providing fingerprints of network intrusions. Such fingerprints can be compared to other intrusion to aid in attack attribution. While the current study has focused on digital forensics, there is no reason that this methodology cannot be applied to any other type of forensic analysis. Graph theory itself has been previously applied to a diverse set of modeling needs. Applying this current technique to fields as diverse as modeling dynamics in social groups or modeling victim selection in serial crimes are certainly possibilities.

References 1. Bondy, J.A., Murty, U.S.R.: Graph Theory with Applications, vol. 290. Macmillan, London (1976) 2. Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Courier Dover Publications (2017) 3. Easttom, C.: On the application of algebraic graph theory to modeling network intrusions. In: IEEE 10th Annual Computing and Communication Conference, pp. 0424–0430 (2020) 4. Easttom, C., Adda, M.: The creation of network intrusion fingerprints by graph (2020) 5. Easttom, C. Adda, M.: The creation of network intrusion fingerprints by graph homomorphism. WSEAS Trans. Inform. Sci. Appl. 17. https://doi.org/10.37394/23209.2020. 17.15 6. Goldreich, O.: Flexible models for testing graph properties. Comput. Comp./Prop. Test. 352– 362 (2020) 7. Godsil, C., Royle, G.F.: Algebraic graph theory. Springer Science & Business Media, New York (2013) 8. Gross, J., Yellen, J., Zhang, P.: Handbook of Graph Theory. CRC Press, New York (2013) 9. Han, L., Liu, G., Yang, X., Han, B.: A computational synthesis approach of mechanical conceptual design based on graph theory and polynomial operation. Chin. J. Mech. Eng. 33(1), 2 (2020) 10. Hartsfield, N., Ringel, G.: Pearls in Graph Theory: A Comprehensive Introduction. Courier Corporation (2013) 11. Hoffmann, R., McCreesh, C., Reilly, C.: Between subgraph isomorphism and maximum common subgraph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)

A Graph Theoretical Methodology for Network Intrusion Fingerprinting

579

12. Knauer, U., Knauer, K.: Algebraic Graph Theory: Morphisms, Monoids and Matrices. Walter de Gruyter Press, Berlin (2019) 13. Kulkarni, S.J.: Graph theory: applications to chemical engineering and chemistry. Galore Int. J. Appl. Sci. Human. 1(2), 17–20 (2017) 14. Marzuki, C.C.: Total irregularity strength of m-copies of rhombus graph. J. Phys.: Conf. Ser. 1116(2), 022023 (2018) 15. Palmer, I., Gelfand, B., Campbell, R.: Exploring digital evidence with graph theory. In: 2017 ADFSL Conference on Digital Forensics, Security, and Law (2017) 16. Qiao, Z., Koolen, J.H., Markowsky, G.: On the Cheeger constant for distance-regular graphs. J. Combinat. Theory, Ser. A 173, 105227 (2020) 17. Samsi, S., et al.: Static graph challenge: subgraph isomorphism. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6, September 2017 18. Sporns, O.: Graph theory methods: applications in brain networks. Dial. Clin. Neurosci. 20(2), 111 (2018) 19. Wang, W.: A graph-oriented approach for network forensic analysis. Dissertation, Iowa State (2010). https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=2722&context=etd. Acccessed 4 Aug 2021 20. Takahashi, D., Xiao, Y, Meng, K.: Creating user-relationship-graph in use of flow-net and log files for computer and network accountability and forensics. In: 2010-MILCOM 2010 Military Communications Conference (2010) 21. Barrère, M., Steiner, R.V., Mohsen, R., Lupu, E.C.: Tracking the bad guys: an efficient forensic methodology to trace multi-step attacks using core attack graphs. In: 2017 13th International Conference on Network and Service Management (CNSM), pp. 1–7 (2017) 22. Milling, C., Caramanis, C., Mannor, S., Shakkottai, S.: Network forensics: random infection vs spreading epidemic. ACM SIGMETRICS Perform. Eval. Rev. 40(1), 223–234 (2012) 23. Valsesia, D., Coluccia, G., Bianchi, T., Magli, E.: Compressed fingerprint matching and camera identification via random projections. IEEE Trans. Inf. Forensics Secur. 10(7), 1472– 1485 (2015) 24. Lee, W., Cho, S., Choi, H., Kim, J.: Partial fingerprint matching using minutiae and ridge shape features for small fingerprint scanners. Exp. Syst. Appl.: Int. J. 87(C), 183–198 (2017) 25. Easttom, C.: A systematic framework for network forensics based on graph theory. University of Portsmouth (2021) 26. Thulasiraman, K., Arumugam, S., Nishizeki, T., Brandstädt, A.: Handbook of Graph Theory, Combinatorial Optimization, and Algorithms. Taylor & Francis (2016) 27. Linke, N.M., Johri, S., Figgatt, C., Landsman, K.A., Matsuura, A.Y., Monroe, C.: Measuring the Rényi entropy of a two-site Fermi-Hubbard model on a trapped ion quantum computer. Phys. Rev. 98(5), 052334 (2018) 28. Hayashi, M. (2017). Quantum Information Theory. Graduate Texts in Physics, no. 2. Springer Press, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49725-8 29. Mishra, S., Ayyub, B.M.: Shannon entropy for quantifying uncertainty and risk in economic disparity. Risk Anal. 39(10), 2160–2181 (2019) 30. Dehmer, M., Emmert-Streib, F. (eds.): Quantitative Graph Theory: Mathematical Foundations and Applications. CRC Press, New York (2014) 31. De Meo, P., Messina, F., Rosaci, D., Sarné, G.M., Vasilakos, A.V.: Estimating graph robustness through the Randic index. IEEE Trans. Cybern. 48(11), 3232–3242 (2017) 32. Kim, S.J., Ozeki, K.: A note on a Brooks’ type theorem for DP-coloring. J. Graph Theory 91(2), 148–161 (2019) 33. Parimala, M., Broumi, S., Prakash, K., Topal, S.: Bellman-Ford algorithm for solving shortest path problem of a network under picture fuzzy environment. Comp. Intell. Syst. 7(5), 2373– 2381 (2021)

580

C. Easttom

34. Easttom, C.: A systems approach to indicators of compromise utilizing graph theory. In: 2018 IEEE International Symposium on Technologies for Homeland Security (2018) 35. Easttom, C.: How to model digital forensics investigations with graph theory. Digit. Forensics Mag. 37 (2018) 36. Easttom, C.: On the application of algebraic graph theory to modeling network intrusions. 2020 IEEE 10th Annual Computing and Communication Conference (2020)

Using Memory Forensics to Investigate Security and Privacy of Facebook Ahmad Ghafarian(B) and Deniz Keskin University of North Georgia, Dahlonega, GA 30597, USA {ahmad.ghafarian,DKESK9340}@ung.edu

Abstract. Facebook has changed the way people leave and communicate with each other. Remnants of Facebook activities and personal information are serious privacy and security concerns. The Facebook usage information includes traces of criminal activities such as child abuse, cyber bullying, political-related posts, and other offensive activities. The personal information includes usernames, passwords, email address, telephone number, etc. In this paper, we use memory forensics to extract the data remnants from the use of Facebook via a web browser on Windows operating systems machine. The results of our experiment demonstrate that we can acquire both personal data as well as Facebook usage activities data from the memory of the device. In addition, we show that memory forensics helps the forensics investigators to identify the offenders and report the actions to the law enforcement officials. For this experiment, we use various memory forensics acquisition and analysis software. For the acquisition part, we use a free software called Magnet Ram capture. This is an easy to use and captures the memory of the device. For the analysis part, we Bulk Extractor, which is also a free software. This tool has several scanners each scanner analyzes specific Facebook activity such email extraction, fore example. To carry out the experiment, we created several anonymous Facebook accounts to hide the identity of the researchers. Keywords: Volatile · Security · Forensics · Facebook · Memory forensics

1 Introduction Facebook is an online social media networking service that is now part of the company Meta Platforms. As of 2021 Facebook has about three billion users around the world, which makes it the most popular platform, Umair et al. [1]. Users can create profiles, upload photos, join a group, and start new groups. Facebook has also many components including Timeline, Status, and New Feed, which are part of the user’s profile. In addition, users can communicate via Messenger; approve contents by using the like button; and let a user’s friends post information on their Wall. Moreover, the tags feature allows people to identify themselves and others in images that could be seen by other, Facebook Friends. Due to the significant amount of personal information that are being exchanged, the privacy and security of Facebook users remains an ongoing problem. To address this issue, Facebook implemented privacy controls in which users could control what content appeared in news feed. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 581–601, 2022. https://doi.org/10.1007/978-3-031-10467-1_35

582

A. Ghafarian and D. Keskin

The privacy and security of online social media networking have also been studied through the computer forensics process, and in particular memory forensics. Recent study shows that memory forensics is the most effective approach to computer forensics. The important aspect of memory forensics includes tool selection, accuracy, and automation of memory analysis. Mobile forensics of Facebook is even more complicated as there are significant variations of hardware and software. First, mobile devices have several different types of memory like non-volatile flash memory, very fast yet volatile random-access memory, or read-only memory that contains the basic operating system of the device. The second challenge is that mobile phone manufacturers often include security features in their operating systems that prevent access to common memory forensics tools unless the phone is rooted to allow access to memory capture tools, Yusoff et al. [2]. Part of Facebook’s popularity can be attributed to its availability in multiple platforms such as Apple and Android devices in the form of a mobile app. Facebook can also be accessed by any web browser regardless of its operating systems as it is designed to work on any platform that has access to the internet. Memory forensics of Facebook on each of these platforms has its own challenges. First, the memory organizations vary from one operating system to another one. Second, the existing memory forensics tools are designed for a specific operating system and thus special care is needed to select the right tool. For example, after some initial evaluation, we found that Fireeye’s Memoryze [3], which is a memory analysis software does not work on Windows 10, 64-bit. Third, the tool must be admissible to the court of law and have no error rate This paper evaluates several memory forensics software and selects the most reliable open sources software. Subsequently, we use the selected tools to experimentally perform memory forensics to evaluate the security and privacy of Facebook users. Our approach is unique and to the best of our knowledge has not been implemented previously. Organization of the paper is described below. Background is presented in Sect. 2. In Sect. 3, the platform for the experiment and the selected software are presented. Section 4 discusses the details of the experiment. The conclusion is presented in Sect. 5 followed by the future direction of the research is Sect. 6.

2 Background From computer forensics perspective, data mining on social media networking sites such as Facebook could provide useful results. The use of data mining to study the security and privacy of Facebook has been addressed by researchers from different perspective. In one study, Ali et al. [4] suggest that the social media companies usually collect user’s data, which may be misused. In another theoretical study, Krishnamurthy [5] suggests that the social media companies have the most knowledge about user’s activity data. Therefore, they should implement mechanisms to secure the data. On the other hand, user themselves should be careful about the amount and the type of information they share in their online activities, Buccafurri et al. [6]. For example, Fire et al. [7], presents a software that can help the user to protect their privacy is provide, while they are online. In general, data stored in the RAM (random access memory) is only as secure as the physical device itself. From a digital forensics perspective, information mining on social media networking sites such as Facebook could provide useful results. Limited results

Using Memory Forensics to Investigate Security and Privacy of Facebook

583

from memory forensics of social media networking are reported by, Majeed and Saleem [8]. Singh and Sharman [9] have reported the results of their unique approach to memory forensics of social media networking. In their work, the authors performed hibernation file forensics and memory forensics on social media networking. Subsequently, they compared the results from both experiments. They conclude that if the results are the same, it verifies the accuracy of memory forensics. In another experiment, Yang et al. [10] reported their successful experiment of Windows Instant Messaging in retrieving Facebook user’s activities from memory. They suggested that memory forensics of social media networking is very useful in the study of privacy and security of social media networking as well as in any other forensics investigation. Since software that are used to carry memory forensics play an important role in forensics investigation, study of the tools is also relevant to this research. Bachler [11] evaluated the two class of tools that currently exists for memory forensics, i.e., open sources and proprietary. They suggest that open-source software have an advantage over the proprietary because any changes to the tool is immediately visible to the users. In contrast, the company who hold the license for proprietary tools keep any changes secret. Memory forensics of mobile devices are even more complicated. This is because there are various types of mobile devices, and each mobile device has different type of RAM. Moreover, some mobile service providers do not allow access to their devices unless they are rooted, Venkateswara and Chakravarthy [12]. Another problem with the mobile device forensics is that they are not interoperable. Junaid et al. [13] reported on a tool they created to address this problem. However, we are not aware of any popular use of their tool. Some reports the application of memory forensics can also be found in Chang [14] and Chang and Yen [15]. In our study, we use validated memory acquisition software tools as well as techniques to retrieve remnants of users’ metadata on Facebook accounts. In this process, we will investigate various methods of data analysis using binary data captured by a memory capture tool as well as extracting bulk data sets to create a clear chain of events and information. Based on artifacts we were able to extract from the device, we made some recommendations on how users can mitigate exposure of their data and ways users should use social networking for the safety of information of privacy.

3 Scope and Methodology The goal of this experiment was to find out what kind of Facebook activity artifacts can be captured from volatile memory of a target machine. To prepare for the experiment, the following actions were taken. • • • • •

Define the platform. Select the tools for the experiment. Identify the target data. Define the scenarios. Layout the experiment methodology

584

A. Ghafarian and D. Keskin

3.1 Platform Our experiment has been carried out on virtual machines. We have installed VMware Workstation player 15 [16] on a laptop, which has been specially used for this research. Then we created three Windows 10 (version 10.0.17134) virtual machines and networked them together. Facebook was accessed via Firefox 70.0.0 web browser. 3.2 Tools Tool selection is very important in memory forensics. RAM forensics tools are divided into two categories, i.e., “memory capture” and “memory analysis” tools. After careful evaluation of several software in both categories, we decided to use the Magnet RAM capture [17] for memory acquisition and Bulk Extractor 1.6.0. [18] for memory analysis, for our experiment. Figure 1 shows the user interface of Bulk Extractor when it is initially launched.

Fig. 1. Graphical user interface of bulk extractor.

3.3 Target Data The target data for the experiment is defined as the data we will be looking for during the forensics analysis of Facebook. The target data is listed in Table 1.

Using Memory Forensics to Investigate Security and Privacy of Facebook

585

Table 1. Target data Data number

Facebook accounts data

1

Name

2

Password

3

Email address

4

Phone number

5

Uploaded photos

6

Facebook likes

7

Reactions

8

Post comments

9

Set location

10

Friends

11

Personal messages

12

URL of the uploading photos

13

Events

3.4 Scenario To carry out the experiment, we created two anonymous Facebook accounts (hereafter called testers) one on each of the virtual machines. To perform Facebook activities in structured manner, we needed to create a scenario as it is described in Table 2. Table 2. Facebook scenario. Name

Action

Feed

User 1 should post this feed on his wall “I am trying to FIND you”. User 2 should reply “glad to see you FIND”

Comment

User 2 should post a feed on his wall “the weather is cold here”. User 1 then should make the comment “yes, it is”

Message

User 1 send a private message to User 2 titled “meetings” with body text “I have been looking for you everywhere”

Chat

User 1 posted the chat “where is the best swimming pool?”. User 2 responds “in 34th street, 1st floor”

Friend Search

User 1 looks for his friend, User 2

Events

User 1 sets an event called “Birthday Party

Photos

User 1 post a photo in his profile and User 2 likes it

Location

User 1 creates a post using Facebook’s check-in feature

586

A. Ghafarian and D. Keskin

3.5 Methodology In addition to the Magnet RAM capture and Bulk Extractor, we also used the following tools. • • • •

Firefox Web browser HxD Hex Editor (v2.3.0.0) Bulk Extractor (1.6.0-Dev) Foremost

Each scanner of the Bulk Extractor produces a text file, e.g., email.txt, ccn.txt, pii.txt, etc. After running the relevant scanners, we searched the scanned files with Notepad + + text editor. For binary search, the logic was to create a forensics profile for user1 using known information. To accomplish this task common strings were used to gather forensic data. For these tasks, the phone number was used to start the search for information. This string search resulted in the user’s password and every instance of user phone number stored in a string search. From here we were able to obtain every detail regarding suspect’s Facebook accounts.

4 Experiment Results We performed two sets of experiment. one with the default Facebook privacy settings and the second one with applying Facebook privacy settings. Table 3 shows the Bulk Extractor scanners we used. Table 3. Bulk extractor scanners we applied. Nu

Scanner

Featured files

Extracted data

1

Accts

PII.txt

Telephone numbers and extra data

2

GPS

GPS.txt

Geographic location and extra data

3

Email

Email.txt

Email addresses

4

Domain

Domain.txt

Domain related information related

5

Jpeg

Jpeg_carved.txt

All Jpg images

6

JSON

JSON.txt

JSON format of the downloaded files

7

Exif

Jpeg.txt

Extracts JPEG files

Using Memory Forensics to Investigate Security and Privacy of Facebook

587

4.1 First Set of Experiment In the first set of experiment, we used Firefox web browser to login to Facebook accounts with the default privacy and security settings. Subsequently, we performed the scenario that was described above. Immediately, we captured RAM and used Bulk Extractor to analyze the captured memory file. The results of the analysis are discussed blow. 4.1.1 User’s Personal Information We ran the Bulk Extractor, used the captured memory file as input, and executed the scanners one at a time. The results were a set of featured files as shown in Table 3. The contents of the featured files are text files. We searched these text files using the target data of Table 1 as search key. We were able to retrieve users’ personal information as shown in Figs. 2, 3, 4, and 5.

Fig. 2. User 1’s username.

588

A. Ghafarian and D. Keskin

Fig. 3. User 2’s username.

Fig. 4. The Email address of user 1 that was retrieved from Email.txt

Using Memory Forensics to Investigate Security and Privacy of Facebook

589

Fig. 5. Phone number of user 2.

4.1.2 Facebook Activity Data In this part, we searched the same captured RAM, but we were specifically looking for Facebook activities information. Figures 6, 7, 8, 9, 10, 11, 12, and 13 show the retrieved information.

Fig. 6. User2’s personal feed post. (Feed & Comment)

590

A. Ghafarian and D. Keskin

Fig. 7. User1’s comment on user2’s post (Comment)

Fig. 8. Location of the user 1 extracted from the binary editor.

Fig. 9. The friend list of the user 1 extracted using bulk extractor (url_facebook-address.txt).

Using Memory Forensics to Investigate Security and Privacy of Facebook

591

Fig. 10. The personal messages between user1 and user2 obtained from JSON.txt.

Fig. 11. The personal messages between user1 and user2 obtained from bulk extractor (JSON.txt).

592

A. Ghafarian and D. Keskin

Fig. 12. The URL of user 2’s profile picture.

Fig. 13. Forensic artifacts of user1 creating an event called “Birthday Party.”

Summary of the first set of experiment is shown in Table 4. Table 4. Summary of Facebook RAM Forensic Evidence

Obtained data

Username

✓

Password

✓

Usernames

✓

Email address

✓

Phone numbers

X

Photos

✓

Likes

✓

Post comments

✓

Location

✓

Personal messages

✓

URL data

✓

Events

✓

Using Memory Forensics to Investigate Security and Privacy of Facebook

593

4.2 Second Set of Experiments For this set of experiments, we first defined a new scenario as shown in Table 5. Table 5. The second set of experiment scenarios Name

Action

Feed

User 1 should post this feed on his wall “I am trying to FIND you”. User 2 should reply “glad to see you FIND”

Comment

User 2 should post a feed on his wall “the weather is cold here”. User 1 then should make the comment “yes, it is”

Message

User 1 send a private message to User 2 titled “meetings” with body text “I have been looking for you everywhere”

Chat

User 1 posted the chat “where is the best swimming pool?”. User 2 responds “in 34th street, 1st floor”

Friend Search

User 1 looks for his friend, User 2

Events

User 1 sets an event called “Birthday Party

Photos

User 1 post a photo in his profile and User 2 likes it

Location

User 1 creates a post using Facebook’s check-in feature

Subsequently, we applied the Facebook’s privacy settings as it is recommended by Facebook operator [19]. Then, we logged into the Facebook, performed the scenario described in Table 5. Following that we performed the same memory forensics process that we did in the first set of experiment. We were able to obtain the identical results that we obtained in the first set of experiment. This means that even with the application of Facebook privacy settings, the users’ personal information as well as Facebook activity data were retrieved. In summary, from the memory forensics perspectives, Facebook privacy settings has no effect. 4.3 Third Set of Experiment For this set of experiment, we repeated the Facebook volatile memory forensic with Firefox private browser. We used the same scenario that is outlined in Table 5 above. 4.4 User Personal Information Figures 14, 15, 16, and 17 show the retrieved user’s personal data.

594

A. Ghafarian and D. Keskin

Fig. 14. User 1’s usernames.

Fig. 15. User 2’s username.

Using Memory Forensics to Investigate Security and Privacy of Facebook

Fig. 16. User 2’s password

Fig. 17. User 1’s Email address

4.5 Facebook Activity Information Figures 18, 19, 20, 21, 22, 23, and 24 show the retrieved Facebook activities.

595

596

A. Ghafarian and D. Keskin

Fig. 18. User2’s personal feed post

Fig. 19. User1’s comment on user2’s post

Table 6 shows comparison of using Facebook with Firefox browser in regular mode vs private mode.

Using Memory Forensics to Investigate Security and Privacy of Facebook

Fig. 20. Friend list of the user 1

Fig. 21. User 1’s feed for user 2.

597

598

A. Ghafarian and D. Keskin

Fig. 22. User 2’s response feed to user 1.

Fig. 23. The URL of user 2’s profile picture.

Using Memory Forensics to Investigate Security and Privacy of Facebook

599

Fig. 24. User1 creating an event called “Birthday Party.”

Table 6. Comparison of using private browser vs regular browser Evidence

Firefox regular browser Firefox private browser

Username

✓

✓

Password

✓

✓

Real usernames

✓

✓

Email address

✓

✓

Phone numbers

X

X

Photos

✓

✓

Likes

✓

✓

Post comments

✓

✓

Location

✓

✓

Personal messages ✓

✓

URL data

✓

✓

Events

✓

✓

✓ Means we retrieved the information. X Means we did not retrieve the information.

5 Conclusions We used memory forensics to evaluate the privacy and security of using Facebook via a web browser on a laptop machine running Windows 10. The three experiments that we performed are explained below. The first memory forensics experiment was performed with the default privacy settings of Facebook. The results demonstrate that the user’s usernames, password, name, personal messages, and URL of the user photos are all retrievable. Additionally, user phone numbers, Facebook users’ friends’ list, user comments, location details, events created, and emails of the friends are all retrievable. The second memory forensics experiment were performed with the Facebook privacy settings being turned on. The results

600

A. Ghafarian and D. Keskin

were identical the first experiment. That means we were able to retrieve the same information as we did in the first set of experiment. The third memory forensics experiment, we used Firefox private browsing windows, turned on the Facebook privacy settings and repeated the memory forensics experiment. Even in this case, through memory forensics we retrieved all the information that we retrieved in the first and the second experiment. In summary, Facebook privacy settings and use of private browsing do not have any effect on the User’s privacy and security. This means that using memory forensics with proper tools and technology we can retrieve all the Facebook personal information as well as Facebook activity information. The justification for this is browsing in Firefox Private mode will not change the outcome of findings due to the information we collect will still reside in RAM until the browser is closed. The reason for this is because Firefox’s private browsing keeps browser history private and deletes the temporary cookies after each session, it does not protect user information once it is loaded into volatile memory until the browser is closed.

6 Future Research Some of the areas for future research include, using a different we browser to accessing Facebook, do memory forensics when using mobile devices to access Facebook, repeat the memory forensics experiment for other social media network such as WhatsApp and Instagram, extend the Facebook activities to include lists, moments, promotion mode, etc. and do a root causes analysis of the observed results.

References 1. Umair, A., Nanda, P., He, X.: Online social network information forensics: a survey on use of various tools and determining how cautious Facebook users are? In: Trustcom/BigDataSE/ICESS.2017.364, pp. 1139–1144 (2017) 2. Yusoff, M.N., Dehghantanha, A., Mahmod, R.: Forensic investigation of social media and instant messaging services in firefox OS: Facebook, Twitter, Google+, Telegram, OpenWapp, and line as case studies. In book: Contemporary Digital Forensic Investigations of Cloud and Mobile Applications, pp. 41–62. Elsevier, (2016) 3. Memoryze: memory analysis tool from Fireeye: Redline. https://www.fireeye.com/services/ freeware/redline.html 4. Ali, S., Islam, N., Rauf, A., Din, I.U., Guizani, M., Rodrigues, J.P.C.: Privacy and Security Issues in Online Social Networks. Fut. Internet 10(12), 1–12 (2018) 5. Krishnamurthy, B.: Privacy and social media networks: can colorless green ideas sleep furiously. http://research.microsoft.com/pubs/64346/dwork.pdf 6. Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A.: Comparing Twitter and Facebook user behavior: privacy and other aspects. Comput. Hum. Behav. 52, 87–95 (2015) 7. Fire, M., Kagan, D., Elishar, A., Elovici, Y.: Social privacy protector-protecting users’ privacy in social networks (2012) 8. Majeed, A., Saleem, S.: Forensics analysis of social media apps in windows 10. NUST J. Eng. Sci. 10(1), 37–45 (2017) 9. Singh, A., Sharma, P., Sharma, S.: A novel memory forensics technique for windows 10. J. Netw. Inform. Sec. 4(2), 1–10 (2016)

Using Memory Forensics to Investigate Security and Privacy of Facebook

601

10. Yang, T.Y., Dehghantanha, A., Choo, K.K., Muda, Z.: Windows instant messaging app forensics: Facebook and skype as case studies. PLoS One 11(3), 1–29 (2016) 11. Bachler, M.: An analysis of smartphones using open source tools versus the proprietary tool cellebrite UFED touch. https://www.marshall.edu/forensics/files/BACHLER_MARCIE_Res earch-Paper_Aug-5.pdf 12. Venkateswara, R.V., Chakravarthy, A.S.N.: Survey on android forensic tools and methodologies. Int. J. Comp. Appl. 154(8) (2016) 13. Junaid, M., Tewari, J.P., Kumar, R., Vaish, A.: Proposed methodology for smart phone forensic tool. Asian J. Comp. Sci. Technol. 4(2), 1–5 (2015) 14. Chang, M.S.: Digital forensic investigation of Facebook on windows 10. Int. J. Innov. Sci. Eng. Technol. 3(9), 1–7 (2016) 15. Chang, M.S., Yen, C.P.: Facebook social network forensics on windows 10. Int. J. Innov. Sci. Eng. Technol. 3(9), 55–60 (2016) 16. VMware Workstation Player 15. https://www.vmware.com 17. Magnet RAM Capture: Acquiring Memory with Magnet RAM Capture. https://www.magnet forensics.com/blog/acquiring-memory-with-magnet-ram-capture/ 18. Garfinkel, S.L.: Digital media triage with bulk data analysis and bulk extractor. Comp. Sec. 32, 56–72 (2013) 19. Facebook privacy settings. https://www.experian.com/blogs/ask-experian/how-to-manageyour-privacy-settings-on-social-media

Hash Based Encryption Schemes Using Physically Unclonable Functions Dina Ghanai Miandaob(B) , Duane Booher, Bertrand Cambou, and Sareh Assiri Northern Arizona University, Flagstaﬀ, AZ 86011, USA {dg856,dinaghanai,duane.booher,Bertrand.Cambou,sa2363}@nau.edu

Abstract. In recent years public key cryptography has been revolutionary developed in cryptosystems. Algorithms such as RSA, ECC and DSA are commonly used as public key cryptography in a variety of applications. The drawback of these protocols is vulnerability against quantum computing. These protocols rely on the computational hardness of ﬁnding factorization but Peter Shor has proved that quantum computing can break all the public key cryptosystems that are relying on this factorization. Hash-Based Cryptography has been subject of debates for past few decades since it is known to be resistant to quantum computer attacks and there have been some advancements over this ﬁeld. But all the available methods are concentrated on the digital signature schemes. None of them have the ability to encrypt and decrypt a plain text. This paper will describe some methods that can be used as encryption and decryption of plain text using hash functions. Keywords: Hash functions · Cryptography · Public key cryptography · Physically unclonable functions

1

Introduction

Hash based cryptography schemes are mainly restricted to digital signatures. Which means it can be only used to sign a document and verify it. Digital signatures are important part of secure systems. It is used to authenticate IoT devices, blockchains, ﬁnancial transactions, etc. [16,17]. The security of the hash based digital signature are solely relying on the security of the underlying hash function. Hence, if a hash function that is used in any scheme becomes insecure, it can be replaced by a secure alternative hash function. However, there are no known encryption and decryption methods that uses hash based cryptography. In this paper we would like to introduce various methods that combines diﬀerent features of hash based digital signature methods to encrypt and decrypt a plain text. Most of the hash based cryptography methods needs to exchange a very long public key thorough unprotected channel which is one of the downsides of those protocols because cryptographic key management process is a complex procedure but using Ternary addressable public key infrastructure we can eliminate the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 602–616, 2022. https://doi.org/10.1007/978-3-031-10467-1_36

Hash Based Cryptography

603

need to share those public key and we can keep them private information [15,24]. We used Physically Unclonable Functions (PUFs) to generate both private and hashed and hashed keys. Using KVL protocol we can generate thousands of keys from PUFs which is useful for the hash based encryption methods [5]. This article is structured into 5 sections. Section 2 represents preliminaries which describes properties that are required for a cryptographic hash functions. Then it describes various digital signature methods such as Lamport, Winternitz and HORS. Also, later in this section we elucidate the description of PUFs and how it is used in TA-PKI to generate cryptographic keys. Section 3 represents our proposed methodology for generic encrypting methods. In the ﬁrst part we introduce multiple hashing method which is inspired by Winternitz digital signature for encryption. In the second part we introduce multiple hashing with random ordering which combines Winternitz and HORS methodology. The ﬁnal part introduces the combination of Lamport, HORS and Winternitz to perform encryption and decryption on a plain text. Section 4 represents the results of implementing those three models. We provide descriptions and graphs to compare three models. Section 5 represents the conclusion and future work.

2

Preliminary

In this section we explain some of the tools that we used in our method such as cryptographic hash functions, Physically Unclonable Functions and Ternary Addressable Publick Key Infrastructure. Furthermore, we explain three of the hash based digital signatures that we used in our protocols. 2.1

Cryptographic Hash Functions

Hash functions are extremely useful and appear in almost all information security applications. A hash function is a mathematical function that converts a numerical input value into another compressed numerical value. The input to the hash function is of arbitrary length but the output is always of ﬁxed length. Values returned by a hash function are called message digest or simply hash values. One of the features of hash functions is that they have ﬁxed length output regardless of the length of the input. Also, hash functions must be computationally eﬃcient [26]. In order to be an eﬀective cryptographic tool, the hash function should possess the following properties: – Pre-Image Resistance The output of a cryptographic hash function must not reveal any information about the input. That means, if a hash function h produced a message digest m, then it should be diﬃcult to ﬁnd any input value x that hashes to m. This property protects against an attacker who only has a hash value and is trying to ﬁnd the input.

604

D. G. Miandaob et al.

– Second Pre-Image Resistance This property means given input and its hash, it should be hard to ﬁnd a diﬀerent input with the same output from the hash. This property of hash function protects against an attacker who has an input value and its hash and wants to substitute diﬀerent value as the legitimate value in place of the original input value. – Collision Resistance This means that it must be extremely unlikely and practically impossible to ﬁnd two diﬀerent inputs that produce the same output. In other words, for a hash function h, it is hard to ﬁnd any two diﬀerent inputs x and y such that h(x) = h(y). This property makes it diﬃcult for an attacker to ﬁnd two input values with the same hash. Also, if a hash function is collision-resistant then it is second pre-image resistant. 2.2

Lamport Digital Signature

The digital signature scheme’s history goes back to 1970 when lamport ﬁrst introduced a digital signature scheme based on a one-way function known as the hash function [1,10,18]. We discussed the concept of pre-image resistance properties of hash functions in previous section. The Lamport digital signatures especially rely on this feature in cryptographic hash functions. The protocol works as follows. The signer ﬁrst randomly generates a stream of n pairs of a random number, each number being n bits in size. A total of 2 ∗ n ∗ n. This will form the private key, which should be kept in a secure space. To sign a message, the signer hashes the message to an n bit hash function, and then for each bit in the message digest, she picks the corresponding number from her private key. For example, if the ﬁrst bit in the message digest is “0”, she chooses the ﬁrst number in the ﬁrst pair, while if it was “1” she chooses the second number in the ﬁrst pair. The second bit in the message digest deﬁnes which number to choose from in the second pair of the private key. And so on. The n ∗ n bits of data are the signers signature. The public key is generated by hashing all the private key pairs. The public key, along with the signature, is sent to the veriﬁer. The veriﬁer ﬁrst hashes the message to get 256 bits of the message digest. Similar to how the signer chose its private key pairs, the veriﬁer chooses each public key pair based on each message digest. This will give him 256 hashes. Then he takes the signature and hashes them. If all 256 hashes he chose from the public key matches the 256 hashed signature, the signature is valid. Even a one-bit mismatch will reject the signature (Table 1).

Hash Based Cryptography

605

Table 1. Lamport digital signature Key Generation Input: Parameters n and m Generate two m/2 separate random n bit strings 0 sk0 = sk00 , sk10 , ..., skn−1 1 sk1 = sk01 , sk11 , ..., skn−1

Let hji = Hash(skij ) for j ∈ {0, 1} and 0 < i < n − 1 pk0 = h00 , h01 , ..., h0n−1 pk1 = h10 , h11 , ..., h1n−1 Output: P K = (pk0 , pk1 ) and SK = (sk0 , sk1 ) Signing Input: Message M and secret key SK = (sk0 , sk1 ) Let h = Hash(M ) split h into n bits b0 , b1 , ..., bn−1 If bi = 0, δi = ski0 If bi = 1, δi = ski1 Output: δ = (δ0 , δ1 , ..., δn−1 ) Verifying Input: Message M , signature δ = (δ0 , δ1 , ..., δn−1 ) and public key P K = (pk0 , pk1 ) Let h = Hash(M ) split h into n bits b0 , b1 , ..., bn−1 If bi = 0, vi0 = Hash(δi ) If bi = 1, vi1 = Hash(δi ) Output: ”accept” if for each i and j,0 < i < n − 1 and j ∈ {0, 1} vi0 = h0i if bi = 0 vi1 = h0i if bi = 1 ; ”reject” otherwise

2.3

Winternitz

Winternitz uses a parameter “w” which deﬁnes the number of substrings of the message to be signed [6,11,22,23]. Since fewer section of the message is to be signed, the size of the public private key and the signature is signiﬁcantly reduced. First, it samples m random strings of each n bits as private keys. Then,

606

D. G. Miandaob et al.

it hashes each of them 256 times to calculate public keys. For signing, it hashes the message and divides message digest into m substring of each w bits and interpret each substring as an integer. Then, the corresponding secret key is hashed 256 − N where N is the integer extracted from the message digest. Finally, for veriﬁcation, the receiver takes the message, hashes it and divides the message digest into m substring of w bits and interpret each substring as an integer. Then, it takes the signature and hashes it N times where N is the number of integer extracted from the message digest. It compares all the hashed signatures with all the public keys. If it all is matched, the signature is veriﬁed. Otherwise, it’s rejected (Table 2). The downside of winternitz is that it is not time eﬃcient which means signing and veriﬁcation takes a long time. Table 2. Winternitz digital signature Key Generation Input:Parameters n, w and m = n/w Generate random n bit strings sk0 , sk1 , ..., skm−1 Let hi = Hash256 (ski ) for and 0 ≤ i < m Output: P K = (h0 , h1 , ..., hm−1 ) and SK = (sk0 , sk1 , ..., skm−1 ) Signing Input: Message M and secret key SK = (sk0 , sk1 , ..., skm−1 ) Let h = Hash(M ) Split h into m substrings b0 , b1 , ..., bm−1 of length w bits each Interpret each bi as an integer N0 , N1 , ..., Ni δi = Hash256−Ni (ski ) Output: δ = (δ0 , δ1 , ..., δm−1 ) Verification Input: Message M , signature δ = (δ0 , δ1 , ..., δm−1 ) and public key P K = (h0 , h1 , ..., hm−1 ) Let h = Hash(M ) Split h into m substrings b0 , b1 , ..., bm−1 of length w bits each Interpret each bi as an integer N0 , N1 , ..., Ni split h into n bits b0 , b1 , ..., bn−1 Output: ”accept” if for each i,0 ≤ i < m − 1,HashNi (δi ) = hi ; ”reject” otherwise

2.4

HORS Digital Signature

Unlike Lamport and Winternitz, HORS (Hash to Obtain Random Subset) is a few-time digital signature [25,27,29]. Which means randomly generated secret keys can be used few times but not many. There is a parameter r that deﬁnes how many times a subset of keys can be used to sign a message. By increasing r, the security level decreases. As the name of HORS shows, it randomly chooses which subset of secret key to use to sign a message. At the beginning, much larger number of secret keys are generated than parameter r. Let us say t is the number of secret keys generated. The Public key is calculated by hashing each secret keys. For signing, The message is hashed to provide ﬁxed length of message digest. The message digest is then split into k substrings of length log2 t

Hash Based Cryptography

607

bits. Each substring is then interpreted as an integer. These integers deﬁne the order of each secret keys that will form the signature. To verify, the receiver simply takes that message and hashes it. Then, splits the message digest into k substrings of length log2 t bits and interprets each substring as an integer. The signature is accepted if hash of each signature is equal to the corresponding order of the public key. Table 3 is a snapshot from paper [27]. Table 3. HORS digital signature [27] Key Generation Input: Parameters l, k and t Generate t random l bit strings sk0 , sk1 , ..., skt−1 Let hi = Hash(ski ) for and 0 ≤ i < t Output: P K = (k, h0 , h1 , ..., ht−1 ) and SK = (K, sk0 , sk1 , ..., skt−1 ) Signing Input: Message M and secret key SK = (k, sk0 , sk1 , ..., skt−1 ) Let h = Hash(M ) Split h into k substrings b0 , b1 , ..., bk−1 of length log2 t bits each Interpret each bi as an integer H0 , H1 , ..., Hk−1 Output: δi = (skH0 , skH1 , ..., skHk−1 ) Verifying

Input: Message M , signature δ = (δ0 , δ1 , ..., δk−1 ) and public key P K = (h0 , h1 , ..., ht−1 ) Let h = Hash(M ) Split h into k substrings b0 , b1 , ..., bk−1 of length log2 t bits each Interpret each bi as an integer H0 , H1 , ..., Hk−1

Output: ”accept” if for each i,0 ≤ i < k − 1,Hash(δi ) = hHi ; ”reject” otherwise

2.5

PUFs

Physically Unclonable Functions (PUFs) are equivalent to human biometrics for a physical device [20,21]. PUFs are hard to clone, hard to predict, diﬃcult to replicate but have a repeatable behavior. PUF targets the nanoscale device parametric variations to create unclonable measurements of physical objects. Their ability to generate and store secret information makes them a good candidate for security systems [3]. As the name of PUF shows, PUF operates as a function, not in a mathematical way but in a physical way. When PUF is queried with a certain input, it generates a unique output [19]. Since PUF measures the physical characteristic of a system and generates an output, its functionality has more engineering sense than mathematical. PUFs are easy to evaluate, which means the evaluation of the function take a short amount of time, and hard to characterize, which means by having a limited number of challenge-response pairs, an attacker can only extract an insigniﬁcant amount of information about a response to a randomly chosen challenge [7,12]. We used SRAM PUFs to generate the private keys for encryption and decryption. SRAM is a random access memory consisting of two-cross coupled inverters

608

D. G. Miandaob et al.

and two access transistors. SRAM is a volatile memory, which means data is lost when power is oﬀ [28]. SRAM PUFs were introduced by [14]. It is built upon the idea that every SRAM cell has its own preferred state at power-up; some prefer zero, some prefer one, and some have no preference. This random distribution of three types of the cell reveals a physical ﬁngerprint of that circuit. The bit cell’s power-on state is used as the PUF response, and the address of the memory array is used as a challenge. When the circuit is powered- oﬀ, both inverters are low. When the circuit is powered on, the unstable state will randomly be skewed into one of the stable states 0 or 1 due to the diﬀerence of the strength in each of the cross-coupled inverter for each SRAM bit cell. This results in each bit cell power-up to a random state. During the enrollment phase, the entire SRAM is read, and the whole table is securely stored in the server [2,4]. 2.6

Ternary Addressable Public Key Infrastructure and Keys with Variable Kength

The Ternary Addressable Public Key Infrastructure scheme (TA-PKI) enhances the crypto table with associated public and private keys. Using a Physically Unclonable Function (PUF), a public key is 1024 bits and is shared between the server and client, resulting in the generation of the same corresponding 256 bit private key for both server and client. In this use case, the client has the unique PUF device that the server has previously characterized by the server. A key principle of the TA-PKI system is the classiﬁcation of mask made up of ternary states where 1 and 0 represent the corresponding select and ignore respective bits from the PUF. An ‘X’ fuzzy data value will mark bits to exclude because of inconsistent bit states in the PUF. One of the concerns regarding the privacy of PUF might be the unique identiﬁers. When users have a PUF, which acts as a

Fig. 1. TA-PKI

Hash Based Cryptography

609

hardware ﬁngerprint and is ﬁxed, they might feel it is trackable [13]. By giving the owner of the PUF ability to control their PUFs with multiple parameters, PUF can have multiple personalities, and the problem mentioned above can be ﬁxed. These parameters include random numbers, passwords, XOR function, and cryptographic hash functions. Using the output of the hash function called message digest to navigate through PUFs helps the user generate multiple facets of the PUF. This will help to prevent an adversary from tracking the PUF. TA-PKI utilizes these parameters to generate public/private key pairs secretly [8,9,15] Fig. 1. The Keys with variable length utilize the TA-PKI public key to generate much longer private keys through an outer looping interaction process. There are several distinct steps in each internal KVL private key loop process. First, the client must present a password that is XOR with the ﬁrst 512 bits of the public key. Second, that output is then the input to a 512 bit hash digest. Third, the hash output is then turned into an array of indices in the PUF table. For each sub-key process, the second half of the public key acts as a mask to select the PUF private key bits and exclude the PUF fuzzy bits. The KVL key is accumulated by the use of up to 81 unique PUF index functions. A typical KVL private key has a length of 21 kbytes [5].

3

Method

In this section, we introduce our three methods that combines various hash based cryptography protocols to generate generic encryption methods. 3.1

Generic Encryption with Multiple Hashing

Suppose a plain text as long as 256 bits is going to be encrypted by the server. First, it will divide the plain text into a block of 8 bits (1 Bytb). Each block can be converted to a decimal value ranging from 0 to 255. Now there are 32 decimal numbers available. Server generates 32 keys(since the plain text is 32 byte) from the image of the PUF which will be denoted as Xi . On the next step, each key will be hashed [256 - Ni ] times to create ciphertext. The combination of all the hashed private keys is the ciphertext. As illustrated in Fig. 2a, the ciphertext is calculated by hashing private keys multiple times, and the plain text deﬁnes the number of hashing needed to be done. Cf = H 255−Nf (Xf ). The receiver will generate the same 32 private keys from PUFs, then she/he will hash each key 256 times to get the ﬁnal value of the hashed key. Yi = H 256 (Xi ) Next, the receiver will take each 256 blocks of the secret key and hash it. It will compare each block to the corresponding hashed key Yi = H 256 (Xi ) after

610

D. G. Miandaob et al.

each hashing. If it did not match, it would keep hashing until it ﬁnds the match (Fig. 2b). The number of times it takes to hash the cipher block until it ﬁnds the match represents the Ni . If we convert the number to binary form, we get back the plain text (Fig. 2). 3.2

Generic Encryption Combining Multiple Hashing and Random Ordering

This scheme is very similar to the previous scheme, but this time 256 private keys are generated. Instead of hashing the corresponding key for creating a ciphertext, it randomly choose diﬀerent keys to hash and create the ciphertext. This randomness is calculated as in Fig. 3a. The key located at Lf is hashed 256 - Nf times for each ciphertext. Cf = H 255−Nf (XLf )

(a) Encryption

(b) Decryption

Fig. 2. Generic encryption using multiple hashing

Hash Based Cryptography

611

(a) Encryption

(b) Decryption

Fig. 3. Generic encryption using multiple hashing and random ordering

The decryption part is slightly diﬀerent since this time, after each level of hashing, the result should be compared to all the possible hashed keys. When both value matches, the number of time hashed indicates Nf and the key location that matched the value indicates Lf . This protocol is a combination of Winternitz and HORS. Winternitz is used for multiple hashing of the key, and HORS is used for random ordering (Fig. 3). Generic Hash Based Encryption Combining Multiple Hashing, Random Ordering and Random Positions. This protocol uses two-dimensional keys which means for each key location there are 256 keys available. Xi,j i, j ∈ [0, 255]. Hence, it need to generate 256*256 keys for both party. For encryption, if we assume that the message is 768 bits, it can be divided into 32 blocks of 24 bits each. Each of those 24 bits can be divided into 3 blocks of 8 bits where each

612

D. G. Miandaob et al.

can represent a decimal value of 0 to 255. According to Fig. 4a, the ciphertext should be calculated as follows: Cf = H 255−Nf (XLf , Pf ) For decryption, the same procedure as the second protocol is followed but this time for each comparison instead of 256 keys, it needs to be compared to 256*256 keys which signiﬁcantly reduces the speed of decryption Fig. 4b.

(a) Encryption

(b) Decryption

Fig. 4. Generic encryption using multiple hashing, random ordering and random position

4

Results

We tested the performance of each model by measuring elapsed time to generate keys, encrypt and decrypt on both sides with diﬀerent message lengths. The SRAM used for this experiment to generate private keys is Cypress Semiconductor Corp CY62256NL-70PXC with a Arduino ATMEGA2560 micro-controller and Intel(R) Core(TM) i5-7500 CPU. All codes were written in C/C++. For

Hash Based Cryptography

613

Fig. 5. Time it takes to generate private keys in the server side for various message lengths for model 1, model 2 and Model 3

the sake of simplicity, we will denote Generic Encryption with Multiple Hashing as model 1, Generic Encryption combining Multiple Hashing and Random Ordering as Model 2, ﬁnally, Generic Encryption combining Multiple Hashing, Random Ordering and Random Placing as model 3 throughout this section. Figure 5 shows how long does it take for the three models to generate private keys on the server side. Since the server uses the image of the PUF not the actual hardware PUF, it can generate keys relatively fast. As it is illustrated in Fig. 5, model 3 is slower than the two other models. It is because model 3 generates 65536 keys whereas model 2 only generates 256 keys regardless of the message length and model 1 generates the same number of keys as the message length which in this experiment, only goes up to 1000 keys. Hence, model 3 take longer to generate keys.

Fig. 6. Time it takes to encrypt plain text with various length in the server side. Model 1, model 2 and model 3

After the key generation, the server side needs to encrypt the message. As Fig. 6 shows the model three is performing most eﬃciently during ecryption test. The reason for this is because model 3 splits the message into blocks of three bytes then hashes the appropriate key to generate cipher text while model 2 splits the plain text into blocks of 2 which means it needs to hash more keys compared to model 3. The most ineﬃcient model here is the model 1 which need to hash keys for every byte of the plain text.

614

D. G. Miandaob et al.

Fig. 7. Time it takes to generate private keys in the client side for various message lengths. Model 1, model 2 and model 3

Fig. 8. Time it takes to decrypt a plain text with various length in the server side. Model 1 and model 2

The above mentioned tests were done on the server side. To analyze the performance of the client side, we measured elapsed time to get the same private keys from PUFs and decrypt the cipher text. As Fig. 7 shows, model 1 and 2 take about the same time to generate keys from PUF while the model 3 is signiﬁcantly slower compared to other models. Again, as explained for the key generation for the server side, it is because of the huge number of keys required for the model 3. Also, these keys are generated from the PUF which has delays when reading from the hardware. That is why it takes 25 s on average to generate 65536 keys. Unfortunately because of the limited computational resources we weren’t able to test the decryption of various message length for model 3. The search process to ﬁnd the match makes this model extremely slow and not practical for average computational resources. Meanwhile, as it was expected, the model 1 became slower than the model 2 with the increase of the message length. This is because model 1 needs to search for more matches than model 2 which makes it ineﬃcient for longer messages. See Fig. 8.

5

Conclusion

Hash based cryptography plays an important role in the cryptosystems. By combining various methods in this ﬁeld along with PUFs, we can build generic

Hash Based Cryptography

615

encryption methods to secure transactions. Based on our measurements, model 2 seems to be the most eﬃcient model. For short messages the ﬁrst model performs fast and eﬃcient. However, its eﬃciency drops signiﬁcantly with the increase of message length. For future, to improve the performance of model 3 we can use CAM(Content Addressable Memory) to reduce latency signiﬁcantly. CAM technology helps us ﬁnd the match without searching the entire address table.

References 1. Alzubi, J.A.: Blockchain-based Lamport Merkle digital signature: authentication tool in IoT healthcare. Comput. Commun. 170, 200–208 (2021) 2. Assiri, S., Cambou, B.: Homomorphic password manager using multiple-hash with PUF. In: Arai, K. (ed.) FICC 2021. AISC, vol. 1363, pp. 772–792. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73100-7 55 3. Assiri, S., Cambou, B., Booher, D.D., Miandoab, D.G., Mohammadinodoushan, M.: Key exchange using ternary system to enhance security. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0488–0492. IEEE (2019) 4. Assiri, S., Cambou, B., Booher, D.D., Mohammadinodoushan, M.: Software implementation of a SRAM PUF-based password manager. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1230, pp. 361–379. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52243-8 26 5. Booher, D.D., Cambou, B., Carlson, A.H., Philabaum, C.: Dynamic key generation for polymorphic encryption. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0482–0487. IEEE (2019) 6. Buchmann, J., Dahmen, E., Ereth, S., H¨ ulsing, A., R¨ uckert, M.: On the security of the Winternitz one-time signature scheme. In: Nitaj, A., Pointcheval, D. (eds.) AFRICACRYPT 2011. LNCS, vol. 6737, pp. 363–378. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21969-6 23 7. Cambou, B., Telesca, D.: Ternary computing to strengthen information assurance. Development of ternary state based public key exchange. In: IEEE, SAI 2018, Computing Conference (2018) 8. Cambou, B., Flikkema, P.G., Palmer, J., Telesca, D., Philabaum, C.: Can ternary computing improve information assurance? Cryptography 2(1), 6 (2018) 9. Cambou, B., et al.: Post quantum cryptographic keys generated with physical unclonable functions. Appl. Sci. 11(6), 2801 (2021) 10. Chang, M.-H., Yeh, Y.-S.: Improving Lamport one-time signature scheme. Appl. Math. Comput. 167(1), 118–124 (2005) 11. Dods, C., Smart, N.P., Stam, M.: Hash based digital signature schemes. In: Smart, N.P. (ed.) Cryptography and Coding 2005. LNCS, vol. 3796, pp. 96–115. Springer, Heidelberg (2005). https://doi.org/10.1007/11586821 8 12. Gassend, B.: Physical random functions (2003) 13. Gassend, B., Clarke, D., Van Dijk, M., Devadas, S.: Controlled physical random functions. In: 18th Annual Computer Security Applications Conference 2002, Proceedings, pp. 149–160. IEEE (2002) 14. Guajardo, J., Kumar, S.S., Schrijen, G.-J., Tuyls, P.: FPGA intrinsic PUFs and their use for IP protection. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 63–80. Springer, Heidelberg (2007). https://doi.org/10.1007/ 978-3-540-74735-2 5

616

D. G. Miandaob et al.

15. Habib, B., Cambou, B., Booher, D., Philabaum, C.: Public key exchange scheme that is addressable (PKA). In: 2017 IEEE Conference on Communications and Network Security (CNS), pp. 392–393. IEEE (2017) 16. Keshavarz, M., Anwar, M.: Towards improving privacy control for smart homes: a privacy decision framework. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–3. IEEE (2018) 17. Keshavarz, M., Shamsoshoara, A., Afghah, F., Ashdown, J.: A real-time framework for trust monitoring in a network of unmanned aerial vehicles. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 677–682. IEEE (2020) 18. Lamport, L.: Constructing digital signatures from a one-way function. Technical report, Citeseer (1979) 19. Lim, D., Lee, J.W., Gassend, B., Suh, G.E., Van Dijk, M., Devadas, S.: Extracting secret keys from integrated circuits. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 13(10), 1200–1205 (2005) 20. Maes, R.: Physically Unclonable Functions: Constructions, Properties and Applications. Springer, Heidelberg (2013) 21. Maes, R., Verbauwhede, I.: Physically unclonable functions: a study on the state of the art and future research directions. In: Sadeghi, A.R., Naccache, D. (eds.) Towards Hardware-Intrinsic Security. ISC, pp. 3–37. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14452-3 1 22. Merkle, R.C.: A digital signature based on a conventional encryption function. In: Pomerance, C. (ed.) CRYPTO 1987. LNCS, vol. 293, pp. 369–378. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-48184-2 32 23. Merkle, R.C.: A certiﬁed digital signature. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 218–238. Springer, New York (1990). https://doi.org/10.1007/ 0-387-34805-0 21 24. Miandoab, D.G., Assiri, S., Mihaljevic, J., Cambou, B.: Statistical analysis of ReRAM-PUF based keyless encryption protocol against frequency analysis attack. arXiv preprint arXiv:2109.11075 (2021) 25. Perrig, A.: The BiBa one-time signature and broadcast authentication protocol. In: Proceedings of the 8th ACM Conference on Computer and Communications Security, pp. 28–37 (2001) 26. Preneel, B.: Cryptographic hash functions. Eur. Trans. Telecommun. 5(4), 431–448 (1994) 27. Reyzin, L., Reyzin, N.: Better than BiBa: short one-time signatures with fast signing and verifying. In: Batten, L., Seberry, J. (eds.) ACISP 2002. LNCS, vol. 2384, pp. 144–153. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-454500 11 28. van der Leest, V., van der Sluis, E., Schrijen, G.-J., Tuyls, P., Handschuh, H.: Eﬃcient implementation of true random number generator based on SRAM PUFs. In: Naccache, D. (ed.) Cryptography and Security: From Theory to Applications. LNCS, vol. 6805, pp. 300–318. Springer, Heidelberg (2012). https://doi.org/10. 1007/978-3-642-28368-0 20 29. Zhu, L.H., Cao, Y.D., Wang, D.: Digital signature of multicast streams secure against adaptive chosen message attack. Comput. Secur. 23(3), 229–240 (2004)

Cyber-Safety Awareness: Assisting Schools in Implementation Guidelines E. Kritzinger1(B) and G. Lautenbach2 1 University of South Africa, Pretoria, South Africa

[email protected]

2 University of Johannesburg, Johannesburg, South Africa

[email protected]

Abstract. Technology is growing at a rapid rate and is becoming part of our daily lives. Information Communication Technology (ICT) has become an integrated fragment of who we are, what we do, and how we do it. With the constant increase in ICT devices, access to cyberspace, and online connectivity, we as cyber users are becoming dependent on cyberspace. Cyberspace is transforming the way we socialise, do work activities and find information. The advantages of cyberspace are enormous and improve all aspects of education, government and industries. However, with the advantages comes a wide spectrum of disadvantages, namely cyberattacks (threats and risks). Cyber users are vulnerable against cyberattacks (for example, theft and ransomware) and can become cyber victims if they do not have the necessary cyber-safety knowledge and skills. One group of cyber users that are seen as easy cyber victims are school learners. School learners are using cyberspace for education, socialising, communication and information gathering. School learners are exposed so cyber threats including cyber bullying, unappropriated material and social distancing, as their cyber-safety knowledge and skills have not been properly established. Schools learners require cybersafety awareness skills and knowledge to become safe cyber users. Parents and schools must ensure that school learners obtain these awareness, knowledge and skills. However, many parents, teachers and schools do not know how to establish a cyber-safety environment for their learners. Schools are currently lacking a proper cyber-safety awareness culture to ensure that the learners are using the cyberspace safely. This research will focus on providing schools with easy to follow cybersafety awareness guidelines on how to create and implement a cyber-safety culture. The proposed guidelines will assist schools to understand the needs of the learners and provide awareness, knowledge and skills regarding cyber-safety actions within cyberspace. Keywords: Awareness · Cyberspace · Cyber users · Cyber safety · School learners

1 Introduction Cyberspace is defined by Ottis and Lorents as “a time-dependent set of interconnected information systems and the human users that interact with these systems” [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 617–629, 2022. https://doi.org/10.1007/978-3-031-10467-1_37

618

E. Kritzinger and G. Lautenbach

Cyberspace is an online world that connects virtual components, networks and cyber users [2]. A cyber user is defined as a user that interacts with cyberspace and ranges in age, race, religion and geographic location. Cyberspace has no boundaries, no laws or governing body to protect cyber users. It is critical that users of cyberspace understand the possible cyber risks and treats and are able to protect themselves within in cyberspace and not become cyber victims [3]. One group of cyber users (school learners) are extremely exposed within cyberspace and can become easy cyber victims [4, 5]. Many schools are currently not promoting cyber-safety awareness and do not have the required initiatives to establish and grow a cyber-safety culture [6, 7]. Teachers and school governing bodies are not well equipped to address cyber-related incidents in schools [8]. Schools therefore need additional assistance on how to establish and grow a cyber-safety culture [9]. The aim of this study is to provide schools (for this research, South African schools) with guidelines to assist them to create, grow and cultivate a cyber-safety awareness culture in the school. This research will identify cyber-safety awareness building blocks and an implementation process to guide schools on the important cyber-safety awareness issues that must be put in place to ensure that school learners understand cyber safety and in order to protect themselves and their information in cyberspace. This research is part of a Master dissertation study in Education at the University of Johannesburg [10].

2 Cyber-Safety Awareness Building Blocks The main aim of this research is to identify global cyber-safety topics/issues that is relevant to create and grow a cyber-safety awareness culture. These topics/issues will be the building blocks of this research. The authors take note that there are a wide range of definitions for the term building blocks. Cyber-safety awareness building blocks (as defined by this research) are the basic cyber-safety awareness topics that are relevant for cyber-safety awareness and can form part of a larger implementation strategy in order to improve cyber- safety awareness amongst cyber users [10]. The building blocks used in this literature analysis were obtained from national documents in South Africa, which included the National Cybersecurity Policy Framework for South Africa (NCPF) and the National Cybercrimes Bill [11, 12]. The building blocks were also obtained from international best practice documents which included NIST, the King Report and the ISO27002 standard [13, 14]. The combined list for cyber-safety building blocks obtained from these documents [10]: • • • • • • • •

identify the knowledge and skills to be conveyed (training) identify the behaviour that need to be changed (awareness) design user specific material use the multi-user approach (include all users) assess current knowledge, skills and awareness address funding issues obtain by-in from management identify delivery methods

Cyber-Safety Awareness: Assisting Schools in Implementation

• • • • • • • • • • • • • • • • • • • •

619

continue the improvement of initiatives ensure monitoring compliance (including assessment) ensure evaluation and feedback generate reports identify success indicators integrate cyber-security risks into management allocate the responsibility in relation to cyber-security risks create a cyber-security plan that includes technical tools create a culture among users monitor cyber incidents make provision for business resilience and continuity revise cyber-security policies create awareness programs pay regular attention to awareness issues create and maintain an internal awareness portal do regular assessments use an incentive approach read the responsibility of all new users train–in-depth approach to policies and procedures use different methods of presentation (workshops or online).

Once these building blocks have been identified, it is important to have an implementation strategy that will propose a plan of action on how (and in which order) these building blocks must be implemented. Form the scoping literature a number processes were also identified. These processes (relevant to cyber safety) include focus areas including management structure, identifying relevant role players, resource analysis, policies, assessment, reporting and monitoring of cyber safety environment. The next section will describe the implementation strategy that has been used for this research. There are a number of educational and information technology-related models that can be utilised to create an implementation strategy. This research will combine two methods (one model with an educational foundation and one with a technological foundation) to form one implementation strategy to implement cyber-safety awakens building blocks [10]. The first model is the ADDIE educational model. The phases of this model include analysis, design, development, implementation and evaluation. The second model is a model that is more focused on the design principles of technology and include empathise, define, ideate, prototype and test. The two models were combined to create a four-phase approach that will be used to group the building blocks. The four phases that will be used for this research include: • • • •

Phase 1: Analysis (including empathies, define and ideate), Phase 2: Design (including prototype) Phase 3: Implementation (including testing) Phase 4: Evaluation.

The proposed four phases, identified building blocks and the required cyber safety processes will form part of the proposed theoretical framework included in Sect. 4.

620

E. Kritzinger and G. Lautenbach

3 Methodology The research study used an interpretivist approach. According to Saunders, Levis and Thornhill [15], the foundation of the interpretivist approach is to develop new methods to define and understand the social environment and the connection to the world around us. The research was divided into four phases. Phase 1 consisted of a critical literature review. The critical literature review points out the extent the research engaged with the relevant literature. According to Grant and Booth [16], a critical literature review is structured by using a narrative report in conjunction with a theoretical conclusion. The literature obtained (non-empirical study) in the review will be used to propose a theoretical cyber-safety awareness framework. The theoretical framework was tested for validly and importance by an empirical qualitative data gathering approach [17]. The data collection was done with expert reviewers from the educational as well as the ICT sectors. According to Nielsen [18], three to five expert reviewers are enough for this qualitative study. The expert reviewers were identified on the following criteria: • The first criterion was years of ICT experience; the research proposes a minimum of five years. • The second criterion was the years of education experience. The research proposes a minimum of five years. • The last criterion was the knowledge level of ICT in the education environment (university degree in ICT or Education). Ethical clearance was obtained from the host institution to conduct the research. The validation of the research was done through an expert reviewers process. The feedback of the expert reviewers was analysed and were used to propose cyber-safety awareness guidelines for schools.

4 Theoretical Framework The cyber-safety awareness theoretical framework consists of the cyber-safety awareness building blocks as identified in Sect. 2 of this paper. Each of the cyber-safety awareness building blocks will be linked to one of the four phases proposed in Sect. 2. Phase 1 includes analysis, Phase 2 focus on design of material, Phase 3 will be implementation and phase 4 will be assigned one or more of the identified building blocks. Table 1 depicts the scoping literature review results where cyber safety building blocks are assigned to one or more pre-identified phases. The building blocks from the initial scoping literature review are depicted in Table 1 and is divided in one or more of the four Phases. Table 2 depicts the different building blocks assigned to one or more of the proposed Phases. This section (presented in Tables 1 and 2) grouped the identified building blocks into the one or more of the four phases. Table 2 represent the proposed theoretical framework for this research. The next section focuses on the testing and validation of the theoretical proposed framework by 5 independent expert reviewers.

Cyber-Safety Awareness: Assisting Schools in Implementation

621

Table 1. Cyber-safety building blocks per phase (scoping literature review) [10]. Building blocks

Phases 1

Ensure policies include cybersecurity training and awareness

2

3

x

Role players are informed of policies for compliance

x

All role players are trained regarding their responsibilities

x

Design of awareness program must be based on the user

x

Awareness programs-relevance to the user’s environment Identify cyber needs

4

x x

Identify the knowledge and skills to be conveyed (training)

x

Identify the behaviour that need to be changed (awareness)

x

Design a user-specific material

x

Use the multi-user approach (include all users)

x

Assess current knowledge, skills and awareness

x x

Funding issues must be addressed

x

Obtain by-in from management

x

Delivery methods must be identified

x

x

Continued improvement of initiatives

x

Ensure monitoring compliance (including assessment)

x

Evaluation and feedback

x

Generate reports

x

Identify success indicators

x

Integrate cyber security risk into management

x

Allocate responsibility in relation to cyber- security risk

x

Create a cyber-security plan that includes technical tools Create a culture among users

x

x

x

x

x x

x

Monitor cyber incidents Make provision for business resilience and continuity

x x

Revise cyber security policies

x

Create awareness programs

x

Pay regular attention to awareness issues

x

x

Create and maintain an internal awareness portal

x

x

Do regular assessments Use an incentive approach

x x

x

x

x (continued)

622

E. Kritzinger and G. Lautenbach Table 1. (continued)

Building blocks

Phases 1

2

3

Read the responsibility of all new users

x

Train–in-depth approach to policies and procedures

x

Use different methods of presentation (workshops or online)

x

4

Table 2. Cyber-safety building blocks per phase (scoping literature review) [10] Number

Building blocks for Phase 1

1.1

Design of awareness program must be based on users

1.2

Ensure policies include compliance with POPI

1.3

Identify cyber needs of cyber users

1.4

Obtain by in from management

1.5

Awareness programs must have relevance to user’s environment

1.6

Integration of cyber security risk into management

1.7

Allocation of responsibility in relation to cyber security risk

1.8

Provision for business resilience and continuity

1.9

Apply Incentive approach

Number

Building blocks for Phase 2

2.1

Policies include a cybersecurity training and awareness

2.2

Awareness programs must have relevance to user’s environment

2.3

Identify the knowledge & skills to be conveyed (training)

2.4

Identify the behavior that need to be changed (awareness)

2.5

Design user specific material

2.6

Multi user approach (include all users)

2.7

Creating a cyber-security plan that includes technical tools

2.8

Regular attention to awareness issues

2.9

Create and maintain internal awareness portal

2.10

Incorporate Incentive approach

Number

Building blocks for Phase 3

3.1

Role players are informed of policies for compliance

3.2

All role-players are trained regarding their responsibilities

3.3

Multi user approach (include all users)

3.4

Assessment of currently knowledge, skills and awareness (continued)

Cyber-Safety Awareness: Assisting Schools in Implementation

623

Table 2. (continued) Number

Building blocks for Phase 1

3.5

Delivery methods must be identified

3.6

Identify success Indicators

Number

Building blocks for Phase 4

4.1

Assessment of currently knowledge, skills and awareness

4.2

Assessment of currently awareness

4.3

Ensure monitoring compliance

4.4

Evaluation and feedback

4.5

Generate reports

4.6

Identify success Indicators

4.7

Monitoring of cyber incidents

4.8

Revision of cyber security policies

4.9

Regular attention to awareness issues

4.10

Create and maintain internal awareness portal

4.11

Regular assessments

4.12

Incentive approach

5 Data Gathering and Findings The expert reviews were requested to evaluate the rigor and validity of the proposed theoretical framework. According to Nielsen (2004) 3–5 expert reviewer are sufficient of empirical studies. The study approached 5 expert reviews to obtain feedback for the research done [18]. Each reviewer was requested to evaluate and validate phase and building block according to a five-point rating scale; from Not important (1) to Very important (5). The expert reviewers’ feedback was analysed according to each phase. Figures 1, 2, 3 and 4 depicts the overall score per building block (within each) for all of the reviewers. All the building blocks proposed in the theoretical framework were validated (scored more than 4 on the rating scale). The feedback from the expert reviewers indicated that all the building blocks are important and should be added to the final proposed cybersafety awareness framework (guidelines). The reviewers also indicate that the four-phase approach is a valid step-based approach to implement the building blocks. The reviewers propose one suggestion to further define each phase into smaller implementation sections (the term processes was proposed) to provide the users of the guidelines to create a cyber culture. The next section will utilise the feedback of the reviewers to propose the cyber-safety awareness guidelines.

624

E. Kritzinger and G. Lautenbach

Building block 1.9 Building block 1.8 Building block 1.7 Building block 1.6 Building block 1.5 Building block 1.4 Building block 1.3 Building block 1.2 Building block 1.1 1

2

3

4

5

Fig. 1. Average score for all reviewers for phase 1 [10]

Building block 2.10 Building block 2.9 Building block 2.8 Building block 2.7 Building block 2.6 Building block 2.5 Building block 2.4 Building block 2.3 Building block 2.2 Building block 2.1 1

2

3

4

5

Fig. 2. Average score for all reviewers for phase 2 [10]

Building block 3.6 Building block 3.5 Building block 3.4 Building block 3.3 Building block 3.2 Building block 3.1 1

2

3

4

Fig. 3. Average score for all reviewers for phase 3 [10]

5

Cyber-Safety Awareness: Assisting Schools in Implementation

625

Building block 4.12 Building block 4.11 Building block 4.10 Building block 4.9 Building block 4.8 Building block 4.7 Building block 4.6 Building block 4.5 Building block 4.4 Building block 4.3 Building block 4.2 Building block 4.1 1

2

3

4

5

Fig. 4. Average score for all reviewers for phase 4 [10]

6 Proposed Framework The framework prosed in this research details a process schools can adopt and implement cyber-safety awareness policies, plans and interventions. The first step of the proposed framework further divides each phase (depicted in Table 1) into more detailed processes (as suggested by the reviewers). Each of the four phases (Analysis, Design Implementation and Evaluation) are linked to a number of processes to assist with the implementation of the cyber-safety awareness policies. The proposed process for each phase include [10]: • Phase 1: Analysis • • • •

Process 1.1: Management structure Process 1.2: Identify role players Process 1.3: Needs analysis and prior knowledge analysis Process 1.4: Resource analysis

• Phase 2: Design • Process 2.1: Policies • Process 2.2: Material development • Process 2.3: Additional contribution • Phase 3: Implementation • Process 3.1: Informing • Process 3.2: Participation

626

E. Kritzinger and G. Lautenbach

• Process 3.3: Assessment • Phase 4: Evaluation • • • • •

Process 4.1: Monitoring Process 4.2: Measuring Process 4.3: Evaluation Process 4.4: Adaption Process 4.5: Reporting

The second step of the proposed framework links the proposed building blocks (as depicted in Table 1) to different processes. Each building block has been assigned to a process to create the proposed cyber-safety awareness guidelines. The linking of building blocks to process are depicted in Table 3. Table 3. Proposed cyber-safety awareness guidelines [10] Process

Building block

Phase 1: Analysis 1.1

Management structure

Obtain by-in from management Integration of cyber security risk into management Allocation of responsibility in relation to cyber security risk Provision for business resilience and continuity

1.2

Identify role players

The design of the awareness program must be based on users

1.3

Needs analysis prior knowledge

Ensure policies include compliance with privacy policies Awareness programs must have relevance to the user’s environment

1.4

Resource analysis

Analyse and assign recourses

Phase 2: Design 2.1

Policies

Ensure policies include cyber-security training and awareness Creating a cyber-security plan that includes technical tools (continued)

Cyber-Safety Awareness: Assisting Schools in Implementation

627

Table 3. (continued) Process

Building block

2.2

Material development

Awareness programs must have relevance to the user’s environment Identify the knowledge and skills to be conveyed (training) Identify the behaviour that must be changed Design user-specific material Multi-user approach (include all users)

2.3

Additional contribution

Create awareness programs Create a culture among learners Pay regular attention to awareness issues Create and maintain an internal awareness portal Use an incentive approach

Phase 3: Implementation 3.1

Informing

Role players are informed of policies for compliance Delivery methods must be identified Read their responsibility of all new users

3.2

Participation

All role players are trained regarding responsibilities Multi-user approach (include all users) Incentive approach Training Different methods of presentation

3.3

Assessment

Assessment of current knowledge and skills Identify success indicators

Phase 4: Evaluation 4.1

Monitoring

Continued improvement of initiatives Ensure monitoring compliance Monitor cyber incidents Monitor internal awareness portal

4.2

Measuring

Assessment of current knowledge and skills Pay regular attention to awareness issues Maintain internal awareness portal Regular assessments

4.3

Evaluation

Continued improvement of initiatives Evaluation and feedback Identify success indicators

4.4

Adaption

Create a culture among users Revise cyber security policies

4.5

Reporting

Generate reports

628

E. Kritzinger and G. Lautenbach

The guidelines (combination of phased, process and building blocks) depicted in Table 3 can be used by school authorities and educational experts as a “road map” or “step-by-step” guide to implement cyber-safety awareness. The correct implementation of cyber-safety awareness will provide a growth in the cyber-safety awareness culture within the school environment that is beneficial to all cyber role players (teachers, the school, the learners and parents). Creating a “healthy” cyber-safety awareness environment within a school is not a “once-size fit all” all approach and number of influencing factors must be taken into consideration. The proposed framework provides schools with a guideline approach on how to identify the influencing factors, corporation the diverse factors into a cohesive development that the school can follow to ensure all the cyber-safety needs of the role-players are met. With the main goal and aim to enhance and grow a cyber-safety awareness’ culture is unique for each school.

7 Conclusion This research investigated cyber-safety awareness building blocks that can assist schools to create, implement and grow a cyber-safety culture. The research identified building blocks from national and international documents and combined the building blocks with a phased implementation strategy. The building blocks and phases were linked to propose a theoretical framework. The validity of the theoretical framework was tested by five expert reviewers. The feedback of the reviewers indicated the importance of each building block. The theoretical framework was adapted to include processes as requested by the reviewers to form the proposed cyber-safety awareness guidelines. The proposed guidelines will assist schools to establish and grow a cyber-safety culture for their learners. Disclaimer from author: This research is part of a Master’s study (Med Dissertations) at the University of Johannesburg.

References 1. Ottis, R., Lorents, P.: Cyberspace: definition and Implications. In: International Conference on Information Warfare and Security, pp. 267–295 (2010) 2. Grobler, M., Jansen Van Vuuren, J., Zaaiman, J.: Preparing South Africa for cyber crime and cyber defense. Syst. Cybern. Inform. 11(7), 32–41 (2013) 3. DePaolis, K., Williford, A.: The nature and prevalence of cyber victimization among elementary school children. Child Youth Care Forum 44(3), 377–393 (2014). https://doi.org/10. 1007/s10566-014-9292-8 4. Burton, P., Mutongwizo, T.: Inescapable violence: cyber bullying and electronic violence against young people in South Africa. Cent. Justice Crime Prev. 8, 1–12 (2009) 5. Furnell, S., Bryant, P., Phippen, A.D.: Assessing the security perceptions of personal Internet users. Comput. Secur. 26(5), 410–417 (2007) 6. Von Solms, S., Von Solms, R.: Towards cyber safety education in primary schools in Africa. In: Proceedings of the Eighth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2014), Haisa, pp. 185–197 (2014)

Cyber-Safety Awareness: Assisting Schools in Implementation

629

7. Kortjan, N., von Solms, R.: Fostering a cyber-security culture: a case of South Africa. In: 14 th Annual Conference on World Wide Web Applications, pp. 4–13 (2012) 8. Govender, I., Skea, B.: Teachers’ understanding of e-safety: an exploratory case in KZN South Africa. Electron. J. Inf. Syst. Dev. Countries 70(5), 1–17 (2015) 9. Grobler, M., Flowerday, S., Von Solms, R., Venter, H.: Cyber awareness initiatives in South Africa: a national perspective. In: Proceedings of the First IFIP TC9/TC11 South African, pp. 32–41 (2011) 10. Kritzinger, E.: Cybersafety guidelines to prepare South African schools for the 4th industrial revolution. MEd Dissertation, University of Johannesburg, Johannesburg (2020) 11. South African Government: The National CyberSecurity Policy Framework (NCPF) (2015) 12. South African Government: The Cybercrimes and cyber security Bill. (2016) https://cyberc rime.org.za/docs/Cybercrimes_and_Cybersecurity_Bill_2015.pdf 13. NIST: Framework for improving critical infrastructure cybersecurity. Federal Agency, United States of America (2018) 14. The Institute of Directors South Africa: Draft King IV TM Report on Corporate Governance for South Africa 2016. Johannesburg, South Africa (2016) 15. Saunders, M., Lewis, P., Thornhill, A.: Research Methods For Business Students, 8th edn. Pearsons, UK (2019) 16. Grant, M., Booth, A.: A typology of reviews: an analysis of 14 review types and associated methodologies. Heal. Inf. Libr. J. 26(2), 91–108 (2009) 17. Mouton, J.: How to succeed in your master’s and doctoral studies: a South African guide and resource book. Van Schaik Publishers, Hatfield, Sout Africa (2001) 18. Nielsen, J.: The need for web design standards. (2004) www.useit.com/alertbox/20040913. html. Accessed 02 Mar 2016

Reducing Exposure to Hateful Speech Online Jack Bowker1 and Jacques Ophoff1,2(B) 1 Abertay University, Dundee, UK

[email protected] 2 University of Cape Town, Cape Town, South Africa

Abstract. It has been observed that regular exposure to hateful content online can reduce levels of empathy in individuals, as well as affect the mental health of targeted groups. Research shows that a significant number of young people fall victim to hateful speech online. Unfortunately, such content is often poorly controlled by online platforms, leaving users to mitigate the problem by themselves. It’s possible that Machine Learning and browser extensions could be used to identify hateful content and assist users in reducing their exposure to hate speech online. A proof-of-concept extension was developed for the Google Chrome web browser, using both a local word blocker and a cloud-based model, to explore how effective browser extensions could be in identifying and managing exposure to hateful speech online. The extension was evaluated by 124 participants regarding the usability and functionality of the extension, to gauge the feasibility of this approach. Users responded positively on the usability of the extension, as well as giving feedback regarding where the proof-of-concept could be improved. The research demonstrates the potential for a browser extension aimed at average users to reduce individuals’ exposure to hateful speech online, using both word blocking and cloud-based Machine Learning techniques. Keywords: Hate speech · Machine learning · Browser extension

1 Introduction ‘Hate speech’ is a form of targeted abuse, aimed towards an individual or group with the intent to offend or threaten [1, 2]. In online spaces, hate speech has long been an issue due to the anonymity and perceived lack of consequences of online speech, along with the ability for hateful groups to easily congregate. It has been observed that exposure to hateful speech and sentiment towards marginalized groups can reduce levels of empathy in the wider population, with a Finnish survey reporting 67% of respondents aged between 15–18 years old had been exposed to hateful speech online, and 21% having fallen victim to such material [3]. The effect of hate speech on members of marginalized groups online is self-evident, as “From the perspective of members of the targeted groups, hate speech ‘sets out to make the establishment and upholding of their dignity’ much harder” [4]. Several different techniques have been attempted to reduce the amount of hateful speech online, with Google funding the ‘Jigsaw’ research wing, focused on fighting © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 630–645, 2022. https://doi.org/10.1007/978-3-031-10467-1_38

Reducing Exposure to Hateful Speech Online

631

online toxicity using Machine Learning (ML). This process involves training a computer algorithm to learn and improve over time. The research involved gathering large datasets of comments and categorizing toxic sentiment, offering the ability to automatically block hateful comments from sites such as The New York Times for their comment section [5]. Even though most major social media companies have the data, resources, and experience to reduce the hate on their platforms using advanced techniques such as ML, the issue is politically sensitive, with the companies cautious to ban accounts or hide posts after accusations of political censorship [6]. In March 2019, CEO Mark Zuckerberg detailed Facebook’s shift towards a more ‘privacy-focused’ platform, motivated by the public response to the Cambridge Analytica privacy scandal and spread of offensive content [7]. This move also reduces the scope of their responsibility to moderate content on the platform, with end-to-end encryption and private groups making moderation more difficult. These reasons are part of the reason why work is being done to give individuals control over what they see online, with Google’s Jigsaw developing the experimental ‘Tune’ browser extension, allowing users to determine the intensity of speech they’re exposed to. The aim of this research is to investigate how a browser extension can be used to assist users in reducing their exposure to hate speech online, by researching existing solutions and surveying members of the public on a proof-of-concept prototype solution. Extensions can be an effective place to do this detection as they’re relatively accessible to users, usually installed as an add-on through an online store to extend the functionality of the browser, as well as being positioned in the web browser where most social interactions on PCs are carried out. Extensions have been used to give users control over their exposure to hate speech, with open-source solutions such as the Negator tool [8] using a locally trained Natural Language Processing (NLP) model, a type of ML that can be used to detect hateful sentiment by considering the context of the comment and comparing it to the comments it has seen and been trained on in the past. This research is primarily focused on the acceptability and usability of such an approach. This paper proceeds as follows. Section 2 presents a literature review into definitions of hate speech, as well as existing technologies available to manage hate speech. Machine learning, and how it can be used to detect sentiment, is also explored. Section 3 describes the development of a proof-of-concept browser extension and server. Section 4 describes the design of an evaluation survey that was conducted with participants from the public, as well as the results of the survey. Section 5 discusses and reflects on the design and technical aspects of the proof-of-concept extension. Lastly, Sect. 6 summarizes the research contribution and discusses limitations and suggestions for future work.

2 Background This section will review how browser extensions as well as different technologies such as ML can assist in accurately detecting instances of hate speech online. It will also discuss how this effectiveness can be measured, how to notify users of this speech, and issues regarding building datasets. This area of research has been of significant interest recently due to the rapid improvement of ML and language recognition, as well

632

J. Bowker and J. Ophoff

as public discussion about censorship and how much or little social media companies should censor content on the platforms. 2.1 Defining Hate Speech The term ‘hate speech’ does not have a universal definition, and the scope of the term can depend on which definition is used. The majority of developed democratic countries have laws defining and restricting the use of it to differing extents [9], with the United Kingdom defining Hate Speech as an expression of hatred towards someone on account of that person’s “colour, race, disability, nationality, ethnic or national origin, religion, gender identity, or sexual orientation”. Additionally, “Any communication which is threatening or abusive, and is intended to harass, alarm, or distress someone” is illegal as of the 1994 Criminal Justice and Public Order Act [2]. Although the United States doesn’t have hate speech written into law due to Supreme Court rulings around the First Amendment [10], most major social media platforms define hate speech in similar terms as the United Kingdom. Facebook, the largest social media platform globally, defines hate speech as a direct attack on someone’s protected characteristics, including “race, ethnicity, national origin, religious affiliation, sexual orientation, sex, gender or gender identity, or serious disabilities or diseases” [1]. It has been found that regular exposure to this content can be harmful to the individual or groups that are directly targeted. For example, the effects of anti-Semitic and homophobic hate speech can cause heightened stress, anxiety, depression, and desensitization [11]. This conclusion is supported by a study run by the Economist Intelligence Unit which finds that 65% of women experience hate speech online [12]. Unfortunately, it is difficult to define a profile for cyberhate targets, which can provide a starting point to identify at-risk individuals [13]. Therefore, any technology to manage hate speech needs to be broadly accessible and usable. 2.2 Technology to Manage Hate Speech As online hate speech has moved to the forefront of public discussion, in part thanks to public campaigning and the growing unanimity of social media, tools have been developed to try and mitigate this. Research has been carried out in using techniques such as keyword detection, account blacklisting, and ML based approaches. There are numerous different proposed methods to alert the user of hateful content, ranging from blocking the post from view completely to just giving the user a notice that the account has a history of hateful conduct. The Negator tool [8] makes use of NLP with an Aspect-based Sentiment Analysis (ABSA) model. The extension uses server-side processing to detect the intensity of the speech from 0 to 100%, taking into consideration the topic and who/what the speech is aimed at, then categorizing the speech into topics including “Abuse, Personal Attacks, Cyberbullying, Sexual Advances, Bigotry, Criminal Activity” and “Death Threats”, and will block the post with a visual indicator notifying the user which category the speech falls under if the intensity is more than 60%. The tool takes a harsh stance on hiding hateful content and entirely hides posts that meet its own criteria with an interface

Reducing Exposure to Hateful Speech Online

633

notifying the user that the content has been blocked, giving the category the post falls into, with an option to view the post anyway. The Shinigami Eyes tool [14] takes a different approach in notifying users of harmful content. The extension focuses on transphobic social media accounts and websites, with users submitting links as being ‘anti-trans’ or ‘trans-friendly’ that are manually reviewed before being added to the list, implemented using a bloom filter. Shinigami Eyes uses colour coded warnings marking hyperlinks and profiles as red (anti-trans) or green (transfriendly). This approach is less harsh in that it still shows users the offending user’s posts but can give extra context that might increase the likelihood of users not interacting with ‘anti-trans’ accounts. Users may submit reports via the extension interface by right clicking on a link to a profile or website. Modha et al. [15] proposes a browser extension that implements the TRAC dataset, which classifies training data into sentiment categories of ‘overtly aggressive’, ‘covertly aggressive’, and ‘non-aggressive’, showing users the levels of each category embedded in the web page. The extension uses a similar method of colour coding as the Shinigami Eyes extension, with posts that pass a high threshold of confidence displayed completely in red, with medium displayed in yellow and non-hateful posts in green. Along with this, the levels detected by the model are shown directly above the comment as a number. The decision was justified in the paper by the distrust that the public has regarding algorithms and ML, partly due to the fact they are seen as a black box. 2.3 Machine Learning Client-Side and Cloud Models. As ML has gained popularity in business applications, commercial services are now common where users can rent computing power and take advantage of mature pre-existing models. Robust libraries and frameworks exist for a multitude of languages for building a model locally, with Python offering libraries such as scikit-learn, mlxtend and Shogun that can be configured with most ML algorithms involving transformation, decision trees, classification and more [16]. Datasets. A significant factor in training an accurate ML model is to ensure there is a well annotated dataset that is relevant to the topic being worked on. In the area of linguistics, the term ‘corpus’ refers to a collection of texts, especially regarding a particular subject. In the field of NLP, a corpus similarly refers to a structured set of text or data, commonly used for training a model. However, classifying hate speech has proven to be a challenge, principally due to the loose definition of the term. Ross et al. [17] discusses this in the context of training a dataset from the online discourse surrounding the European Refugee Crisis, with data gathered from 13,766 Tweets including a selection of offensive hashtags. The tweets were categorized by annotators who were also given Twitter’s official definition of hate speech, but the rate of agreement was still low with a Krippendorff’s alpha of (α) = 0.38 (with 0 being complete disagreement and 1 being complete agreement). They conclude that there needs to be a stronger and more consistent definition of hate speech. Additionally, when annotating datasets, finding common characteristics of content users find hateful will be useful in building a more automated detection model.

634

J. Bowker and J. Ophoff

MacAvaney et al. [18] focuses more broadly on the challenges and possible solutions of hate speech detection. An issue brought up is the data usage and distribution policies of major social media companies who want to restrict users scraping their platform for various reasons including legitimate user privacy concerns. This creates an issue on facebook, the biggest social platform and location for a significant amount of online discourse, not allowing for scraping of content. Therefore, a useful stream of data is restricted – for example discussion in comments underneath a controversial news article. A related issue is the disproportionate language representation in hate speech datasets, with english making up the vast majority due to being the go-to language for online discussion. Assessing Accuracy. There are various methods to assess the accuracy of natural language models, and the appropriate measurement can vary depending on what the NLP model is intended to be used for. A reliable method available with certain datasets is making use of ‘Dev set’ data. This was implemented as part of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-1) where 130 teams helped annotate 15,000 Facebook comments and posts [19]. This large data pool meant a second nonoverlapping corpus could be created, which came from similar sources to the intended training data making it a good testing environment. Once test data is acquired, a popular metric to describe the performance of ML models is a Confusion matrix [20].

2.4 Summary The literature shows the potential applications a browser extension could have in the real world, with different methods effective at helping users control their exposure to hateful content and be informed of who they are interacting with. The papers reviewed include detailed discussion of the different natural language models for detection but conclude that each method has its drawbacks and can all perform well with high quality training data. The research also provided valuable insights into datasets, highlighting the value of properly sanitized and annotated data to help train and test language models.

3 Prototype Development This section describes the development of a proof-of-concept browser extension and a Python application (nicknamed HateBlocker), with the browser extension used to scan webpages for elements, and a Python server to receive these elements and process them, sending back the results. The browser extension was developed for the Google Chrome browser. This decision was made as Chrome is currently the market leader in desktop web browsers. As browser extensions are developed primarily using cross platform web technologies like JavaScript, it would be trivial to port over to a non-Chromium browser such as Mozilla Firefox. An iterative development approach was adopted. Originally the browser extension was designed to deal with both detecting and processing webpage data. However, during the development process it was found that this increases the complexity of the code

Reducing Exposure to Hateful Speech Online

635

significantly, also making it more difficult to swap out the processing algorithm at a later stage. To allow future changes to the processing algorithm, the decision was made to split the application into two distinct parts as shown in Fig. 1. A Chrome web extension parses the webpage elements and sends and receive this data, while a separate Python-based server retrieves the data, processes it, and sends it back to the web extension.

Fig. 1. Flow diagram of prototype

3.1 Text Detection For the browser extension to detect and block hateful content on webpages, it needs to be able to read elements on the page, as well as filter which elements would be relevant to the search. To inject scripts into webpages, content_scripts must be declared through manifest.json. This file is required for the extension to run, as it holds the extension’s metadata and other configuration. After fetching the relevant elements, the innerText of these elements was added to an array. This removed parts of the element object that were unnecessary for processing the text. The technique used for fetching relevant elements depends on which site the HateBlocker is being aimed at. For the proof-of-concept the classic interface of the popular Reddit social media platform was chosen (old.reddit.com). Reddit is a decentralized platform relying on voluntary moderators to manage communities. However, it’s been widely reported that some communities don’t strictly enforce Reddit’s code of conduct and leave hateful content on the site [21], which makes it applicable to this research. The classic interface was chosen as it doesn’t rely extensively on JavaScript to draw interface elements, and therefore the Document Object Model (DOM) is relatively static and easy to parse. This method could easily be adapted to forums and imageboards such as 4chan. However, a different method would be needed for modern social media platforms such as Twitter, Facebook, and the most recent design of Reddit, as they use the popular ReactJS library. This library increases the complexity of detecting elements on a webpage because it virtualizes the DOM and only periodically updates the real version.

636

J. Bowker and J. Ophoff

3.2 API The development of the project was split into two parts, with the extension sending webpage elements via POST request as JSON data, a standardized format for serializing data, in real time to a remote server. This is done asynchronously using the fetch API available in JavaScript ES6. This data is then received by a Python Flask [22] application located at app.py. Flask was chosen as it’s a lightweight web framework that allowed for a POST endpoint to be configured in a few lines of code. As shown in Fig. 2, two methods were added to the server, to receive POST requests and send this data to the analyser, and to send a response if a GET request is mistakenly sent to the server.

Fig. 2. Flask endpoints

Once data was passed into the Flask application and processed by the analyser, it was returned to the extension via the POST response, formatted as JSON data, with an extra property of either True, meaning the text was detected to be hateful, or False meaning the text didn’t trigger the analyser. Figure 3 illustrates how the endpoints communicate with the extension. This API based method of communication between the extension and server makes it easy for future work in to be carried out in implementing different extensions or servers.

Fig. 3. API flow visualization

3.3 Text Processing Text processing functionality was developed with the goal of being interchangeable, allowing for different hate speech detection techniques and algorithms to be added in

Reducing Exposure to Hateful Speech Online

637

the future. A local version of text processing was implemented in JavaScript in the Chrome extension earlier in the development stage. Different avenues were explored in this area, such as using a server-side wordlist and sentiment analysis using Google’s Cloud Language and Perspective APIs. Local Wordlist. A local wordlist was the first step in detecting overtly hateful text online. This method misses out on the context in which the term was used - for example, if the word is offensive only in the context of a particular discussion. Although relatively easy to implement technically, a wordlist had to be found or generated for the Python application to use. Hatebase [23] provides a comprehensive and regularly updated list of hateful words in use in multiple languages globally and was considered for use in the application. The site requires use of an API to access their dataset which could considerably slow down the runtime of the application. An alternative dataset is made available by Carnegie Mellon University [24] listing 1,300 + English terms that could be found offensive. This dataset was downloaded as a TXT file and converted to CSV using Microsoft Excel, before being used in the prototype. Cloud based Sentiment Analysis. After implementing word detection using a local Python server, cloud-based sentiment analysis was integrated into the application as a demo of the extensibility of a server-based approach. This was done using Google’s Perspective AI. This is a limited access API developed by Jigsaw, a research unit within Google and Google’s Counter Abuse Technology teams, to enable better conversations online by creating ML models targeted at online toxicity and harassment [25]. As shown in Fig. 4, the JSON response was parsed to get the toxicity score for the paragraph, measured between 0 (no toxicity) and 1 (extremely toxic), with any value over 0.5 returning True and resulting in the element being marked as hateful.

Fig. 4. Perspective API Implementation

3.4 Notifying the User Due to how elements are sent to a server and returned, limitations were found regarding how the elements could be modified, to notify the user that they had been blocked. Initially, elements were checked to see if they included the blocked word, and if they did, the innerHTML was appended to remove the phrase and replace it with “Post blocked by HateBlocker”. The appearance of blocked elements is illustrated in Fig. 5.

638

J. Bowker and J. Ophoff

It uses a minimal notification, blending in with the page background. This method also allowed for users to view the content if they wished by selecting the dropdown (as shown in the bottom section of the figure).

Fig. 5. Appearance of blocked elements

As well as notifying the user by covering offending elements, the extension’s popup was configured to show the number of instances on a given page. This was done by adding a temporary counter to Chrome’s local storage API after instances are returned, which is fetched when the pop-up is opened. This result was cleared on page change, as well as when the extension was disabled using the pop-up interface.

4 Evaluation A user acceptance survey was conducted to gather opinions on the proposed hate speech blocker browser extension. Due to COVID-19 restrictions, participants were unable to test the browser extension in person. Although options were considered with regards to allowing users to install the extension on their personal machines, the decision was made that this would be too awkward for novice users as well as being a security risk. Due to the complicated nature of walking participants through installing a browser extension manually, it was decided that a video would be created to simulate what the browser extension’s experience was like. This involved recording an in-development version of the extension on Reddit, a site the extension was intended for. The demonstration recording involved using the extension in a way that was as easy as possible to understand, which involved planning which areas of the extension to show and when. As shown in Fig. 6, a linear diagram was used to assist in visualizing the order to carry out the demonstration.

Fig. 6. Demonstration Video Visualization

Reducing Exposure to Hateful Speech Online

639

Once the video was recorded, it was edited to slow down the clip and make it easier to follow. Annotations were added to explaining what was happening. The video was uploaded to YouTube from where it was embedded into the survey. Participants were recruited using a combination of social media and email. A survey link was posted publicly to personal social media platforms (Facebook, Twitter, Instagram, and LinkedIn) and sharing was encouraged. The link was also sent out via email to personal contacts. Prior to data collection the survey was reviewed and approved by the School of Design and Informatics Ethics Committee at Abertay University. 4.1 Survey Items Quantitative Items. This included demographic questions, as well as questions to assess the perceived usability of the extension. The System Usability Scale was used to assess participants’ subjective rating on the usability of the extension. This scale was chosen as it’s an established metric, proven to be highly robust and versatile [26]. It consists of 10 standard questions aimed at measuring the effectiveness, efficiency, and satisfaction of a system/piece of software, scored on a Likert scale from Strongly Disagree to Strongly Agree. These questions were adapted for the browser extension and are shown in Table 1. One additional Likert scale question was added, presenting the assertion that “I’m comfortable with my amount of exposure to hateful content online”. This question was added as a factor to determine how useful the extension would be for the user. Table 1. System usability scale questions adapted for survey Number

Question

Q1

I think I would use this extension frequently

Q2

The extension looked unnecessarily complex

Q3

The extension looked easy to use

Q4

I think I’d need the support of a technical person to be able to use this extension

Q5

I found the functionality of this extension well integrated with the site

Q6

The extension was designed in an aesthetically pleasing way

Q7

I would imagine that most people would learn to use this extension very quickly

Q8

The extension looked very cumbersome to use

Q9

I think I would feel very confident using the extension

Q10

I would need to learn a lot of things before I could get going with this extension

Qualitative Feedback. Three open-ended questions were added to gather more detailed perceptions. Firstly, “Is there any other ways the browser extension could be improved, based on the video viewed?” This question was used as general feedback for participants to voice their opinion and suggestions on how the extension itself should be executed. After this, the question “Which websites would you see this extension being useful

640

J. Bowker and J. Ophoff

on?” was asked to allow the user to add site suggestions they would find it useful on. Finally, the question “If uncomfortable using this extension, what would make you more comfortable using an extension to limit your exposure to hate speech?” was asked. This question is aimed at any participants that aren’t comfortable with the concept of such an extension and gives an opportunity to explain what their reasons are.

4.2 Survey Results In total 124 participants completed the survey. Slightly over half of participants (50.8%) were aged 18–24. Most participants identified as either male (44.4%) or female (42.7%). The highest level of education is University (50%), followed by higher/further education (30%), and a post-graduate degree (9.7%). Participants were asked which social media platforms they use, to determine which platforms would be most useful to extend functionality to in the future. The major platforms identified were Facebook (76.6%), Instagram (71.8%), Twitter (71.8%), Snapchat (47.6%), Reddit (41.1%), and TikTok (27.4%). Concerning the statement “I’m comfortable with my amount of exposure to hateful content online” (hereafter referred to as ‘comfort with exposure’) the participants generally held a neutral position (M = 2.99, SD = 1.334). Spearman’s correlation was computed to assess the relationship between comfort with exposure and the demographic variables. There is a significant correlation between gender and comfort with exposure, with participants identifying as female and non-binary being less comfortable (r = −.38, 95% BCa CI [−.523, −.199], p < .001). This relationship is illustrated by the histogram in Fig. 7. This result supports prior research findings [12] and helps to identify at-risk individuals. No significant correlation was found between comfort with exposure and age, or level of education.

Fig. 7. Relationship between Comfort with Exposure and Gender

The responses were analyzed to see which areas of the System Usability Scale stood out as strengths and weaknesses. The final score was found to be 77.3 out of 100, resulting in an above average score. Previous research has shown that scoring upwards of

Reducing Exposure to Hateful Speech Online

641

71.4 would rank the result as ‘good’, with results of over 85.5 ranked as ‘excellent’ [27]. Figure 8 shows the total responses for each question, keeping in mind that odd numbered questions are phrased positively (meaning Strongly Agree indicates better usability), and even numbered questions phrased negatively (and therefore Strongly Disagree indicating better usability).

Fig. 8. Stacked bar graph showing system usability scale responses

Normalized scores for the questions are shown on the vertical axis, which takes account of the different phrasing and scores the questions on how positive the results were.

5 Discussion 5.1 Design A high priority when designing the browser extension was ensuring it was accessible to users of all levels of technical proficiency. This was done by making the purpose of the extension clear through making the interface as simple as possible as well as providing materials to assist users in understanding how the extension works via a demonstration video. The extension was developed through multiple iterations, with the first version demonstrated to survey participants in a Demonstration Video. The majority of the feedback regarding this version involved the fact that the text was too intrusive, as well as users being able to “fill in the blanks” of the blocked word since the rest of the sentence was visible, with over 30 responses mentioning this. For example: “Perhaps find a way to block out sentences after the blocked word as you will know the word you have blocked out but an Excellent first step.” This feedback was addressed with the final version of the extension, as shown in Fig. 5. In the initial design of the extension, it was proposed to show the user the specific type of hate speech the extension was hiding, but due to the method of word detection used, the extension could only tell whether an element was hateful or not, and not the

642

J. Bowker and J. Ophoff

specific reason it was. Therefore, the number of instances on the page were displayed to inform the user. This functionality could be implemented in the future by using a local ML model. The decision was made not to include a local ML model for the proof-of-concept due to the Perspective API being perceived as more accurate and being easier to implement, but after testing was found to be unusably slow when used with complex webpages. Sentiment analysis could be integrated to achieve a similar level of responsiveness as word-detection if implemented locally, while also opening the possibility of giving context-dependent warnings. Survey participants praised the overall ease of use of the extension, with Q2, Q3, and Q4 related to the level of knowledge required to use it, with Q4. “I think I’d need the support of a technical person to be able to use this extension” being the highest scoring answer with an average response of 1.2 and the majority of participants choosing Strongly Disagree which is a sign that users understand the purpose and usage of the extension, leaving responses such as: “It looks clean and easy to use as is.” 5.2 Technical Implementation A core objective of this research was to develop a browser extension that would be able to detect and block hateful content on webpages. The extension was successful in this regard in that it effectively detected relevant elements of the target page, sending these using an API to communicate with the server that processed these results. It was also used to inform users of the status of the extension, and the number of instances of hate speech on any page. A common point of feedback found in survey results regarding the technical functionality of the extension had to do with integrating a ‘wordlist’ or ‘machine learning’. For example, “I think the proposed method of adding words to a block list or using machine learning would be a good method to block words however I would like to see a user manageable list as some people have different qualifiers as what is hate to them.” This functionality (a wordlist and cloud-based sentiment analysis powered by the Perspective API) was implemented in the final version of the extension. A targeted approach was used when building the detection functionality of this extension, and this significantly simplified the process of detecting elements on the chosen site, where the Paragraph tag was used. A downside of this method of element detection was that a lot of irrelevant elements were captured and processed such as page headings and navigation/side bars. This didn’t cause issues when carrying out local word detection, as the API traffic stayed on the local machine, however if this server was moved to a remote location, it’s possible the network connection would be a bottleneck. Along with this extension, a server was implemented to carry out the processing function of the browser extension, enabling the two systems to exist independently. This would allow the browser extension to be modified and configured with different detection models, and the server to be used with any application that wishes to integrate with the HTTP API. Python was the language of choice for this server, due both to the simplicity of configuring an API using Flask and the wide range of libraries available for text processing and Machine Learning. The extension was built with an intent to make the process of integrating new detection algorithms simple, with a standard JSON input format of sentence:true/false and each method contained within a function

Reducing Exposure to Hateful Speech Online

643

inside the analyser. The server-side approach was successful in that it integrated well with the extension and performed well when working with a local word detection model. Problems were experienced however when working with the remote Perspective API, where due to every paragraph element of the page being individually processed, often the rate limit of 60 requests per minute was reached before classifying all the elements on the page. Although the same rate limits were not experienced with the similar Google Cloud Language, this API was less catered towards toxicity and similarly to Perspective API, the time spent waiting for a response made these methods unusably slow on websites that were more complex.

6 Conclusion The aim of this research was to investigate how a browser extension can be used to reduce exposure to hate speech online. It found, through investigation of existing literature, that NLP can play a valuable role in hate speech detection due to its ability to recognize the context in speech, and how browser extensions can be an effective method of managing what users are exposed to online. Through the development and evaluation of a proofof-concept extension, it was found that users were receptive to this method of reducing their exposure to hate speech online. This research was mainly focused on the acceptability and usability of the approach for users. It was found that the extension was highly usable, with an overall System Usability Score of 77.3 (good). In addition, survey results suggest that the public are open to using browser extensions or similar solutions to reduce their exposure to hate speech online, with most of the negative feedback related to the fact that the demonstration was based on an early version of the extension. Due to COVID-19 restrictions, the extension could only be demonstrated to participants using a video recording of its functionality. Although it is believed that this still presents a sufficient basis for evaluation, additional in-person evaluations would be beneficial. It was found that sentiment analysis and NLP, in general, could greatly assist in reducing the amount of hateful speech online, especially when online platforms integrate it to deal with toxic accounts directly. Browser extensions, however, have the potential to play an important role for individuals that wish to cater their online experience separate from what online platforms deem as acceptable. The survey results indicate that gender could be an important indicator of at-risk individuals, with those identifying as female and non-binary being significantly less comfortable with the amount of exposure to hateful content online. Future research could investigate the impact on these groups in more detail, and what they would prioritize in a technical solution. The extension was designed to be simple for the end user, whilst maintaining a level of interoperability and extensibility to add functionality in the future. For this purpose, it was designed as two parts, and using a standard API allows for the server or extension to be modified without having to make major changes to the protocol. While the proofof-concept only targeted the classic interface of Reddit, the survey results showed this to be one of the top five platforms in use and thus relevant in the context of this research. The survey results confirmed several other platforms for future targeting.

644

J. Bowker and J. Ophoff

A limitation present in the current version of the extension was the choice made between having a less accurate and context unaware word detection model that processes quickly, or a more accurate context aware sentiment analysis model that hinders performance. This limitation could be addressed by using a local ML model. However, a larger challenge would be bringing this technology to increasingly popular mobile platforms where content injection is impossible in most instances. How best to manage these constraints presents an opportunity for further research.

References 1. Facebook: Hate speech | Transparency Centre. https://transparency.fb.com/en-gb/policies/ community-standards/hate-speech/. Accessed 15 Nov 2021 2. The Crown Prosecution Service: Hate crime | The Crown Prosecution Service, https://www. cps.gov.uk/crime-info/hate-crime. Accessed 15 Nov 2021 3. Oksanen, A., Hawdon, J., Holkeri, E., Näsi, M., Räsänen, P.: Exposure to online hate among young social media users. In: Soul of Society: A Focus on the Lives of Children & Youth, pp. 253–273. Emerald Group Publishing Limited (2014). https://doi.org/10.1108/S1537-466 120140000018021 4. Barendt, E.: What Is the Harm of Hate Speech? Ethical Theory Moral Pract 22(3), 539–553 (2019). https://doi.org/10.1007/s10677-019-10002-0 5. Google: Toxicity. https://jigsaw.google.com/the-current/toxicity/. Accessed 15 Nov 2021 6. Gogarty, K., Silva, S.: A new study finds that Facebook is not censoring conservatives despite their repeated attacks. https://www.mediamatters.org/facebook/new-study-finds-fac ebook-not-censoring-conservatives-despite-their-repeated-attacks. Accessed 15 Nov 2021 7. BBC News: Zuckerberg outlines plan for “privacy-focused” Facebook (2019). https://www. bbc.com/news/world-us-canada-47477677 8. Jain, S., Kamthania, D.: Hate Speech Detector: Negator. Social Science Research Network, Rochester, NY (2020). https://doi.org/10.2139/ssrn.3563563 9. Howard, J.W.: Free speech and hate speech. Annu. Rev. Polit. Sci. 22, 93–109 (2019). https:// doi.org/10.1146/annurev-polisci-051517-012343 10. Harvard Law Review: Matal v. Tam Leading Case: 137 S. Ct. 1744. https://harvardlawreview. org/2017/11/matal-v-tam/. Accessed 15 Nov 2021 11. Leets, L.: Experiencing hate speech: perceptions and responses to anti-semitism and antigay speech. J. Soc. Issues 58, 341–361 (2002). https://doi.org/10.1111/1540-4560.00264 12. The Economist: Measuring the prevalence of online violence against women. https://online violencewomen.eiu.com/. Accessed 15 Nov 2021 13. Manqola, T., Garbutt, M., Ophoff, J.: Cyberhate: Profiling of Potential Targets. CONF-IRM 2020 Proceedings (2020) 14. Shinigami Eyes: An extension that highlights trans-friendly and anti-trans social network pages. https://shinigami-eyes.github.io/. Accessed 15 Nov 2021 15. Modha, S., Majumder, P., Mandl, T., Mandalia, C.: Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance. Expert Syst. Appl. 161, 113725 (2020). https://doi.org/10.1016/j.eswa.2020.113725 16. Stanˇcin, I., Jovi´c, A.: An overview and comparison of free Python libraries for data mining and big data analysis. In: 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 977–982 (2019). https://doi.org/ 10.23919/MIPRO.2019.8757088

Reducing Exposure to Hateful Speech Online

645

17. Roß, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., Wojatzki, M.: Measuring the reliability of hate speech annotations: the case of the european refugee crisis. Presented at the NLP4CMC III: 3rdWorkshop on Natural Language Processing for Computer-Mediated Communication 22 September 2016 September (2016). https://doi.org/10.17185/duepublico/ 42132 18. MacAvaney, S., Yao, H.-R., Yang, E., Russell, K., Goharian, N., Frieder, O.: Hate speech detection: challenges and solutions. PLoS ONE 14, e0221152 (2019). https://doi.org/10.1371/ journal.pone.0221152 19. Kumar, R., Ojha, A.Kr., Malmasi, S., Zampieri, M.: Benchmarking aggression identification in social media. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 1–11. Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018) 20. Malmasi, S., Zampieri, M.: Detecting Hate Speech in Social Media. arXiv:1712.06427 [cs]. (2017) 21. Newcomer, E., Bloomberg: Racism is rampant on Reddit, and its editors are in open revolt, https://fortune.com/2020/06/20/racism-reddit-editors-in-revolt/. Accessed 24 Jan 2022 22. Flask: Welcome to Flask — Flask Documentation (1.1.x). https://flask.palletsprojects.com/ en/1.1.x/. Accessed 15 Nov 2021 23. Hatebase: Hatebase. https://hatebase.org/. Accessed 15 Nov 2021 24. von Ahn, L.: Offensive/Profane Word List. https://www.cs.cmu.edu/~biglou/resources/. last accessed 2021/11/15 25. Perspective: About the API – FAQs. https://support.perspectiveapi.com/s/about-the-api-faqs. Accessed 15 Nov 2021 26. Bangor, A., Kortum, P.T., Miller, J.T.: An empirical evaluation of the system usability scale. Int. J.Human-Computer Interaction. 24, 574–594 (2008). https://doi.org/10.1080/104473108 02205776 27. Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usability Studies 4, 114–123 (2009)

Hybrid-AI Blockchain Supported Protection Framework for Smart Grids S Sai Ganesh(B) , S Surya Siddharthan, Balaji Rajaguru Rajakumar, S Neelavathy Pari, Jayashree Padmanabhan, and Vishnu Priya MIT Campus, Anna University, Chennai, India [email protected]

Abstract. In the digtal era, smart grid is a vital critical digital infrastructure of the nation that has concern for security against cyber threats in its power and communication infrastructures. Huge data gathering and bi-way information ﬂows open up the potential for compromising data security and conﬁdentiality. Cyber security requirements of data availability and data integrity are addressed in the proposed work by a deep learning security framework to mitigate False Data Injection Attacks (FDIA) that aﬀects data integrity and Distributed Denial of Service (DDoS) attacks that are threats to data availability. The proposed Hybrid-AI Blockhain supported Protection Framework (HABPF) utilizes a hybrid of Recurrent Neural Networks (RNN) and LeNet5 based Convolutional Neural Networks (CNN) to protect the communication infrastructure of smart grid. The proposed system also leverages blockchain to store all grid data to add a layer of security against data tampering. The extensive performance and security evaluation of the HABPF framework through the IEEE 14 bus system reveals that the proposed framework is highly competent with 96% accuracy in detecting attacks compared to other related works. Keywords: Smart grid · FDI Attack · DoS Attack network · Cyber physical system · Blockchain

1

· Deep neural

Introduction

The latest advancements in the Industrial Internet of Things (IIoT) and communication technology have enabled Cyber-Physical Systems (CPS) as core components in critical infrastructure systems. Smart grid is one such CPS system that augments the electric grid with sensing and metering capabilities. Smart grids support more eﬃcient transmission of electricity compared to traditional grids. Smart grids being highly interconnected and distributed widely over a large area, are susceptible to various attacks. Security standards like NIST (National Institute of Standards Technology) and NIS (Networks & Information Systems) have notiﬁed that the four key cyber essentials to be addressed for smart grid are device level security, cryptographic key management, networking security and system level security. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 646–659, 2022. https://doi.org/10.1007/978-3-031-10467-1_39

HABPF

647

FDI attack and Denial of Service (DoS) are the most concerned cyber-attacks in the smart grid network [17]. During the state estimation phase, adversaries can use the FDIA to change the energy supply and demand ﬁgures, leading to additional costs or more devastating hazards in the smart grids. In [11], a novel methodology to defend against FDIA through a sparse optimization method is proposed. Furthermore, the authors of [13] have proposed a Kalman ﬁlter-based methodology to detect FDIA. Machine learning techniques are commonly used to extract necessary information and develop insights into the smart grid [6,10]. Machine learning models such as Deep Belief Network (DBN) [9] and Support Vector Machine (SVM) [18] are proposed for the detection of FDIA through the Bad Data Detection (BDD) mechanism. The BDD method is not suitable as it ignores the complexity of smart grids and ignores those attacks that are unobservable. In [21], a Deep Neural Network (DNN) based technique is proposed to detect normal and abnormal measurements and wavelet transformation. Further, a Recurrent Neural Network (RNN) to detect the attacks in the smart grid using the temporal characteristics of the measurements is proposed in [12,19]. Another similar kind of attack aﬀecting smart grids is the Covert DataIntegrity Attack (CDIA). A methodology is proposed in [3] to remove the false data generated in the CDIA. This methodology involves reconstructing the measurement data using a denoising autoencoder. Further, it is shown that denoising autoencoders provide a better result than autoencoders. Further, the authors of [4] proposed an unsupervised learning-based algorithm to detect CDIA in power systems. The authors utilized an isolation forest to detect CDIA eﬀectively. The second serious cyber attack of concern, due to its high impact on the availability of the services supported by the smart grid, is the Denial of Service (DoS) attack. A malicious entity can utilize the DoS attack to disrupt the service to the users. The authors of [5] proposed a Multiple Kernel Learning for Dimensionality Reduction algorithm (MKLDR) for DDoS attack detection in the smart grid. The authors utilized a multilevel autoencoder to detect DDoS attacks in the smart grids with an accuracy greater than 90%. Furthermore, [2] proposed a DDoS attack classiﬁcation/detection using supervised ML techniques such as random forests, K nearest neighbors, and SVM. An essential operation in power grids is the state estimation operation. The state estimation operation utilizes the measurement data from the Supervisory Control And Data Acquisition (SCADA) as input. It generates an output that contains the estimates of the current system states. Weighted Least Square (WLS) is the traditional method to estimate the system states. The measurement data contains parameters such as the actual power injection, reactive power injection, real power ﬂows, and reactive power ﬂows. The system state vector includes the values of voltage magnitude and phase angles at all the buses. State estimation aims to obtain actual states by maximizing the likelihood of estimated states which is equivalent to minimizing the residuals between the estimated and actual measurements.

648

S. S. Ganesh et al.

A method for state estimation through the Least Absolute Value (LAV) estimator was proposed in [7]. However, in [7], the Phasor measurement unit (PMU) is solely utilized for state estimation. Due to the coexistence of SCADA and PMU, [8,22] proposed utilizing data from both SCADA and PMU with weighted LAV estimators to estimate the system state eﬃciently. Alternatively, [20] proposed a divide and conquer approach to improving the calculation eﬃciency. The power system is divided into multiple areas, and the estimation is carried out across multiple areas with PMU based linear estimation. The accuracy of detecting FDIA and DDoS attacks in the existing work are not up to the mark. Furthermore, most of the existing literature mainly focus on the detection of FDI attacks and not its mitigation. Also, many works deal with only the static states, but the power grid and its systems are dynamic. These attacks can lead to infrastructural failure, blackouts, energy theft, customer privacy breach, and endanger the safety of operating personnel. The power distribution will constantly be changing due to weather, customer behaviors, and other events. Due to these random changes in the behavior of the power system, the physics behind the state estimation is diﬃcult to understand. Motivated by these challenges, a data driven deep leaning security model that can address the challenges is proposed. Further, there is a lack of data source authentication in the existing systems. Since blockchain supports immutable, veriﬁable and tamper-resistant transactions, it can be used for secure storage. In addition, blockchain technology can provide greater levels of availability and fault tolerance because the data is distributed across multiple nodes. Each block in the block chain stores the previous block’s cryptographic hash, a time stamp and the data related to the transaction. This hashing technology makes the blockchain immutable as the modiﬁcation of a single block will require the modiﬁcation of each of the subsequent blocks. These advantages of blockchain can be integrated with the smart grid to provide a secure storage for all the data in it [14,16]. The main contributions in the work are as follows: – A highly accurate machine learning based approach is introduced to perform state estimation instead of the traditional WLS method. – LSTM is used to exploit the spatial and temporal correlations on the system state variables from the historical data to learn the FDIA characteristics against injected attack vector to make FDIA observable. – The work proposes a computationally faster data driven CNN for detecting attacks in milliseconds enabling feasibility of real time attack detection. – The work leverages blockchain to provide secure storage and nodal data ﬂow authentication in smart grids. The rest of this paper is organized as follows. Section 2 presents a brief background on state estimation, lousy data detection, and FDI attacks. Section 3 focuses on the proposed HABPF framework for FDI attack mitigation and DDoS attack detection. Simulation results of the HABPF framework are analysed in Sect. 4. Section 5 concludes the paper and some future research directions.

HABPF

2 2.1

649

Background State Estimation

The smart grid uses a supervisory Control and Data Acquisition (SCADA) system to accurately collect and monitor real-time data from the lines and buses. The SCADA system then transmits the data to the control center. The measured data contains details about active power ﬂow on lines and active power injection at buses. The state variables of the state vectors are voltage magnitudes and phase angles. The following equation represents the relationship between the system state vector β ∈ Rn and the measurements vector α ∈ Rm α = Jβ + γ

(1)

where J is the Jacobian matrix and denotes the measurement error, various methods estimate the system state vector. The state estimation of a power grid by the commonly used Weighted Least Square method is as follows: β = (J T DJ)−1 J T Dα

(2)

where D is a diagonal matrix whose elements are reciprocals of the variances of meter errors. 2.2

Bad Data Detection

The data from state estimation may get corrupted due to various factors such as system errors like noise from communication medium, sensor faults, or bad actors like cyber attacks. One of the ways to do bad data detection is to compare the L2 normalization of the measurement error γ with the threshold τ in traditional bad data detection. The data is considered bad when the L2 normalized error is greater than the threshold. 2.3

False Data Injection Attack

In a FDIA attack, the attacker may introduce an attack vector z to the measurement vector α, resulting in the following corrupted measurements. αz = α + z

(3)

The system state vector would change into zα = z + κ

(4)

where κ ∈ Rn is the change made on the estimates system state variables. If the attack vector satisﬁes the formula z = Jκ, the L2 normalization of the residue of the attacked measurement data would be αz − Jβ = α + z − J(β + κ)

(5)

650

S. S. Ganesh et al.

αz − Jβ = α − Jβ + z − Jκ

(6)

The L2 normalization of the residue of the attacked measurement data is the same as the residue of the original measurement data. If the residue of the original measurement data can pass the Bad Data Detection, then the residue of the attacked measurement data can also pass the Bad Data Detection. Thus, the attacker can easily do a FDIA on the power system.

Power Plants

Central Power Station

Smart Transformer

Smart Transformer

HAPF framework based protection

Wind Farms

Smart Transformer

Smart Transformer

Houses

Offices

Commercial Building

Control Centre

Micro Gird

Fig. 1. HABPF architecture

2.4

Distributed Denial of Service

Distributed Denial of Service (DDoS) is a distributed targeted attack that can bring parts or the whole grid down. It overﬂows the central servers with traﬃc from various places worldwide, making the power grid unusable. DDoS attacks can lead to catastrophic events where a couple of streets or even cities could lose power.

3

Proposed Scheme

The proposed HABPF framework augments the smart grids with AI enabled attack mitigation and blockchain based integrity and authentication mechanisms. A permissioned blockchain network is used to interconnect the various entities part of the augmented smart grid. The proposed framework is deﬁned as S. S {EP M , EST , ECC , ECP S }

(7)

The HABPF system enabled smart grid consists of the following components:

HABPF

651

1. Power Meters (EP M ): These smart meters collect the electricity consumption data and transmit it to the nearest ECC . Further, they also store the average consumption data for the last φ minutes in the permissioned ledger. 2. Smart Transformers (EST ): These transformers are nodes in the blockchain that transform and distribute power in the grid. They also share usage data with the sub nodes. The smart transformers act as normal transformers which are augmented with nodal data ﬂow authentication using blockchain. 3. Control Centre ECC : The control center monitors all the data while protecting it from diﬀerent types of security attacks using the HABPF based protection. 4. Central Power Station ECP S : The Central Power Station transfers the power to the grid from diﬀerent power generation plants like wind farms or hydro electric power plants. Furthermore, the HABPF framework monitors the data before it reaches the control center. The system veriﬁes the data and veriﬁes that the data has not been aﬀected by FDI attacks and DDoS attacks. If the system suspects an attack, it reconstructs the original data from the corrupted data. The entire HABPF framework is shown in Fig. 1. 3.1

Data Integrity and Authentication

A permissioned blockchain network is used as an overlay network to connect the diﬀerent components of the smart grid as a part of data and device authentication. The record of the ith EP M in the permissioned ledger is denoted as i RP M i i i i i i i RP M = {ID , A , V , D , θ , sig[ ] }

1. 2. 3. 4. 5. 6.

(8)

IDi : Node ID. Ai : Amplitude that is calculated from the average current voltage. V i : AC Voltage. Di : Record Duration for Average. θ: AC Phase angle. sig[]i : Path of data transfer. Contains all the nodes that are in the path from EPi M to ECC

The permissions available to each of the EPi M allows it to update the values of Ai , V i , Di , θi . Each ES T j in the data transmission path from EPi M to ECC adds their stored key in the signature Hashmap. The permission to directly view the values in the signature Hashmap is presented only to the ECC . The other nodes part of the permissioned ledger network can view the signature values in their hashed form. It is ineﬃcient to frequently update the data stored in the blockchain as records as it requires a high amount of computational power. The proposed

652

S. S. Ganesh et al.

HABPF framework overcomes this problem by abstracting the blockchain network from the underlying network. The EP M continues to transmit the minuteby-minute changes in power consumption directly and not over the blockchain. i This data is collected and stored in the data repository of ECC . The record RSM is only changed once every φ mins and the values updated are the average values over time. The averages of the phase angle θ are calculated as an average of their complex number representations. The complex number (a + bi) representations are derived as shown below: a = cos(

θ∗π ) 180

(9)

b = sin(

θ∗π ) 180

(10)

i i Similarly, RST deﬁnes the record of the EST . It contains a similar structure as RP M and functions similarly. The availability of these nodes’ values along the path allows multi-value authentication of the data ﬂowing as a stream from the EP M to ECC . A Randomized Power Path authentication algorithm is utilized for authenticating the power consumption inﬂow at diﬀerent nodes by the ECC . The authentication process is shown in Algorithm 1. The Find Nonzero Keys() function fetches a list of nodes that are in the path from the hashmap sig[]. The F etch X f ordurationstream (M, N ) function fetches the average of the stream value of X for the M t h node for the duration of N .

3.2

FDIA Mitigation

FDIA detection and mitigation contains two critical steps: state estimation and attack mitigation. The algorithm for FDIA mitigation is shown in Algorithm 1. 1. State estimation: State estimation is critical in power grid operations. It receives raw measurements from the SCADA system and generates critical inputs for other applications that require reliable estimates of current system states. The AC power ﬂow model susceptible to AC false data injection attack is deﬁned by the following set of equations where the system state is deﬁned by the bus voltage (Vi) and phase angle(θ i) and the non linear relationship between the real/reactive power injection at bus i (Pi /Qi) and the real/reactive power ﬂow between any two connected buses (Pij/Qij) is depicted. Vj (Gij cosθij + Bij sinθij ) (11) Pi = V i j∈Ni

Qi = Vi

Vj (Gij sinθij + Bij cosθij )

(12)

j∈Ni

Pij = Vi2 (gsi + gij) − Vi Vj (gij cosθij + bij sinθij )

(13)

HABPF

Algorithm 1. Randomized Power Path Authentication

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:

Input IDj J Output Authentication of RP M procedure Auth P M (j) i P ath = F ind N onzero Keys(RP M (sig[])) V erif ication node x = Random(P ath) Seed Random() V erif ication node y = Random(P ath) for Enode ∈ under(V erif ication node x) do avg Ax = avg Ax + Rnode (A) avg θx = Complex add(avg θx , Rnode (θ)) checkvalsx = compare(avg Ax , avg θx , RV erif ication node x) for Enode ∈ under(V erif ication node y) do avg Ay = avg Ay + Rnode (A) avg θy = Complex add(avg θy , Rnode (θ)) checkvalsy = compare(avg Ay , avg θy , RV erif ication node y) if checkvalsx == True or checkvalsy == True then j streamA = F etch A f ordurationstream (j, RP M (D)) j streamθ = F etch θ f ordurationstream (j, RP M (D)) j nodever = compare(streamA , streamθ , RP M) if nodever == T rue then return Random authentication successful j return Discard data of EPj M for the time RP M (D) SCADA SYSTEM

Real Time Load Data

Reconstruct Measurement Data

Data Preprocessing

NOISE Removal using CNN

MATPOWER Simulation

State Variables

LSTM State Estimation

FDIA MITIGATION

Good Measurement Data

Fig. 2. HABPF ﬂow for FDIA mitigation

653

654

S. S. Ganesh et al. Table 1. Parameters for the CNN model utilized for FDIA mitigation Layer

Output shape

Param.

Conv2D

(None, 5, 1, 8) 16

Activation

(None, 5, 1, 8) 0

Conv2D

(None, 5, 1, 8) 72

Batch Normalisation (None, 5, 1, 8) 32 Activation

(None, 5, 1, 8) 0

Conv2D

(None, 5, 1, 8) 72

Batch Normalisation (None, 5, 1, 8) 32 Activation

(None, 5, 1, 8) 0

Conv2D

(None, 5, 1, 8) 72

Batch Normalisation (None, 5, 1, 8) 32 Activation

(None, 5, 1, 8) 0

Conv2D

(None, 5, 1, 1) 9

Batch Normalisation (None, 5, 1, 1) 4 Activation

(None, 5, 1, 1) 0

Max Pooling2D

(None, 5, 1, 1) 0

Flatten

(None, 5)

Dense

0

(None, 128)

768

Batch Normalisation (None, 128)

512

Activation

(None, 128)

0

Dropout

(None, 128)

0

Dense

(None, 100)

12,900

Batch Normalisation (None, 100)

400

Activation

0

(None, 100)

Dense

(None, 2)

202

Activation

(None, 2)

0

Qij = Vi2 (bsi + bij) − Vi Vj (gij sinθij + bij cosθij )

(14)

where Pi and Qi are the real and reactive power injections of the bus i and Pij and Qij are the real and reactive power ﬂow from bus i to bus j. Vi and θi are the voltage magnitude and phase angle at bus i. We use a LSTM Neural network as an encoder to map the nonlinear relationship of measurements and states and sigmoid activation function is used. 2. FDIA mitigation: Bad Data Detection (BDD) is a phase in smart grid to check the measurement data from the SCADA system. The attacker can fool the BDD mechanism to inject false data into the power system. To mitigate this, a CNN model as mentioned in Table 1 is used with ReLU as the activation function for hidden layers. The input contains voltage magnitude and phase angle from all the buses calculated from the state estimation using LSTM

HABPF

655

from the previous step, and the output is the model’s prediction on whether the data is tampered or not. Since we are using binary classiﬁcation, the sigmoid activation function works well. 3.3

DDoS Attack Detection

The workﬂow diagram for the proposed DDoS attack detection is shown in Fig. 2. In a DDoS attack, the adversary will introduce attack data to the primary service source and cause blockage to the regular service. LeNet 5, a traditional CNN architecture shown in Fig. 3, is used for DDoS classiﬁcation from the labeled network traﬃc dataset. LeNet5 has six layers (2 convolutional layers, two subsampling layers, and two fully connected layers). UNSW-NB15 dataset [15] is used to classify the attack data. The proposed model is used to train and test two datasets, one with 10,000 samples and another one with 7,00,000 samples, to compare the performance of the proposed HABPF model with [5].

INPUT Full Connection

Pooling Convolution

Pooling Convolution

Convolution

Fig. 3. Lenet5 architecture

4 4.1

Results and Analysis Simulation Settings

The measurement data is calculated with the simulator from the real-time load data acquired from [1]. In real-time, the measurement data will be available from the SCADA system of the smart grid. Here the real-time load data is simulated and the measurement vector is generated using MATPOWER, an open-source extensible architecture in MATLAB for simulating power grids. Also the parameters such as generator values and the topology of each node can be generated. The state vector consists of bus voltage magnitude and angle generated from the IEEE 14-bus system. The real-time load data for two months is acquired from NYISO [22]. The load data is collected with a ﬁve minute time interval and 17,858 samples are considered and the data is normalized to the range of [0, 1] with per day values. The normalized data is then split into training and testing data sets, each with the values for one month. The training dataset has 8810 samples, and the testing dataset has 9048 samples.

656

4.2

S. S. Ganesh et al.

Performance Analysis

The data from the smart grid is stored and maintained in Hyperledger Fabric, an open-source enterprise grade permissioned blockchain framework which utilizes the RAFT consensus protocol. The RAFT consensus protocol follows leaderfollower approach which is highly eﬃcient for high frequency of transactions. Alternatively, HABPF could have utilized a non-permissioned PoW (Proof of Work) blockchain network. This would have led to increased transaction time and lesser access control over smart contracts. Table 2. Transaction time in HABPF blockchain Transactions Time (ms) 5

928

10

1850

50

9280

100

18571

150

27857

400

74285

750

139285

1000

185000

Table 2 infers that there is a linear increase in time for transaction execution as the number of transactions increase. The time depends on the Proof of Authority (PoA) consensus algorithm utilized by the blockchain framework. 4.3

Accuracy Analysis

The LSTM model for state estimation has been compared with the traditional WLS method. The LSTM model has an increased accuracy when compared to the WLS method. Table 3 inferred that the mean and standard deviation of voltage angle and voltage magnitude of LSTM is way less than the WLS method. The Standard deviation of the voltage magnitude is slightly more signiﬁcant than the WLS method. The comparison of RMSE values of estimated voltage magnitude and phase angle for WLS and LSTM models are shown in Fig. 4. Table 3. Evaluation of methods for state estimation Method Mean V Mean θ Standard deviation V Standard deviation θ WLS

0.0014

0.0027

0.00066

0.0017

LSTM

0.0012

0.0017

0.00095

0.0007

HABPF

657

Fig. 4. Comparison of root mean square error during state estimation

The CNN model for FDIA mitigation from HABPF has an accuracy of 90% as shown in Fig. 5(B) While, Fig. 5(A) shows the accuracy of the CNN model for the DDoS attack detection. Table 4 shows that the CNN model has increased accuracy compared to the existing system. For a small dataset with 10000 samples, the existing multilevel autoencoders [5] have shown an average accuracy of 95%. However, the proposed LeNet 5 model has shown an accuracy of 96.5%. The proposed model has shown a slight improvement for the small dataset. However, for the dataset with 7,00,000 samples, the existing system [5] has shown an average accuracy of 90%. The proposed model has shown a considerable improvement in accuracy of 99%. Table 4. Accuracy comparison of DDoS classiﬁers Model

Accuracy

Multilevel encoders [20]

92

SVM [21]

92

KNN [21]

95

Proposed CNN model (for small dataset) 96.5 Random forest [21]

96.66

Proposed CNN model (for large dataset) 99

658

S. S. Ganesh et al.

Fig. 5. Training statistics of HABPF CNN models

5

Conclusion

The work herein proposed an attack mitigation system equipped with simulation, a state estimation module, FDI attack mitigation module, and DDoS detection module. In contrast to previous works, which mostly rely on the traditional WLS method for state estimation, the proposed work performs state estimation using LSTM network model, which shows better results. In detecting DDoS attacks in smart grid, the proposed work uses the CNN model to achieve increased accuracy. Further, the proposed framework utilizes a blockchain based randomized power path authentication algorithm to infer the validity of power consumption data received at the control center. The experiments indicated that the proposed system achieves good performance when compared to the existing works. Though existing multilevel autoencoder based system shows promising results for smaller dataset, the accuracy reduces for larger datasets. However, the proposed system has shown a better accuracy on larger datasets. Future work can involve developing light-weight blockchains that are suitable for smart grids to improve the performance of the whole system.

References 1. Energy and market operational data. NYISO Load Data (2020) 2. Aamir, M., Zaidi, S.M.A.: Clustering based semi-supervised machine learning for DDoS attack classiﬁcation 33, 436–446 (2021) 3. Ahmed, S., Lee, Y., Hyun, S.-H., Koo, I.: Mitigating the impacts of covert cyber attacks in smart grids via reconstruction of measurement data utilizing deep denoising autoencoders 12(16), 3091 (2019) 4. Ahmed, S., Lee, Y., Hyun, S.-H., Koo, I.: Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest 14(10), 2765–2777 (2019) 5. Ali, S., Li, Y.: Learning multilevel auto-encoders for DDoS attack detection in smart grid network. 7, 108647–108659 (2019)

HABPF

659

6. Cui, M., Khodayar, M., Chen, C., Wang, X., Zhang, Y., Khodayar, M.E.: Deep learning-based time-varying parameter identiﬁcation for system-wide load modeling 10(6), 6102–6114 (2019) 7. G¨ ol, M., Abur, A.: LAV based robust state estimation for systems measured by PMUs. 5(4), 1808–1814 (2014) 8. G¨ ol, M., Abur, A.: A hybrid state estimator for systems with limited number of PMUs 30(3), 1511–1517 (2015) 9. He, Y., Mendis, G.J., Wei, J.: Real-time detection of false data injection attacks in smart grid: a deep learning-based intelligent mechanism 8(5), 2505–2516 (2017) 10. Lin, Y., Wang, J.: Probabilistic deep autoencoder for power system measurement outlier detection and reconstruction 11, 1796–1798 (2020) 11. Liu, L., Esmalifalak, M., Ding, Q., Emesih, V.A., Han, Z.: Detecting false data injection attacks on power grid by sparse optimization 5, 612–621 (2014) 12. Lore, K.G., Akintayo, A., Sarkar, S.: LLNet: a deep autoencoder approach to natural low-light image enhancement. Patt. Recogn. 61, 650–662 (2017) 13. Manandhar, K., Cao, X., Fei, H., Liu, Y.: Detection of faults and attacks including false data injection attack in smart grid using Kalman ﬁlter. IEEE Trans. Control Network Syst. 1(4), 370–379 (2014) 14. Mengelkamp, E., Notheisen, B., Beer, C., Dauer, D., Weinhardt, C.: A blockchainbased smart grid: towards sustainable local energy markets. Comput. Sci. - Res. Dev. 207–214 (2017). https://doi.org/10.1007/s00450-017-0360-9 15. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015) 16. Mylrea, M., Gourisetti, S.N.G.: Blockchain for smart grid resilience: exchanging distributed energy at speed, scale and security, pp. 18–23 (2017) 17. Nguyen, T., Wang, S., Alhazmi, M., Nazemi, M., Estebsari, A., Dehghanian, P.: Electric power grid resilience to cyber adversaries: state of the art. IEEE Access 8, 87592–87608 (2020) 18. Shi, H., Xie, L., Peng, L.: Detection of false data injection attacks in smart grid based on a new dimensionality-reduction method 91, 107058 (2021) 19. Tian, C., Fei, L., Zheng, W., Yong, X., Zuo, W., Lin, C.-W.: Deep learning on image denoising: an overview 131, 251–275 (2020) 20. Xu, C., Abur, A.: Robust linear state estimation using multi-level power system models with diﬀerent partitions, pp. 1–5 (2017) 21. James, J.Q., Yu, Hou, Y., Li, V.O.K.: Online false data injection attack detection with wavelet transform and deep neural networks 14, 3271–3280 (2018) 22. Zhao, J., Wang, S., Mili, L., Amidan, B., Huang, R., Huang, Z.: A robust state estimation framework considering measurement correlations and imperfect synchronization 33, 4604–4613 (2018)

Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-Purpose Regularizer and Privacy Budget Retrieval and Recycling Yinan Li and Fang Liu(B) University of Notre Dame, Notre Dame, IN 46556, USA [email protected]

Abstract. We propose Noise-Augmented Privacy-Preserving Empirical Risk Minimization (NAPP-ERM) that solves ERM with diﬀerential privacy (DP) guarantees. Existing privacy-preserving ERM approaches may be subject to over-regularization with the employment of a l2 term to achieve strong convexity on top of the target regularization. NAPP-ERM improves over the current approaches and mitigates over-regularization by iteratively realizing the target regularization through appropriately designed noisy augmented data and delivering strong convexity via a single adaptively weighted dual-purpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve the privacy budget when the strong convexity requirement is met, which can be returned to users such that DP is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the magnitude of injected DP noise and improve the utility of DP-ERM. From an implementation perspective, NAPP-ERM can be achieved by optimizing a non-perturbed object function given noise-augmented data and can thus leverage existing tools for non-private ERM optimization. We illustrate through extensive experiments the mitigation eﬀect of the overregularization and private budget retrieval by NAPP-ERM on variable selection and outcome prediction. Keywords: Diﬀerential privacy · Dual-purpose · Empirical risk minimization · Noise augmentation · Over-regularization · Privacy budget retrieval and recycle · Utility analysis

1 1.1

Introduction Background

Empirical risk minimization (ERM) is a principle in statistical learning. Through ERM, we can measure the performance of a family of learning algorithms based c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 660–681, 2022. https://doi.org/10.1007/978-3-031-10467-1_40

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

661

on a set of observed training data empirically without knowing the true distribution of the data and derive theoretical bounds on the performance. ERM is routinely applied in a wide range of learning problems such as regression, classiﬁcation, and clustering. In recent years, with the increasing popularity of privacypreserving machine learning that satisﬁes formal privacy guarantees such as differential privacy (DP) [7], the topic of privacy-preserving ERM has also been investigated. Generally speaking, diﬀerentially private empirical risk minimization (DP-ERM) can be realized by perturbing the output (estimation or prediction), the objective function (input), or iteratively during optimization. For the output perturbation, randomization mechanisms need to be applied every time a new output is released; for iterative algorithmic perturbation, each iteration incurs a privacy loss, careful planning and implementation of privacy accounting methods to minimize the overall privacy loss is critical. In this paper, we focus on diﬀerentially private perturbation of objective functions. Once an objective function is perturbed, the subsequent optimization does not incur additional privacy loss and all outputs generated from the optimization are also diﬀerentially private per the immunity-to-post-processing property of DP. 1.2

Related Work

Examples of output perturbation in DP-ERM include classiﬁcation via logistic regression that satisﬁes -DP [5] and variable selection in lasso-regularized linear regression that satisﬁes (, δ)-DP and also achieves near-optimal bounds for the excess risk with weaker assumptions than previous work [17]. A functional mechanism that perturbs the coeﬃcients of a polynomial representation of the original loss function is proposed in [25] that loosens the requirement on the normalization of predictors just to satisfy certain assumptions. Privacypreserving ERM with strongly convex regularizers for classiﬁcation problems was ﬁrst examined in [6] with -DP. The framework is subsequently extended to admitting convex regularization in general, variable selection and outcome prediction included, with (, δ)-DP in high-dimensional settings [14]. Privacypreserving kernelized learning proposed in [6] is extended to more general RKHS settings with a dimensionless bound for the excess risk of kernel functions in [11]. Dimension-independent expected excess risk bounds for l2 -regularized generalized linear models (GLMs) are established [12]. The worst-case excess risk bound is further improved [13]. Another line of work for privacy-preserving machine learning in general, ERM included, is through the iterative algorithmic perturbation [1–3,9,15,18,20–22,24]. A fast stochastic gradient descent algorithm is developed in [2] that improves the asymptotic excess risk bound with Lipschitz loss functions and bounded optimization domain. In addition to the theoretical and algorithmic development, DP-ERM has been applied to online learning [10,19] and GWAS databases in the setting of elastic-net regularized logistic regression [23], among others.

662

1.3

Y. Li and F. Liu

Our Contributions

Despite the extensive research on objective function perturbation with DP in ERM, there still exists room for improvement. First, to ensure DP, the current framework requires strong convexity for a perturbed objective function, which is often achieved by including an extra l2 term on parameters in addition to the target regularization term and the DP term. This may lead to over-regularization, deviating parameter estimates further away from those obtained in the nonprivate setting on top of the deviation due to the DP term. The extra l2 term may also lead to over-protection for privacy as it introduces additional perturbation to the objective function on top of the DP term; in other words, the actual privacy cost can be smaller than the privacy budget pre-set by the DP term. Second, when the DP noise term is large, it would dwarf the target regularizer, especially when the sample size is relatively small, and outputs from DP-ERM can deviate signiﬁcantly from those obtained in the non-private setting. We aim at overcoming the limitations of the current DP-ERM framework. Toward that end, we propose a dual-purpose regularizer that realizes the target regularization (diﬀerentiable or not) and strong-convexity simultaneously through a single adaptive weighted l2 regularizer. We name the procedure Noise Augmented Privacy-Preserving ERM (NAPP-ERM). NAPP-ERM can be achieved through optimization of a non-perturbed objective function but constructed with noise-augmented observed data. When the aim of ERM is variable selection, we propose a new noise term to perturb the objective function to achieve privacy guarantees and sparsity in parameter estimation simultaneously. Finally, we propose a strategy to retrieve the privacy budget when the strong convexity requirement is met. The retrieved privacy budget can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or it can be recycled to the ERM optimization procedure so to reduce the injected DP noise and improve the utility of DP-ERM at the pre-set privacy cost. From an implementation perspective, since NAPP-ERM can be achieved by optimizing a non-perturbed object function given noise-augmented data, we can leverage existing software or tools for non-private ERM optimization.

2

Preliminaries

Definition 1 ((, δ)-DP [7,8]). Let D, D be two data sets that diﬀer by only one entry. A randomized algorithm R is (, δ)-DP if for ∀(D, D ) and all result subsets S to query q, Pr(R(q, D)) ∈ S) ≤ e Pr(R(q, D )) ∈ S) + δ holds. > 0 and δ ≥ 0 are the pre-speciﬁed privacy loss parameters. When δ = 0, (, δ)-DP reduces to (pure) DP. Smaller or δ ﬁxing the other leads to more privacy protection of the individuals in the data. Denote the observed data by D = {d1 , . . . , dn }, where di = (xi , yi ) is an i.i.d sample from the underlying distribution D with a ﬁxed domain T , xi is a

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

663

p-dimensional feature/predictor vector and yi is the outcome. We consider the following ERM problem. ˆ = arg min J(θ|D) = arg min n−1 (n l(θ|di ) + ΛR(θ)) θ i=1 θ ∈Θ

θ ∈Θ

−1

= arg min n θ ∈Θ

(l(θ|D) + ΛR(θ)) ,

(1)

where Θ ⊆ Rp is a closed convex set, loss function l(θ|D) and regularizer R(θ) are both convex in θ, and Λ ≥ 0 is a tuning parameter. The goal of DP-ERM is to conduct privacy-preserving parameter estimation of the optimization problem in Eq. (1). The DP-ERM framework with objective function perturbation considered in [6] and [14] requires strong convexity1 on the objective function J(θ|D) (2) J priv (θ|D) = n−1 l(θ|D)+ΛR(θ)+bT θ+Λ0 θ22 , where Λ0 ≥ ζ3 /, and ζ3 is an upper bound for ∇2 l(θ|di )2 < ζ3 . Other regularity conditions besides the strong convexity are listed below in Assumption 1. Assumption 1. a) Θ is a closed convex set in an orthant of Rp ; b) loss function n l(θ|D) and regularizer R(θ) are both convex in θ; c) l(θ|D) = i=1 l(θ|di ) has 2 continuous gradient ∇l(θ|D) and Hessian ∇ l(θ|D); d) for ∀ di ∈ D and θ, xi 2 ≤ ζ1 , ∇l(θ|di )2 < ζ2 , and max{eigen(∇2 l(θ|di ))} = ∇2 l(θ|di )2 < ζ3 .2 Compared to Eqs. (1), (2) has an additional term bT θ to ensure DP, where b is a noise term drawn from either spherical Laplace or multivariate Gaussian distributions, as well as an additional θ22 term to guarantee the strong convexity of J priv (θ|D). The regularizer R(θ) in Eq. (2) can be non-diﬀerentiable, such as lasso or elastic net. Under Assumption 1, the privacy-preserving paramˆ priv that minimizes the private loss function in Eq. (2), that is, eter estimate θ ˆ priv= arg min J priv (θ|D), satisﬁes -DP [14]. θ θ ∈Θ

As brieﬂy discussed in Sect. 1, there are several limitations to the existing DP-ERM framework in Eq. (2). The extra l2 term that brings strong convexity to Eq. (2) can lead to additional regularization on parameters on top of the target regularizer R(θ); and the additional l2 term might also be associated with over-protection from a privacy perspective as it introduces another source of perturbation to the objective function on top of the formal DP term bT θ. In other words, the actual privacy loss may be smaller than the pre-set privacy budget with Eq. (2). Lastly, if bT θ is large, it would dwarf ΛR(θ), especially ˆ priv may deviate signifwhen the sample size is relatively small. As a result, θ icantly from its non-private counterpart. In the case of variable selection with ˆpriv are zeros. On the sparsity-promoting R(θ), it is possible that no entries in θ 1

2

A function f (θ) is 2Λ0 -strongly convex if f (αθ 1+(1−α)θ 2 ) ≤ αf (θ 1 )+(1 − α)f (θ 2 ) − Λ0 α(1 − α)θ 1 − θ 2 22 for ∀ α ∈ (0, 1) and θ 1 , θ 2 in the domain of f . ∇2 l(θ|di ) is of rank 1 with one sample di ; therefore its maximum eigenvalue equals to its l2 norm.

664

Y. Li and F. Liu

other hand, we need the bT θ term to provide formal privacy guarantee, neither can we simply get rid of Λ0 θ22 due to the strong convexity requirement (even when the target regularizer ΛR(θ) itself is strongly convex, such as l2 , as Λ alone might not be able to meet the required level for strong convexity).

3

Noise-Augmented Privacy-Preserving ERM

We propose a new DP-ERM framework, Noise-Augmented Privacy-Preserving (NAPP) ERM, to resolve the above-listed limitations with the current DP-ERM framework. We design and generate noisy data and attach them to observed data D to achieve the target (convex) regularization, strong convexity, and privacy guarantees simultaneously. Since NAPP-ERM uses a single dual-purpose weighted iterative regularizer to achieve the target regularization and delivers the required strong convexity simultaneously, we can eliminate the need for an ad-hoc term just to bring strong convexity to the objective function and thus mitigate over-regularization of the current DP-ERM framework. NAPP-ERM still guarantees privacy through the DP term bT θ, but the magnitude of b can be reduced through a privacy budget retrieval and recycling scheme. We also propose a new type of DP term that targets speciﬁcally variable selection with guaranteed sparsity and privacy protection. 3.1

Noise Augmentation Scheme

NAPP-ERM estimates θ with DP and realizes the target regularization by iteratively solving an unregularized ERM problem given the combined observed and augmented noisy data. Table 1 depicts a data augmentation schematic in iteration t of the iterative NAPP-ERM procedure. Table 1. Noise augmentation in iteration t of NAPP-ERM y1 x11 · · · x1p eij = e˜ij + e∗j for i = 1, . . . , ne and .. .. . · · · .. j = 1, . . . , p is the augmented noise data . . (t) yn xn1 · · · xnp in iteration t. e˜ij realizes the target (t) (t) augmented ey1 e11 · · · e1p regularization upon convergence .. .. . and e∗j achieves DP. The value of ey,i noise . . · · · .. (t) (t) eyne ene 1 · · · ene p depends on the outcome type† . For example, we may set ey,i ≡ 0 for i = 1, . . . , n for linear regression; ey,i ≡ 1 for Poisson regression; ey,i = 0 for i = 1, . . . , ne /2 and ey,i = 1 for i = ne /2 + 1, . . . , ne for logistic regression. observed

†

(t)

(t)

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

665

The augmented data e(t) in iteration t composes two components: DP noise e∗ and regularization noise e˜(t) , where e∗i = (e∗i1 , . . . , e∗ip ) = (ne l |η =0 )−1 b, where b = (b1 , . . . , bp ), ∝ exp −(ζ1 ζ2 )−1 (r)b2 for -DP and f (b) . = N (0, 2(r)−2 ζ12 ζ22 (r−log(δ)) Ip ) for (δ, )-DP

(3) (4)

e∗ guarantees privacy and is ﬁxed throughout the iterations. e˜(t) yields the target regularization upon convergence and changes with iteration. p in Eq. (3) is the dimensionality of the predictor x and l is the ﬁrst derivative of the onedata-point loss function with regard to η = xθ (i.e., the linear predictor); ζ1 and ζ2 in Eq. (4) are deﬁned in Assumption 1, r ∈ (0, 1) is the portion of total privacy budget associated with f (b) directly, which is often set at 1/2 [6,14] (more detail is provided in Sect. 3.6 and around Eq. (11)). b in Eq. (3) is divided into ne portions because the total amount of DP noise is b while there are ne noise terms so each noise term receives 1/ne of b. We refer to f (b) in Eq. (4) that yields -DP as the spherical Laplace distribution.3 The variance of the target regularization noise e˜ij (t) for j = 1, . . . , p is adapˆ ∗(t−1) during iterations and tive to estimate θ (t−1) , Λ)) for i = 1, . . . , ne /2 (t) ∼ N (0, V(θ e˜ij , (5) (t) = −˜ ei−ne /2 for i = ne /2 + 1, . . . , ne where Λ contains tuning parameters. V(θ, Λ) is designed in such a way so that (t) ˜i = (˜ ei1 , . . . , e˜ij ) leads to the target regularization while providing the required e ˜.j across i = 1, . . . , ne strong convexity quantiﬁed by Λ0 . By design, the sum e is 0 for each j so that the linear term of the Taylor expansion of the noiseaugmented loss function is used to realize DP and its quadratic term realizes the ˜ over i would be very close target regularization. If ne is very large, the sum of e to 0 anyway, though not exactly 0 guaranteed by Eq. (5). 3.2

Noise Augmented Privacy-Preserving ERM

Assume the loss function of the original ERM problem in Eq. (1) has non-zero, ﬁnite, and continuous gradient and Hessian in the neighborhood of η = xθ = 04 . The optimization problem in iteration t of the NAPP-ERM procedure is n ne ∗(t) (t) (t)priv (t) −1 ˆ = arg min J (θ|D, e ) = arg min n l(θ|di )+ l(θ|e ) , (6) θ θ ∈Θ

p

θ ∈Θ

i=1

i=1

i

where ei for i = 1, . . . , ne is the augmented noise in Table 1. 3

4

Sampling from f (b) can be achieved by sampling a norm of b from a gamma distribution and its direction of a uniform distribution [4]. The regularity condition η = 0 is satisﬁed in common ERM problems such as l2 loss, GLMs, and SVMs with the smoothed Huber loss (refer to Sect. 6 for more discussion).

666

Y. Li and F. Liu

Proposition 1. The optimization problem in NAPP-ERM in Eq. (6) is secondorder equivalent to

ˆ ∗(t) = arg min n−1 n l(θ|di )+ p bj θj + R(t) (θ) (7) θ i=1 j=1 θ ∈Θ

as ne → ∞, where R(t) (θ) = 2−1 ne l |η =0

p

j=1 V

(t) e˜ij θj2 .

The proof is provided in the supplementary materials. R(t) (θ) is the regularizer in iteration t that realizes strong convexity and helps achieve the target regularization upon convergence. For diﬀerent target regularizers, V(θ, Λ) takes diﬀerent forms. Some examples on V(θ, Λ) are given in Table 2. In the examples, the variance at t = 1 is independent of θ, meaning that NAPP-ERM always starts with l2 regularization to obtain initial parameter estimates and set the stage for noise generation in subsequent iterations. If the target R(θ) is ridge, it can be realized at t = 1, along with strong convexity, by choosing the maximum of (Λ, Λ0 ) as the tuning parameter as long as ne is large. For bridge and elastic net regularization, V(θ, Λ) depends on the most updated θ estimate after t > 1. Table 2. Some examples of V(θj , Λ) in Eq. (5) Regularization t = 1 ridge bridge l2−γ elastic net

t≥2

2(ne l |η =0 )−1 max{Λ, Λ0 }

2(ne l |η =0 )−1 max{Λ, Λ0 } (t−1) −γ −1 2(ne l |η =0 ) Λ0 2(ne l |η =0 )−1 max Λ|θˆj | , Λ0 (t−1) −1 2(ne l |η =0 )−1 max{Λκ, Λ0 } 2(ne l |η =0 )−1 max Λ|θˆj | +Λκ, Λ0

γ ∈ [0, 2); κ ∈ (0, 1) l is the 2nd-order derivative of then one-data-point loss function with respect to η = θ T x.

Claim 2. The NAPP-ERM procedure can be applied to solve the existing DPERM problem in Eq. (2). Rather than using the maximum of Λ0 and the term that involves Λ as shown in Table 2, the sum of the two are used except when R(θ) is ridge. Speciﬁcally, for ridge, V(θj , Λ) = 2(ne l |η =0 )−1 max{Λ, Λ0 } as in Table 2; for bridge and elastic net regularizations, V(θj , Λ) = 2(ne l |η =0 )−1 Λ0 ∗(t−1) −γ and 2(ne l |η =0 )−1 (Λκ + Λ0 ) at t = 1; 2(ne l |η =0 )−1 Λ|θˆj | + Λ0 and ∗(t−1) −1 2(ne l |η =0 )−1 Λ|θˆ | +Λκ+Λ0 for t ≥ 2, respectively . j

3.3

Dual-Purpose Regularization and Mitigation of OverRegularization (MOOR) Through Iterative Weighted l2

Proposition 1 suggests that the regularization in each iteration is a weighted (t) for θj2 . If the weight is large l2 regularization with weight 2−1 ne l |η =0 V e˜j enough to also achieve 2Λ0 strong convexity, there will be no need for an additional ad-hoc l2 term for strong convexity guarantees as adopted by the current

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

667

DP-ERM practice (Eq. (2)). Therefore, the NAPP-ERM framework oﬀers an opportunity to mitigate over-regularization in the current DP-ERM framework and improve the utility of the private estimates of θ. Proposition 3 (Dual-purpose regularization). NAPP-ERM guarantees (t) strongly convexity with modulus ne l |η =0 min V e˜ij for the DP-ERM problem j=1,...,p

in each iteration t, while achieving the target regularization upon convergence.

EN current DP-ERM NAP-ERM

DP-ERM

strong convexity modulus 5 10

−0.5

NAP-ERM

−1

θ 0

1

2

0.0 0.5

1

EN current DP-ERM x NAP-ERM

θ

0

0 −2

θ 0.0

−2

−1

0

θ −1

−0.5

1

0

strong convexity modulus 5 10

lasso current x

−1

1

0.5

x

2

1

bridge (l0.5) current DP ERM NAP-ERM

θ

0

0.5

strong convexity modulus 5 15 10

0

15

−0.5

15

0.0

0.5 0

θ −1

bridge (l0.5) current DP-ERM x NAP-ERM

0.5

1.0 R(θ) 1.5

x

0.5

1

R(θ) 1.5

NAP-ERM

2 .

lasso current DP-ERM x

1.0 R(θ) 1.5

2.0

2

The proof of Proposition 3 is straightforward per the formulation of V (˜ e) which uses the larger modulus of the two l2 terms (Λ0 and that leads to the target regularization upon convergence). Figure 1 illustrates the strong convexity guarantees and the realized regularization through NAPP-ERM, together with the existing DP-ERM framework, when the target regularizer is lasso, elastic net, and l0.5 . Both the realized regularizations of NAPP-ERM and the existing DP-ERM deviate from their targets, but the former approximates the targets signiﬁcantly better than, especially when θ is in the neighborhood of 0.

−2

−1

0

1

2

Fig. 1. Realized regularizer R(θ) (top) and the corresponding modulus of strong convexity (bottom) when the target regularizer (dotted black lines) is lasso, elastic net (EN) and l0.5 , respectively. Solid red and dashed orange lines are the analytical realized regularization and modulus for NAPP-ERM and existing DP-ERM frameworks, respectively; blue crosses represent the empirically realized regularization and modulus at ne = 104 through NAPP-ERM.

668

3.4

Y. Li and F. Liu

Computational Algorithm

The algorithmic steps for solving the NAPP-ERM problem is given in Algorithm 1. Algorithm 1 may stop based on several criteria, similar to other noise augmentation approaches [16]. For example, we may eyeball the trace plot of l(θˆ(t) |D, e(t) ) over a period of iterations to see whether it has stabilized, or calculate the percentage change in l(θˆ(t) |D, e(t) ) between two consecutive iterations and see if it is below a pre-speciﬁed threshold. input : Observed data D = {d1 , . . . , dn } and loss function l(θ|D) that satisfy Assumption 1; number of iterations T ; size of injected noisy data ne , overall privacy budget (and δ if (, δ)-DP); portion r ∈ (0, 1) of the overall budget allocated to bounding ratio r1 in Eq. (11); tuning parameter Λ for the target regularization. ˆ ∗ Set Λ0 ≥ ζ3 /(2(1 − r)) output: Private estimate θ Draw b per Eq. (4) and calculate e∗i in Eq. (3) for i = 1, . . . , ne ; t ← 1 and convergence ← 0; while convergence = 0 and t < T do (t) ∗ ˜(t) ˜(t) Draw noises e per Eq. (5) and obtain ei = e i i + ei for i = 1, . . . , ne ; (t) (t) (t) ˆ Augment D with e1:ne; solve θ = arg min Jp (θ|D, e ) in Eq. (6); θ ∈Θ

convergence ← 1 if the algorithm converges; t←t+1 end Algorithm 1: NAPP-ERM Regarding the choice of ne in Algorithm 1, since NAPP-ERM realizes the target regularization and guarantees DP based on the second order Taylor expansion and −1/2 ), better approximation will be achieved if ne is the higher-order terms are of O(ne set at a large value. In the experiments in Sect. 5, we used ne = 104 . Regarding the speciﬁcation of Λ0 , it needs to be ≥ ζ3 /(2(1 − r)) to guarantee the required strong convexity and bound the Jacobian ratio r2 in Eq. (11) for DP guarantees (Sect. 3.6). 1 − r is the assigned portion out of the total budget to bound r2 . While any values of r ∈ (0, 1) can be speciﬁed, we recommend r = 1/2 (used in [6, 14]) as Λ0 = ζ3 / might oﬀer the best trade-oﬀ between the amount of DP noise and guarantees of strong convexity. Λ is a user-speciﬁed hyperparameter for the target regularization; we recommend using the two strategies in [6], both of which apply to the NAPP-ERM algorithm. ˆ from Eq. (6) on the augmented data (D, e(t) ) in iteration t is basically Solving θ solving an unregularized and unperturbed objective function l given (D, e(t) ); there are many existing tools and software for minimizing l in various types of regression. For example, for GLMs, the loss function is the negative log-likelihood. With a large ne so that n + ne > p, and the augmented data (D, e) can be fed to any software that can run a regular GLM. (e.g. the glm function in R; tfp.glm.ExponentialFamily in TensorFlow). This is the same idea as the PANDA technique [16] for regularizing undirected graphic models.

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

3.5

669

NAPP-ERM for Variable Selection

Though the current DP-ERM formulation in Eq. (2) can accommodate variable selection by employing a sparsity regularizer R(θ), the DP noise term bT θ may trump the variable selection goal, resulting in non-sparsity, especially when or n is relatively small. To improve on the current approach for diﬀerentially private variable selection through the ERM framework in general, we propose a new type of DP noise term that guarantees to lead to some level of sparsity. ˆ priv = arg min n−1 l(θ|D) + ΛR(θ) + Λ0 θ22 + bT |θ| , (8) θ θ ∈Θ

Compared to the DP-ERM problem in Eq. (2), the DP term in Eq. (8) is formulated as bT |θ| instead of bT θ, which can be regarded as a random version of the lasso where the “turning parameter” bj is randomly sampled and diﬀers by |θj |. To avoid negative bj , rather than sampling directly from the spherical Laplace or Gaussian distribution from Eq. (4), the left truncated spherical Laplace distribution or the left Gaussian distribution can be used, deﬁned below. (9) -DP: f (b) ∝ I(b > c) exp −(ζ1 ζ2 )−1 (r)b2 (, δ)-DP: f (b) ∼ I(b > c)

N (0, 2(r)−2 ζ12 ζ22 (− log(δ)+r) Ip , Pr(bj > c ∀j = 1, . . . , p)

(10)

where I(b > c) is a indicator function that bj > c ∀j = 1, . . . , p. The truncation point c ≤ 0 can be set the same for all j = 1, . . . , p, especially after taking into account the privacy considerations. When c = 0, referred to as NAPP-VS+ hereafter, the distribution of b becomes the half spherical or half Gaussian distribution, and the DP term bT |θ| in Eq. (8) becomes a “weighted” lasso term, which will always lead ˆ priv . When c < 0, a subset of θ will be subject to the weighted lasso to sparsity in θ regularization, but the formation of this subset is completely random. The data augmentation scheme for NAPP-VS is the same as for the general NAPPERM and the computational steps are the same as Algorithm 1 except for theadjust- (t−1) ment for the sign of DP noise e∗ in each iteration of t ≥ 2; i.e., e∗j ← e∗j sgn θˆj for j = 1, . . . , p. The adjustment of the sign does not constitute a threat for privacy because the sign is determined by the parameter estimate from the previous iteration, (t) which already satisﬁes DP (Theorem 4 in Sect. 3.6). Also noted is that the sign of θˆj (t) will eventually stabilize if it is non-zero. If θˆj ﬂuctuates around 0 after convergence, its ﬁnal estimate will be set at 0.

3.6

Guarantees of DP in NAPP-ERM

Before we present the formal result on DP satisfaction by NAPP-ERM, it should be noted that NAPP-ERM guarantees DP through objective function perturbation with one-time DP noise injection. Though the NAPP-ERM is realized through an iterative procedure, it is only for leveraging existing tools for solving non-regularized problems to achieve the target regularization eﬀect. In other words, the NAPP-ERM algorithm ˆ ∗ rather only queries the original observed data once to output one ﬁnal estimate θ than querying the data multiple times to output multiple statistics or multiple versions ˆ ∗. Therefore, we only need to show the per-iteration privacy guarantees of sanitized θ and there is no need to perform privacy accounting over iterations.

670

Y. Li and F. Liu

Theorem 4 (DP guarantees). Under Assumption 1, the NAPP-ERM procedure in Algorithm 1 and the NAPP-ERM procedure for variable selection satisfy DP. The detailed proof on -DP and (, δ)-DP guarantees is provided in the supplementary Brieﬂy, The key step in the proof is to bound the ratio (t) materials. ˆ (t) |D by e for two data sets D and D diﬀering by 1. This is achieved ˆ |D /f θ f θ by bounding ratios r1 and r2 in Eq. (11), separately (t) ˆ |D ˆ (t) |D)) ˆ (t) |D ))| f θ fb (b−1 (θ | det(Jb (θ × = r1 r2 . = (t) (t) ˆ |D ˆ |D )) ˆ (t) |D))| f θ fb (b−1 (θ | det(Jb (θ

(11)

r1 relates directly to the amount of DP noise b, but the Jacobian ratio r2 is not. On the other hand, r2 still costs privacy per Eq. (11) due to the transformation from ˆ (t) via the change of variable technique. variable b to θ A by-product of the proof of (, δ)-DP is a lower bound on σ 2 for the Gaussian distributions in Eqs. (4) and (10) as given in Corollary 5. The proof is provided in the supplementary materials. Corollary 5. The lower bound on the variance σ 2 of the Gaussian noise b that leads to (, δ)-DP is −2 log(δ) + + −2 log(δ) , (12) σ ≥ −1 ζ1 ζ2 −1 which reduces to σ ≥ 2 ζ1 ζ2 −2 log(δ) + if is small compared to −2 log(δ).

3.7

Privacy Budget Retrieval

When establishing the privacy guarantees in Theorem 4, although only r1 determines the amount of the DP noises b, r2 has to be bounded due to the variable transforˆ consuming a portion of the total budget . This section explores mation from b to θ, whether it is possible to cut back on the spending of the budget on bounding r2 and re-allocate it to r1 so to reduce the inject the level of DP noise and achieve better ˆ (t) , while still maintaining the overall budget at . utility for the estimated private θ Proposition 6 presents the formal results privacy budget retrieval; the proof is provided in the supplementary materials. Proposition 6 (privacy budget retrieval via NAPP-ERM). Let (1 − r) be the privacy budget allocated to bounding Jacobian ratio r2 in Eq. (11), and T0 be the iteration when the NAPP-ERM algorithm converges. The retrievable budget out of (1 − r) upon convergence is (t) −1 , (13) Δ = min 0, (1 − r) 1 − Λ0 ne l |η =0 V(1)

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

671

(t) (t) where V(1) = min V e˜ij . As long as Δ > 0, we can retrieve some privacy budget j=1,...,p

originally allocated to r2 upon convergence. The retrieved budget Δ can be returned to the user, or recycled back to the NAPP algorithm for a re-run with an updated distribution of DP noise b given the re-allocated budget r+Δ , the sum of the originally allocated budget to ratio r1 and the retrieved budget. Equation (13) suggests that budget can only be retrieved when the required strong convexity is automatically fulﬁlled by the target regularization via the dual-purpose (t) weighted l2 regularization in NAPP-ERM; that is, Λ0 < ne l |η =0 V(1) . The retrieved privacy budget Δ can be used in two ways. First, it can be returned to users so that the actual privacy cost is − Δ lower than the original planned costs , meaning that the released results enjoy a higher level of privacy. Second, it can be re-allocated to bounding r1 in Eq. (11) that directly relates to the scale of DP noise b so that less DP ˆ ∗ can be achieved. Speciﬁcally, the noise is injected and higher utility of the private θ distributions of b are updated to ∝ exp −(r + Δ )(ζ1 ζ2 )−1 b2 for -DP (14) f (b) = N (0, 2(r + Δ )−2 ζ12 ζ22 (− log(δ1 ) + r + Δ ) Ip ) for(δ, )-DP ˜ is sampled is also updated with a new and the Gaussian distribution from which e variance term V(˜ eij ). For example, when the target regularization is bridge and elastic net, (T ) (T ) V(˜ eij ) = 2(ne l |η =0 )−1 max Λ|θˆj 0 |−γ , Λ|θˆj (τ0) |−γ , Λ0 , (T ) (T ) V(˜ eij ) = 2(ne l |η =0 )−1 max Λ|θˆj 0 |−1 +Λκ, Λ|θˆj (T00 ) |−1 +Λκ, Λ0 , (15) (T ) respectively, where j (T0 ) arg minj V e˜ij 0 . In other words, the updated variance of e˜ij ensures the strong convexity by choosing the largest modulus out of the following three: (T ) that associated with the weighted l2 term based on the parameter estimate θˆj 0 , the (T ) (T ) “old” modulus 2Λ0 , and the new modulus 2Λ|θˆ 0 |−1 (bridge) or 2Λ|θˆ 0 |−1 +2Λκ j (T0 )

j (T0 )

(elastic net). Algorithm 2 lists the steps that incorporate the privacy budget retrieval in Algorithm 1 and the NAPP-ERM variable selection procedure. Algorithm 2 may be applied multiple times if the user chooses to “recycle”. Speciﬁcally, after a round of budget retrieval and recycling and a re-run of the NAPP algorithm, a new set of parameter ˆ ∗ will be obtained upon convergence. Further budget retrieval is possible estimates θ through another re-run as long as there is a retrievable budget, and each re-run will allocate more budget to the sampling of DP noise. Though the DP noise will keep decreasing through the recycling, each additional round of budget retrieval/recycling also means that the required strong convexity is getting stronger each time until eventually stabilizing. Our empirical results suggest signiﬁcant retrieval often occurs with

672

Y. Li and F. Liu

one round of retrieval, and the amount of retrieved budget in later rounds is often minimal. input : choice A: return or recycle the retrieved budget? ˆ ∗ if A = “return”; parameter estimate θ ˆ∗ output: Δ and parameter estimate θ if A = “recycle”. Run the NAPP algorithm (Algorithm 1), calculate retrievable privacy budget ˆ (T0 ) upon convergence at iteration Δ per Eq. (13) given parameter estimate θ T0 ; if A = “recycle” then Let Δcum ← 0; while Δ > 0 do Δcum ← Δcum + Δ ; ); Rescale e∗ ← e∗ (r)/(r + Δcum ← Λ ; Let Λold 0 0 ˜ from the Gaussian distribution with the updated Λ0 ; Draw e ˜ till Run Algorithm 1 with the updated augmented noisy data e∗ + e convergence. Denote the iteration at the convergence by T (0) ; (T (0) )

; Λ0 ← ne l |η =0 V(1)

Calculate retrievable privacy budget Δ = min 0, ((1 − r)−Δcum ) (T (0) ) old (T0 ) −1 ˆ given the updated θ 1−max{Λ0 , Λ0 } ne l |η =0 V(1) end end

Algorithm 2: Privacy Budget Retrieval through NAPP-ERM 3.8

Summary on NAPP-ERM

We end Sect. 3 by presenting Fig. 2, which summarizes the ideas behind NAPP-ERM, compares it with the existing DP-ERM framework, and illustrates how and why NAPPERM works with the dual-purpose iterative weighted l2 regularization term and when privacy budget can be retrieved.

4

Utility Analysis

ˆ via NAPP-ERM in two In this section, we consider the utility of the estimated θ aspects: excess risk bound and sample complexity. WLOG, we demonstrate the utility of NAPP-ERM without privacy budget retrieval. The steps of deriving the theoretical bounds and sample complexity when there is budget retrieval are similar to what’s given below, though the ﬁnal mathematical results will be diﬀerent. To start, we ﬁrst deﬁne several types of loss functions (Table 3). The noise-augment but non-private loss (t) (t)∗priv (θ|D) with b = 0; e∗ = 0, and Jp (θ|D) can be regarded as a special case of Jp (t) R (θ) is expected to converge to max{ΛR(θ), Λ0 θ2 } through noise augmentation for ne → ∞. When strong convexity with modulus 2Λ0 is simultaneously realized by the target regularization, then R(t) (θ) converges to ΛR(θ).

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

673

Fig. 2. Dual-purpose regularization and privacy budget retrieval in NAPP-ERM Table 3. Loss functions and minimizers Loss function Expected loss L(θ ) Regularized empirical loss J(θ |D) Noise-augment empirical loss in iteration t: Jp(t)(θ |D) Expected NA empirical loss in iteration t: J¯p(t) (θ ) NAPP empirical loss in iteration t: Jp(t)priv(θ |D)

4.1

Minimizer

Ed n−1 n i=1 l(θ |di ) n −1 n l(θ |di )+ ΛR(θ ) i=1 −1 n (t) n (θ ) i=1 l(θ |di ) + R

θ0 ˆ θ ˆ (t) θ

Ed Jp(t) (θ |D) n−1

n

i=1

l(θ |di )+

¯ (t) θ p

j=1

bj θj +R(t) (θ )

ˆ (t)∗ θ

Excess Risk Bound

(t)∗ ˆ The excess risk in this context is the expected diﬀerence between J θ |D and ˆ ˆ (t)∗ is the private estimate of θ J(θ|D) over the distribution of DP noise b, where θ ˆ is the non-private minimizer in the t-th iteration of the NAPP-ERM algorithm and θ ˆ of J(θ|D). Before we present the main results in Theorem 8, we ﬁrst derive an upper ˆ ˆ (t)∗ |D)−J(θ|D) in Lemma 7, based on which we will bound the expected bound for J(θ diﬀerence. ˆ (t)∗ obtained in Lemma 7 (empirical risk bound). Under Assumption 1, for θ iteration t of the NAPP-ERM algorithm (t)−1 (t) (t)∗ ˆ ˆ ˆ ˆ . Jθ |D −J θ|D ≤n−1 b22 ne l |η =0 V(1) e˜ij + R (θ)−R( θ) The proof of Lemma 7 is provided in the supplementary materials. The lemma suggests that the upper bound of the empirical risk at iteration t decreases at a rate of O(n−1 ), and is proportion to the squared l2 norm of DP noise b and the diﬀerence ˆ − R(θ). ˆ Based on the results in Lemma 7, we obtain Theorem 8, the between R(t) (θ) proof of which is provided in the supplementary materials.

674

Y. Li and F. Liu

ˆ (t)∗ obtained in iteration t of the NAPPTheorem 8 (excess risk bound). For θ ERM algorithm, with probability ≥ 1 − π, ˆ n, p, Λ0 , ζ1 , ζ2 , , π) for -DP O B1 (θ, (t)∗ ˆ ˆ , |D −J θ|D = Eb J θ ˆ n, p, Λ0 , ζ1 , ζ2 , , δ, π) for (, δ)-DP O B2 (θ, 2 where B1 ( ) = pζ1 ζ2 (r)−1 log(pπ −1 ) × n−1

(t) −1 ˆ ˆ = O n−1 p2 log(p)−2 ; ne l |η =0 V(1) +R(t) (θ)−R( θ)

(16)

) = 4pζ12 ζ22 (r)−2 (r+log(2δ −1 ))log(π −1 )n−1

B2 (

(t) −1 ˆ ˆ = O n−1 p(−1 + −2 log(δ −1) ) . × ne l |η =0 V(1) +R(t) (θ)−R( θ)

(17)

The bounds B1 and B2 in Eqs. (16) and (17) are tighter than the bounds given by Theorem 26 in [14]. This can be easily seen. First, the ﬁrst terms in B1 and B2 in Eqs. (16) and (17) are no larger than the ﬁrst terms of the bounds in [14]. Second, the second terms in B1 and B2 are smaller than the second terms of the bounds in ˆ − R(θ) ˆ < Λ0 θ22 (if strong convexity is realized by the target [14]; that is, R(t) (θ) ˆ − R(θ) ˆ → 0 as ne → ∞ with the MOOR eﬀect regularization in NAPP-ERM, R(t) (θ) of NAPP-ERM). Taken together, the NAPP-ERM excess risk bounds B1 and B2 are tighter than the bounds given in [14].

4.2

Sample Complexity

The sample complexity is the training data size n needed to bound an excess risk. In ˆ (t)∗ ) vs. L(θ 0 ), the ideal loss our setting, the excess risk is the diﬀerence between L(θ evaluated at the private parameter estimate from NAPP-ERM vs. that at θ 0 (Table 3). Prior to the main results in Theorem 10, we ﬁrst present Lemma 9, on which Theorem 10 is based. Lemma 9. There exists C such that, with probability at least 1 − π ∀ π ∈ (0, 1), (t) (t)∗ (t) ˆ (t)∗ (t) ˆ (t) (t) ˆ (t) (t) −1 ˆ J¯p θ − J¯p θ ≤ 2 J p (θ |D)−Jp (θ |D) −C log(π ) 2ne l |η =0 V(1) .

Theorem 10 (sample complexity of NAPP-ERM). For any given > 0, when the training sample size (t) −1 −1 (t) 0 (t) −1 R θ +C ne l |η =0 V(1) , (18) n > +C log(π ) 2ne l |η =0 V(1) ˆ (t)∗ ) ≤ L(θ 0 ) + ≥ 1 − π − π, then Pr L(θ (19) 2 where C = 2 pζ1 ζ2 (r)−1 log(pπ −1 ) for -DP and C = 4pζ12 ζ22 (r)−2 r +log(2/δ) × log(π −1 ) for (, δ)-DP, π is defined in Lemma 9, and π is defined in the same way as in Theorem 8. The proofs of Lemma 9 and Theorem 10 are provided in the supplementary materials. Since [14] does not perform the sample complexity analysis, we compare the NAPP sample complexity in Eq. (18) with that in [6] that focuses on -DP and R(θ) = 2−1 Λθ2 . When R(θ) is the l2 regularizer, the DP-ERM framework in [6] adds an extra l2 term only if needed, the sample complexity in [6] is the same as Eq. (18). The supplementary materials provide more detail.

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

5

675

Experiments

We run several experiments to demonstrate the improvement of NAPP-ERM over the current DP-ERM framework in linear, Poisson and logistic regressions with the lasso regularizer in both simulated and real data. For simulated data, we vary the training set size n, privacy loss (, δ), and tuning parameter Λ. 500 repeats were run in each simulation scenario. Due to space limitation, we present the results in a subset of the regression settings in the main text and the rest are provided in the supplementary materials.

5.1

Mitigation of Over-Regularization (MOOR)

The goal of this experiment is to demonstrate the MOOR eﬀect of NAPP with the dual-purpose regularization with a single iterative weighted l2 term. Though the strong convexity requirement is a result of privacy guarantees, the MOOR eﬀect is independent of privacy. WLOG, we set DP noise b = 0 in Eqs. (2) and (7) and examine MOOR eﬀect in the non-private setting and compare it with the regular lasso regression (without the requirement for strong convexity). Figure 3 plots the l2 distance between the regular lasso estimates and those via the NAPP-ERM algorithm (by setting b and e∗ at 0) with MOOR vs without MOOR. In the linear and logistic regressions, the increase in accuracy (smaller l2 distance) with MOOR is signiﬁcant for all the examined Λ0 and Λ values. There is also improvement in the Poisson regression though it is not as obvious as the other two because of the large ζ3 and Λ0 values, leaving little room for NAPP-ERM to execute its MOOR power. logistic regression

Poisson regression

Λ0 =5 Λ0 =3.3 Λ0 =2.5 Λ0 =2

w/ MOORE

Λ0 =2.5 Λ0 =1.7 Λ0 =1.25 Λ0 =1

w/o MOOR Λ0 =2.5 Λ0 =1.7 Λ0 =1.25 Λ0 =1

0.4

2

3.6 5.2 6.8 Tuning Parameter Λ

8.4

10

n=1000

w/ MOORE

Λ0 =125 Λ0 =83.3 Λ0 =62.5 Λ0 =50

w/o MOOR Λ0 =125 Λ0 =83.3 Λ0 =62.5 Λ0 =50

0

0.0

0.0

w/o MOOR Λ0 =5 Λ0 =3.3 Λ0 =2.5 Λ0 =2

n=1000

6

w/ MOORE l2−distance from non−private lasso 1 2 3 4 5

l2−distance from non−private lasso 0.1 0.2 0.3 0.4 0.5

n=500

l2−distance from non−private lasso 0.5 1.0 1.5 2.0

0.6

linear regression

1.4

7

12.6 18.2 23.8 Tuning Parameter Λ

29.4

35

3.4

17

30.6 44.2 57.8 Tuning Parameter Λ

71.4

85

Fig. 3. l2 distance of θ estimates obtained via non-private noise augmented ERM with MOOR vs without MOORE from the original Lasso estimates

5.2

Private Variable Selection and Outcome Prediction

We compare 5 approaches (listed below) in private variable selection and prediction with the lasso regularization, with the regular non-private lasso as the baseline. – NAPP-VS+ without MOOR in lasso regression – NAPP-VS without MOOR in lasso regression – existing DP-ERM in lasso regression (without MOOR)

676

Y. Li and F. Liu

– NAPP-VS+ with MOOR in lasso regression – NAPP-VS with MOOR in lasso regression c = 0 in Eqs. (9) and (10) or NAPP-VS+ and c → −∞ for NAPP-VS. For variable selection, we deﬁne “positive” as correct selection of a covariate associated with nonzero θ and plot the ROC curves. For outcome prediction, we examine the prediction mean squared error (MSE) in independently simulated testing data (n = 10, 000) in linear and poisson regressions, and the misclassiﬁcation rate in logistic regression. The results on private variable selection on the private prediction are illustrated in Figs. 4 and 5, respectively.

NAP−VS+

NAP−VS

NAP−VS

DPERM

NAP−ERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

1.0 True Positive Rate 0.4 0.6 0.8

ε=1

lasso w/ MOOR

w/o MOOR

lasso w/ MOOR

0.2

NAP−ERM

w/o MOOR

NAP−VS+

n = 500

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−VS

NAP−VS

NAP−VS

DPERM

NAP−ERM

DPERM

NAP−ERM

DPERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

0.0

NAP−VS

lasso w/ MOOR

0.0

NAP−VS+

0.0

0.2 0.0

w/o MOOR

NAP−VS+

ε = 0.8

True Positive Rate 0.4 0.6 0.8

lasso

n = 500

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.6

w/ MOOR

0.0

n = 500

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.4

1.0

1.0

1.0

linear regression n = 500 n = 500

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

1.0

NAP−ERM

NAP−VS+

NAP−VS

NAP−VS

DPERM

NAP−ERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

1.0

ε=1

True Positive Rate 0.4 0.6 0.8

w/o MOOR

NAP−VS+

n = 1000

lasso w/ MOOR

w/o MOOR

lasso w/ MOOR

0.2

NAP−VS

lasso w/ MOOR

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−VS

NAP−VS

NAP−VS

DPERM

NAP−ERM

DPERM

NAP−ERM

DPERM

0.2 0.4 0.6 0.8 False Positive Rate

0.0

NAP−VS+

0.0

0.2 0.0

w/o MOOR

NAP−VS+

ε = 0.8

True Positive Rate 0.4 0.6 0.8

lasso

n = 1000

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.6

w/ MOOR

0.0

n = 1000

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.4

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

0.0

n = 1000

1.0

1.0

1.0

logistic regression n = 1000

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

1.0

NAP−VS+

NAP−VS

NAP−VS

DPERM

NAP−ERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

1.0 True Positive Rate 0.4 0.6 0.8

ε=1

lasso w/ MOOR

w/o MOOR

lasso w/ MOOR

0.2

NAP−ERM

w/o MOOR

NAP−VS+

n = 1000

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−VS

NAP−VS

NAP−VS

DPERM

NAP−ERM

DPERM

NAP−ERM

DPERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

0.0

NAP−VS

lasso w/ MOOR

0.0

NAP−VS+

0.0

0.2 0.0

w/o MOOR

NAP−VS+

ε = 0.8

True Positive Rate 0.4 0.6 0.8

lasso

n = 1000

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.6

w/ MOOR

0.0

n = 1000

0.2

True Positive Rate 0.4 0.6 0.8

ε = 0.4

1.0

1.0

1.0

Poisson regression n = 1000 n = 1000

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

1.0

Fig. 4. Variable selection ROC curves by varying Λ at diﬀerent n and

We also applied NAPP-ERM and NAPP-VS to the Adult data (downloaded from https://archive.ics.uci.edu/ml/datasets/Adult) and compared their prediction performance against the existing DP-ERM approach in lasso-regularized logistic regression. The results are presented in Fig. 6. NAPP-VS+ (the red lines) performs the best in general with the smallest misclassiﬁcation rate, followed by NAPP-VS. At = 1, the misclassiﬁcation rate via NAPP-VS+ is around 15.5%, very close to the non-private misclassiﬁcation rate 15.2%. The MOOR eﬀect (the solid lines) is not obvious in most cases except for NAPP-VS+ at Λ = 1.

0.4

2

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

lasso w/ MOOR

0.4

2

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

0.12

0.12

ε = 0.6

n = 500 ε = 0.8

lasso w/ MOOR

0.4

2

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

677

n = 500 ε=1

MSE of testing data 0.06 0.08 0.10

lasso w/ MOOR

linear regression n = 500 n = 500

MSE of testing data 0.06 0.08 0.10

0.12

ε = 0.4

MSE of testing data 0.06 0.08 0.10

n = 500

MSE of testing data 0.06 0.08 0.10

0.12

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

lasso w/ MOOR

0.4

2

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

DPERM

30.6 57.8 Tuning Parameter Λ

85

3.4

17

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

30.6 57.8 Tuning Parameter Λ

6.6 lasso w/ MOOR

85

3.4

17

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

30.6 57.8 Tuning Parameter Λ

85

MSE of testing data 5.8 6.0 6.2 6.4

NAP−VS

NAP−ERM

w/o MOOR

NAP−VS+

ε=1

lasso w/ MOOR

5.6

NAP−VS

lasso w/ MOOR

n = 1000

ε = 0.8

5.4

17

NAP−VS+

n = 1000

5.8

5.9 5.8

3.4

w/o MOOR

NAP−VS+

5.6

lasso w/ MOOR

6.8

ε = 0.6

MSE of testing data 6.0 6.2 6.4 6.6

n = 1000

ε = 0.4

MSE of testing data 5.8 6.0 6.2

n = 1000

MSE of testing data 6.0 6.1 6.2 6.3

6.4

Poisson regression n = 1000

3.4

17

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

DPERM

30.6 57.8 Tuning Parameter Λ

85

n = 1000

n = 1000

n = 1000

ε = 0.4

ε = 0.6

ε = 0.8

ε=1

NAP−VS

NAP−VS

DPERM

NAP−ERM

12.6 23.8 Tuning Parameter Λ

35

1.4

7

Testing error 0.30

0.35 NAP−VS+

lasso w/ MOOR

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

DPERM

NAP−ERM

12.6 23.8 Tuning Parameter Λ

35

1.4

7

lasso w/ MOOR

0.25

NAP−ERM

w/o MOOR

NAP−VS+

Testing error 0.25 0.30

Testing error 0.30 0.35

NAP−VS

lasso w/ MOOR

0.20

7

NAP−VS+

0.20

0.25 1.4

w/o MOOR

NAP−VS+

0.25

lasso w/ MOOR

0.35

n = 1000

Testing error 0.30 0.35

0.40

logistic regression n = 1000

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−VS

DPERM

NAP−ERM

DPERM

12.6 23.8 Tuning Parameter Λ

35

1.4

7

12.6 23.8 Tuning Parameter Λ

35

Fig. 5. Testing prediction error (MSE in linear and Poisson regression and misclassiﬁcation rate in logistic regression) for diﬀerent Λ, n, and

NAP−VS+

NAP−VS

NAP−VS

NAP

Existing

0.4

0.6 ε

0.8

1

Λ = 0.4

0.2

lasso w/ MOOR

n = 30,162

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP-ERM

existing

0.4

0.6 ε

0.8

1

Misclassification Rate 0.155 0.160 0.165

n = 30,162

w/o MOOR

NAP−VS+

0.150

0.2

lasso w/ MOOR

Misclassification Rate 0.155 0.160 0.165

Λ = 0.1

0.150

Misclassification Rate 0.155 0.160 0.165

n = 30,162

Λ=1

0.2

lasso w/ MOOR

w/o MOOR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP

Existing

0.4

0.6

0.8

1

ε

Fig. 6. Misclassiﬁcation rate in the testing data of the adult data experiment

w/o MOOR NAP−VS+ NAP−VS DPERM

Retrieved Privacy Budget % 20 40 60 80

ε = 0.4 w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

n = 1000

Retrieved Privacy Budget % 20 40 60 80

ε = 0.4

w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS

n = 1000 ε = 0.6 w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

1.4

7

3.4

12.6 23.8 Tuning Parameter Λ

17

35

2

100

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

n = 1000 ε = 0.8 w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

30.6 57.8 Tuning Parameter Λ

85

3.4

17

30.6 57.8 Tuning Parameter Λ

n = 1000 ε = 0.6

w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

1.4

7

12.6 23.8 Tuning Parameter Λ

35

w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

0.4

85

logistic regression n = 1000

0

0

DPERM

DPERM

0.4

100

85

NAP−VS

n = 1000 ε = 0.8

w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

0

30.6 57.8 Tuning Parameter Λ

100

17

Retrieved Privacy Budget % 20 40 60 80

100

3.4

NAP−VS+

Poisson regression n = 1000

0

0

DPERM

2

w/o MOOR

100

n = 1000

Retrieved Privacy Budget % 20 40 60 80

0.4

NAP−ERM

Retrieved Privacy Budget % 20 40 60 80

10

100

3.6 5.2 6.8 8.4 Tuning Parameter Λ

NAP−VS

0

2

100

100

0.4

NAP−VS+

Retrieved Privacy Budget % 20 40 60 80

NAP−ERM

w/ MOOR

ε=1

0

100

NAP−VS

0

0

DPERM

NAP−VS+

n = 500

2

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

n = 1000 ε=1 w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

0

NAP−VS

w/ MOOR

1.4

7

3.4

100

NAP−VS+

ε = 0.8

12.6 23.8 Tuning Parameter Λ

Retrieved Privacy Budget % 20 40 60 80

NAP−ERM

n = 500

17

35

30.6 57.8 Tuning Parameter Λ

85

n = 1000 ε=1

w/ MOOR NAP−VS+ NAP−VS NAP−ERM w/o MOOR NAP−VS+ NAP−VS DPERM

0

NAP−VS w/o MOOR

ε = 0.6

Retrieved Privacy Budget % 20 40 60 80

NAP−VS+

linear regression n = 500 n = 500

0

Retrieved Privacy Budget % 20 40 60 80

Retrieved Privacy Budget % 20 40 60 80

ε = 0.4 w/ MOOR

Retrieved Privacy Budget % 20 40 60 80

n = 500

Retrieved Privacy Budget % 20 40 60 80

100

Y. Li and F. Liu

100

678

1.4

7

12.6 23.8 Tuning Parameter Λ

35

Fig. 7. % Retrieved privacy budget allocated originally to bounding Jacobian ratio

5.3

Privacy Budget Retrieval and Recycling

We quantify the retrievable privacy budget via NAPP out of the originally allocated portion to the Jacobian ratio r2 in Eq. (11), which is 1/2 of the total in all the experiment settings. The results are presented in Fig. 7. We then recycled the retrieved privacy budget back to the NAPP-ERM procedure to reduce the magnitude of injected DP noise. Figure 8 shows the improvement in utility in linear regression; we expect similar ﬁndings in other types of regression.

5.4

Experiment Result Summary

The main conclusions from the experiments are summarized as follows. First, the MOOR eﬀect brought by the dual-purpose weighted l2 regularization in NAPP-ERM can be very eﬀective in maintaining the utility of the privacy-preserving results compared to using two separate terms in the objective function to achieve the target regularization and the strong convexity requirement as employed in the existing DP-ERM framework. Second, our new formulation of the DP-ERM problem for variable selection with the new DP noise term that functions as a “random” lasso term guarantees sparsity in variable selection compared to the current DP-ERM problem formulation for variable selection. Third, our proposed privacy budget retrieval scheme works as

NAP−ERM

ε = 0.4

lasso w/ MOOR

2

0.0

w/ MOOR & BR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

NAP−ERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

1.0 True Positive Rate 0.4 0.6 0.8

True Positive Rate 0.4 0.6 0.8

w/ MOOR & BR

lasso w/ MOOR

w/ MOOR & BR

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−VS

NAP−VS

NAP−VS

NAP−ERM

NAP−ERM

NAP−ERM

NAP−ERM

NAP−ERM

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

1.0

0.0

0.2 0.4 0.6 0.8 False Positive Rate

1.0

outcome prediction MSE n = 500 n = 500 ε = 0.6

lasso w/ MOOR

0.4

2

w/ MOOR & BR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

NAP−ERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

0.12

1.0

0.12

0.2 0.4 0.6 0.8 False Positive Rate

lasso

0.2

NAP−ERM

n = 500

0.0

NAP−VS

679

ε=1

w/ MOOR

0.2

NAP−VS

n = 500

0.4

True Positive Rate 0.4 0.6 0.8

NAP−VS+

0.0

NAP−ERM

w/ MOOR & BR

NAP−VS+

0.12

NAP−VS

lasso w/ MOOR

n = 500 ε = 0.8

MSE of testing data 0.06 0.08 0.10

NAP−VS+

0.0

0.2 0.0

w/ MOOR & BR

NAP−VS+

n = 500 ε = 0.8

0.2

lasso

MSE of testing data 0.06 0.08 0.10

0.12

ε = 0.6

w/ MOOR

0.0

n = 500

MSE of testing data 0.06 0.08 0.10

True Positive Rate 0.4 0.6 0.8

ε = 0.4

lasso w/ MOOR

0.4

2

w/ MOOR & BR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

NAP−ERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

n = 500 ε=1

MSE of testing data 0.06 0.08 0.10

n = 500

1.0

variable selection ROC curve via lasso n = 500 1.0

1.0

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

lasso w/ MOOR

0.4

2

w/ MOOR & BR

NAP−VS+

NAP−VS+

NAP−VS

NAP−VS

NAP−ERM

NAP−ERM

3.6 5.2 6.8 8.4 Tuning Parameter Λ

10

Fig. 8. Variable selection ROC curves and outcome prediction MSE on testing data in linear regression with Lasso via NAPP-ERM with vs without recycled privacy budget (BR in the legends stands for budget recycling)

expected. If recycled back to the privacy-preserving learning procedure, the retrieved privacy budget can help to improve the utility of the private results.

6

Discussion and Conclusion

NAPP-ERM utilizes iterative weighted l2 regularization to realize the target regularization upon convergence and to cover the strong-convexity requirement for the DP guarantees. NAPP-ERM can accommodate a variety of target regularizers through noise augmentation [16] (including non-convex regularization such as SCAD and l0 ) that the current DP-ERM framework does not as the latter requires the target regularizer to be convex. Our experiments focus on GLMs where the loss function l(θ|D) is the negative log-likelihood. NAPP-ERM also admits other types of loss functions as long as the assumptions listed in Assumption 1 are satisﬁed for the loss function, including the l2 loss and the smoothed hinge loss (l(t) = 0 if t > 1; (1 − t)2 /2 if 0 < t ≤ 1; 1/2 − t if t ≤ 0) employed by SVMs, where t = yxθ and y = ±1 is the observed binary outcome. Since the loss is 0 if t > 1 and is linear in t if t ≤ 0, we can leverage the loss at t ∈ (0, 1] to achieve both the target regularization and privacy guarantees. The noise augmentation scheme for SVMs is deﬁned in a similar manner as in the logistic regression, but a large ne should be used so that t ∈ (0, 1] for a given Λ.

Supplementary Materials The supplementary materials are available at https://arxiv.org/abs/2110.08676. Acknowledgment. The work was supported by NSF award #1717417.

680

Y. Li and F. Liu

References 1. Abadi, M., et al.: Deep learning with diﬀerential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016) 2. Bassily, R., Smith, A., Thakurta, A.: Private empirical risk minimization: eﬃcient algorithms and tight error bounds. In: IEEE Annual Symposium on Foundations of Computer Science (2014) 3. Bu, Z., Dong, J., Long, Q., Su, W.J.: Deep learning with gaussian diﬀerential privacy. Harvard Data Sci. Rev. 2020(23) (2020) 4. Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, vol. 21 (2008) 5. Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Advances in Neural Information Processing Systems, pp. 289–296 (2009) 6. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Diﬀerentially private empirical risk minimization. J. Mach. Learn. Res. 12, 1069–1109 (2011) 7. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https:// doi.org/10.1007/11761679 29 8. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878 14 9. Feldman, V., Mironov, I., Talwar, K., Thakurta, A.: Privacy ampliﬁcation by iteration. arXiv:1808.06651v1 (2018) 10. Jain, P., Kothari, P., Thakurta, A.: Diﬀerentially private online learning. In: COLT, vol. 24, pp. 1–34 (2012) 11. Jain, P., Thakurta, A.: Diﬀerentially private learning with kernels. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28 (2013) 12. Jain, P., Thakurta, A.: (Near) dimension independent risk bounds for diﬀerentially private learning. In: Proceedings of the 31st International Conference on Machine Learning, vol. 32 (2014) 13. Kasiviswanathan, S.P., Jin, H.: Eﬃcient private empirical risk minimization for high-dimensional learning. In: International Conference on Machine Learning, pp. 488–497 (2016) 14. Kifer, D., Smith, A., Thakurta, A.: Private convex empirical risk minimization and high-dimensional regression. J. Mach. Learn. Res. Worksh. Conf. Proc. 23, 25.1–25.40 (2012) 15. Lee, J., Kifer, D.: Concentrated diﬀerentially private gradient descent with adaptive per-iteration privacy budget. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, pp. 1656–1665 (2018) 16. Li, Y., Liu, F.: Panda: adaptive noisy data augmentation for regularization of undirected graphical models. In: Proceedings of 2021 IEEE International Conference on Data Science and Advanced Analytics (DSAA 2021), 6–9 October 2021 (2021) 17. Smith, A., Thakurta, A.: Diﬀerentially private model selection via stability arguments and the robustness of the lasso. J. Mach. Learn. Res. Worksh. Conf. Proc. 30, 1–32 (2013) 18. Talwar, K., Thakurta, A., Zhang, L.: Nearly optimal private lasso. In: NIPS 2015, Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2, pp. 3025–3033 (2015)

NAP-ERM with Dual-Purpose Regularizer and Privacy Budget Retrieval

681

19. Thakurta, A., Smith, A.: (Nearly) optimal algorithms for private online learning in full- information and bandit settings. In: Advances in Neural Information Processing Systems, pp. 2733–2741 (2013) 20. Wang, D., Ye, M., Xu, J.: Diﬀerentially private empirical risk minimization revisited: faster and more general. In: Advances in Neural Information Processing Systems, pp. 2719–2728. Curran Associates, Inc. (2017) 21. Wang, Y-X., Fienberg, S., Smola, A.: Privacy for free: posterior sampling and stochastic gradient Monte Carlo. In: International Conference on Machine Learning, pp. 2493–2502 (2015) 22. Williams, O., McSherry, F.: Probabilistic inference and diﬀerential privacy. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 2451–2459. Curran Associates, Inc. (2010) 23. Yu, F., Rybar, M., Uhler, C., Fienberg, S.E.: Diﬀerentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In: International Conference on Privacy in Statistical Databases, pp. 170–184 (2014) 24. Zhang, J., Zheng, K., Mou, W., Wang, L.: Eﬃcient private ERM for smooth objectives. In: Proceedings of the 26th International Joint Conference on Artiﬁcial Intelligence, pp. 3922–3928. AAAI Press (2017) 25. Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under diﬀerential privacy. Proc. VLDB Endow. 5, 1364–1375 (2012)

Data Security Awareness and Proper Handling of ICT Equipment of Employees: An Assessment Dorothy M. Ayuyang(B) Cagayan State University, Gonzaga, Cagayan, Philippines [email protected], [email protected]

Abstract. Technology has been a part of the daily activities of every employee as they perform their respective duties in the office. How they handle this technology affects mainly their daily performance in executing their daily task. This study investigated the level of awareness of employees on data security and their level of competency in handling ICT equipment. A structured questionnaire adapted from the National ICT Competency Standards was administered to 46 employees as respondents of the study. Frequencies, percentages, ranks, means, and standard deviation was used in describing the profile variables of the respondents. A five-point Likert scale was used in describing the level of competency of the respondents in the proper handling of ICT equipment and their awareness of data security. Bivariate correlation analysis using Pearson r, Spearman rho, and pointbiserial tests were employed whichever is appropriate for each pair of variables. All inferential questions were tested at 0.05 level of significance. Findings show that the significant correlates of the competency level on the proper handling of ICT equipment of employees are age, years in service, availability of ICT facilities, number of hours in using ICT, number of hours in using the internet, and availability of devices for internet. Also, their awareness of data security was found to be significant in handling ICT equipment. Therefore, it is suggested that the organization should focus on their facility exposure, and their awareness of data security to increase competency in handling ICT equipment. Keywords: ICT competency · Data security awareness · Proper handling of ICT

1 Introduction With the advancement of technology today, users do not only need to know how to operate it. They should also have enough knowledge on how to protect their data and their machines from malwares, intruders, hackers, and malicious attacks which may threaten their computer systems. There have been many studies on data security awareness from a different perspective. On employees’ aspect, their awareness of their company’s information security policy and procedures increase their competency to manage cybersecurity tasks than those who are not aware of their company’s cybersecurity policies [1]. The relationship between the security culture played an important mediating relationship between organizational culture and Information Security Awareness (ISA) that suggests © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 682–692, 2022. https://doi.org/10.1007/978-3-031-10467-1_41

Data Security Awareness and Proper Handling of ICT Equipment of Employees

683

that organizations should focus on security culture rather than organizational culture to improve ISA [2]. The general’s public unawareness or failure to address data security are reasons why employees aren’t convinced about their company’s vulnerability to cybercrimes [3]. Innovations and applications related to ICT have become major drivers to enhanced organizational performance. However, the proper usage of ICT equipment leads to a higher level of growth and sustained competitive advantage in an organization [4]. The effect of ICT implementation on employees’ jobs had a significant positive influence on employees’ job characteristics, however, employees were less satisfied with their job and had lower performance following the ICT implementation due to four contextual forces like environmental barriers, learning difficulty, culture shock, and employees valuation [5]. Some studies revealed that the most important factor that is required to support an ICT organization’s performance is the human resource management system with knowledge management and employee competency to ICT [6]. On contrary, some studies reveal that technostress is one variable that affects employees’ use of ICT in their work tasks. This technostress affects the employees’ satisfaction with the use of ICT and their intentions to extend the use of ICT [7]. ICT competencies are increasingly important for most employers, regardless of role. Employees must be open to updating their skills continuously. It is noted that in most job profiles, there is a generation problem as the older employees are typically less familiar with ICT and more reluctant to use computers at the beginning. Research studies reveal that the top five competencies in organizations were lifelong learning, personal attitude, teamwork, dependability, and IT foundations among others [8]. The dimensions of Digital Competency (DC) at work are composed of a specific combination of knowledge, skills, abilities, and other characteristics that are needed to perform at today’s digital workplaces. In some cases, handling of hardware requires basic knowledge about the devices and hardware available and the cognitive ability to operate them. Furthermore, the skills to handle the hardware appropriately and the motivation to do so are basic requirements for this competency [9]. Another relevant finding on acquiring ICT competency is on how intensive an employee used ICT. This is linked to higher mastery and greater willingness of the employee to acquire ICT competencies [10]. It is then noted that ICT competency among workers is equally important along with their awareness on how to deal with data to properly address communication exchange through the use of ICT. Pieces of training, which are a tremendous way to upgrade workers today on the continuous advancement of technology are given utmost priority among organizations to be equally competitive in the current trends of society. This includes not only how to properly handle ICT equipment but also on improving awareness of data security as this is an advantageous attribute to safeguard the organization from cyber-attacks and a powerful mechanism for mitigating information security risk [11]. This research study then aimed to assess the employees’ level of competencies in the proper handling of ICT equipment and their level of awareness on data security. This assessment will be a basis on what level of competency the employee will acquire to be able to perform effectively and efficiently in the office.

684

D. M. Ayuyang

As employees perform their duties every day, they are exposed to different ICT equipment available in the office. How to use this equipment at its maximum level is sometimes a problem most especially if they were not equipped with enough knowledge on how to properly handle it. Specifically, it sought to answer the following questions: What is the profile of the respondents concerning: age, sex, civil status, educational attainment, present Plantilla position, designation, and years in service; what is the ICT exposure of the respondents in terms of: available ICT facilities in the office, number of hours of using ICT facilities per day, ownership of ICT equipment and internet access; what is the level of awareness of the respondents on data security; what is the level of competency of the employees in the proper handling of ICT equipment; and the relationship between the level of competency of the respondents on the proper handling of ICT equipment and their profile variables, facility exposure to ICT equipment and their level of awareness to data security. 1.1 Research Framework

INDEPENDENT VARIABLES Profile Age Sex Highest Educational Attainment Position Designation Facility Exposure to Technology Available ICT facilities in the office Number of hours in using ICT Equipment Number of owned ICT equipment Number of hours in connecting to the internet in the office Available devices used in connecting to internet

DEPENDENT VARIABLES

Level of Competency on the Proper Handling of ICT Equipment

Data Security Awareness

Fig. 1. Paradigm of the Study Showing the Relationship between the Independent and Dependent Variables.

Data Security Awareness and Proper Handling of ICT Equipment of Employees

685

As illustrated in Fig. 1, the paradigm developed for this study, which depicts the relationship between the independent and dependent variables, determines the characteristics and potentials of the respondents in terms of their exposure to technology and their awareness of data security, which may influence their competency in the Proper Handling of ICT equipment.

2 Methods A questionnaire was the principal instrument used to assess the level of competency of 46 employees of Cagayan State University, Gonzaga campus which is composed of the faculty and administrative personnel who are assigned to do office and administrative work. This has three parts, namely: the profile of the respondents and their exposure to ICT facilities as part 1; the level of competency in the proper handling of ICT equipment as part 2, and level of awareness on data security as part 3. The following Likert-scale adopted from the CICT NICS survey was used to assess the respondents in their level of competency in handling ICT equipment: Fully Mastered = 5, Mastered = 4, Nearly Mastered = 3, Partly Mastered = 2, and Not Mastered = 1. For the level of awareness on data security of the respondents, the scale strongly aware = 5, Aware = 4, somewhat aware = 3, Unaware = 2, and strongly unaware = 1 was used. The study used frequencies, percentages, ranks, means, and standard deviations in describing the profile of the respondents. To interpret the responses of the respondents the following statistical limits were used: 4.20 – 5.00 = fully mastered; strongly aware, 3.40–4.19 = mastered, aware; 2.60–3.39 = nearly mastered, somewhat aware; 1.8–2.59 = partly mastered, unaware; and 1.00–1.79 = not mastered, strongly unaware. Bivariate correlation analysis using Pearson r, Spearman rho, and point-biserial tests were employed whichever is appropriate for each pair of variables. All inferential questions were tested at 0.05 level of confidence.

3 Results and Discussions 3.1 Profile of the Respondents

Table 1. Profile of the respondents Variable Age

Mean SD

40.11 years old 11.84

Sex

Female Male

32 14

Civil status

Married Single

34 12 (continued)

686

D. M. Ayuyang Table 1. (continued)

Variable Highest educational attainment

Doctorate degree Master’s degree Bachelor’s degree

9 22 15

Position rank

Faculty Administrative Personnel

33 13

Designation

With designation Without designation

23 23

Length of service

Mean SD

12.52 years 12.03

The respondents are at an average age of 40 years old, mostly female, married, with master’s degree, mostly faculty in rank, and with an average length of service of 12 years. However as to their designation, half of them are with designation and half of them are without designation as shown in Table 1. 3.2 Facility Exposure of the Respondents to ICT Equipment

Table 2. Facility exposure of respondents to ICT equipment Variable Available ICT facilities

Printer Desktop Computers Mobile cellular phone Laptop/netbooks/tablets Scanner Photocopier

41 40 37 36 28 20

Number of hours in using ICT Equipment per day

Mean SD

4.20 h 2.43

Number of owned ICT Equipment

Mean SD

1.98 1.14

Availability of Internet Access in the office

Yes no

25 21

Number of hours in using Internet per day

Mean SD

3.8 h 2.7

Devices used in Connecting to Internet

Handheld mobile phone Portable computers Desktop computers Tablet computers Don’t use any

33 25 12 2 1

Data Security Awareness and Proper Handling of ICT Equipment of Employees

687

As shown in Table 2, it is evident that most of the offices have available printers (rank 1) followed by desktop computers. However, some of the offices have available mobile cellular phones, laptops, scanners, and photocopiers. This is an indication that offices in the university are equipped with ICT facilities which are being used daily by the respondents. On the other hand, the average exposure time of the respondents in using ICT facilities per day is 4.20 h with a standard deviation of 2.43. This implies that most of their tasks are executed by using ICT equipment. As to the respondent’s ownership of ICT equipment, all of them at least owned one ICT equipment as indicated in the mean of 1.98 and standard deviation of 1.14. This implies that the respondents are with the experience of using ICT equipment be it their personal or in the office. Connectivity in the office is another milestone of ICT nowadays. Respondents as shown in Table 2, reported that almost half of them have internet access in the office. This indicates that online facilities are also available for them as part of their basic task. And as to their number of hours in using the internet, the respondents have an average hour of connecting to the internet of 3.8 h with a standard deviation of 2.7 h daily. This implies that the respondents used the internet for some of their office work which most of them used their handheld mobile phone or smartphone along with their portable computers (laptop) and some also used their desktop computers which are available in the office. 3.3 Technology Competency Level of Employees in Proper Handling of ICT Equipment

Table 3. Technology competency level of employees in proper handling of ICT equipment Standards

Weighted mean

Descriptive value

1.Demonstrate knowledge and skills in basic computer operation and other information devices including basic troubleshooting and maintenance

3.63

Mastered

2.Use appropriate office and teaching productivity tools

3.73

Mastered

3.Understand and effectively use the Internet and network applications and resources

3.32

Nearly Mastered

4.Demonstrate knowledge and skills in information and data management

3.28

Nearly Mastered

Overall Weighted Mean

3.49

Mastered

Table 3 shows the overall weighted mean of the respondents on the four standards of proper handling of ICT Equipment. The findings show that the respondents have a competency level as “mastered” in the standards which are in demonstrating their knowledge and skills in basic computer operation and other information devices including basic troubleshooting and maintenance and in using appropriate office productivity

688

D. M. Ayuyang

tools as shown in their weighted mean of 3.63 and 3.73 respectively. However, in the area as to the use of the Internet and network application and resources, and in demonstrating their skills and knowledge in information and data management, their competency level is “nearly masters” which means that they can do the function occasionally but need further practice to be confident. 3.4 Respondents Level of Data Security Awareness

Table 4. Level of data security awareness of employees Standards

Weighted mean

Descriptive value

1.Information Security is an important part of my work 4.61

Strongly aware

2.What I do on my computer affect other people

3.76

Aware

3.I know what constitutes acceptable use of my computer

4.26

Strongly Aware

4.Password is important to secure my computer

4.73

Strongly aware

5.I will share my login and password in case I am absent from work

2.72

Slightly aware

6.Saving my files to my computer’s hard drive is more secure than to my flash drives

3.87

Aware

7.The use of Facebook messenger is an appropriate method for sending information containing names and contact details to another office

3.41

Aware

8.Handling documents that contain personal information are subject to internal procedures and policies to protect the confidentiality

4.5

Strongly Aware

9.I take the information home and work on it with my home computer

3.63

Aware

10.I often access shared drives, files, applications, and emails remotely

2.93

Slightly Aware

11.I can play a significant role in protecting my computer and the information stored in it

4.28

Strongly Aware

12.There is nothing on my work computer or in my immediate workspace that would be of any interest or value to others

3.0

Slightly Aware

Overall Weighted Mean

3.81

Aware

It is reflected in the result shown in Table 4 that respondents are aware of securing the data they deal with daily in the office. As evident in the table, the importance of password in protecting the computer was rated the highest with a mean of 4.73. This means that the respondents are strongly aware of securing their files or data stored in

Data Security Awareness and Proper Handling of ICT Equipment of Employees

689

their computers. However, sharing their login and password in cases they are absent was rated the lowest mean of 2.72. This means that the respondents are slightly aware of the risks of sharing their password and logins with other employees. 3.5 Relationship Between Respondents’ Competency Level on the Proper Handling of ICT Equipment and Their Profile Variables

Table 5. Result of correlation analysis between the employees competency level on the proper handling of ICT equipment and their profile Variables

Correlation coefficient

Probability

Statistical inference

Proper Handling of ICT competency level Profile Age

-0.0483

0.000

Significant

Civil Status

-0.229

0.488

Not Significant

Highest Educational Attainment

0.206

0.333

Not Significant

Position/Rank

-0.142

0.319

Not Significant

Years in Service

-0.045

0.000

Significant

Availability of ICT facilities 0.201

0.037

Significant

Number of hours using ICT 0.137

0.027

Significant

Number of owned ICT Equipment

0.055

0.683

Not Significant

Number of hours using Internet

0.123

0.023

Significant

Technology facility exposure

Table 5 presents the results of the analysis made to ascertain whether or not a significant relationship exists between the respondents’ level of competency on the proper handling of ICT equipment and their profile variables. The statistical test shows that the competency level of employees is significantly related (P < 0.05) to their age and years in service. Likewise, their facility exposure as to the availability of ICT facilities, number of hours in using ICT equipment, number of hours in using the internet, and availability of devices for the internet. It is very interesting to note that younger employees with fewer years in service have higher competency levels on the proper handling of ICT equipment. Furthermore, an increase in their facility exposure also leads to a higher competency level. These findings are confirmed by the findings of some studies such as the generation gap in terms of technology adoption where the younger generation was far ahead savvy

690

D. M. Ayuyang

in terms of technology use as compared to senior generations [12]. Also, there is a generation problem as the older employees are typically less familiar with ICT and more reluctant to use computers at the beginning. Some studies also confirmed the findings on facility exposure of an employee to ICT facilities in the office, such that the innovative use of ICT resources can generate better organizational performance and also their adoption to ICT facilities [4]. 3.6 Relationship Between Respondents’ Competency Level on the Proper Handling of ICT Equipment and Their Level of Awareness on Data Security

Table 6. Result of correlation analysis between the employees competency level on the proper handling of ICT equipment and their level of awareness on data security Variables

Correlation coefficient

Probability

Statistical inference

0.007

Significant

Proper Handling of ICT competency level Level of Awareness on Data 0.979 Security

Another significant finding of this study is on the relationship between the level of competency on the proper handling of ICT equipment and their awareness of data security. As shown in Table 6, the higher their level of awareness on data security the higher is their competency level on handling ICT equipment. This reveals that employees’ awareness of how to secure their data increase their competency level. The result of this study was confirmed by other research findings where their awareness of their company’s information security policy and procedures increase their competency to manage cybersecurity tasks than those who are not aware of their company’s cybersecurity policies [1].

4 Conclusions and Recommendations Findings show that the skill of the respondents is categorized as “mastered” meaning they can use the functions regularly and confidently in the areas of basic computer operation and other information devices including basic troubleshooting and in the area on how they used appropriate productivity tools. However, in the areas of how they understand and effectively use the internet and network applications and resources, and on how they demonstrate knowledge and skills in information and data management, their skill which is categorized as “nearly mastered”, which means that they can do the function occasionally but need further practice to be confident. For data security, awareness, respondents were slightly aware as to how they handle information like securing their passwords and logins and passing information from their office to others which means that they are less conscious of the risks of sharing information with others.

Data Security Awareness and Proper Handling of ICT Equipment of Employees

691

Also, the researcher concludes based on the results evaluated that the significant correlates of the competency level on the proper handling of ICT equipment of employees are age, years in service, availability of ICT facilities, number of hours in using ICT, number of hours in using internet and availability of devices for internet. Furthermore, their awareness of data security was found to be significant in handling ICT equipment. Therefore, it is suggested that the organization should focus on their facility exposure and their awareness of data security to increase competency in handling ICT equipment. The organization may also consider improving their ICT infrastructure and its internet connectivity for this is needed in the current trend to exchange information nowadays. Acknowledgment. The funding support provided by the Cagayan State University headed by the university president Dr. Urdujah G. Alvarado and the Campus Executive Officer of Gonzaga campus headed by Dr. Froilan A. Pacris Jr., is gratefully acknowledged. The research expertise of the Research, Development, and Extension (RDE) department of Cagayan State University is deeply appreciated. The inputs of the research enthusiast in the campus are highly recognized during the in-house reviews conducted in the university. I am also grateful to all the respondents who participated in the said study.

References 1. Li, L., He, W., Xu, L., Ash, I., Anwar, M., Yuan, X.: Investigating the impact of cybersecurity policy awareness on employees’ cybersecurity behavior. Int. J. Inf. Manage. 45, 13–24 (2019). https://doi.org/10.1016/j.ijinfomgt.2018.10.017 2. A. Wiley, A. McCormac, and D. Calic, “More than the individual: Examining the relationship between culture and Information Security Awareness,” Comput. Secur., vol. 88, 2020, doi: https://doi.org/10.1016/j.cose.2019.101640 3. Kemper, G.: Improving employees’ cyber security awareness. Comput. Fraud Secur. 2019(8), 11–14 (2019). https://doi.org/10.1016/S1361-3723(19)30085-5 4. Yunis, M., Tarhini, A., Kassar, A.: The role of ICT and innovation in enhancing organizational performance: the catalysing effect of corporate entrepreneurship. J. Bus. Res. 88(December), 344–356 (2018). https://doi.org/10.1016/j.jbusres.2017.12.030 5. Venkatesh, V., Bala, H., Sykes, T.A.: Impacts of information and communication technology implementations on employees’ jobs in service organizations in India: a multi-method longitudinal field study. Prod. Oper. Manag. 19(5), 591–613 (2010). https://doi.org/10.1111/j. 1937-5956.2010.01148.x 6. Kiatsuranon, K., Suwunnamek, O.: Determinants of Thai information and communication technology organization performance: a structural equation model analysis. Kasetsart J. Soc. Sci. 40(1), 113–120 (2019). https://doi.org/10.1016/j.kjss.2017.08.004 7. Fuglseth, A.M., Sørebø, Ø.: The effects of technostress within the context of employee use of ICT. Comput. Human Behav. 40, 161–170 (2014). https://doi.org/10.1016/j.chb.2014.07.040 8. Siddoo, V., Sawattawee, J., Janchai, W., Thinnukool, O.: An exploratory study of digital workforce competency in Thailand. Heliyon 5(5), e01723 (2019). https://doi.org/10.1016/j. heliyon.2019.e01723 9. Oberländer, M., Beinicke, A., Bipp, T.: Digital competencies: a review of the literature and applications in the workplace. Comput. Educ. 146(September), 2020 (2019). https://doi.org/ 10.1016/j.compedu.2019.103752

692

D. M. Ayuyang

10. Tijdens, K., Steijn, B.: The determinants of ICT competencies among employees. New Technol. Work Employ. 20(1), 60–73 (2005). https://doi.org/10.1111/j.1468-005X.2005.001 44.x 11. Eminaˇgaoˇglu, M., Uçar, E., Eren, S: ¸ The positive outcomes of information security awareness training in companies - a case study. Inf. Secur. Tech. Rep. 14(4), 223–229 (2009). https:// doi.org/10.1016/j.istr.2010.05.002 12. Marcial, D.E., De Rama, P.A.: ICT competency level of teacher education professionals in the Central Visayas Region, 3(5) (2015)

Developing a Webpage Phishing Attack Detection Tool Abdulrahman Almutairi and Abdullah I. Alshoshan(B) Qassim University, Qassim, Saudi Arabia {411107270,ashoshan}@qu.edu.sa

Abstract. Phishing is an identity theft fraud strategy in which users obtain false website links from fraudulent addresses that tend to belong to legitimate and actual companies in order to steal the receiver’s personal information. The proposed tool helps users to check the received URLs using the pre-set white-list to determine whether it is legitimate or not. In addition, the tool can test any URL with the database and the results shows that the tool achieved 100% true positive rate and 100% true negative rate. Furthermore, the tool classifies any URL that neither in the database nor is phishing to the URLs in the database as unknown. However, the tool needs to be developed in terms of unknown URLs to be more reliable. Keywords: Phishing · Spoofing detection · Webpages · URLs · Feature extraction · X.509 · Public key · TLS certification · Favicon

1 Introduction Phishing attacks are intended to mislead people and collect sensitive information [1], such as usernames, passwords, credit card numbers, and IDs from a victim [2]. To deter phishing attacks, numerous countermeasures have been proposed. However, these solutions have not achieved the expected decrease of phishing attacks [3]. Many organizations and research groups have embraced the training users approach in order to enhance human knowledge of cyber-security by raising awareness [4, 5]. However, users tend to forget some of the skills and details related to security awareness, making preservation of the knowledge acquired a challenge for this approach [6, 7]. In this form of attack, the phisher uses a number of strategies to trick the user. These methods have been categorized into the following categories [8]: • Social engineering, which includes all phishers’ methods and scenarios for developing a credible context. • Imitation, which entails creating fake websites that appear to be legitimate. • E-mail spoofing, this allows a phisher to spoof an email’s source address. • URL hiding, phishers will hide the URL to which a user is redirected using this technique.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 693–709, 2022. https://doi.org/10.1007/978-3-031-10467-1_42

694

A. Almutairi and A. I. Alshoshan

According to the 2021 Anti-Phishing Working Group (APWG) Phishing Activity Trends Report [9], APWG saw 260,642 phishing attacks in July 2021, which was the highest monthly in APWG’s reporting history. The number of phishing attacks has doubled from early 2020. The software-as-a-service and webmail sector was the most frequently victimized by phishing in the third quarter, with 29.1% of all attacks. Attacks against financial institutions and payment providers continued to be numerous, and were a combined 34.9% of all attacks. Phishing against cryptocurrency targets – cryptocurrency exchanges and wallet providers – settled at 5.6% of attacks. The number of brands being attacked has risen during 2021, from just over 400 per month to more than 700 in September. Attackers use e-mails, SMS messages, and even phone calls to reach people. The main aim is to persuade people to reveal personal details, such as through fake web forms or other data entry methods that send the information to databases used by cyber criminals to hurt the victim, such as by using their money or information in illegal activities. The fundamental issue with phishing attacks is that they attempt to “fish” people using social engineering, and in the case of fraudulent webpages, it thrives on people with little to no knowledge of computer systems, giving the impression that they are entering information on trustworthy websites. Despite the extensive research into phishing detection, no single set of features has been identified as the most effective in detecting phishing. Furthermore, there is a need to keep improving the detection techniques’ accuracy; therefore, his study aims to provide a new method by using a combination of favicons and public keys to build up a tool to detect phishing websites. The following sections provide an overview of the related work in the field of phishing detection, outlines the used research methodology, and present the system design and the implementation of the proposed work along with the results and evaluation.

2 Webpages Phishing Detection Techniques Many webpages phishing-detection techniques are existed in literature, such as: 2.1 Search Engine Based Search engine-based techniques extract features from websites such as text, images, and URLs, then use search engines to find those features and collect the results. When searching for a normal website, the assumption is that it will be among the top search results and have a higher index than phishing webpages, which are only active for a short period of time [10]. 2.2 Visual Similarity Based To detect phishing, this technique makes use of the visual similarity between webpages. When phishing websites are compared to authentic websites based on their visual characteristics, it determines whether they are in the same domain and, if not, the website is marked as a phishing website [11].

Developing a Webpage Phishing Attack Detection Tool

695

2.3 Blacklist and Whitelist Based This technique utilizes a whitelist database of legitimate websites and a blacklist database of suspicious websites. The blacklist is collected from phishing websites that have been reported by users, or other third parties who are performing phishing detection using other techniques [10] . 2.4 Heuristics and Machine Learning Based These techniques extract a set of features from normal or abnormal websites, such as text, images, or URL specific information. A set of heuristics is used, and the thresholds or rules derived from the learning algorithms are used to detect anomalies [12]. 2.5 Proactive Phishing URL Detection Based This scheme helps to detect possible phishing URLs by generating different sequence URLs from existing authentic URLs and assessing whether they exist and are involved in phishing-related web activities [13]. 2.6 DNS Based DNS is used to validate a phishing website’s IP address. For example, DNS will check to see if the IP address used by the phishing website is on the list of authentic website IPs. If not, the website is flagged as phishing. DNS can be used in a variety of ways, depending on the needs of the user [14].

3 Related Work In recent years, a number of studies have looked at the topic of phishing. Each of these studies takes different approaches to the issue of phishing. A model that uses an auto-update whitelist of legitimate sites to alert users when a URL is not included in the whitelist is suggested in [15]. They check the validity of a website using Domain and IP address matching module and examine the features of the hyperlinks from source code. The model observed 86.02% true positive rates and 1.48% false negative rates. Other researchers have also created an anti-phishing technique based on a weighted URL token scheme, as in [16]. They collected identity keywords from a query webpage and used them as the page’s signature, which was then fed into a search engine to find the target domain name. They performed the tests using regular datasets, achieving 99.20% true positives and 92.20% true negatives.

696

A. Almutairi and A. I. Alshoshan

In addition, some researchers used the image-based approach which is another type of anti-phishing method that is based on the analysis of website images such as, favicon. Image-based approaches got attention since they are able to transcend the weaknesses of text-based anti-phishing techniques [17]. These approaches use the image processing tools (e.g., ImgSeek [18]) and OCR technology [19] and image processing techniques. Different phishing websites that are targeting the same legitimate websites will look the same [18]. The appearance includes textual and graphic elements, such as the favicon and since the favicon is a visual representation of the website, we can use it to identify phishing sites. Furthermore, some websites do not contain favicon; therefore, some other features should be used. A method that is based on favicon-based phishing is proposed in [20]. They used other features such as Second-Level Domain, Path in URL and Title and Snippet. The proposed method observed 96.93% true positive rates and 4.13% false negative rates. Around 10,000 valid certificates were analyzed for phishing websites and compared them with 40,000 certificates collected from legitimate sites [21]. The analysis showed that using only information contained in a website’s certificate, it is difficult to distinguish between phishing and friendly websites.

4 The Proposed Approach Initially, a comprehensive literature review on the features that were used in webpages phishing were presented in Sect. 3. As a result, the proposed tool is made to help users to check received URLs to a white-list database. Then, if it is matched legitimate that means the URL is for legitimate webpage, and if it is matched phishing that means the URL is for phishing webpage for a legitimate URL in the white-list database. If it is not matched, it means that user must take care about submitting their information. The proposed work begins with the development of a phishing detection tool written in Python and utilizing a number of Python libraries. The approach is divided into two major phases: The first phase involves collecting data based on URLs public key and favicon from legitimate URLs datasets and storing them in a whitelist database. The second phase entails developing a matcher to test the tool. Moreover, the results are evaluated using a confusion matrix. Figure 1 presents the system design of the tool. 4.1 The First Phase The first phase contains five functions; each one is depending on the previous one. Initially, a text file of all needed URLs is added to the whitelist database should be prepared.

Developing a Webpage Phishing Attack Detection Tool

697

Fig. 1. System design.

1st Function is used to read the dataset line by another from a text file, and then it stores them in variable list called (all_URLs) and finally pass it to the next function shown in Fig. 2.

Fig. 2. 1st function.

2nd Function is used to check each URL whether is shortened or not. Furthermore, it makes use of a python function called response history that resides under the response library and it returns a list of response objects holding the history of http request of any URL. Moreover, URL redirection codes is in range of 300–308, therefore, this function

698

A. Almutairi and A. I. Alshoshan

runs in a loop until getting the final response that presents the success code and then it returns the final URL in a variable list called (checked_URLs), see Fig. 3.

Fig. 3. 2nd function.

3rd Function consists of two sub-functions. The first one is responsible of making amendments to any URL in the variable (checked_urls) to be readable by F4 and it stores them in a variable list called (final_urls) as in Fig. 4. On the other hand, the second sub-function used to get any URL’s favicon. Furthermore, it hashes the favicon using (sha256) to get a unique value that identifies any URL’s favicon, and then it stores them in a variable list called (hash_values) as in Figure 5.

Fig. 4. 3rd function-1.

Developing a Webpage Phishing Attack Detection Tool

699

4th Function contains of three sub-functions that present at the end point the hexdecimal value of the webpage public key which is a unique identifier to be used in the whitelist database. Initially, the first sub-function is uses the variable (final_url) which had been formulated to be readable by this function. Furthermore, it makes an SSL connection using hostname and 443 port to get the SSL’s context of the webpage. After that, it gets the PEM file, which contains the certificate information. Then, it uses a function called (OpenSSL.crypto.load_certificate) to get X.509 file that contains the public key and its exponent. Moreover, the second sub-function at this point, checks the parameters of the file whether they are (x and y) or (n and e) and this differ depending on the used algorithms whether it is ECC or RSA. Finally, the third sub-function converts integer public key to hexadecimal numbers, and store them in a variable list called (Public_key) as in Fig. 6. The final step of this phase is to store Public_key, Hash_values and final_urls in mysql database, which is called (White_Database). Moreover, Public_key is set to be the primary key because it is unique to any website’s identity, and the other variables is stored to be used to in the 2nd phase, as shown in Figure 7.

Fig. 5. 3rd function-2.

700

A. Almutairi and A. I. Alshoshan

Fig. 6. 4th function 1&2.

Fig. 7. Data insertion.

Developing a Webpage Phishing Attack Detection Tool

701

Fig. 8. 1st sub-phase-1.

4.2 The Second Phase Initially, the second phase uses the same functions that have been used in the previous phase excluding the storing step and it has three sub-phases to build a precise matcher. 1st Sub-Phase: After extracting all needed features a looped query is initiated to get the match values of Public_key and returns Hash_values and the webpage URL and in this step a counter is set to record the number of detected matched. Moreover, if there is a match, the returned webpage URL values are stored in a variable list called (Legitimate_url) and the returned legitimate favicons values are stored in a variable list called (Legitmate_fav) to be used next, as in Fig. 8. The next step is to remove legitimate URLs from the (Final_urls) and store them in a variable list called (URLs_without_legitimate),

Fig. 9. 1st sub-phase-2.

702

A. Almutairi and A. I. Alshoshan

as in Fig. 9, and remove legitimate favicons hash values from (Hash_values), and store them in a variable list called (Fav_re_check) to be checked in the next sub-phase, as in Fig. 10.

Fig. 10. 1st sub-phase-3.

2nd Sub-Phase: Another looped query is sent to the database using the variable (Fav_re_check), and it returns the name of the matched phishing website name and they are stored in a variable list called (Matched_phish). In addition, at this step, a counter is set to identify a process number to be used as a value locator in the variable list (URLs_without_legitimate) if it is matched phishing, then it returns the value of the phishing URLs and store them in a variable list called (Phishing_urls) to be linked with matched phishing. Furthermore, another counter is set to count matched phishing URLs, as shown in Fig. 11.

Fig. 11. 2nd sub-phase.

Developing a Webpage Phishing Attack Detection Tool

703

3rd Sub-Phase: In the last sub-phase, a looped function is made to count all values that neither in the variable (URLs_without_legitimate) nor in the variable (Phishing_urls) and are identified as unknown URLs and store them in a variable list called (Unknown_urls), as in Fig. 12.

Fig. 12. 3rd sub-phase.

5 Implementation and Testing In this section, the experiment will be presented and performed using real phishing methods, along with the results and evaluation, in the following sections we will present how dataset is chosen and what tools are used in each step, and then we will present the experiment results. 5.1 Dataset The experiment needs two datasets, the first one is for making the whitelist Database of legitimate URLs and the other one is for testing. To perform the experiment, it is required to create datasets that meet the following conditions: (1) the webpage is online, (2) the web server runs HTTPS, (3) the webpage contains favicon. Legitimate Dataset: There are several legitimate web pages chosen from Statista website based on their popularity and importance. Table 1 presents the dataset of legitimate webpages:

704

A. Almutairi and A. I. Alshoshan Table 1. The most popular website according to statista. No.

Identity

URL

1

Google

https://accounts.google.com/

7

Microsoft

https://login.live.com/

4

Twitter

https://twitter.com/

5

Amazon

https://www.amazon.com/

3

Facebook

https://www.facebook.com/

2

Instagram

https://www.instagram.com/

6

Reddit

https://www.reddit.com/

8

YouTube

https://www.youtube.com/

As mentioned, the tool can deal with shortened URLs, some of the above URLs will be shortened by using Shorturl Online tool. Moreover, Table 2 shows the final dataset of legitimate URLs and its characteristics, Where SH refers to Shortened. The next step is to copy all of above URLs in a text file that has been attached to the developed tool “legitimate Dataset”, and separating them by using new line to be readable. Table 2. Legitimate URLs dataset. No.

Identity

URL

1

Google

shorturl.at/csPUY

2

M icrosoft

shorturl.at/hlrxR

3

Twitter

shorturl.at/hmvLP

4

Amazon

https://www.amazon.com/

5

Facebook

https://www.facebook.com/

6

Instagram

shorturl.at/gvCN7

7

Reddit

https://www.reddit.com/

8

YouTube

https://www.youtube.com/

SH

Testing Dataset: To perform the testing, 4 of the Legitimate URLs are chosen to be Phished, which are number 1, 2, 3 and 5, see Table 3, by cloning them by using their source codes. In addition, there will be 1 URL added to the dataset randomly which is number 9 to test how the tool deal with Unknown URLs. Moreover, to make them online, Ngrok is a well-known tool in this field had been used. Furthermore, Ngrok provides secure tunnels that connect local servers behind NATs and firewalls to the public internet.

Developing a Webpage Phishing Attack Detection Tool

705

All the above were done by using a virtual Linux machine that runs Ubuntu OS and has Ngrok installed on. The experiment uses the Ngrok paid version because the free one only provide 2 tunnels whereas 4 tunnels are needed to tunnel the 4 phished webpages. The outcome of this process is 4 tunnels as follow: As shown in Table 3, where H refers to HTTPS, F refers to Favicon and SH refers to Shortened, two of the phished URLs are shorten whereas the other two are not, and that for trying the possibility of getting shortened URLs. The next step is to copy all of above URLs in a text file that has been attached to the developed tool as “Testing Dataset”, and separating them by using new line to be readable. Furthermore, one new URL has been added to dataset, which is not included in the legitimate dataset to test the tool. 5.2 Storing White-List Database To store the white-list database, which has the outcome of the first phase, MySql server is used. Moreover, a database has been created that has one table, which contains the public keys, favicon hash values and the link of the webpage. Furthermore, the public key and the favicon are set to be primary key of the table to prevent redundancy and to avoid discarding webpages that are using the same public key and different favicons. 5.3 Experiment Results In this part, the experiment results will be presented depending on each phase of the phishing detection tool. The beginning will be with the first part, which is about the first phase. Then the second part, which is about the second phase and finally the results, will be evaluated using the confusion matrix. 1st Phase: After running the phishing detection tool by using the legitimate dataset to build the white-list database the results show that 8 records have been added to the database which is the 100% of the total legitimate URLs dataset as in Fig. 13. 1st phase outcomes 10 5 0 Legitimate URLs

DB records

Fig. 13. 1st Phase results.

2nd Phase: After running the webpage detection tools to test the tool by using testing dataset, the results show that 100% of the phishing URLs are detected which represents 44.44% of the total testing dataset. In addition, the results show that 100% of the legitimate URLs are detected legitimate which represents 44.44% of the total testing dataset.

706

A. Almutairi and A. I. Alshoshan

Finally, the results show that 100% of unknown URLs are detected unknown, which represents 11.11% of the total testing dataset as in Fig. 14. Table 3. Testing URLs dataset. #

Identity

1 2 3

Google Microsoft

4 5

Amazon

6 7 8 9

Instagram Reddit YouTube Wikipedia

Twitter

Facebook

URL

H

F

SH

shorturl.at/csPUY shorturl.at/hlrxR https://d930 -37-107-927.ngrok.io https://www.amazon.com/ https://3721 -37-107-927.ngrok.io shorturl.at/gvCN7 https://www.reddit.com/ https://www.youtube.com/ https://www.wikipedia.org/

URL Classification Phishing Phishing Phishing Legitimate Phishing Legitimate Legitimate Legitimate Unknown

2nd phase 10 8 6 4 2 0 Tested ULRs

Detected ligitimate

Detected phishing

Unknown URLs

Fig. 14. 2nd phase results.

In addition, the phishing detection tool provides references to each phishing URL with the Phished URL. 5.4 Results Evaluation Accuracy is the rate of correct predictions made by the tool. Precision and recall, on the other hand, are two evaluation techniques that are calculated using a confusion matrix as shown in Table 4 and computed according to Eqs. (1), (2) and (3). Precision =

TP TP + FP

(1)

Developing a Webpage Phishing Attack Detection Tool

Recall = Accuracy =

TP TP + FN

707

(2)

TP + TN TP + FP + TN + FN

(3)

where the true positive (TP) is the number of correct detected phishing URLs, the false negative (FN) is the number of phishing URLs were detected as legitimate URLs, the false positive (FP) is the number of legitimate URLs were detected as phishing URLs, and the true negative (TN) is the number of legitimate URLs were detected as legitimate URLs. Table 4. Confusion matrix. Classified phishing

Classified legitimate

Actual phishing

TP

FN

Actual legitimate

FP

TN

The results show that the phishing detection tool achieved the accuracy, precision and recall as of 100% as in Table 5. Table 5. Confusion matrix result. Tool

Precision

Recal

Accuracy

Phishing detection tool

100

100

100

6 Conclusion and Future Work There are many techniques for detecting phishing URLs, but some limitations exist, such as low accuracy, content that really is similar to legitimate webpages and thus cannot be detected, and a low detection rate. In this paper, a webpage phishing detection tool has been developed to help users to check received URLs to a white-list database based on two main features which are the public key and the favicon. Moreover, the tool consists of two phases, the first one for features extraction and store them in MySql database. On the other hand, the second phase is for testing the tool. Both the first phase and the second one have some extra functions such as, dealing with shortened URLs. Finally, its accuracy was evaluated based on confusion matrix. For the first phase, there were 100 legitimate URLs have been used to extract their features and there were 10 of them have been shortened and the tool had dealt with them successfully. Finally, the tool has stored them in the database. For the second phase, there were 100 URLs chosen as follows:

708

A. Almutairi and A. I. Alshoshan

• 46 URLs chosen from the legitimate URLs dataset. • 4 URLs are phishing. • 50 URL has been chosen randomly and unknown. The results of detection were as follows: • 46 URLs are detected Legitimate. • 4 URLs are detected phishing. • 50 URL classified as unknown To keep up with the constant development of new techniques by phishers, phishing detection tools must be improved. As a result, we recommend creating a new automated phase to deal with unknown URLs in order to improve the functionality of the tool detection.

References 1. Leite, C., Gondim, J.J., Barreto, P.S., Alchieri, E.A.: Waste flooding: a phishing retaliation tool. In: 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), pp. 1–8, September 2019 2. Park, G., Rayz, J.: Ontological detection of phishing emails. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2858–2863. IEEE, October 2018 3. Lötter, A., Futcher, L.: A framework to assist email users in the identification of phishing attacks. Inf. Comput. Secur. (2015) 4. Qabajeh, I., Thabtah, F., Chiclana, F.: A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Comput. Sci. Rev. 29, 44–55 (2018) 5. Alsharnouby, M., Alaca, F., Chiasson, S.: Why phishing still works: user strategies for combating phishing attacks. Int. J. Hum Comput Stud. 82, 69–82 (2015) 6. Ghafir, I., et al.: Security threats to critical infrastructure: the human factor. J. Supercomput. 74(10), 4986–5002 (2018). https://doi.org/10.1007/s11227-018-2337-2 7. Khonji, M., Iraqi, Y., Jones, A.: Mitigation of spear phishing attacks: a content-based authorship identification framework. In: 2011 International Conference for Internet Technology and Secured Transactions, pp. 416–421. IEEE, December 2011 8. Bergholz, A., Chang, J.H., Paass, G., Reichartz, F., Strobel, S.: Improved phishing detection using model-based features. In: CEAS, August 2008 9. Phishing activity trends reports. (2020), https://apwg.org/trendsreports/. Accessed 15 Mar 2021 10. Varshney, G., Misra, M., Atrey, P.K.: A survey and classification of web phishing detection schemes. Secur. Commun. Netw. 9(18), 6266–6284 (2016) 11. Jain, A.K., Gupta, B.B.: Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68(4), 687–700 (2017). https://doi.org/10.1007/ s11235-017-0414-0 12. Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. (2017) 13. Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016) 14. Bin, S., Qiaoyan, W., Xiaoying, L.: A DNS based anti-phishing approach. In: 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, Vol. 2, pp. 262–265. IEEE, April 2010

Developing a Webpage Phishing Attack Detection Tool

709

15. Jain, A.K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J. Inf. Secur. 2016(1), 1–11 (2016). https://doi.org/ 10.1186/s13635-016-0034-3 16. Tan, C.L., Chiew, K.L.: Phishing webpage detection using weighted URL tokens for identity keywords retrieval. In: 9th International Conference on Robotic, Vision, Signal Processing and Power Applications, pp. 133–139. Springer, Singapore (2017) 17. Herzberg, A., Jbara, A.: Security and identification indicators for browsers against spoofing and phishing attacks. ACM Trans. Internet Technol. (TOIT) 8(4), 1–36 (2008) 18. Hara, M., Yamada, A., Miyake, Y.: Visual similarity-based phishing detection without victim site information. In: 2009 IEEE Symposium on Computational Intelligence in Cyber Security, pp. 30–36. IEEE, March 2009 19. Dunlop, M., Groat, S., Shelly, D.: Goldphish: using images for content-based phishing analysis. In: 2010 Fifth International Conference on Internet Monitoring and Protection, pp. 123–128. IEEE, May 2010 20. Chiew, K.L., Choo, J.S.F., Sze, S.N., Yong, K.S.: Leverage website favicon to detect phishing websites. Secur. Commun. Netw. (2018) 21. Drury, V., Meyer, U.: Certified phishing: taking a look at public key certificates of phishing websites. In: 15th Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Berkeley, CA, USA, pp. 211–223, August 2019

An Evaluation Model Supporting IT Outsourcing Decision for Organizations Alessandro Annarelli(B) , Lavinia Foscolo Fonticoli, Fabio Nonino, and Giulia Palombi Department of Computer, Control, and Management Engineering, Sapienza University of Rome, Via Ariosto 25, 00185 Roma, Italy [email protected]

Abstract. The aim of this paper is to carry out a study on Information Technology outsourcing. Starting from a literature review, we selected 46 papers to identify the main features of the subject matter. In particular, the main topics were: IT Governance, Market, Determinants, Benefits and Disadvantages, Outsourcing methods and Security. These six aspects have been deepened and studied, allowing, thanks to the present literature, to obtain a complete theoretical framework of the treated topic. Subsequently, we condensed these findings into an evaluation model to support IT outsourcing decision. This, in turn, would allow to assess the level of maturity of IT outsourcing, both in the case where the company in question is already outsourcing and in the case where the decision to outsource is to be made. Keywords: IT outsourcing · ITO · Evaluation model

1 Introduction IT outsourcing is a worldwide, constantly evolving phenomenon, which nowadays represents a crucial reality for business in both private and public sectors. It represents a governance decision whereby, following a partial or total transfer, an external provider offers specific resources to work in support of the client’s IT infrastructure and maintain it [1–3]. The term IT outsourcing was first used in 1989 when Eastman Kodak outsourced its IT function [4]. The fact that a large company such as Kodak took such an innovative step led to many other companies taking the same decision and starting Information Technology Outsourcing. The importance of IT outsourcing has grown throughout the years, though attracting minor attentions from scholars, while its key role has been underestimated [5]. Moreover, the information security concerns in IT outsourcing have grown [6]. On one hand IT outsourcing can be more economically sustainable, while on the other it might represent another vulnerability for the organization and for its wiliness to maintain cyber resilient systems [7] while investing in the right cybersecurity measures [8]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 710–734, 2022. https://doi.org/10.1007/978-3-031-10467-1_43

An Evaluation Model Supporting IT Outsourcing Decision

711

Therefore, the purpose of this research is to focus the attention on this relevant and still-growing phenomenon, and create an evaluation model that will allow to better understand IT outsourcing using evidence from the literature. The paper is structured as follows: in Sect. 2 we present the objective of this work and the criteria used in the search for academic papers, in Sect. 3 we introduce what emerges from the literature, which is then further explored in Sect. 4. Finally, Sect. 5 presents the evaluation model followed by conclusions and future research directions.

2 Research Aim and Methodology The purpose of this research is an evaluation model related to the choice of IT outsourcing within the corporate context. The methodology used is the literature analysis, aimed at identifying the main drivers through which a company can evaluate IT outsourcing. The search for academic articles was carried out on Scopus: we decided to use this online database for our research because, compared to other sources, is an optimal resource for academic literature search, and it covers a wide range of subjects and journals [9, 10]. The initial search was conducted with the following keywords in title or abstract fields: (((information OR cyber) AND (security OR technolog* OR system*)) OR (IT) OR (IS) OR (IT/IS) OR (cybersecurity)) AND (outsourcing). This first step produced a total of 5.559 papers, published in English in the subject area of Business, Management and Accounting. Further, we conducted a selection of the papers, based on title, abstract and full-text analysis, radically reducing the number of contributions to 46. All selected articles are listed in Table 1. These papers have been analysed in detail, in order to identify key recurring topics in IT outsourcing-related literature. Results of the analysis are presented in the next section. Table 1. Selected Articles Author(s), Year

Title

Ref

Martinsons 1993

Outsourcing information systems: A strategic partnership with risks

[11]

Shepherd 1999

Outsourcing IT in a changing world

[12]

Udo 2000

Using analytic hierarchy process to analyze [13] the information technology outsourcing decision

Barthélemy 2001

The hidden costs of IT outsourcing

[14]

Khalfan 2004

Information security considerations in IS/IT outsourcing projects: a descriptive case study of two sectors

[1]

Rohde 2004

IS/IT outsourcing practices of small- and medium-sized manufacturers

[15] (continued)

712

A. Annarelli et al. Table 1. (continued)

Author(s), Year

Title

Ref

Barthélemy and Geyer 2005

An empirical investigation of IT outsourcing vs. quasi-outsourcing in France and Germany

[16]

Erber and Sayed-Ahmed 2005

Offshore outsourcing. A global shift in the present IT Industry

[17]

Dutta and Roy 2005

Offshore outsourcing: A dynamic causal model of counteracting forces

[18]

Florin et al. 2005

Information technology outsourcing and organizational restructuring: An explanation of their effects on firm value

[19]

Yoon and Im 2005

An evaluation system for IT outsourcing customer satisfaction using the analytic hierarchy process

[20]

Dhar and Balakrishnan 2006

Risks, benefits, and challenges in global IT [21] outsourcing: Perspectives and practices

Gonzalez et al. 2006

Information systems outsourcing: A literature analysis

[22]

Gottschalk and Solli- Sæther 2006

Maturity model for IT outsourcing relationships

[23]

Bi 2007

Managing the risks of IT outsourcing

[24]

Foogooa 2008

IS outsourcing - A strategic perspective

[25]

Gonzalez et al. 2008

A descriptive analysis of IT outsourcing configuration

[26]

Nakatsu and Iacovou 2008

A comparative study of important risk factors involved in offshore and domestic outsourcing of software development projects: A two-panel Delphi study

[27]

Lacity et al. 2009

A review of the IT outsourcing literature: insights for practice

[28]

Lacity et al. 2011

Beyond Transaction Cost Economics: towards an endogenous theory of information technology outsourcing

[29]

Qu and Pinsonneault 2011

Country environments and the adoption of IT outsourcing

[30]

Sven and Björn, 2011

IS/ICT Outsourcing Decision Project in a Large Public Organization: A Case Study

[31]

Chang and Gurbaxani 2012

Information technology outsourcing, knowledge transfer, and firm productivity: An empirical analysis

[32]

(continued)

An Evaluation Model Supporting IT Outsourcing Decision

713

Table 1. (continued) Author(s), Year

Title

Ref

Chang et al. 2012

An analysis of IT/IS outsourcing provider selection for small- and medium-sized enterprises in Taiwan

[33]

Dhar 2012

From outsourcing to cloud computing: evolution of IT services

[34]

Järveläinen 2012

Information security and business continuity management in interorganizational IT relationships

[35]

Nassimbeni et al. 2012

Security risks in service offshoring and outsourcing

[36]

Swar et al. 2012

Public sectors’ perception on critical relationship factors in IS/IT outsourcing: Analysis of the literature and a Delphi examination

[37]

Deng et al. 2013

An empirical study on the source of [38] vendors’ relational performance in offshore information systems outsourcing

Han et al. 2013

Complementarity between client and vendor IT capabilities: An empirical investigation in IT outsourcing projects

[39]

Yigitbasioglu et al. 2013

Cloud computing: How does it differ from IT outsourcing and what are the implications for practice and research?

[40]

Bachlechner et al. 2014

Security and compliance challenges in complex IT outsourcing arrangements: a multi-stakeholder perspective

[41]

Galo¸s and Sìrbu 2015

Cloud computing the new paradigm of IT outsourcing

[42]

Liang et al. 2016

IT outsourcing research from 1992 to 2013: [43] a literature review based on main path analysis

Liu and Yuliani 2016

Differences between Clients’ and Vendors’ [44] Perceptions of IT Outsourcing Risks: Project Partnering as the Mitigation Approach

Chang et al. 2017

Information technology outsourcing: Asset [45] transfer and the role of contract

Dhillon et al. 2017

Information security concerns in IT outsourcing: identifying (in) congruence between clients and vendors

[6]

(continued)

714

A. Annarelli et al. Table 1. (continued)

Author(s), Year

Title

Ref

Liu et al. 2017

IT centralization, security outsourcing, and [46] cybersecurity breaches: evidence from the U.S. higher education

Almutairi and Riddle 2018

State of the art of IT outsourcing and future [47] needs for managing its security risks

Das and Grover 2018

Biased decisions on IT outsourcing: how vendor selection adds value

Puspitasari et al. 2018

Analysis of success level and supporting [49] factors of IT outsourcing implementation: a case study at PT Bank Bukopin Tbk

Ensslin et al. 2020

Management support model for information technology outsourcing

[2]

Karimi-Alaghehband and Rivard 2020

IT outsourcing success: a dynamic capability-based model

[50]

Marco-Simó and Pastor-Collado 2020

IT outsourcing in the public sector: a descriptive framework from a literature review

[51]

Kranz 2021

Strategic innovation in IT outsourcing: Exploring the differential and interaction effects of contractual and relational governance mechanisms

[52]

Prawesh et al. 2021

Industry Norms as Predictors of IT Outsourcing Behaviors

[53]

[48]

3 Findings The analysis of the selected articles highlights 6 main topics through which to study IT outsourcing and subsequently structure the evaluation model. The topics identified are the following: • Market. The actors within the IT outsourcing market, i.e., customers, suppliers and their relationship. • IT governance. IT management processes and structures, related standards to be complied with and challenges to be faced. • Security. With reference to the security of information and the problems that may occur to it when outsourcing takes place. • Benefits and disadvantages. The costs and benefits of outsourcing and, therefore, the risks that may occur. • Outsourcing methods. Outsourcing can be carried out in different ways depending on the intentions and characteristics of the client company.

An Evaluation Model Supporting IT Outsourcing Decision

715

• Determinants. The factors that influence the choice of outsourcing and the companies those are most likely to make this choice. Figure 1 shows the frequency with which these topics are present in the selected articles.

Fig. 1. Topics emerged and number of related papers

4 Discussion The six topics identified are analysed below. 4.1 Market The outsourcing of corporate IT and related information security can be represented as a market structure in which suppliers offer IT products and services to a heterogeneous group of customers. Looking at the current market landscape, we can see a wide and varied possible customer base with suppliers ready to offer appropriate products and services. Potential customers who decide to outsource IT services and related security can be large industrial companies, small and medium-sized enterprises, residential customers, non-profit organisations, public sector companies and government agencies [20]. Of these, small and medium-sized enterprises and public sector companies are of particular interest.

716

A. Annarelli et al.

SMEs are more vulnerable than large companies due to their size and lower availability of human, technical and financial resources. They are more exposed to risks related to growth, interest rates, commodity prices, supply and distribution chain, employee management, e-business and technology. About the public sector, [51] carried out a literature review to analyse the companies and their differences with those in the private sector. The following distinctive characteristics are evident in public sector activities and processes: priority to non-economic aspects, political motivation prevails over technical or economic aspects, silo mentality, bureaucratic procedures, difficulties in personnel management, external and internal relationship, complexity and specificity of IT, lack of experience in IT [51]. 4.2 IT Governance The term IT Governance refers to the set of processes, structures and relational mechanisms that enable the alignment of both IT and other corporate resources to value creation related to IT investments [6]. Today, the scene is changing with more and more decisions being taken outside the company as a result of outsourcing policies. It is therefore necessary to identify the correct control relationship between the internal, where the centralisation or decentralisation of departments is assessed, and the external, where suppliers must be dealt with [39]. The corporate strategy preceding the decision to outsource IT does not consist exclusively of the desire to achieve cost reductions. Other motivations, such as improving IT security and obtaining business benefits, also emerge in the literature [28]. The main challenges that characterise outsourcing are: cloud auditing, management of heterogeneous services, coordination of parties involved, management of relationships between customers and suppliers, localisation and transfer of data and management of security awareness [41]. 4.3 Security The security problems that can arise with IT outsourcing contracts depend strongly on the characteristics of the client company and especially on those of the supplier. In fact, two main issues that are addressed when outsourcing are the security of data during its transmission and the security measures of the suppliers [1]. Security problems should be studied by assessing three dimensions: • Organisational, relating to the procedures and security requirements to be met; • Legal, considering the legislative framework of the client company and the supplier; • Technical, analysing the company’s infrastructures and tools that are fundamental for protecting data and internal knowledge [36]. In order to ensure information security during the IT outsourcing process, it is important that the supplier is competent, complies with the client’s requirements, complies with applicable regulations and controls the management of confidential information.

An Evaluation Model Supporting IT Outsourcing Decision

717

Attention must therefore be paid to the customer-supplier relationship and the behaviour of both parties. The author in [6] conducted a Delphi study to identify the main security problems that can arise in IT outsourcing. Three main problems emerged: the competence of suppliers, compliance with policies and regulations and trust in controls and information protection. 4.4 Benefits and Disadvantages Outsourcing corporate IT and related security presents unique challenges. Benefits and disadvantages or, to better say, risks are not easy to define due to the lack of prior experience, thus the impossibility of relying on the past, and the dynamic nature of the IT world and its attacks [47]. The author in [47] analysed the main benefits that a company can obtain as a result of the decision to outsource: reducing costs, focusing on core business activities, improving flexibility, improving quality, reducing routine activities, reducing obsolescence risk. The benefits of outsourcing are subject to a careful analysis of possible risks. The main risks identified by [47] are: security risks, breach of contract, loss of technological expertise, inadequately qualified supplier and hidden costs. 4.5 Outsourcing Methods There are many outsourceable activities, and it is possible to choose between different options and arrangements. In particular, the most interesting trends are cloud computing, offshoring and quasi-outsourcing. The term cloud computing refers to the distribution of computing services including servers, storage resources, databases, and software, which are paid based on customer demand and usage. The spread of cloud computing has greatly impacted the world of IT outsourcing by changing the paradigm of accessing and paying for services and giving new opportunities to IT service providers. An alternative to traditional outsourcing is offshore outsourcing which involves contracting with an IT service provider outside their own national borders with implications on privacy, ownership, security, and regulatory enforcement [36]. Quasi-outsourcing is a way of outsourcing IT services. As an alternative to outsourcing to an external provider, a subsidiary is created to which the management of the company’s IT is transferred. This subsidiary is then managed independently of the parent company in a similar way to an external provider, but with more control over the outsourced activities. 4.6 Determinants The IT outsourcing process results from a careful business analysis that leads to the decision to keep information management in-house or to contract with an external provider. The main needs that lead to the decision to outsource corporate IT are: to carry out financial restructuring, to reduce or stabilize costs, to overcome cultural and organizational problems, to access globally available expertise and to focus on core business activities [12].

718

A. Annarelli et al.

Some of the factors mentioned also result to be determinants in sanctioning the success of an outsourcing process as reported in the literature review by [28]. Success is defined by looking at the organisation, the IT function and the project and its determinants are categorised into relationships concerning IT outsourcing decision, contractual governance and relational governance.

5 Evaluation Model Supporting IT Outsourcing The evaluation model supporting IT outsourcing decision allows studying IT outsourcing within a corporate context. It is a checklist of 39 check elements divided into the six topics presented above. The check elements are aimed at carrying out a company analysis in relation to an outsourcing process to be carried out or already underway, focusing on certain strategic aspects that lead the company to undertake useful actions to obtain superior results. Specifically, in the case where the outsourcing process is to be initiated, the controls are aimed at making an assessment of the suitability of the company; whether it is ready to outsource in such a way as to derive maximum benefit from this process or whether it is necessary to pay attention to certain aspects in which it is lacking before proceeding. It may also be that the company realises that it is better to continue with the current management system and avoid outsourcing. The second case concerns companies that already outsource their IT and wish to make an assessment highlighting those areas where they are most lacking in order to make corrections and obtain greater benefits. Within the model, checks are differentiated according to whether they are intended for the former, the latter or both. Before submitting the checklist to the company, it is necessary to check the applicability of each control according to the type of firm, but also according to the sector or the specific characteristics of the company in question. The checklist is filled in by entering two types of answer: “YES” and “NO”. “YES” expresses a positive judgement regarding company outsourcing. It means that what is expressed by the checklist has already been implemented or has been considered in the company analysis and is compliant. “NO” means that, although the control is applicable and of interest to the company, it has not yet been implemented or the necessary analyses have not been carried out. In the case where a specific control consists of a bullet point, a YES/NO analysis is carried out for each point, and the value that appears most often is automatically assigned as the evaluation of the control. At the end of the compilation of the checklist the final result offered by the model is presented. Taking into account the results of the checklist, a percentage value is calculated which indicates the company’s status in relation to outsourcing. It should be noted, however, that this count does not include the topic Outsourcing methods. In fact, the checks in this section do not discriminate against outsourcing but advice whether it can be done profitably through different, unconventional or innovative forms such as cloud computing, offshoring or quasi-outsourcing.

An Evaluation Model Supporting IT Outsourcing Decision

719

The Tables 2, 3, 4, 5, 6 and 7 show the controls in the model for each topic. The column type shows the letter A in case the control is aimed at a company that has to evaluate its outsourcing process and the letter B in case the company wants to evaluate whether to outsource or not. Table 2. Market controls MARKET ID

Type

Name

Description

M1

A-B

Capacity [28]

Assess the following business capabilities a) Technical/methodological capacity b) Human resources management capacity c) Supplier management skills d) Contract negotiation skills e) Leadership skills f) Change management skills g) Transition management skills h) Customer management skills

M2

A-B

Services a [15, 33]

Analyse the following outsourcing services often implemented within SMEs and assess their possible adoption: a) Systems planning, integration, inspection and management b) Internet management c) Software development, testing and maintenance d) Hardware operation and maintenance e) Facilities management f) Internet, consulting and support services g) Database establishment and data registration h) Information management i) Promotion of training

M3

A

Effectiveness a [33]

Assess the effectiveness of outsourcing through the following factors: a) Supplier capability b) Quality and level of service c) Presence of partnership contracts d) IT needs and capabilities of the customer e) Characteristics of the organisation (continued)

720

A. Annarelli et al. Table 2. (continued)

MARKET ID

Type

Name

Description f) Communication methods g) Processes and procedures for testing and quality assurance

M4

B

Motivations b [31, 51]

Evaluate the following reasons that may lead to the choice to outsource: a) Economic b) Organisational c) Political

M5

B

Management b [51]

In managing an IT outsourcing process, the involvement of management is ensured

M6

B

Contracts b [37, 51]

Pay attention to contract management; do not bundle IT requirements of different departments in one contract

M7

B

Suppliers b [51]

Extreme care must be taken in the search, selection and management of suppliers

M8

A-B

Staff b [51]

A team of professionals must be in place to monitor and supervise the service provided by the outsourcer and assess the quality and relevance of the results to the contract

M9

A-B

Risks b [1]

Assess the company’s position with regard to the following risks: a) Data security/confidentiality issues b) Ability to operate or manage new systems c) Loss of key employees d) Hidden costs e) Management issues and inadequate planning

M10

A-B

Dynamic capabilities [50]

Assess the ability of the company to manage, reconfigure and adapt its resources to gain competitive advantage

M11

A-B

Management skills [50]

Assess the ability the company has in: a) Contract management b) Relationship management

M12

B

Intermediate skills [50]

Evaluate the following past achievements of the company: a) Successful reconfiguration of IT services b) Successful delivery of IT services (continued)

An Evaluation Model Supporting IT Outsourcing Decision

721

Table 2. (continued) MARKET ID

Type

Name

Description

M13

A

Supplier assessment [28]

Evaluate whether suppliers have identified, acquired, developed and deployed the right resources necessary to fulfil customer requirements

M14

A

Strategic benefits [49]

Assess whether the relationship with the supplier(s) has led to the following strategic benefits: a) Ability to focus on core business b) Increased flexibility c) Improved quality of IT services d) Reduction of routine activities to be managed e) Possibility to have alternatives to internal staff f) Improvement in the management of technology and human resources

M15

A

Technological benefits [49]

Assess whether the relationship with the supplier(s) has led to the following technological benefits: a) Reduced risk of technical obsolescence b) Facilitated access to technology c) Standardised IT environment d) Well functioning IT environment

M16

A

Economic benefits [49]

Assess whether the relationship with the supplier(s) has led to the following economic benefits: a) Savings in personnel costs b) Technology cost savings c) Improvement in finance and flexibility d) Improved control over expenditure

M17

A

Social benefits [49]

Assess whether the relationship with the supplier(s) has led to the following social benefits: a) Improved availability of services b) Improved user satisfaction (continued)

722

A. Annarelli et al. Table 2. (continued)

MARKET ID

Type

Name

Description

M18

A-B

Environment [38]

Assessing whether there is mutual understanding of the customer and supplier environments

M19

B

Expertise [38]

Assessing skills in relation to software development, cost management and project planning and control through established processes and methodologies

a Control is for SME only. b Control is for public sector companies only

Table 3. IT governance controls IT GOVERNANCE ID

Type

Name

Description

G1

A

ISO [41]

Verify that the controls contained in ISO 27001 are implemented

G2

B

Announcement [28]

Assess the strategy to be adopted when announcing the conclusion of the outsourcing agreement to the markets

G3

B

Decision-making process (MCDA) [2]

Evaluate the implementation of the MCDA model in the decision-making process

G4

B

Outsourcing process [47]

Follow the following steps in the outsourcing process: a) Planning b) Implementation c) Operational management d) Closure

G5

B

Recommendations (part 1) [34]

During the outsourcing process check that the following points are fulfilled: a) Collaboration on both sides to share technical knowledge, manpower and intellectual resources b) Formulation of project specifications by the supplier (continued)

An Evaluation Model Supporting IT Outsourcing Decision

723

Table 3. (continued) IT GOVERNANCE ID

Type

Name

Description c) Checking the possibility that outsourcing gives access to products with higher added value d) Pay attention to the understanding of compliance requirements e) Maintaining certain processes in-house f) Establishment of a knowledge transfer procedure g) Implementing a safety procedure for risk reduction during knowledge transfer h) Introduction of an information security plan to safeguard data i) Development of strategic partnerships

G6

B

Recommendations (part 2) [47]

During the outsourcing process check that the following points are fulfilled: a) Aligning security management with project management b) Management of security requirements c) Management of security risks d) Improving usability e) Supporting flexibility

G7

A

Business impact [41]

Assess the business impact of the following situations: a) Cloud auditing b) Management of heterogeneous services c) Coordination of parties involved d) Management of relations between customers and suppliers e) Data localisation and transfer f) Security mismanagement g) Integration of multiple suppliers h) Subcontracting management

724

A. Annarelli et al. Table 4. Security controls

Security ID

Type

Name

Description

S1

A-B

Security issues [36]

Evaluate security problems according to the following dimensions: a) Organisational b) Legal c) Technical

S2

A-B

FMEA [36]

Evaluate the possibility of using the FMEA to develop a comprehensive and all-encompassing framework for assessing safety risks and the causes of safety errors in outsourcing projects

S3

B

Problems in the outsourcing process [6]

Pay attention to the following issues during the outsourcing process: a) Supplier’s ability to comply with the client’s security policies, standards, and processes b) Audit of outsourced IT operations c) Audit of the supplier’s staffing process d) Clarity of data management e) Completeness of analysis of outsourcing decisions f) Congruence between supplier’s and customer’s culture g) Difficulty in monitoring conflict of interest h) Dissipation of supplier knowledge i) Diversity of jurisdictions and laws j) Inability of an outsourcing provider to have in-depth knowledge of the client’s business processes k) Financial feasibility of information technology outsourcing l) Governance ethics of the outsourcing provider’s environment m) Inability to change information security requirements n) Inability to leverage business knowledge on follow-on projects (continued)

An Evaluation Model Supporting IT Outsourcing Decision

725

Table 4. (continued) Security ID

Type

Name

Description o) Inability to develop new skills p) Competence in information security of the supplier q) Credibility of the outsourcing provider’s information security r) Legal and judicial framework of supplier outsourcing s) Legal and regulatory compliance t) Quality of the supplier’s personnel u) Right balance of access v) Technical complexity of the customer’s IT operations w) Technological maturity of the supplier’s environment x) Supplier billing transparency y) Confidence that the supplier applies appropriate security controls z) Confidence that the supplier will not misuse the customer’s proprietary information and knowledge

Table 5. Benefits and disadvantages controls BENEFITS AND DISADVANTAGES ID

Type

Name

Description

V1

A-B

Benefits [47]

Verify the possibility of achieving or actual achievement of the following benefits: a) Cost reduction b) Concentration on core business activities c) Improved flexibility d) Quality improvement e) Reduction of routine activities f) Reduction of obsolescence risk (continued)

726

A. Annarelli et al. Table 5. (continued)

BENEFITS AND DISADVANTAGES ID

Type

Name

Description

V2

A-B

Risks [21, 24, 28, 44]

Pay attention to the following sources of risk a) Impact on internal IT staff b) Misrepresentation by suppliers c) Breach of contract by supplier d) Cultural differences between customer and supplier e) Difficulties in remote management of different teams f) Excessive transaction costs f) Hidden costs h) Inability to manage the relationship with suppliers i) Inflexible contracts j) Violation of intellectual property rights k) Lack of trust l) Loss of autonomy and control over decisions m) Loss of control over data n) Loss of control over supplier o) Loss of internal capacity p) No overall cost savings q) Unpatriotic perception (for offshore outsourcing) r) Provider with low capacity, financial stability, cultural fit s) Security/privacy breach t) Turnover or shortage of supplier’s employees u) Inexperienced supplier’s employees v) Supplier’s employees with poor communication skills w) Supplier ceases activity x) Supplier has too much power over customer (continued)

An Evaluation Model Supporting IT Outsourcing Decision

727

Table 5. (continued) BENEFITS AND DISADVANTAGES ID

Type

Name

Description y) Failure of transition z) Treatment of customer as undifferentiated commodity a1) Uncontrollable contract growth b1) Supplier lock-in with high switching costs

Table 6. Outsourcing methods controls OUTSOURCING METHODS ID

Type

Name

Description

E1

B

Reasons [34, 40, 42]

Analyse the following motivations that may lead to the choice of cloud computing: a) Economic b) Maintenance c) Demand e) Non-core IT operations

E2

B

Challenges [34]

Assess the following challenges that arise when adopting cloud computing: a) Privacy and security b) Maturity and performance c) Compliance and data sovereignty d) Lack of standards

E3

B

Perspectives [17, 18, 28, 36, 43]

Assess the possibility of offshoring taking into account the following perspectives: a) Costs b) Cultural distance c) Innovation d) Knowledge transfer

E4

B

Risks [27]

Assess the following risks resulting from offshoring: a) Lack of top management commitment (continued)

728

A. Annarelli et al. Table 6. (continued)

OUTSOURCING METHODS ID

Type

Name

Description b) Poorly communicated requirements c) Language barriers in project communication d) Inadequate user involvement e) Lack of project management know-how by the client f) Failure to manage end-user expectations g) Lack of commercial know-how on the part of the offshore team h) Poor exchange controls i) Lack of required technical know-how on the part of the offshore team j) Lack of consideration of all costs k) Telecommunications and infrastructure problems l) Vendor feasibility m) Difficulty in ongoing support and maintenance n) Low visibility of the project process o) Transnational cultural differences p) High turnover of the vendor’s employees q) Constraints due to time zone differences r) Lack of continuous face-to-face interactions between team members s) Threats to the security of information resources t) Negative impact on employee morale u) Unfamiliarity with international and foreign contract law (continued)

An Evaluation Model Supporting IT Outsourcing Decision

729

Table 6. (continued) OUTSOURCING METHODS ID

Type

Name

Description v) Differences in development methodology/processes w) Political instability in offshore destinations x) Negative impact on image of client organisation y) Currency fluctuations

E5

B

Drivers [16]

In the choice of quasi-outsourcing assess: a) presence of large industrial groups b) co-management systems c) powerful trade union groups

Table 7. Determinants DETERMINANTS ID

Type

Name

Description

D1

B

Needs [28]

Check whether the company outsources when the following needs arise: a) Reduce costs b) Focus on core capabilities c) Access to expertise d) Improving business/process performance e) Technical reasons f) Flexibility g) Political reasons h) Catalyst for change i) Commercial exploitation l) Scalability m) Access to global markets n) Alignment of IS with business strategy o) Cost predictability p) Workforce reduction (continued)

730

A. Annarelli et al. Table 7. (continued)

DETERMINANTS ID

Type

Name

Description q) Need to generate liquidity r) Rapid delivery s) Innovation

D2

A

Measures [28]

To measure the success of outsourcing assess: a) the metrics that are used to define financial performance b) defects and improvements in information management c) time, costs, quality

D3

A

Determinants of success [28]

Evaluate the following points to analyse outsourcing and its success: a) IT outsourcing decision b) Contractual governance c) Relational governance

6 Conclusions and Future Research The research work presented aims at creating an evaluation model for IT outsourcing based on the academic literature. The search for academic articles led to the selection of 46 contributions. These identified 6 main topics through which IT outsourcing could be studied: Market, IT governance, Security, Outsourcing methods, Benefits and disadvantages and Determinants. The study of the market, i.e. the players involved, customers and suppliers, highlighted the particularities of small and medium-sized enterprises and public sector companies as well as IT governance and how it has changed with the advent of outsourcing were studied in depth. Information security must be guaranteed throughout the IT outsourcing process. The continuous development and increasing importance of IT has led to the emergence of new frontiers such as cloud computing, offshoring and quasioutsourcing. Finally, outsourcing is a process and as such has costs, benefits and risks which have been duly analysed. The theoretical evidence has led to the creation of an evaluation model that allows to guide and help companies in the evaluation of IT outsourcing both when they want to analyse whether to carry out an outsourcing process and when it has already been carried out. A checklist of 39 check elements has been created, which may or may not be applicable, depending on the purpose of the analysis and the type of company. Following compilation, a percentage assessment of the compliance that the company possesses in relation to the controls carried out is obtained.

An Evaluation Model Supporting IT Outsourcing Decision

731

The research presented can be deepened both from the point of view of literature and model. In fact, 46 articles were selected, giving a sufficiently complete picture of IT outsourcing and its characteristics. Expanding the number of articles would allow the topic to be explored further. For example, IT outsourcing could be analysed in relation to other types of companies, in addition to SMEs and the public sector, or the outsourcing model could be explored in more detail, especially in relation to the most innovative solutions that could emerge in the next few years. As regards the model, first, it could be tested on a sample of companies to determine its effectiveness. With a positive result, it could become a tool that is as easy to apply as it is useful at company level. In addition, considering this as a basic model that requires verification, for the applicability of a control to the company that uses it, it could be customised, based on the sector or type of company, available automatically before the compilation of the checklist. In conclusion, the analysis carried out shows that IT outsourcing is a complex and delicate subject. Although in many cases this corporate function is considered non-core and therefore can be outsourced, it has an impact on all the others as IT now involves all departments in an organisation. For this reason, it requires an in-depth study, tailored to the company in question, to achieve the best possible result. Acknowledgments. This work is supported by the fund “Progetto di Eccellenza” of the Department of Computer, Control and Management Engineering “Antonio Ruberti” of Sapienza University of Rome. The department has been designated by the Italian Ministry of Education (MIUR) for being “Department of Excellence” in advanced training programs in the field of cybersecurity.

References 1. Khalfan, A.M.: Information security considerations in IS/IT outsourcing projects: a descriptive case study of two sectors. Int. J. Inf. Manage. 24(1), 29–42 (2004). https://doi.org/10. 1016/j.ijinfomgt.2003.12.001 2. Ensslin, L., Mussi, C.C., Dutra, A., Ensslin, S.R., Demetrio, S.N.: Management support model for information technology outsourcing. J. Glob. Inf. Manag. 28(3), 123–147 (2020).https:// doi.org/10.4018/JGIM.2020070107 3. Annarelli, A., Colabianchi, S., Nonino, F., Palombi, G.: The effectiveness of outsourcing cybersecurity practices: a study of the Italian context. In: Proceedings of the Future Technologies Conference (FTC), K. Arai, Ed. Springer, Cham, pp. 17–31 (2022) 4. Androsova, I., Simonenko, E.: Strategic possibilities of IT-outsourcing in the organizations activities in the context of globalization. SHS Web Conf. 92, 05001 (2021).https://doi.org/ 10.1051/shsconf/20219205001 5. Annarelli, A., Clemente, S., Nonino, F., Palombi, G.: Effectiveness and adoption of NIST managerial practices for cyber resilience in Italy. In: Intelligent Computing. Lecture Notes in Networks and Systems, K. Arai, Ed. Springer, Cham, pp. 818–832https://doi.org/10.1007/ 978-3-030-80129-8_55 6. Dhillon, G., Syed, R., de Sá-Soares, F.: Information security concerns in IT outsourcing: Identifying (in) congruence between clients and vendors. Inf. Manag. 54(4), 452–464 (2017).https://doi.org/10.1016/j.im.2016.10.002 7. Annarelli, A., Nonino, F., Palombi, G.: Understanding the management of cyber resilient systems. Comput. Ind. Eng. 149, 106829 (2020). https://doi.org/10.1016/j.cie.2020.106829

732

A. Annarelli et al.

8. Armenia, S., Angelini, M., Nonino, F., Palombi, G., Schlitzer, M.F.: A dynamic simulation approach to support the evaluation of cyber risks and security investments in SMEs. Decis. Support Syst. 147, 113580 (2021). https://doi.org/10.1016/j.dss.2021.113580 9. Tukker, A.: Product services for a resource-efficient and circular economy - a review. J. Clean. Prod. 97, 76–91 (2015).https://doi.org/10.1016/j.jclepro.2013.11.049 10. Annarelli, A., Battistella, C., Nonino, F.: Product service system: a conceptual framework from a systematic review. J. Clean. Prod. 139, 1011–1032 (2016).https://doi.org/10.1016/j.jcl epro.2016.08.061 11. Martinsons, M.G.: Outsourcing information systems: a strategic partnership with risks. Long Range Plann. 26(3), 18–25 (1993).https://doi.org/10.1016/0024-6301(93)90003-X 12. Shepherd, A.: Outsourcing IT in a changing world. Eur. Manag. J. 17(1), 64–84 (1999). https:// doi.org/10.1016/S0263-2373(98)00064-4 13. Udo, G.G.: Using analytic hierarchy process to analyze the information technology outsourcing decision. Ind. Manag. Data Syst. 100(9), 421–429 (2000).https://doi.org/10.1108/026355 70010358348 14. Barthélemy, J.: The hidden costs of IT outsourcing. MIT Sloan Manag. Rev. 42(3), 60–69 (2001) 15. Rohde, F.H.: IS/IT outsourcing practices of small- and medium-sized manufacturers. Int. J. Account. Inf. Syst. 5(4), 429–451 (2004).https://doi.org/10.1016/j.accinf.2004.04.006 16. Barthélemy, J., Geyer, D.: An empirical investigation of IT outsourcing versus quasioutsourcing in France and Germany. Inf. Manag. 42(4), 533–542 (2005).https://doi.org/10. 1016/j.im.2004.02.005 17. Erber, G., Sayed-Ahmed, A.: Offshore outsourcing. a global shift in the present IT Industry. Intereconomics. 40(2), 100–112 (2005) 18. Dutta, A., Roy, R.: Offshore outsourcing: a dynamic causal model of counteracting forces. J. Manag. Inf. Syst. 22(2), 15–35 (2005).https://doi.org/10.1080/07421222.2005.11045850 19. Florin, J., Bradford, M., Pagach, D.: Information technology outsourcing and organizational restructuring: an explanation of their effects on firm value. J. High Technol. Manag. Res. 16(2), 241–253 (2005).https://doi.org/10.1016/j.hitech.2005.10.007 20. Yoon, Y., Im, K.S.: An evaluation system for IT outsourcing customer satisfaction using the analytic hierarchy process. J. Glob. Inf. Manag. 13(4), 55–76 (2005).https://doi.org/10.4018/ jgim.2005100103 21. Dhar, S., Balakrishnan, B.: Risks, benefits, and challenges in global IT outsourcing. J. Glob. Inf. Manag. 14(3), 59–89 (2006).https://doi.org/10.4018/jgim.2006070104 22. Gonzalez, R., Gasco, J., Llopis, J.: Information systems outsourcing: a literature analysis. Inf. Manag. 43(7), 821–834 (2006).https://doi.org/10.1016/j.im.2006.07.002 23. Gottschalk, P., Solli-Sæther, H.: Maturity model for IT outsourcing relationships. Ind. Manag. Data Syst. 106(2), 200–212 (2006).https://doi.org/10.1108/02635570610649853 24. Bi, L.: Managing the risks of IT outsourcing. J. Corp. Account. Financ. 18(5), 65–69 (2007). https://doi.org/10.1002/jcaf.20325 25. Foogooa, R.: IS outsourcing – a strategic perspective. Bus. Process Manag. J. 14(6), 858–864 (2008).https://doi.org/10.1108/14637150810916035 26. Gonzalez, R., Gasco, J., Llopis, J.: A descriptive analysis of IT outsourcing configuration. Int. J. Manag. Enterp. Dev. 5(6), 656 (2008).https://doi.org/10.1504/IJMED.2008.021188 27. Nakatsu, R.T., Iacovou, C.L.: A comparative study of important risk factors involved in offshore and domestic outsourcing of software development projects: a two-panel Delphi study. Inf. Manag. 46(1), 57–68 (2009).https://doi.org/10.1016/j.im.2008.11.005 28. Lacity, M.C., Khan, S.A., Willcocks, L.P.: A review of the IT outsourcing literature: Insights for practice. J. Strateg. Inf. Syst. 18(3), 130–146 (2009). https://doi.org/10.1016/j.jsis.2009. 06.002

An Evaluation Model Supporting IT Outsourcing Decision

733

29. Lacity, M.C., Willcocks, L.P., Khan, S.: Beyond transaction cost economics: towards an endogenous theory of information technology outsourcing. J. Strateg. Inf. Syst. 20(2), 139– 157 (2011).https://doi.org/10.1016/j.jsis.2011.04.002 30. Qu, W.G., Pinsonneault, A.: Country environments and the adoption of IT outsourcing. J. Glob. Inf. Manag. 19(1), 30–50 (2011).https://doi.org/10.4018/jgim.2011010102 31. Carlsson, S., Johansson, B.: IS/ICT outsourcing decision project in a large public organization: a case study. J. Decis. Syst. 20(2), 137–162 (2011).https://doi.org/10.3166/jds.20.137-162 32. Chang, Y.B., Gurbaxani, V.: Information technology outsourcing, knowledge transfer, and firm productivity: an empirical analysis. MIS Q. 36(4), 1043 (2012).https://doi.org/10.2307/ 41703497 33. Chang, S.-I., Yen, D.C., Ng, C.S.-P., Chang, W.-T.: An analysis of IT/IS outsourcing provider selection for small- and medium-sized enterprises in Taiwan. Inf. Manag. 49(5), 199–209 (2012).https://doi.org/10.1016/j.im.2012.03.001 34. Dhar, S.: From outsourcing to Cloud computing: evolution of IT services. Manag. Res. Rev. 35(8), 664–675 (2012).https://doi.org/10.1108/01409171211247677 35. Järveläinen, J.: Information security and business continuity management in interorganizational IT relationships. Inf. Manag. Comput. Secur. 20(5), 332–349 (2012). https://doi.org/ 10.1108/09685221211286511 36. Nassimbeni, G., Sartor, M., Dus, D.: Security risks in service offshoring and outsourcing. Ind. Manag. Data Syst. 112(3), 405–440 (2012).https://doi.org/10.1108/02635571211210059 37. Swar, B., Moon, J., Khan, G.F.: Public sectors’ perception on critical relationship factors in IS/IT outsourcing: analysis of the literature and a Delphi examination. Int. J. Serv. Technol. Manag. 17(1), 1 (2012).https://doi.org/10.1504/IJSTM.2012.048032 38. Deng, C.-P., Mao, J.-Y., Wang, G.-S.: An empirical study on the source of vendors’ relational performance in offshore information systems outsourcing. Int. J. Inf. Manage. 33(1), 10–19 (2013).https://doi.org/10.1016/j.ijinfomgt.2012.04.004 39. Han, H.-S., Lee, J.-N., Chun, J.U., Seo, Y.-W.: Complementarity between client and vendor IT capabilities: an empirical investigation in IT outsourcing projects. Decis. Support Syst. 55(3), 777–791 (2013).https://doi.org/10.1016/j.dss.2013.03.003 40. Yigitbasioglu, O.M., Mackenzie, K., Low, R.: Cloud Computing: how does it differ from IT outsourcing and what are the implications for practice and research? Int. J. Digit. Account. Res. 13 (2013). https://doi.org/10.4192/1577-8517-v13_4 41. Bachlechner, D., Thalmann, S., Maier, R.: Security and compliance challenges in complex IT outsourcing arrangements: a multi-stakeholder perspective. Comput. Secur. 40, 38–59 (2014).https://doi.org/10.1016/j.cose.2013.11.002 42. Galo¸s, M., Sìrbu, J.: Cloud computing the new paradigm of IT outsourcing. Qual. Access Success, 16, 180–187 (2015) 43. Liang, H., Wang, J.-J., Xue, Y., Cui, X.: IT outsourcing research from 1992 to 2013: a literature review based on main path analysis. Inf. Manag. 53(2), 227–251 (2016).https://doi.org/10. 1016/j.im.2015.10.001 44. Liu, J.Y.-C., Yuliani, A.R.: Differences between clients’ and vendors’ perceptions of IT outsourcing risks: project partnering as the mitigation approach. Proj. Manag. J. 47(1), 45–58 (2016).https://doi.org/10.1002/pmj.21559 45. Chang, Y.B., Gurbaxani, V., Ravindran, K.: Information technology outsourcing: asset transfer and the role of contract. MIS Q. 41(3), 959–973 (2017).https://doi.org/10.25300/MISQ/2017/ 41.3.13 46. Liu, C., Huang, P., Lucas, H.: IT centralization, security outsourcing, and cybersecurity breaches: evidence from the U.S. higher education. In: ICIS Proceedings (2017) 47. Almutairi, M., Riddle, S.: State of the art of IT outsourcing and future needs for managing its security risks. In: 2018 International Conference on Information Management and Processing (ICIMP), pp. 42–48 (2018). https://doi.org/10.1109/ICIMP1.2018.8325839

734

A. Annarelli et al.

48. Das, A., Grover, D.: Biased decisions on IT outsourcing: how vendor selection adds value. J. Bus. Strategy, 39(5), 31–40 (2018).https://doi.org/10.1108/JBS-03-2018-0039 49. Puspitasari, R., Yudhoatmojo, S.B., Hapsari, I.C., Hidayanto, A.N.: Analysis of success level and supporting factors of IT outsourcing implementation: a case study at PT Bank Bukopin Tbk. In: 2018 International Conference on Computing, Engineering, and Design (ICCED), pp. 75–80 (2018). https://doi.org/10.1109/ICCED.2018.00024 50. Karimi-Alaghehband, F., Rivard, S.: IT outsourcing success: a dynamic capability-based model. J. Strateg. Inf. Syst. 29(10), 101599 (2020). https://doi.org/10.1016/j.jsis.2020.101599 51. Marco-Simó, J.M., Pastor-Collado, J.A.: IT outsourcing in the public sector: a descriptive framework from a literature review. J. Glob. Inf. Technol. Manag. 23(1), 25–52 (2020).https:// doi.org/10.1080/1097198X.2019.1701357 52. Kranz, J.: Strategic innovation in IT outsourcing: exploring the differential and interaction effects of contractual and relational governance mechanisms. J. Strateg. Inf. Syst. 30(1), 101656 (2021). https://doi.org/10.1016/j.jsis.2021.101656 53. Prawesh, S., Chari, K., Agrawal, M.: Industry norms as predictors of IT outsourcing behaviors. Int. J. Inf. Manage. 56, 102242 (2021). https://doi.org/10.1016/j.ijinfomgt.2020.102242

Immunizing Files Against Ransomware with Koalafied Immunity William Hutton(B) Richland, WA 99352, USA [email protected] https://www.linkedin.com/in/william-hutton-03665393/

Abstract. Without backups, victims of ransomware are often forced to pay a ransom to recover critical ﬁles. Some ransomware only encrypts the ﬁrst 256 K or 1 MB of a ﬁle. Could the ﬁrst 1 MB of a ﬁle simply be appended to the end of the ﬁle then used to restore ﬁles encrypted by these types of ransomware. This paper describes such an approach, called “Koalafied Immunity”, and experiments conducted to evaluate the eﬀectiveness of this approach. The disruptive impact on Koalafied Immunity is also examined.

Keywords: Computer security security

1

· Computer crime · Information

Introduction

Recorded Future is a private cyber security company that interviewed a member of the BlackMatter ransomware group [1]. This interview is a subject of discussion during episode 830 of Steve Gibson’s “Security Now” podcast on August 3, 2021 [2]. While discussing the performance of various ransomware programs, it was revealed that the speed of the LockBit 2.0 ransomware is achieved with the heuristic of only encrypting the ﬁrst 256 K of a ﬁle. Likewise, the BlackMatter ransomware only encrypts the ﬁrst 1 MB of each ﬁle. Until this revelation, it had always appeared there were only three ways to recover from a ransomware attack: 1. Restore ﬁles from backups. 2. Pay the ransom. 3. Recreate the ﬁles from scratch. With this new information that the entire ﬁle is not encrypted, could there be a fourth way to defeat ransomware? What if each ﬁle could contain a backup copy of its ﬁrst 1 MB at the end of the ﬁle. This copy could be used to restore the ﬁrst 1 MB of a ﬁle if it became encrypted by ransomware or otherwise corrupted. This c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 735–741, 2022. https://doi.org/10.1007/978-3-031-10467-1_44

736

W. Hutton

would result in qualiﬁed immunity against ransomware that does not encrypt the entire ﬁle. Hence the name, Koalafied Immunity. A key beneﬁt of Koalafied Immunity is that other than scheduling it to run periodically, it requires no further user actions. Small to medium sized organizations are the perfect ransomware victims because they often lack mature information technology and cyber security procedures, processes, and controls to deter, detect, or mitigate a ransomware attack. Small to medium sized organizations are also likely to feel tremendous external pressures to quickly recover and have the necessary resources to pay a ransom (e.g. insurance). Koalafied Immunity automatically restores the availability and integrity of protected ﬁles momentarily compromised by ransomware with no user intervention. To determine the validity of this approach, a proof of concept was implemented in Python which is available at https://github.com/gar0u/ koalaﬁedimmunity. The rest of this paper carefully considers the Koalafied Immunity algorithm and its impact on mitigating ransomware.

2

Koalafied Immunity

The simplest explanation of Koalafied Immunity is that it appends a backup copy of the ﬁrst 1 MB of a ﬁle to the end of the ﬁle with no noticeable change to the user of the ﬁle. It is expected that this approach should work because most operating systems abstract physical storage devices into logical blocks for eﬃciency. For example, the Microsoft Windows operating systems implement the NTFS (New Technology File System) format for reading and writing ﬁles to external storage devices like traditional spinning hard disk drives and newer solid state drives. The logical block size of NTFS is 4,096 bytes. For simplicity, round that to 4 K. To write a 5 K ﬁle to disk, the operating system uses two logical blocks totalling 8 K. One logical block is not large enough to hold the 5 K of actual data. The remaining 3 K, which are wasted, are known as “slack space”. It may be helpful to think of a ﬁle’s data as a linked list. Each logical unit of storage points to the next logical unit. This should allow for the writing of additional information at the end of the ﬁle without negatively eﬀecting the functionality of the ﬁle. The last item in the list terminates, and does not know about the backup copy of the ﬁrst 1 MB of the ﬁle that follows it. The ﬁle system knows it is there, but does not distinguish one logical block from another, because the ﬁle was modiﬁed using OS calls via Python. Begin with a naive algorithm for the sake of simplicity, then add to it to fully develop Koalafied Immunity. The ﬁrst edge case is how to handle ﬁles that are smaller than 1 MB. Algorithm 1 pads ﬁles smaller than 1 MB with zero bytes (i.e. 0 × 00) until the ﬁle is 1 MB in size. Then the ﬁrst 1 MB of the ﬁle is appended to the end of the ﬁle. This makes the ﬁle immune to ransomware that only encrypts the beginning a ﬁle and not the entire ﬁle. Ideally Koalafied Immunity would run automatically at some set time interval to ensure newly created or recently modiﬁed ﬁles are quickly protected. This

Koalafied Immunity

737

if file < 1 MB then while file < 1 MB do Pad f ile with 0 × 00; end end M = First 1 MB of f ile; f ile = f ile + M ;

Algorithm 1: Handle Files Smaller than 1 MB

results in ever increasing ﬁle sizes as additional copies of the ﬁrst 1 MB of the ﬁle are appended to the end of the ﬁle each time the program is run. To solve this issue, Algorithm 2a fail safe string, e.g., “8K310279Q4A6J” is appended to the end of the ﬁle.

FAILSAFE = “8K310279Q4A6J”; if file < 1 MB then while file < 1 MB do Pad f ile with 0 × 00; end end M = First 1 MB of f ile; f ile = f ile + M + FAILSAFE;

Algorithm 2: Check for Fail Safe String

One must be careful not to overwrite the backup copy of the ﬁrst MB the ﬁle if a previously protected ﬁle is encrypted as the result of a ransomware attack. To overcome this problem, the end of the ﬁle is checked for a FAILSAFE string, then the ﬁrst and last 1 MB of the ﬁle are compared. If they diﬀer, instead of copying the ﬁrst 1 MB, the backup copy of the ﬁrst 1 MB at the end of the ﬁle should be restored, resulting in Algorithm 3. 2.1

What if Ransomware Encrypts the Entire File?

Cyber attack and defense, especially detection are co-evolutionary. Defenders use previous attacks to look for indicators of compromise–in a way ﬁghting the last war, not the next war. Meanwhile attackers are aware of current security policies and technical security controls used to enforce those policies. Attackers develop new tools, tactics, and procedures to avoid the defenders security policies and controls. Taking a lesson from physical conﬂict, it is understood that adversaries tend to operate in one of two modes; stealth to avoid detection or speed to accomplish their goal before an adequate response can be mustered to deny them their goal. Once detected, the adversary has no choice but to switch from stealth to speed.

738

W. Hutton FAILSAFE = “8K310279Q4A6J”; if f ile < 1 MB then while f ile < 1 MB do Pad f ile with 0 × 00; end end if (EOF contains FAILSAFE) and (first 1 MB != last 1 MB) then # Restore ﬁle; ﬁrst 1 MB = last 1 MB - FAILSAFE; else # Protect ﬁle; M = ﬁrst 1 MB of f ile; f ile = f ile + M + FAILSAFE; end

Algorithm 3: Automatically Restore Ransomed Files Otherwise Protect File

Given the ransomware heuristic of only encrypting the beginning of a ﬁle, it would seem the adversaries have already evaluated the question of stealth vs. speed and have chosen speed. Their goal is to encrypt as many ﬁles as possible before they are detected. Using the Cyber Kill Chain [3] to model a ransomware attack, and assuming detection does not occur until the ﬁnal step (i.e. actions on objectives) detection could be facilitated through the use of any number of existing technical security controls, such as: – Anomaly detection (e.g., increased CPU usage due to encryption algorithm) – File modiﬁcation (e.g., bytes changed due to encryption, ﬂastmod attribute, etc.) Of course, with good cyber hygiene and continuous monitoring, one could also detect the attack at any of the previous six steps of the Cyber Kill Chain and take actions to prevent the attacker from accomplishing their goal of encrypting ﬁles and soliciting a ransom. Thus, attackers must resort to speed as their preferred tactic. It is unlikely that ransomware will evolve to encrypt an entire ﬁle to negate the beneﬁts of Koalafied Immunity. If Koalafied Immunity forces attackers to encrypt entire ﬁles, it still provides the beneﬁt of added delay during the ransomware attack in the increased time it takes to fully encrypt each ﬁle. The resulting delay limits the damage done by ransomware assuming the same probability of detection using the methods mentioned above.

3

Experiment Design

This paper explains a simple proof of concept based on two hypotheses. The ﬁrst hypothesis is that all ransomware only encrypts the beginning of a ﬁle. The second hypothesis is that appending additional data to the end of the ﬁle has no negative impact on the function of the ﬁle.

Koalafied Immunity

739

The ﬁrst hypothesis is addressed in the Future Work section below. The second hypothesis was evaluated by populating a test directory with many diﬀerent types of ﬁles (see Table 1), running Koalafied Immunity as described in Algorithm 3 to transform each ﬁle, then accessing each ﬁle with their intended application as a normal user to conﬁrm the ﬁle still functions normally as expected. Multiple ﬁle types of various sizes were used to evaluate the second hypothesis. After applying Koalafied Immunity to all the ﬁles in a test directory, a second Python program was run that walked the test directory, simulating the encryption function of ransomware by overwriting the ﬁrst 256 K or 1 MB of each ﬁle with randomly generated data with equal probability. Each ﬁle in the test directory is again accessed using its associated application. As expected, each ﬁle is now unusable. Lastly, Koalafied Immunity is run a second time. This time, the algorithm detects that the fail safe string is present, and the ﬁrst 1 MB and last 1 MB of the ﬁle diﬀer. Therefore, the backup copy of the ﬁrst 1 MB of the ﬁle is used to restore the ﬁle. Walking the test directory a third time conﬁrms that the ﬁles again work normally and the damage from the simulated ransomware is repaired. If Koalafied Immunity was scheduled to run every 60 s, a normal user would never even know they’d temporarily been the victim of ransomware.

4

Results

There was no observable impact for most ﬁles. Even small ones that required padding with null bytes to increase their size to 1 MB. Microsoft applications detected ﬁle corruption and oﬀered to attempt to recover the ﬁle. Recovered ﬁles functioned normally. Understandably, the only ﬁle type that was negatively impacted by appending the ﬁrst 1 MB to the end of the ﬁle were ASCII text ﬁles, resulting in a duplication of data. “Hello, world!\n” appears as “Hello, world!\nHello, world!\n” with 986 null bytes (i.e. 0 × 00) between each instance of “Hello, world!\n”.

5

Future Work

A cross-sectional study (i.e. an analysis of more ransomware at a speciﬁc time without manipulating the study environment) to understand how common is the tactic of not encrypting a whole ﬁle is needed to quantify the protection provided by Koalafied Immunity. As previously stated, assume attackers use the tactic of speed to do as much damage as possible in the shortest amount of time before they are detected. This should maximize the probability of payment, which is the value function of the ransomware for the attacker. The emerging threat of exﬁltration along with the normal denial of availability is an indication that payment is the primary goal of ransomware. More robust testing with additional ﬁle types and various ﬁle sizes should also be performed to validate the second hypothesis that appending data to the end of the ﬁle does not break normal functionality. Testing on common

740

W. Hutton Table 1. File functionality after applying Koalafied Immunity File type

Extension Small Large

Result

Adobe Portable Document Format pdf

3.2 MB

Pass

Adobe Portable Document Format pdf

11.3 MB

Pass

Adobe Portable Document Format pdf

36.1 MB

Pass

Apple Quicktime

mov

62 MB

Pass

ASCII Text

txt

3.5 MB

FAIL

ISO 9660

iso

15 MB

Pass

Microsoft PowerPoint

pptx

1.4 MB

Pass∗

Microsoft Word

docx

2.1 MB

Pass∗

MPEG-4 Part 14

m4v

141.5 MB Pass

PKWARE archive

zip

2.6 MB

Pass

PKWARE archive

zip

16 MB

Pass

Portable Network Graphic

png

1.5 MB

Pass

Adobe Portable Document Format pdf

584 K

Pass

ASCII Text

txt

6B

FAIL

ASCII Text

txt

12 B

FAIL

ASCII Text

txt

19 B

FAIL

Joint Photographic Expert Group

jpg

3K

Pass

Joint Photographic Expert Group

jpg

242 K

Pass

Microsoft PowerPoint

pptx

168 K

Pass∗

Microsoft Word

docx

15 K

Pass∗

Tape Archive

tar

4K

Pass

Tape Archive/GNU ZIP Archive

tar.gz

205 B

Pass

operating systems (e.g., Microsoft Windows, Apple OSX, and Linux) should also be performed. Finally, the current Python code is only a proof of concept. Software engineering best practices should be used to develop production code suitable for public use. An easy way to include or exclude directories is needed. Currently a single directory is speciﬁed via the command line. Lastly, user documentation is needed to instruct users how to schedule Koalafied Immunity. ...and maybe a cute logo–perhaps a koala bear holding a key?

6

Conclusion

Koalafied Immunity appears to be a valid mitigation to certain types of ransomware that only encrypt the beginning of a ﬁle. Additional research is needed to know how widespread this heuristic is in ransomware. Additional testing is necessary to understand the user impact of applying Koalafied Immunity to other

Koalafied Immunity

741

types of user ﬁles. The slight increase in ﬁle size seems to justify the elegance of the solution in that backup and restore intervention is not required to recover from ransomware, and the protection is immediate and transparent to the user.

References 1. BlackMatter Ransomware Linux Variant Sample. https://www.tutorialjinni.com/ blackmatter-ransomware-download-linux.html 2. Gibson, S.: “Security Now: The BlackMatter Interview”. Episode 830 transcript, 3 August 2021. https://www.grc.com/securitynow.htm 3. Lockheed Martin:“The Cyber Kill Chain”. https://www.lockheedmartin.com/enus/capabilities/cyber/cyber-kill-chain.html

Measuring the Resolution Resiliency of Second-Level Domain Name Lanlan Pan1(B) , Ruonan Qiu1 , Anyu Wang1 , Minghui Yang1 , Yong Chen2 , and Anlei Hu2 1 Guangdong OPPO Mobile Telecommunications Corp. Ltd., Guangdong 518000, China

[email protected] 2 China Internet Network Information Center, Beijing 100190, China

Abstract. DNS resiliency concerns are growing. It is important to investigate the factors that can cause DNS resolution failure, including DDoS attack, BGP operation, DNS server operation, domain name registrar operation, etc. We discuss the metrics to measure resolution resiliency of second-level domain name, and do the resiliency measurement on some famous internet service. We describe a stub resolver cache proposal to mitigate the domain resolution failure, defense against DDoS attack. Keywords: DNS · Domain · Resiliency · Resolver · BGP · DDoS

1 Introduction Domain name system (DNS) is a critical internet service, it offers domain name resolution for internet service. DNS resolution failure can result in disruptions at prominent internet service such as Facebook, Twitter, and Weibo. However, there are many factors can cause DNS resolution failure, including DDoS attack, BGP operation, DNS server operation, domain name registrar phishing, etc. Therefore, it is important to measure the DNS resiliency to serve the internet service better. The remainder of this paper is organized as follows. In Sect. 2, we introduce the DNS ecosystem. In Sect. 3, we discuss some typical DNS resolution failure cases. In Sect. 4, we give a brief overview of existing DNS resiliency measurement work. In Sect. 5, we describe the metrics to measure the resolution resiliency of second-level domain name (SLD) in detail, and show our experiment. In Sect. 6, we describe a stub resolver cache proposal to improve DNS resolution resiliency on terminal device. Finally, in Sect. 7, we conclude the paper.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 742–755, 2022. https://doi.org/10.1007/978-3-031-10467-1_45

Measuring the Resolution Resiliency of Second-Level Domain Name

743

2 DNS Ecosystem As Fig. 1 shows, we take facebook.com; for example, briefly describe the components of the DNS name registration below. IANA (.)

Root Server (.)

Domain Name Registry (com.)

TLD Authoritative Server (com.)

Domain Name Registrar

Service Provider (Facebook)

SLD Authoritative Server (facebook.com.)

Fig. 1. DNS name registration.

Domain Name Registration. Table 1 shows the information of SLD “facebook.com”. 1. Service provider (Facebook) registers a second-level domain (SLD) “facebook.com” and configures the NS record of “facebook.com” at domain name registrar. 2. Domain name registrar submits SLD’s information to domain name registry. 3. Domain name registry synchronizes the SLD’s NS record to the authoritative server of top-level domain (TLD) “com”. 4. Domain name registry submits TLD’s information to IANA. 5. IANA synchronizes the TLD’s NS record to Root server. SLD Zone Data Configuration. Service provider (Facebook) configure the “facebook.com” zone data on its SLD authoritative server ([a-d].ns.facebook.com). The zone data contains “facebook.com” domain name’s resource record (RR) [1], such as NS, MX, CNAME, A, AAAA. Domain Name Resolution. Figure 2 shows the domain query path of “www.facebook. com”. 1. User opens Facebook application at terminal device, such as PC, mobile phone. Facebook application sends the A/AAAA record domain query for “www.facebook. com” to the stub resolver of terminal device.

744

L. Pan et al. Table 1. SLD’s Whois information (facebook.com).

Whois information of facebook.com Domain Name: FACEBOOK.COM Registry Domain ID: 2320948_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.registrarsafe.com Registrar URL: http://www.registrarsafe.com Updated Date: 2021-10-18T18:07:40Z Creation Date: 1997-03-29T05:00:00Z Registry Expiry Date: 2031-03-30T04:00:00Z Registrar: RegistrarSafe, LLC Registrar IANA ID: 3237 Registrar Abuse Contact Email: [email protected] Registrar Abuse Contact Phone: +1-650-308-7004 Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited Name Server: A.NS.FACEBOOK.COM Name Server: B.NS.FACEBOOK.COM Name Server: C.NS.FACEBOOK.COM Name Server: D.NS.FACEBOOK.COM DNSSEC: unsigned URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/

Facebook Application (Terminal Device) (1)

(7) Stub Resolver (Terminal Device)

(2)

Root Server (.) NS: [a-m].root-servers.net.

(3)

(6)

Recursive Resolver (Home Router, ISP DNS, Public DNS)

TLD Authoritative Server (com.) NS: [a-m].gtld-servers.net.

(4)

(5)

SLD Authoritative Server (facebook.com.) NS: [a-d].ns.facebook.com

Fig. 2. Domain name resolution.

Measuring the Resolution Resiliency of Second-Level Domain Name

745

2. Stub resolver sends the recursive domain query for “www.facebook.com” to recursive resolver. 3. Recursive resolver sends the iterative domain query for “www.facebook.com” to Root server ([a-m].root-servers.net) [2], gets TLD “com” authoritative server list ([a-m].gtld-servers.net). 4. Recursive resolver sends the iterative domain query for “www.facebook.com” to TLD “com” authoritative server ([a-m].gtld-servers.net) [3], gets SLD authoritative server ([a-d].ns.facebook.com). 5. Recursive resolver sends the iterative domain query for “www.facebook.com” to “facebook.com” SLD authoritative server, get the A/AAAA record of “www.fac ebook.com”. 6. Recursive resolver returns the A/AAAA record of “www.facebook.com” to the stub resolver. 7. Stub resolver returns the A/AAAA record of “www.facebook.com” to Facebook application.

3 DNS Resolution Failure As Table 2 shows, there are many factors can cause DNS resolution failure, including false BGP operation, DDoS attack, false DNS operation, tamper SLD’s NS record, etc. Table 2. DNS failure. Date

Failure case

Failure factor

Failure DNS component

2021–10-04

Facebook outage [4]

False BGP operation

SLD authoritative server

2016–10-21

DYN DDoS [5, 6]

DDoS attack

SLD authoritative server

2013–08-25

CN TLD DDoS [7, 8]

DDoS attack

TLD authoritative server

2011–05-30

www.qq.com outage [9]

False DNS operation

SLD authoritative server

2010–01-12

Baidu NS hijack [10]

Tamper SLD’s NS record

Domain name registrar

2009–05-18

Baofeng DDoS [11]

DDoS attack

ISP Recursive resolver

October 2021 Facebook Outage. It is because of “facebook.com” SLD authoritative servers disable their BGP advertisements automatically when they detect an unhealthy network connection to data centers [4]. The problem is that, after disable their BGP advertisements, the rest of the internet could not visit the “facebook.com” SLD authoritative servers to make DNS query. Due to the DNS failure, users could not visit “www. facebook.com” on their terminal device, although the Facebook’s data centers can work. October 2016 DYN DDoS. It is because of numerous Internet of Things-enabled (IoT) devices had been infected with Mirai malware [5, 6]. Attacker used the Mirai botnet to make DDoS attack to DYN server. DYN is serving as SLD authoritative servers for many prominent internet services such as Twitter, GitHub, and Netflix. The attack caused these prominent services to be unavailable to numerous users in Europe and North America.

746

L. Pan et al.

August 2013 CN TLD DDoS. It is an attack to TLD authoritative server, impacted some major CN internet services, such as weibo.cn [7, 8]. May 2011 www.qq.com Outage. It is a false DNS operation on “qq.com” SLD authoritative server, caused NODATA error for “www.qq.com” for some hours [9]. Jan 2010 Baidu NS Hijack. It is because of “baidu.com” SLD’s NS record is tampered at domain name registrar. Domain name registrar submits the tampered NS record to domain name registry. Domain name registry synchronizes the tampered NS record to the “com” TLD authoritative server. Numerous recursive resolvers get tampered NS record from the “com” TLD authoritative server [10]. May 2009 Baofeng DDoS. It is because of the Baofeng application’s SLD authoritative server is down, however, numerous Baofeng application send huge DNS queries to ISP recursive resolvers continuously. Finally, ISP recursive resolvers are outage, impacted numerous users in South China [11].

4 Related Work There are many proposals to measure DNS resiliency and address DNS resolution failures. However, existing proposals are hard to assure the resolution resiliency of SLD on terminal device. We believe that terminal device can provide more help to deal with DNS resolution failure on SLD. 4.1 Measure DNS Resiliency Most of existing resiliency measurement technologies focuses on the global DNS ecosystem health, lack of discussions on the resolution resiliency of SLD. DNS Resiliency Measurement. Casalicchio, E, et al. [12]. This paper describes the Measuring Naming System (MeNSa), a framework designed to provide a formal methodology, metrics and tools for evaluating DNS health. Kröhnke, L, et al. [13] provided a generic method to carry out DNS resilience analysis, identified bottlenecks and single points of failure that should be mitigated in order to improve resilience. Jian, J., et al. [14] propose a graph-based model to comprehensively analyze zone dependency in DNS. Authoritative DNS Diversity Measurement. Akamai is using anycast combined with globally diverse physical name server locations, such as redundant network links, collocation with ISPs throughout the world, and robust peering arrangements [15]. Bates, S, et al. [16] examine changes in domains’ tendency to “diversify” their pool of nameservers, which could supply redundancy and resilience in the event of an attack or service outage affecting one provider. Sommese, R, et al. [17] characterized any cast adoption in authoritative DNS infrastructure for TLDs and SLDs, found high adoption of anycast as a resilience mechanism, reaching 97% for TLDs and 62% for SLDs.

Measuring the Resolution Resiliency of Second-Level Domain Name

747

4.2 Address DNS Resolution Failures Most of existing resolution failure mitigation technologies are focus on the configuration on recursive resolver and authoritative server, and the improvement of resolution architecture. However, there are lack of discussions on the resolution resiliency improvement on terminal device. Recursive Resolver Cache Proposal. RFC 8767 [18] defines a method (serve-stale) for recursive resolvers to use stale DNS data to avoid outages when authoritative nameservers cannot be reached to refresh expired data, to make the DNS more resilient to DoS attacks. RFC8806 [19] recommends recursive resolvers to run a root server local. Authoritative Zone Data Configuration Proposal. Moura, Giovane CM, et al. [20] examined DNS TTLs, provide recommendations in TTL choice for different situations, and for where they must be configured. Stub Resolver Distributing DNS Queries Proposal. Hounsel, A, et al. [22] presented a refactored DNS resolver architecture that allows for de-centralized name resolution, preserving the benefits of encrypted DNS while satisfying other desirable properties, including performance and privacy.

5 Resolution Resiliency Measurement Existing researches discuss about the resolution resiliency measurement of global DNS system, especially the Root system. In this section, we define some metrics to measure the resolution resiliency of SLD. 5.1 Domain Status Code Metrics As Table 3 shows, Extensible Provisioning Protocol (EPP) [23] defines some status codes, to prevent unauthorized change on the second-level domain name (SLD). Obviously, the domain status code lock can help to defense against the Jan 2010 Baidu NS hijack [10]. Table 3. Domain status code metrics. Metric

Description

clientDeleteProhibited

Prevents domain from being deleted without registrar credit

clientTransferProhibited

Prevents domain from being transferred without registrar credit

clientUpdateProhibited

Prevents domain from being updated without registrar credit

serverDeleteProhibited

Prevents domain from being deleted without registry credit

serverTransferProhibited

Prevents domain from being transferred without registry credit

serverUpdateProhibited

Prevents domain from being updated without registry credit

748

L. Pan et al.

5.2 NS Diversity Metrics Consider about the DNS resolution incidents [4, 5, 7, 9], the resolution resiliency is heavily dependent on the NS diversity and anycast deployment. As Table 4 shows, to avoid single point of failure, the domain owner is better to make distributed deployment for NS. For example, facebook.com has 4 NS ([a-d].ns.facebook.com), however, the IP addresses of these NS servers belong to a single AS (AS32934 Facebook, Inc). When AS32934 made some mistake on its BGP operation, global recursive resolvers could not get all subdomains of facebook.com from these NS servers [4]. The NS diversity of Root service is much better. There are 13 root NS operated by the 12 independent root server operators, with different IP addresses, belonged to different AS numbers. Each root NS IP is anycast. There are 1469 roots NS instances distributed deployed on different countries at 10/24/2021 [2]. Table 4. NS diversity metrics. Metric

Description

NS count

The count of domain’s NS

NS IP count

The IP count of domain’s NS

NS IP AS count

The IP AS count of domain’s NS

NS IP country count

The IP country count of domain’s NS

NS IP anycast count

The count of domain’s anycast NS IP

5.3 TTL Configuration Metrics TTL configuration is critical to domain name’s resource record (RR) [20]. Table 5 shows six types of TTL, belong to authoritative service, email service, and web service. NS Table 5. TTL configuration metrics. Metric

Description

NS TTL

The maximum TTL of domain’s NS RR

NS IP TTL

The maximum A/AAAA TTL of domain’s NS RR

MX TTL

The maximum TTL of domain’s MX RR

MX IP TTL

The maximum A/AAAA TTL of domain’s MX RR

www CNAME TTL

The minimum TTL of “www” subdomain’s CNAME RR

www IP TTL

The minimum TTL of “www” subdomain’s A/AAAA RR

Measuring the Resolution Resiliency of Second-Level Domain Name

749

is the key role for authoritaftive service. MX is the guide for email service. “www” subdomain is for the default webpage of the SLD. NS Because of the hierarchy design of DNS, the TTL configuration of NS and NS IP is critical for the domain resolution resiliency. The TTL should not be configured too short for the record on TLD, or on its zone data. We recommend that the maximum TTL of NS should not less than 86400 s, the maximum TTL of NS IP should not less than 3600 s for SLD, the maximum TTL of NS IP should not less than 600 s for the subdomains of SLD. Table 6. The TTL configuration of qq.com NS at 10/24/2021. Authoritative server

NS

TTL type

[a-m].gtld-servers.net ns[1–4].qq.com NS TTL

TTL Response configuration section

Description

172800

“qq.com” NS record on “com” TLD

ns[1–4].qq.com NS IP 172800 A TTL ns[1, 2].qq.com NS IP AAAA TTL ns[1–4].qq.com

ns[1–4].qq.com NS TTL ns1.qq.com

86400

NS IP A TTL

3600

ns[2–4].qq.com NS IP A TTL

600

ns[1, 2].qq.com NS IP AAAA TTL

600

AUTHORITY

ADDITIONAL “qq.com” NS IP glue record on “com” TLD AUTHORITY ANSWER

Zone data on “qq.com” SLD

TTL Overwrite Risk of NS. Table 6 takes “qq.com” for example. It sets TTL = 172800 s for “ns[1–4].qq.com” NS IP glue record on “com” TLD authoritative servers. However, it sets IP TTL = 3600 s for “ns1.qq.com”, IP TTL = 600 s for “ns[2–4].qq.com” on “qq.com” SLD authoritative servers. From the DNS authority view, recursive resolvers should overwrite the glue record from TLD authoritative servers with the answer from SLD authoritative servers [21]. In other words, if follow the DNS hierarchy authoritative design strictly, recursive resolvers should cache the IP of “ns1.qq.com” for 3600 s, and cache the IP of ns[2–4].qq.com for 600 s, but not 172800 s. Therefore, it will reduce the resolution resiliency of “qq.com” if recursive resolvers couldn’t get glue record from

750

L. Pan et al.

“com” TLD in every 3600 s. “amazon.com” made similar configuration on the TTL of NS. MX MX record specifies the mail server of SLD. Since the email address is commonly used as the register account name, the TTL configuration of MX and MX IP is important for many internet services. For example, the MX of google.com is crucial to Gmail service, and Gmail service is one of critical information infrastructure for numerous users. We recommend that the maximum TTL of MX should not less than 3600 s, and the maximum TTL of MX IP should not less than 600 s. “www” subdomain CDN always wants to change “www” subdomain IP rapidly. To lighten the query burden of global recursive resolvers, we recommend that the minimal TTL should not less than 60 s. 5.4 Measure Resolution Resiliency To measure resolution resiliency, we define a risk threshold for each resolution resiliency metric. As Table 7 shows for each domain query, we count the risks on its DNS resolution path as the resiliency measurement. As Table 8 shows for each second-level domain, we count the resiliency measurements based on the critical qtype and www subdomain. 5.5 Experiment Our experiment code can be found in [25]. Table 9 shows our measurement on some famous internet service: • • • •

“twitter.com” doesn’t configure clientDeleteProhibited and clientUpdateProhibited. “twitter.com” and “weibo.cn” make a better NS IP AS deployment. “weibo.cn” doesn’t make anycast NS, and NS servers are only deployed in China. “qq.com” and “amazon.com” configure different TTL for NS or NS IP, which may cause TTL overwrite of NS on recursive resolvers. • “google.com” has a better MX configuration. • As Table 10 shows, Akamai edge CDN service configures very short IP TTL (20 s) for “www.qq.com” and “www.amazon.com”. Note that the measurement may be different from different probe node, if the SLD’s authoritative server with Geo-Location based traffic management [24].

Measuring the Resolution Resiliency of Second-Level Domain Name

751

Table 7. Measure resolution resiliency. Method: measurement = measure_resolution_resiliency(qd, qt) qd: domain name to query qt: RR type to query measurement = 0 foreach domain name d on the resolution path of do foreach metric m in [NS Diversity Metrics, TTL Configuration Metrics] do risk = 0 skip if find duplicate check on metric . check the value of metric , set risk = 1 if risk. measurement = measurement + risk end end if qd is second-level domain foreach metric m in [Domain Status Code Metrics] do risk = 0 skip if find duplicate check on metric . check the value of metric , set risk = 1 if risk. measurement = measurement + risk end endif return measurement

6 Resolution Resiliency Improvement As mentioned in Sect. 5, we can make some resolution resiliency improvement, including authoritative DNS diversity, recursive resolver cache, zone data configuration, and distribution DNS queries, etc. However, there are few discussions about the resiliency improvement of stub resolver on terminal device, such as PC, Mobile Phone.

752

L. Pan et al. Table 8. Measure SLD resolution resiliency.

Method: measurement = measure_SLD_resolution_resiliency(qsld) qsld: second-level domain name to query. qsld_www: the www subdomain of qsld. measurement = 0 foreach qt in [ NS, MX ] do measurement = measurement + measure_resolution_resiliency(qsld, qt) end foreach qt in [ A, AAAA ] do measurement = measurement + measure_resolution_resiliency(qsld_www, qt) end return measurement

Table 9. Resolution resiliency measurement experiment. Class

Metric

facebook.com qq.com twitter.com amazon.com google.com weibo.cn

Domain status

clientDeleteProhibited

0

0

1

0

0

clientTransferProhibited

0

0

0

0

0

0

clientUpdateProhibited

0

0

1

0

0

0

serverDeleteProhibited

0

NS diversity

0

0

0

0

0

serverTransferProhibited 0

0

0

0

0

0

serverUpdateProhibited

0

0

0

0

0

0

clientDeleteProhibited

0

0

0

0

0

0

NS count

0

0

0

0

0

0

NS IP count

0

0

0

0

0

0

NS IP AS count

2

2

0

5

1

0

NS IP country count

0

0

0

0

0

1

NS IP anycast count

0

0

0

0

0

1

0

0

0

2

0

0

TTL NS TTL configuration NS IP TTL

ALL

0

0

1

0

0

0

0

MX TTL

0

0

1

1

1

1

MX IP TTL

1

0

1

1

1

1

www CNAME TTL

0

0

0

0

0

0

www IP TTL

0

1

0

1

0

0

ALL

3

4

4

10

3

4

Measuring the Resolution Resiliency of Second-Level Domain Name

753

Table 10. www IP TTL. QDOMAIN

DNS response OWNER

www.qq. com

TTL

www.qq.com

300

news.qq.com.edgekey.net

IN

CNAME

e6156.dscf.akamaiedge.net

IN

AAAA

2600:1406:3f:396::180c

20

IN

AAAA

2600:1406:3f:390::180c

1800

IN

CNAME

tp.47cf2c8c9-frontier.amazon.com

tp.47cf2c8c9-frontier.amazon.com www.amazon.com.edgekey.net

RDATA news.qq.com.edgekey.net

20

e6156.dscf.akamaiedge.net www.amazon.com

TYPE CNAME

21600

e6156.dscf.akamaiedge.net

www.ama zon.com

CLASS IN

60

IN

CNAME

www.amazon.com.edgekey.net

86400

IN

CNAME

e15316.a.akamaiedge.net

20

IN

A

23.52.177.96

e15316.a.akamaiedge.net

Receive DNS request

from application

Query Fail

Send query to recursive resolver

No

Get DNS response

from recursive resolver ?

No

Find DNS response in stale data pool ?

Yes Add DNS response into stale data pool; If there is duplicate : - reset stale_ttl - update ttl, rr, and current_timestamp.

Yes

Extract from the cache

Return DNS response

to application

Fig. 3. Stub resolver cache proposal for terminal device.

We propose that a stub resolver cache proposal for terminal device as follows: • Similar with RFC8767 [18], the stub resolver caches the DNS response in the stale data pool. Set a expire timer for each stale data, call it stale_ttl. We can set stale_ttl = 3 days by default. Reserve the origin TTL of DNS response. • As Fig. 3 shows for each DNS request < query domain: qd, query type: qt >: a) If stub resolver gets DNS response < qd, qt, ttl, rr > from the recursive resolver, add into stale data pool: < qd, qt, ttl, rr, stale_ttl, ts >. If existed duplicate < qd, qt >, reset stale_ttl, update < ttl, rr, ts >. Set ts to the current timestamp. b) Otherwise, stub resolver will search the record < qd, qt > in the stale data pool instead.

754

L. Pan et al.

• Stub resolver should period check each of DNS response in the stale data pool, if current_timestamp > ts + stale_ttl, delete it from stale data pool. Compared to distribution DNS queries [22], the stub resolver cache proposal is simpler and more efficient: • The system of terminal device can offer a button to switch the stale data pool. User is OPT-IN to open the stub resolver cache support. • Stub resolver just maintains a stale data pool for cache, no other complex recursive resolver selection policies. • User doesn’t need to choose how DNS queries are resolved, to implement the choices for all devices on their network. As a useful supplement to recursive resolver cache [18], the stub resolver cache proposal can improve domain resolution resiliency on terminal device significantly: • It can address all the DNS failure issues described in Sect. 3. • It can persist the application visit the working internet server, even when the SLD authoritative server is breakdown, or the recursive resolver is out of order. • It can distribute the cache burden to terminal service, mitigate the random subdomain DDoS risk to recursive resolver which support stale data cache.

7 Conclusion In this paper we studied the DNS resolution failure issues, and discussed about the metrics to measure resolution resiliency of second-level domain name, including domain status code, NS diversity and TTL configuration. We focus on the resolution configuration, not mention about the DNS software configuration. We made the resolution resiliency measurement on some famous internet service domain names, and propose a stub resolver cache method to improve the resolution resiliency. We believe the stub resolver cache of terminal device could be very helpful to defense DNS resolution failure. We plan to deploy the stub resolver cache on terminal device in the future.

References 1. 2. 3. 4.

Mockapetris, P.V.: RFC1034: domain names-concepts and facilities (1987) root-servers.org, https://root-servers.org/ Delegation Record for .COM, https://www.iana.org/domains/root/db/com.html More details about the October 4 outage, https://engineering.fb.com/2021/10/05/networkingtraffic/outage-details/ 5. Understanding the Dyn DDoS Attack, https://www.ncta.com/chart/understanding-the-dynddos-attack 6. Hackers release source code for a powerful DDoS app called Mirai, https://techcrunch.com/ 2016/10/10/hackers-release-source-code-for-a-powerful-ddos-app-called-mirai/ 7. China hit by massive DDoS attack causing the Internet inaccessibility for hours, https://the hackernews.com/2013/08/china-hit-by-massive-ddos-attack_27.html

Measuring the Resolution Resiliency of Second-Level Domain Name

755

8. Pan, L., Yuchi, X., Hu, A.: Evaluating quickly method for DDoS attack on authority DNS service. Appl. Res. Comput. (11), 3456–3459 (2015) 9. www.qq.com not found, http://news.sohu.com/20110530/n308903429.shtml 10. Baidu sues registrar over DNS records hack, https://www.theregister.com/2010/01/20/baidu_ dns_hack_lawsuit 11. DNS Attack Downs Internet in Parts of China, https://www.pcworld.com/article/529187/art icle-6778.html 12. Casalicchio, E., Caselli, M., Coletta, A., Di Blasi, S., Fovino, I.N.: Measuring name system health. In: International Conference on Critical Infrastructure Protection, pp. 155–169. Springer, Berlin, Heidelberg, March 2012 13. Kröhnke, L., Jansen, J., Vranken, H.: Resilience of the domain name system: a case study of the. nl-domain. Comput. Netw. 139, 136–150 (2018) 14. Jiang, J., Zhang, J., Duan, H., Li, K., Liu, W.: Analysis and measurement of zone dependency in the domain name system. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE, May 2018 15. Deployment Diversity for DNS Resiliency, https://www.akamai.com/blog/security/deploy ment-diversity-for-dns-resiliency 16. Bates, S., Bowers, J., Greenstein, S., Weinstock, J., Xu, Y., Zittrain, J.: Evidence of decreasing internet entropy: the lack of redundancy in DNS resolution by major websites and services (No. w24317). National Bureau of Economic Research (2018) 17. Sommese, R., et al.: Characterization of anycast adoption in the DNS authoritative infrastructure. In: Network Traffic Measurement and Analysis Conference (TMA 2021), September 2021 18. Kumari, W., Sood, P., Lawrence, D.: RFC 8767-Serving Stale Data to Improve DNS Resiliency (2020) 19. Kumari, W., Hoffman, P.: Running a Root Server Local to a Resolver. IETF RFC 8806 (2020) 20. Moura, G.C., Heidemann, J., Schmidt, R.D.O., Hardaker, W.: Cache me if you can: effects of DNS Time-to-Live. In: Proceedings of the Internet Measurement Conference, pp. 101–115, October 2019 21. Arends, R., Austein, R., Larson, M., Massey, D., Rose, S.W.: Protocol modifications for the DNS security extensions RFC 4035 (2005) 22. Hounsel, A., Schmitt, P., Borgolte, K., Feamster, N.: Encryption without centralization: distributing DNS queries across recursive resolvers. In: Proceedings of the Applied Networking Research Workshop, pp. 62–68, July 2021 23. EPP, https://www.rfc-editor.org/info/std69 24. Use DNS policy for geo-location traffic management with primary servers, https://docs.mic rosoft.com/en-us/windows-server/networking/dns/deploy/primary-geo-location 25. Domain Resolution Resiliency, https://github.com/abbypan/domain_resolution_resiliency

An Advanced Algorithm for Email Classification by Using SMTP Code Woo Young Park, Sang Hyun Kim, Duy-Son Vu, Chang Han Song, Hee Soo Jung, and Hyeon Jo(B) RealSecu, Busan 48059, Republic of Korea [email protected]

Abstract. Email is an essential communication tool for modern people and offers a variety of functions. After the outbreak of COVID-19, the importance of emails enhanced further as non-face-to-face work increased. However, with the spread and dissemination of emails, cybercrime that abused emails has also increased. The number of cases of stealing or damaging email users by impersonating public institutions such as the National Police Agency, the Prosecutor’s Office, or the WHO. This study proposes an advanced algorithm of email classification using an SMTP response code to strengthen the level of email security. The proposed system is located on the side of the recipient’s email server and operates upon receipt of the email. When an email is received, it automatically verifies whether the domain of the email sender is normally registered in DNS. Thereafter, MX, SPF, and PTR records are extracted and combined to determine the state of the sending server. When additional verification is required, a proposed algorithm automatically connects the communication session to the sender to request the SMTP response code. The proposed algorithm was applied to two organizations and succeeded in classifying received emails into various categories. This study contributes to the literature on email classification by presenting new ideas in the process of sender verification. Keywords: Email classification · Email sender verification · SMTP code · Email security

1 Introduction Email provides essential and important functions in the information and communications technology (ICT) environment. It has become an integral part of our lives and a means for successful communication on the internet and cyberspace [1]. Users share information, exchange data, and inform their daily lives through email. With the expansion of smart devices and mobile environments, users can utilize email anytime, anywhere. As the use and market of email have grown, cybercrime that abused emails also increases. A typical example is a case of impersonating a public institution to steal sensitive information from victims or seek financial gains. Several cases of sending emails by impersonating organizations such as the National Police Agency, the Prosecutor’s Office, and the National Tax Service have been detected. Recently, spear phishing has © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 756–775, 2022. https://doi.org/10.1007/978-3-031-10467-1_46

An Advanced Algorithm for Email Classification by Using SMTP Code

757

emerged as an important issue. After selecting a specific target and understanding the circumstances, hackers send a disguised email as if they were a customer or a related person [2]. Since the outbreak of COVID-19, there was a case of maliciously approaching users by impersonating the WHO (world health organization) [3]. Numerous studies and efforts have been made by the government, industry, and academia to solve social problems caused by email. The government has established and implemented guidelines such as cyber crisis response simulation training and malicious email simulation training [4]. Several cybersecurity companies in the industry have launched solutions to identify and classify email. In academia, a great body of studies has proposed and verified the performance of email identification, classification, and processing algorithms. However, the damage caused by malicious emails has not been completely resolved so far despite the above efforts and studies. This study contributes to the literature on email security in that it systematically verifies both email sender and communication information to improve the completeness of email processing. The existing solutions focused only on the email body, attachments, and links based on the collected information. These techniques become vulnerable as hackers invent new attack methods. The approach that performs pattern recognition based on previously transmitted email information is difficult to respond to new attack patterns. On the other hand, a proposed algorithm in this study can provide robust security because it identifies suspicious users even when new attack patterns appear. To determine whether the sender of the email is normal, this study checks whether the domain of the sender of the email is normally registered in the domain name system (DNS). Thereafter, the state and type of the sending server are determined by combining the mail exchange (MX), sender policy framework (SPF), and pointer (PTR: reverse domain) information. If additional verification is required, a communication session is established with the outgoing server of the email sender. Then, a simple mail transfer protocol (SMTP) response code is requested. The purpose of this study is to classify email senders and emails by reinterpreting various SMTP response codes.

2 Research Background 2.1 Email Environment A domain is the unique address system used on the Internet. Through the domain, various services (web, email, etc.) can be operated. Thus, the domain can’t be duplicated and registered. In addition, individuals or institutions can register and manage the domain by receiving an accurate certification and paying an annual cost. When users register the domain, they offer information such as domain name, initial registration date, owner, etc. Through these procedures, transactions are made with intellectual property rights. In the Internet space, anyone can access the domain. The difference between general email users and malicious email users is as follows. A typical email user uses email in two ways. The first way is to create an email account by signing up for a portal site (ex: @gmail.com/@icloud.com). The second is to register the domain and build an email server to operate the email system. In this way, users can send an email and receive it normally.

758

W. Y. Park et al.

On the other hand, malicious email users hide themselves to avoid responsibility after sending malignant things. They use unregistered email servers to prevent exposure. This is the ‘email from an unknown’ origin that has recently become a problem. In most cases, when an email recipient checks the email and sender, it turns out to be an ‘unknown email sender’ or an ‘impersonated or forged email’. In the current email communication environment, email sending can be sent only by establishing an email server. However, email reception is possible only if it is an email service through a normally registered domain. Malevolent email users exploit the vulnerabilities of this environment and send emails through an unregistered domain called ‘unknown sender’. They also impersonate or falsify the domain to induce email recipients to view the email. Such email looks like a registered domain, but it is different from information registered in a DNS. Malicious senders want to bring about as much damage as possible to unspecified recipients through one-time outgoing emails. Email cannot be transmitted to the originating address of the emails they sent. In this context, there is a need to develop a technology that can protect email users by detecting and blocking malicious emails based on the characteristics of bad mailers. The proposed algorithm in this study provides the effect of protecting an email user from a malicious sender by detecting and blocking problematic emails such as ‘email from only outgoing server’, ‘email from unknown sender’, and ‘email from a private server’. 2.2 Literature Review Along with the spread and growth of emails, there have been many works to solve various derivative problems caused by email. In particular, numerous studies have created techniques for classifying large-capacity emails, advertisement emails, and unwanted emails. As the function of email evolved, malicious attackers’ sending patterns have also diversified. In the past, sending large amounts of email or advertising emails to an unspecified number of people was a typical way. Recently, hackers attack email users in ways such as spear phishing, impersonation email, and forgery email. To respond to this, industry experts and scholars have also developed new email identification and classification techniques. Gomes et al. (2017) compared two different approaches for classifying emails which are Naive Bayes and Hidden Markov Model (HMM) [5]. Two different methods are machine learning algorithms and were used for detecting the legitimacy of the email. They conducted several combinations of natural language processing techniques including stopwords removing, stemming, and lemmatizing on both algorithms to examine the gap in accuracy and search for the best method among them. In [6], the authors introduced a nature-inspired metaheuristics technique for email classification. They focused on decreasing the false-positive problem of treating spam messages as abnormal ones. They used metaheuristics-based feature selection methods and applied an extra-tree classifier to classify emails into spam and ham. The suggested algorithm had an accuracy of 95.5%, specificity of 93.7%, and F1-score of 96.3%. They compared the performance of extra-tree classifiers with other techniques like decision trees and random forests.

An Advanced Algorithm for Email Classification by Using SMTP Code

759

Sharaff and Nagwani (2020) developed a multi-label variant of email classification (ML-EC2). ML-EC2 incorporates text clustering, text classification, frequent-term calculation, and taxonomic term-mapping technique. Email clusters can be mapped to more than one email category by applying ML-EC2. The authors measured Entropy and Davies-Bouldin index to evaluate the designed algorithm [7]. Shuaib and Abdulhamid (2019) presented a metaheuristic optimization algorithm, the whale optimization algorithm (WOA) to distinguish the emails [1]. Through WOA, they chose the imperative features of emails. They used the entire dataset and assessed the rotation forest algorithm before and after feature selection with WOA. The analysis results illustrated that the rotation forest algorithm after feature selection with WOA was able to classify the emails into spam and non-spam with a performance accuracy of 99.9% and a low false-positive rate of 0.0019. Abdulhamid et al. (2018) programmed several classification algorithms including Bayesian logistic regression, hidden Naïve Bayes, radial basis function (RBF) network, voted perceptron, lazy Bayesian rule, logit boost, rotation forest, NNge, logistic model tree, REP Tree, Naïve Bayes, multilayer perceptron, random tree and J48 [8]. They measured the performance of each algorithm in terms of accuracy, precision, recall, F-ratio, root mean squared error (RMSE), receiver operator characteristics area, and root relative squared error using the WEKA data mining tool. The results described that rotation forest is the best classification algorithm achieving an accuracy of 94.2%. In [9], the authors explored the Naïve Bayes algorithm for email classification on two datasets and verified its performance. They calculated the accuracy, recall, precision, and F-measure to evaluate the performance. This research used the WEKA tool to assess the Naïve Bayes algorithm. The result indicated that the type of email and the number of instances of the dataset influences the performance of Naïve Bayes. Laksono (2020) investigated the K-nearest neighbors (KNN) algorithm to classify emails as an effort to diminish the amount of spam [10]. KNN can classify spam or ham in an email by checking it using a different K-value approach. The results showed that classification using a confusion matrix with a value of K = 1 had the highest accuracy value of 91.4%. These findings implied that the optimized KNN method using frequency distribution clustering can produce high accuracy of 100% and k-means clustering produces an accuracy of 99%. Singh (2019) designed an emerging evolutionary and swarm-based intelligent water drops method for email classification [11]. The author used the proposed algorithm along with the machine learning classification technique known as naïve Bayes classifier. The intelligent water drops algorithm is used for feature subset construction, and a naïve Bayes classifier is applied over the subset to classify the email as spam or not spam. The performance of the hybrid method was compared with other machine learning classifiers. The results showed that the performance of the proposed algorithm is better than the other hybrid algorithms. Li et al. (2019) suggested an email classification approach based on multi-view disagreement-based semi-supervised learning [12]. They argued that the multi-view method can offer richer information for classification. They examine the performance of the designed classifier with two datasets and in a real network environment. The results validate that the use of multi-view data can enhance the accuracy of email classification

760

W. Y. Park et al.

than the use of single-view data. Moreover, the proposed method was more effective as compared to several extant algorithms. Mohammad (2020) showed an improved spam classification based on the association classification algorithm (SCAC) [13]. Associative classification (AC) is one of the intelligent data mining techniques. It can efficiently identify spam emails. The author stated that SCAC can apply the robust rule generation procedure, improve the model creation process, and enhance the prediction mechanism. Rosander and Ahlstrand (2018) revised the manually defined rule-based algorithm by using machine learning [14]. They used the frameworks, TensorFlow, Scikit-Learn, and Gensim to conduct five experiments to test the performance of several common machine learning algorithms. The results revealed that Long short-term memory networks outperform other non-sequential models such as support vector machines and ADABoost when predicting labels for emails. In summary, existing studies have a common point and disadvantage in that they have used the collected information for email classification. These methodologies have no choice but to remain in permanent beta no matter how much they evolve. To solve this essential problem, this study proposes a methodology for classifying email without existing email communication information. If we use the SMTP code, we can check the status of the email sending server. Thus, clear classification is possible.

3 Proposed Algorithm 3.1 Terms For the consistency of terms, this paper unifies some expressions as follows. • The user sending the email is referred to as the sender. • The user receiving the email is referred to as the recipient. • The sender sends the email through the outgoing email server. At this time, the outgoing email server is referred to as an ‘outgoing server’ or ‘sending server’, or ‘originating server’. • The server located on the side of the recipient is referred to as an ‘incoming server’ or ‘receiving server’. • When the email arrives at the recipient’s side, this received email is referred to as an ‘incoming email’ or ‘received email’. • The algorithm proposed in this study is operated at the front of the receiving server and is referred to as an ‘email classification engine’ or ‘classification engine’ or ‘engine’, or ‘RealMail’. • Email sent from the email server for only outgoing is referred to as an ‘only outgoing email’. • Email sent from an unknown sender is referred to as ‘email from unknown’. • Email sent from a private server, not public, is referred to as ‘email from a private server’. • Impersonation email is referred to as ‘impersonation email’ or ‘impersonated email’. • Forgery email is referred to as ‘forgery email’ or ‘forged email’. • Alternation email is referred to as ‘alternation email’ or ‘alternated email’.

An Advanced Algorithm for Email Classification by Using SMTP Code

761

• SMTP codes are classified as two values. The code 1 indicates the normal status of sending server. The code 2 means the abnormal status of sending server. Figure 1 displays the overall structure of advanced email classification. The email sender transmits the email to the recipient through the sending server. RealMail verifies DNS registration information, domain, email header, and email server to verify incoming emails. If the verification result is normal, the email is delivered to the recipient. If it is abnormal, it is blocked.

Fig. 1. The overall structure of advanced email classification.

3.2 Basic Concept and Procedure An algorithm utilizes the information including the sender’s account, domain, and internet protocol (IP) to analyze the association among sender, email header, and DNS. If one of the information is not consistent with any others, the classification engine regards the email as abnormal and blocks it. Figure 2 presents the components of processing in email identification. It includes an email account, domain, and IP. In the case of a normal email, all information (account, domain, and IP) between sender, email header, and DNS matches. However, in the case of an unusual email, some elements do not match.

Fig. 2. Email identification by checking account, domain, and IP.

762

W. Y. Park et al.

The email classification engine is connected between the network and the receiving server. The engine checks whether the originating domain of the incoming email is registered in the DNS. If the originating domain of the received email is not registered in DNS, the received email is classified as dangerous and blocked. If the originating domain of the received email is registered in DNS, the classification engine extracts the MX information (email exporter record) for the outgoing domain recorded in the received email and SPF information for the outgoing IP. Then, the classification engine connects a communication session to the originating server according to the email sender’s IP. It receives the result value of the SMTP code from the originating server. If the SMTP code is 1 and SPF information on the originating IP of the received email is detected, the received email is passed to the incoming server. If the SMTP code is 2, the received email is blocked. 3.3 Flow of Email Transmission Agent The overall schematic diagram of flow in email transmission agent is as follows (see Fig. 3). As indicated in the upper right corner, configuration elements contain postfix, policy, database, and sender & receiver. The algorithm is performed by collecting information necessary for email classification from the sender located in the center. In this process, various data of the email are stored in the database to perform preprocessing and main processing.

Fig. 3. Flow in email transmission agent for SMTP analysis

3.4 Logic Flow We describe logic flow in detail in this section. The logic proceeds as follows in Fig. 4.

An Advanced Algorithm for Email Classification by Using SMTP Code

Fig. 4. Flowchart of email classification by using SMTP code

763

764

W. Y. Park et al.

DNS, A When the email is received through the network, the email classification engine determines whether the originating domain of the received email is normally registered in DNS (checking record). The engine checks whether the outgoing domain of the incoming email is registered in DNS. If the outgoing domain is not registered in DNS and the outgoing domain cannot be verified, the engine classifies the email as risky and blocks it. MX If the originating domain of the received email is registered in DNS, MX information is detected from the received email (checking MX record). When MX information on the outgoing domain is detected, the engine registers the number N of MX information and records MX FLAG as 1. If MX information was not detected, the engine writes MX FLAG as 0. SPF The email classification engine detects the SPF information from the received email. If SPF information is detected from incoming email and SPF information is INCLUDE SPF, the engine writes INCLUDE SPF FLAG as 2. The engine records EXCLUDE SPF FLAG as 1 if SPF information is EXCLUDE SPF. If SPF information is not detected, the SPF FLAG is recorded as 0. PTR The engine extracts PTR information from the received email. When the classification engine detects PTR information from the incoming email, PTR FLAG is recorded as 1. If PTR information is not detected from the incoming email, PTR FLAG is recorded as 0. Through the above-described procedure, the email classification engine acquires information displayed on MX, SPF, and PTR for the received email through various flags. The definitions of the various flags are described in Table 1. Table 1. Flag and definition Flag

Definition

MX FLAG = 0

no MX record is set in DNS

MX FLAG = 1

an MX record is accurately defined in DNS

SPF FLAG = 0

SPF information is not defined in the txt of DNS

SPF FLAG = 1

SPF information is defined in the txt of DNS, but does not include outgoing information IP

SPF FLAG = 2

SPF information is accurately defined in the txt of DNS

PTR FLAG = 1 PTR information is accurately defined in DNS PTR FLAG = 0 PTR information is not defined in DNS

An Advanced Algorithm for Email Classification by Using SMTP Code

765

3.5 Each Step of Algorithm This section elaborates each step of an advanced algorithm. First of all, it determines whether the incoming email is ‘only outgoing email’ or ‘email from unknown’, or ‘email from a private server’. It identifies the emails by confirming the information of DNS, MX, SPF, and PTR. When some problematic clue is detected, the engine classifies the incoming email as dangerous. In the case of a normal incoming email, the engine delivers the incoming email to the receiving server, eventually to a recipient. The link map of email classification is as below.

Fig. 5. Link map of email classification

DNS, A The email classification engine checks whether the sending domain of the incoming email is registered in the DNS when incoming email is received through the network. The engine blocks incoming emails if the sending domain of the incoming email is not registered in DNS. DNS, MX If the sending domain of incoming email is registered in DNS, the engine extracts MX information (email exchange records) about the originating domain recorded in the incoming email. The engine also collects SPF information about the originating IP. The engine connects a communication session to the originating server according to the originating IP. If an SMTP code is replied from the outgoing server, the engine checks both SMTP code and SPF information. If the code value is 1 and SPF information on the outgoing IP of the incoming email is detected, the incoming email is passed to the receiving server.

766

W. Y. Park et al.

If the value of the SMTP code is 2, the incoming email is blocked. The email classification engine extracts PTR information of the received email. The engine connects a communication session to the originating server according to the originating IP. If the code value is not the 1 and 2, and the connection is failed, the engine checks whether SPF information has not been detected for the received email. If SPF information is not detected, the classification engine blocks incoming email by classifying it as ‘email from a private server’ or ‘only outgoing email’ depending on the detection of PTR information. SPF – EXCLUDE: Rmail from a Private Server If SPF information is detected, the email classification engine checks whether the SPF information is EXCLUDE SPF information. If the SPF information is EXCLUDE SPF information PTR information is not detected, the engine classifies the email as ‘email from a private server’. SPF – EXCLUDE, PTR, MX: Only Outgoing Email If SPF information is EXCLUDE SPF information, and PTR information was extracted, the engine determines that the MX information for the originating domain is detected. If MX information for the originating domain has not been detected, the engine classifies the email as ‘only outgoing email’ and blocks SPF – EXCLUDE, PTR, MX: Normal Email If the SPF information is the EXCLUDE SPF information, and the PTR information is detected, the engine determines whether the MX information on the originating domain is detected. If the MX information on the originating domain is detected, the engine accesses to sending server and receives the code value. If the code value is the first code, the engine classifies the email as normal and delivers it to receiving server. SPF – INCLUDE, MX: Only Outgoing Email If SPF information is detected, the engine checks whether the SPF information is INCLUDE SPF information. If SPF information is INCLUDE, the engine determines whether that the MX information was detected in the originating domain. If MX on the originating domain was not detected, the engine classifies the email as ‘only outgoing email’. SPF – INCLUDE, MX, SMTP If SPF information is INCLUDE SPF information, and the MX information on the originating domain is detected, the engine receives a code value by accessing the sending server according to the MX information. The received email is passed only when the code value is the first code. SMTP: Connection_fromIP After the email classification engine obtains information including MX, SPF, and PTR for incoming email through various flags, the engine accesses the outgoing server through PORT 25 of the SMTP (connection_fromIP, Fig. 5) according to the incoming email’s originating IP. The engine connects the outgoing server through PORT 25 of the SMTP

An Advanced Algorithm for Email Classification by Using SMTP Code

767

protocol and checks whether the value of the SMTP code from the outgoing server is 250. When the SMTP code is 250, the engine checks the SPF FLAG again. If the SPF FLAG is 2, the engine passes the received email (WHITE) and transmits the received email to the receiving server. When the SPF FLAG is 0 or 1, the engine passes the received email (N0_WHITE) to the receiving server. Accordingly, when the SMTP code is 250, both the account and the sending server are normal, and thus may be transmitted to the recipient. The email classification engine checks whether the SMTP code is 550. When the SMTP code is 550, the engine blocks the incoming email by determining that the account is unknown from the outgoing server corresponding to the IP of the incoming email. The engine checks whether an SMTP value (other than 250 or 550) has been returned or failed to connect to the sending server IP. If the engine is unable to access the outgoing server IP, or the returned code is not 250 or 550, the engine takes the next steps to discriminate more accurately. The engine divides the cases into three categories using the SPF FLAG value. For example, it checks whether the SPF FLAG values are 0 or 1, or 2. In the first case, ‘SPF FLAG = 0’ has no SPF value set in the domain. In second, ‘SPF FLAG = 1’ means that an SPF value is set in the domain, but does not include the IP value of the outgoing information. Last, ‘SPF FLAG = 2’ indicates that the SPF value is accurately set in the domain. First, if the SPF FLAG is 0, the engine checks whether the PTR FLAG value is 0 or 1. If both SPF FLAG and PTR FLAG are 0, the engine regards the email as ‘unknown email server’ and ‘email from private server’ because the sending server is likely to be private or cannot be validated. If the SPF FLAG is 0 and the PTR FLAG is 1, the engine judges the email as an ‘ email from only outgoing server’ and blocks it. This is because there isn’t SPF but a PTR value is accurately set. If the SPF FLAG is 1, it means that SPF is set but the IP of the corresponding outgoing information is not included. Thus, the engine again checks whether the PTR FLAG value is 0 or 1. If SPF FLAG is 1 and PTR FLAG is 0, it intimates that the outgoing server is a private server (an unknown email server). Hence, the engine blocks the incoming email. If the PTR FLAG is 1, the engine checks whether the MX FLAG value is 0 or 1. When the MX FLAG is 0, the email was sent from an only outgoing server that cannot receive email. The engine classifies the incoming email as ‘only outgoing email’. SMTP: Connection_MXIPs When the MX FLAG is 1, the engine determines that the sending server on the originating site of the incoming email is normally registered and proceeds to the connection_MXIPs path (Fig. 5). If the SPF FLAG is 2, the engine checks whether the MX FLAG is 0 or 1. When the MX FLAG is 0, the engine classifies the incoming email as ‘only outgoing email’ and blocks it. This is because there is no MX record and the sending server cannot receive emails. In Connection_MXIPs, the engine connects to the outgoing server based on IP information set in the MX record and checks the returned SMTP code. Only when the returned SMTP code is 250, the engine passes the received email and transmits it to the receiving

768

W. Y. Park et al.

server. When the returned SMTP code value is 550, the engine blocks the received email because the outgoing account is not confirmed. Even when the returned SMTP code is not 250 or 550, or the connection fails, the received email is blocked. If the MX record values are multiple (N), the engine connects each of the MX record values using port 25 of the SMTP. The process of blocking or delivering the received email according to the value of the SMTP code is repeatedly performed. As described above, the technical ideas described in the embodiments of this algorithm may be implemented independently and may be conducted in combination with each other. Table 2 shows the sender information, MX, DNS, and Telnet messages. Table 2. Sender information, MX, DNS, and telnet messages. Type

Sender information Account Domain

MX

DNS

Server

Telnet Domain connect

IP connect

Normal email

a01

b01.com 1.1.1.1 Match Match Telnet (check) (check) b01.com 25 b01.com ESMTP : RCPT TO: [email protected] Recipient OK

telnet 1.1.1.1 25 b01.com ESMTP : RCPT TO: [email protected] Recipient OK

Unknown email

abc01

rea.net

2.2.2.2 None

None

Telnet rea.net 25 Could not connect… * Domains not registered with DNS

Telnet 10.10.10.20 25 Could not connect… * Port 25 is not open

Impersonation a01 email

bO1.com 3.3.3.3 None

None

Telnet b01.com 25 Could not connect… * Domains not registered with DNS

Telnet 10.10.10.20 25 Could not connect… * Port 25 is not open

Malicious email

abce.cy

None

Telnet abce.cy 25 Could not connect… * Domains not registered with DNS

Telnet 10.10.10.20 25 Could not connect… * Port 25 is not open

Police

4.4.4.4 None

An Advanced Algorithm for Email Classification by Using SMTP Code

Table 3 details the SMTP reply code, interpretation, and reinterpretation. Table 3. SMTP reply code: interpretation and reinterpretation Code

Interpretation

Reinterpretation

250

Match/OK

Match/OK

441

4.4.1 No answer from host

451

4.4.0 DNS resolving error

Unknown email (Impersonation email/private email/fake email/etc.)

451

4.3.0 Other or undefined mail system status

550

5.1.1 No such user (Email Address)

550

User unknown

553

Sorry, that domain isn’t in my list of allowed rcpt hosts

Figure 6 displays the message communicated with sending server.

Fig. 6. Message communication in connection with sending server.

769

770

W. Y. Park et al.

4 Performance 4.1 Computing Performance To check the computing performance of the proposed algorithm, this study measured and verified the CPU usage rate, memory usage rate, and response time. We operated the algorithm in the specified test environment (hardware, software, and network). To secure the reliability and validity of the verification, the performance test was commissioned to the TTA (Telecommunication Technology Association), an external professional organization. To measure the computing performance of the algorithm, the measurement environment and method were set as follows (see Table 4): Table 4. Test environment and method for computing performance Task

Index

Measurement environment and method

Processing accepted email

CPU usage Executing the test target program in the rate/Memory administrator PC usage After logging in, all email blocking policies are set to ON After running Windows PowerShell on the PC for sending and receiving email, a script is executed that simultaneously sends an unfiltered email that was not blocked by policy

Processing blocked email CPU usage Executing the test target program on the rate/Memory administrator PC usage After logging in, all email blocking policies are set to ON Registering domains to be blocked from ‘blacklist’ After running Windows PowerShell on the PC for sending/receiving email, a script is executed that simultaneously sends email filtered (blacklisted) which were blocked by policy Email information inquiry

CPU usage Executing the test target program on the rate/Memory administrator PC usage After logging in, select the [Email Analysis] menu After setting the period, inquiry the information of the received email

Inquiry response time

Response time

Time is taken from immediately after inputting a command such as inquiry or request to the system to EO when the response to the command is completed

Computing performance evaluation results are as follows. First, computing performance for blocking email processing was measured by sequentially sending 300 emails. The peak CPU usage rate of the server is 3.00%, and the average memory usage is 2452

An Advanced Algorithm for Email Classification by Using SMTP Code

771

MB. Second, in the case of computing performance when searching for incoming email information, the highest usage rate of the server CPU is 1.00% and the average memory usage is 2438 MB. When viewing incoming email information (5793 cases) from the administrator PC, the server’s maximum CPU usage rate is 1.00%, and the average memory usage is 3885 MB. Table 5 summarized the CPU usage rate and memory usage. Table 5. CPU usage rate and memory usage Task

Type

CPU Usage rate (Highest)

Memory Usage (Mean)

Processing blocked email

Server

3.00%

2,452.19 MB

Inquiry for information Received email

Server

1.00%

2,438.89 MB

Administrator PC

1.35%

3,885.52 MB

Third, the response time for information inquiry of incoming email (5793 cases) is as follows based on the administrator PC. Table 6 displays response time for blocked email and inquiry. Table 6. Response time for blocked email and inquiry Index Response time

Response time (sec.) Once

2

3

4

5

Mean

0.15

0.16

0.15

0.17

0.16

0.16

4.2 Classification Performance This study proposes a new algorithm for email classification, but it is difficult to compare our algorithm to other methodologies. The first reason is that many existing techniques apply signature-based pattern recognition. That is, the newly received email is determined based on the information obtained by analyzing the already received email. On the other hand, the core of this study checks sender information using the SMTP code in an environment where existing email reception has not been realized. Thus, the comparison is difficult. Second, the legitimacy of emails varies from user to user. Even if it is the same advertising email, some users regard it as a meaningful information email, while others may regard it as spam. As such, the definition of normal emails varies depending on the circumstances. Hence, it is difficult to compare the algorithm of this study with other benchmarking techniques. For these reasons, this study verified the classification performance by case application. This algorithm was applied to two actual organizations

772

W. Y. Park et al.

for many periods and objective classification values were measured. In addition, actual email server managers of each organization participated in setting classification criteria and customized email definitions. Organization 1 Organization 1 is a national university and constituted of 10 colleges. The school has 270 majors, 22000 students, and 1000 employees. It has received more than 20,000 emails per day from all over the world. A great number of malicious emails and impersonated emails have come into the receiving server. Although a spam-blocking system was in operation, several fraudulent cases have been detected. We applied our algorithm to this institution from 22nd April to 6th May 2021. There were no abnormal emails in types of real-time black list, bomb, black, virus, and spam. We detected 358 emails whose domains were not normal. 5781 emails whose account information cannot be verified were blocked. 11268 emails in which the sender’s information was not confirmed were classified as blocked emails. 4925 emails in which the sender’s information was not confirmed were classified as blocked emails. 8042 emails with unknown sources were processed as blocked emails. We detected 1856 emails including dangerous keywords and 4,456 emails that were problematic due to other issues. Figure 7 shows the classification performance of organization 1.

Fig. 7. Classification performance: organization 1.

Organization 2 Organization 2 is a medium-sized manufacturing company that has 17 locations over the world. It has over 4500 employees. It has received more than 10000 emails per day. A great amount of spam including forged emails and altered emails has come into the server. Although a spam-blocking system was running, ransomware attacked the email system recently. It was due to negligence of the user and dated security solution. We implemented our algorithm to this institution from 22nd February to 2nd March 2021. We detected 424 RBL emails, 477 bomb emails, 123 blacklisted emails, and 1106 spam. Moreover, we found 520 emails whose domains are not normal and 2178 emails whose account information was not verified. 4547 emails whose accounts cannot be validated were blocked. 1344 emails in which the sender’s information was not confirmed were classified as blocked emails. 3624 emails with unknown sources were processed as blocked emails. We detected 4370 emails including dangerous keywords and 2183 emails that were problematic due to other issues. Figure 8 depicts the classification performance of organization 2.

An Advanced Algorithm for Email Classification by Using SMTP Code

773

Fig. 8. Classification performance: organization 2.

5 Conclusion 5.1 Summary This study proposed an intelligent email classification algorithm by supplementing existing studies. To advance the email classification technique, all email-related information such as outgoing information (account, domain, and IP), email header, and DNS were reflected. By automatically performing a communication session to the sending server, the status of sending server was checked in detail by interpreting SMTP code. Through this, ‘only outgoing email’ and ‘email from unknown’ were accurately identified. The RealMail checks whether the outgoing domain of the incoming email is registered in the DNS. If the outgoing domain is not registered in DNS, the engine blocks incoming emails. When the outgoing domain is registered in DNS, MX information about the outgoing domain is recorded in the incoming email. The proposed algorithm examined SPF information for the originating IP to check whether there is an abnormality. It connects to the outgoing server according to the outgoing IP of the incoming email and receives a code value from the outgoing server. If the SMTP code is 1 and SPF information on the originating IP of the received email is detected, the received email is passed to the receiving server. If the code is 2, the received email is blocked. 5.2 Implications for Research and Practice The academic and practical implications of this study are as follows. First, this study contributes to the literature on email classification by interpreting the SMTP code, which is basic and very dominant but has not been actively reflected in previous studies on email classification. Existing works have accumulated data based on collected emails and clarified patterns. This study supplemented extant research by investigating the DNS, email header, and IP of the sender intelligently. Through the results of this study, organizations that newly use email hosting will be able to check email senders without worrying about security in the future. In practice, service providers or distributors will be beneficial to conduct marketing by emphasizing the importance of identifying email senders. Second, this study determined the originating server through information on DNS, SPF, and PTR. All combinations were formed to classify dangerous outgoing servers and establish standards. Representatively, ’only outgoing email’, ’email from unknown’, and ’email from a private server’ were defined and processed. Prior studies

774

W. Y. Park et al.

have mainly classified spam only into large-capacity emails and advertising emails. This study presented a new milestone in the email classification field by dealing with spear phishing, impersonated email, and forged email. Service providers will be able to strengthen their security levels by creating new categories in the process of email filtering. Third, this study proved its excellence by applying it to large institutions using the actual email system. This algorithm was applied to a big university and a mid-sized company. It succeeded in classifying dangerous emails into several categories. Based on the results of this application, potential customers interested in email security will be able to test this algorithm. 5.3 Limitation and Direction of Further Research The limitations of this study and further research directions are as follows. First, this study did not compare our algorithm with other techniques in the validation of classification performance. This is due to differences in the classification environment as mentioned earlier. In future studies, it will be meaningful to set an environment in which existing pattern analysis techniques and classification performance can be compared. Second, this study was applied only to two organizations. Subsequent studies should be applied to many institutions to confirm the performance of the algorithm to improve generality and validity.

References 1. Shuaib, M., et al.: Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification. SN Appl. Sci. 1(5), 1–17 (2019). https://doi. org/10.1007/s42452-019-0394-7 2. Stembert, N., Padmos, A., Bargh, M.S., Choenni, S., Jansen, F.: A study of preventing email (spear) phishing by enabling human intelligence. In: 2015 European Intelligence and Security Informatics Conference, IEEE, Manchester (2015) 3. World Health Organization: Beware of criminals pretending to be WHO. https://www.who. int/about/cyber-security. Accessed 01 Sept 2021 4. Straub, J.: Using simulation to understand and respond to real world and cyber crises. In: Information Technology Applications for Crisis Response and Management. IGI global, pp. 111–127 (2021) 5. Gomes, S.R., et al.: A comparative approach to email classification using Naive Bayes classifier and hidden Markov model. In: 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), Dhaka (2017) 6. Sharaff, A., Gupta, H.: Extra-tree classifier with metaheuristics approach for email classification. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds.) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol. 924, pp. 189–197. Springer, Singapore (2019). https://doi.org/10.1007/978-981-136861-5_17 7. Sharaff, A., Nagwani, N.K.: ML-EC2: an algorithm for multi-label email classification using clustering. Int. J. Web-Based Learn. Teach. Technol. (IJWLTT) 15(2), 19–33 (2020) 8. Abdulhamid, S.I.M., Shuaib, M., Osho, O., Ismaila, I., Alhassan, J.K.: Comparative analysis of classification algorithms for email spam detection. Int. J. Comput. Netw. Inf. Secur. 10(1) (2018)

An Advanced Algorithm for Email Classification by Using SMTP Code

775

9. Rusland, N.F., Wahid, N., Kasim, S., Hafit, H.: Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing, vol. 226, no. 1, p. 012091 (2017) 10. Laksono, E., Basuki, A., Bachtiar, F.: Optimization of K value in KNN algorithm for spam and ham email classification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 4(2), 377–383 (2020) 11. Singh, M.: Classification of spam email using intelligent water drops algorithm with naïve bayes classifier. In: Panigrahi, C., Pujari, A., Misra, S., Pati, B., Li, K.C. (eds.) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol. 714, pp. 133–138. Springer, Singapore(2019). https://doi.org/10.1007/978981-13-0224-4_1 12. Li, W., Meng, W., Tan, Z., Xiang, Y.: Design of multi-view based email classification for IoT systems via semi-supervised learning. J. Netw. Comput. Appl. 128, 56–63, (2019) 13. Mohammad, R.M.A.: An improved multi-class classification algorithm based on association classification approach and its application to spam emails. IAENG Int. J. Comput. Sci. 47(2), (2020) 14. Rosander, O., Ahlstrand, J.: Email Classification with Machine Learning and Word Embeddings for Improved Customer Support (2018)

Protecting Privacy Using Low-Cost Data Diodes and Strong Cryptography Andr´e Frank Krause(B) and Kai Essig Rhein Waal University, Kamp Lintfort, Germany {andrefrank.krause,kai.essig}@hochschule-rhein-waal.de

Abstract. Compromised near-body electronic devices, like an eye tracker or a brain-computer interface, can leak private, highly sensitive biometric or medical data. Such data must be protected at all costs to avoid mass-surveillance and hacking attempts. We review the current, dire state of network security caused by complex protocols, closed-source software and proprietary hardware. To tackle the issue, we discuss a concept that protects privacy by combining three elements: data diodes, strong encryption and true random number generators. For each element, we suggest low-complexity algorithms and low-cost hardware solutions that can be implemented using off-the-shelf components. Already a basic data diode can establish a strong barrier against hacking attempts. A carefully designed, shielded and monitored system combining data diodes and strong encryption can make most levels of attack infeasible. Keywords: Data diode · One-time-pad · Post-quantum cryptography · True random number generator · Data privacy

1 Introduction Modern devices like eye trackers, EEG-headsets and augmented reality glasses process and record highly sensitive, intimate and often confidential physiological- and environmental data. Such data must be protected at all costs to avoid mass-surveillance and hacking attempts by rouge actors. For example, eye trackers register eye movements, pupil dilations and often a video stream including audio recordings of the peripersonal space through a scene camera. On top, video based eye trackers record close-up images of one or both eyes, often using near-infrared illumination. Most commercial iris scanners to date apply the same principle and capture a high-resolution infrared image of the iris [6, 75]. Iris features are very unique to each person and appear to be stable over many decades [16, 35]. Due to this uniqueness of iris features, not only person verification, but also identification can be performed with an extremely low false match rate. This allows the reliable identification of a person among millions of other persons [17]. In the near future, eye tracking will be an essential component of many virtual- and augmented reality headsets for natural interaction [45, 48] and performance (foveated rendering, [2, 43]). For example, the Microsoft Hololens 2 incorporates two infrared eye cameras and uses iris detection for user identification [12]. Other sensitive attributes like age [38, 59], sexual arousal- and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 776–788, 2022. https://doi.org/10.1007/978-3-031-10467-1_47

Privacy with Low-Cost Data Diodes

777

orientation [49, 50], use of contraceptives [51], Autism [44], and Alzheimer’s disease [13] can be extracted from gaze data, see also [34]. Hence, leaked gaze-data, eye images and video + audio streams from hacked devices pose a huge privacy risk [21]. To establish trust in using such devices, users need to have full authority, control and consent about what data can be used for which purpose or where it must be transmitted confidentially. Yet, “interested parties” around the world try to circumvent encryption by hacking devices, stealing security keys or logging communication before encryption. Today’s computing devices use complex hardware- and software architectures and versatile but complicated communication protocols. This inevitably leads to security holes that are difficult to uncover and fix. Further, interested parties proactively try to weaken security standards, cryptographic algorithms and protocols. Section 2 reviews the current, dire state of network security. Data diodes (also known as unidirectional gateways) can provide a solution to such complexity- and network-security problems. They are based on provable, easy to understand physical- and electronic mechanisms. Section 3 introduces data diodes and explains the security advantages of strictly unidirectional data flow. In the remaining sections, we describe a low-complexity concept for secure data flow and communication that combines data diodes, strong encryption and true random number generation. We also present and analyze low-cost hardware solutions that can be used in consumer electronics.

2 Network Security Threats Today, essentially every network - connected device should be considered compromised. For example, a severe Bluetooth flaw was discovered in 2018 that still affects millions of unpatched devices: A man-in-the-middle attack was demonstrated that reveals the encryption key and allows for passive eavesdropping of encrypted communication [8]. This flaw not only requires software-, but also firmware patches that are often not available for older devices. Wi-Fi security as well seems to be constantly at risk. Connections using the - now deprecated - WEP protocol can be cracked within minutes, recovering the encryption key from observed network traffic [22, 65]. To solve the non-fixable issues with WEP, a new security standard called WPA2 was released in 2004 (for a security analysis see [36]). But exploits were also found for WPA2, allowing an attacker to decrypt, inject and manipulate traffic [69, 70]. WPA3 was released in 2018 to fix the protocol vulnerabilities of WPA2 with a different, more secure key exchange handshake (dragonfly key exchange). But yet again, a group of vulnerabilities called “Dragonblood” were discovered soon after WPA3 was released. They enable the extraction of the Wi-Fi password through downgrade- and side-channel attacks [71]. Besides protocol flaws, software bugs and side channel data leakage, there are also active attempts to break privacy and encryption. For example, the leaks from Edward Snowden about the NSA mass surveillance program provided strong evidence that a pseudorandom number generator (PRNG) called “Dual EC DRBG” was backdoored and pushed by the NSA to be the default algorithm in a widely used cryptography library [32, 58]. In 1999, it became apparent that Microsoft’s cryptographic subsystem for the windows operating system of that time contained an undisclosed secondary

778

A. F. Krause and K. Essig

validation key named “NSAKEY” [77]. This, combined with a testimony at the US congress in 2009 confirming that the NSA was working together with Microsoft on the development of Windows 7 and previous versions [30], raised suspicion of back- and bug doors specifically tailored for the NSA. Additional revelations about the close collaboration between Microsoft and the NSA as part of the PRISM program [23] further reduces trust in closed source software and proprietary hardware. Unfortunately, even open source software cannot always guarantee protection against fatal security holes in a timely manner, as was demonstrated by the OpenSSL Heartbleed vulnerability. Heartbleed kept undiscovered for more than two years [20]. A compromised network connection allows an adversary to passively eavesdrop on data streams or to actively inject exploits to compromise the target machine. The two main reasons for reoccurring, severe vulnerabilities certainly are protocol- and software complexity [56] and the difficulty to properly implement cryptographic algorithms [53].

3 Data Diodes

Fig. 1. Concept for secure communication using data diodes. The sending device securely encrypts data and transmits the data using an optocoupler. The encrypted data is then relayed over a network and forwarded to the receiving device again through an optocoupler. The optocouplers implement the data diodes: no malicious information can enter the sending device - it becomes essentially non-hackable. On the other end, no information can escape the receiving device. If implemented properly, no information can be eavesdropped.

Data diodes enforce that information can only flow in one direction by exploiting oneway physical mechanisms [40, 61]. Such a mechanism can be the unidirectional flow of light from a light source to a detector. For example, an optocoupler can be used as a data diode: the data is supposed to flow from a light emitting diode to an optical sensor (e.g. a photo-diode/-transistor), but not in the other direction. [39] provides an overview of current optocoupler designs. Building upon this foundation, a secure system can be designed. Figure 1 illustrates the concept: 1. If information is guaranteed to flow only in one direction, the sending device can never be manipulated from the outside. 2. From the receiving device, no information can ever escape, even if the receiving device might become manipulated through a malicious data stream.

Privacy with Low-Cost Data Diodes

779

If combined with strong encryption (see Sect. 4), data can be sent over any network without the risk of eavesdropping. Still, a flawed practical implementation can leak information through diverse side-channels, see Sect. 3.2. Proper shielding (e.g. faraday cage, acoustic and thermal insulation) and a battery driven power supply can reduce such attack vectors. Due to the unidirectionality of data diodes, there is no way to acknowledge that a data package was properly received [27]. This may cause data loss due to transmission errors, a temporarily unresponsive receiver or a receiver that is too slow for the incoming amount of data. These problems can be solved by forward error correction methods (e.g. sending a data package several times; using error correcting codes [10, 27]) and equipping the receiver with a large buffer. 3.1 Low-Cost Data Diodes

Fig. 2. A basic data diode is formed by connecting the transmit data pin (TXD) of the serial port of a sending device with the receive data pin (RXD) of a receiving device through a diode (A) or a digital buffer (B).

A basic data diode can be established using the serial port of the sending device and disconnecting the receive-data pin (RXD). Unfortunately, on some single board computers (including the Raspberry Pi), the pin assignment can be freely reconfigured, e.g. by a malicious actor. To enforce directional data-flow, a diode or a digital buffer can be added, see Fig. 2. Yet, a direct, wired electrical connection between two devices provides little electromagnetic insulation, opening up potential side channels (see Sect. 3.2). A higher level of isolation is provided by mainstream optocoupler chips, for example the low-cost and low-complexity 6N136 with up to 1 MBaud transfer speed. Figure 3 shows the internal circuit and external wiring to connect two Raspberry Pi single-board computers. We could achieve a stable transmission speed of up to 600 kBaud. This is more than enough speed to transmit gaze data: for example, our eye tracking software “Libretracker” [31] currently streams 18 floating point numbers (gaze + marker data). Using cameras with a sampling rate 60 Hz, streaming requires approximately 35 kBit/s. Security Analysis. The optocoupler 6N136 contains only three components: a LED, a photo diode and a transistor. Each component can be individually tested for potential faults by external measurements. In practice, no optocoupler can be a perfect one-way

780

A. F. Krause and K. Essig

Fig. 3. Low-cost data diode based on a 6N136 optocoupler, connecting the serial ports of two Raspberry Pi’s. The 6N136 is a Low-complexity device, containing an infrared LED, a photo diode and a high-speed transistor.

communication device. There is always a small, unavoidable amount of electromagnetic and stray capacitive coupling between internal structures and external pins of the optocoupler chip. On top, a photo diode, supposed to only receive data, can emit light itself if powered. This light could then be registered by the LED, because an LED can unfortunately also work as a (relatively inefficient) photo diode [18, 37]. This breaks the common assumption that an optocoupler provides a strict one-way flow of data. Yet, in the circuit shown in Fig. 3, the photo diode is not directly accessible. If the GPIO pins on the receiving Raspberry Pi would be reconfigured, the photo diode is still isolated by the transistor and cannot be powered. A small amount of capacitive coupling might remain, such that a very sensitive analog input (that is not available on Raspberry Pi GPIO pins) could read backward flowing data. A digital buffer inserted between TXD and the LED (see Fig. 2) can provide further isolation. High-Speed Optocoupler. Higher transfer rates are possible using high-speed optocouplers (e.g. the medium-cost1 HCPL-7723 with up to 50MBaud). The HCPL-7723 can be connected either directly to the Raspberry Pi serial port or to a USB-to-serial chip, see Fig. 4. A drawback of such chips is their complex internal structure, which makes it harder to analyze their suitability as data diodes. 3.2

Side-Channels

Data can be exfiltrated from systems protected by a data-diode or even fully air-gapped systems through unexpected and often surprising side channel attack methods. Sidechannels can exist in the form of magnetic field fluctuations, electromagnetic radiation, thermal conduction, vibrations and acoustic sounds produced by electronic components, power-consumption variations and optical emissions. For example, carefully timed DRAM access patterns can turn the memory bus into a wifi antenna, sending data with 100 bits/second [24].

1

The HCPL-7723 chip is approximately ten times more expensive than the 6N136, but still very affordable compared to commercial data diode solutions.

Privacy with Low-Cost Data Diodes

781

Fig. 4. Low-cost data diode using the high-speed HCPL-7723 optocoupler (50 MBaud) and a USB to serial converter. The schematic is based on the design by [46], License: GNU FDL v1.3

[24, 25, 60] provide a side-channel taxonomy, corresponding exploitation methods and countermeasures. Proper shielding can block most of above mentioned sidechannels. A carefully designed Faraday cage blocks all relevant electromagnetic emissions in the commonly used radio spectrum. Yet, slowly changing magnetic fields (e.g. 50 Hz) can even leak through a Faraday cage built with a thick, solid metal shielding [26]. Confining magnetic fields is complicated and requires several layers of special metal alloys with high magnetic permeability [11].

4 Encryption Encryption protects not only from eavesdropping, but also complicates the injection of malicious bit streams that could compromise the receiving endpoint. Assuming that the receiving hardware + firmware (e.g. the serial chip connected to the optocoupler) is safe from manipulated bit streams, the next critical attack point is the decryption algorithm. If this algorithm is secure and hardened and the encryption key is unknown to an attacker, it will be very difficult to inject tailored bit streams to break later software stages. Only with a low-complexity encryption + decryption solution one can strive to approximate bug-free software. 4.1 One-Time-Pad One such low-complexity encryption method is the one-time-pad (OTP) [72]. The encryption method is very simple: the message is xor’ed with a random key having the same size: encrypted = message ⊕ key. The resulting cipher-text can be decrypted by xor’ing it with the same key again: decrypted = encrypted ⊕ key. The encryption strength of the one-time-pad scales directly with the quality of the random key generator. Requirements for hardware based true random number generators are analyzed in [19, 63]. The one-time-pad is considered to be an information-theoretically secure,

782

A. F. Krause and K. Essig

“unbreakable” encryption method because it is not based on any unproven assumptions like the existence of one-way functions [55]. But it is secure only if the key is truly random, is used only once and has the same or a larger size than the message: |key| ≥ |message|. Random number sources generating unbiased and fully uncorrelated bits are very difficult to construct, see Sect. 4.4. Further, the one-time-pad itself does not provide message authentication. This is problematic, because flipping bits in the ciphertext at a known position can, for example, randomly change a number representing the amount of a money transfer [47]. One-time-pad message authentication is discussed in [47]. 4.2

Post-Quantum Cryptography

A major disadvantage of the one-time-pad is the key size requirement. Streaming large amounts of data can quickly deplete the available amount of pre-shared random data on the sending- and receiving endpoint. Currently, some symmetric key encryption algorithms, for example 256 bit AES [14, 15], are considered safe against attacks using quantum computers. But other, very important public-key algorithms, like the DiffieHellman key exchange, are considered broken [1, 4, 7, 57]. Using the pre-shared random data as a private-key reservoir for a quantum safe symmetric cipher can be an alternative to full OTP encryption. Fetching a fresh key from the pool per session or even per data-block provides forward secrecy: breaking the key of one session does not break other sessions [76]. 4.3

Meta-Data

Even with strong encryption, some information can leak in the form of “meta-data”. For example, the start-time, duration, amount and speed of encrypted data flowing out of the device can provide valuable information for an adversary. To reduce the accumulation of such meta-data, the encrypted data could be embedded into a constant stream of random data or hidden using steganography methods [3, 29]. 4.4

True Random Number Generators

True random number generators (TRNG) are based on physical processes (e.g. noise, chaos or quantum) that are considered non-deterministig, hence non-predictable even with full knowledge of the technical details of the TRNG, all previous states and initial conditions. Implementing a scientifically provable TRNG is a very difficult task, for a review see [63]. The bits sampled from a physical process often have an uneven distribution of zeros and ones (i.e. the bits are biased, [63]). Therefore, the bit stream must be post-processed with a whitening algorithm (e.g. von Neumann de-biasing [73]). Some hardware random number generators include an additional step that expands the amount of available random bits per second using a deterministic algorithm, e.g. seeding a cryptographically secure pseudo-random number generator (CSPRNG) with the true random bits. This is a questionable step, as seeding a deterministic algorithm cannot increase the amount of

Privacy with Low-Cost Data Diodes

783

non-deterministic randomness [63, 73]. Sunar et al. [64] states: “There is great danger in deterministic methods aimed at improving the ‘appearance of randomness’.” To date, there is no mathematical proof if a high quality CSPRNG exists that can not be distinguished from a TRNG. Therefore, “a good true RNG should be post-processing - free or use minimal ad hoc post-processing” [63]. Establishing trust in proprietary, on-chip “true” RNG is difficult if not impossible: companies want to keep their trade secrets, while cryptography experts need to know every detail to judge the TRNG quality. For example, the output of an on-chip hardware RNG might be computationally indistinguishable from a TRNG, but predictable if a CPRNG is used for expansion and the seed is (partially) known. [5] describe a hardware level trojan that is extremely difficult to detect, because no visible changes to the chip layout are made. They demonstrate an attack on the Intel’s Ivy Bridge TRNG. By changing the dopant polarity of a few hundred transistors, it is possible to reduce the attack complexity of the AES based post-processing step from 128 to 32 bits [5]. AMD CPU’s seem to have re-occurring, severe hardware RNG bugs requiring microcode updates [33, 52]. 4.5 Low-Cost TRNG’s

Fig. 5. Popular, low-cost analog noise-source circuits. A) Reverse biased pn transistor junction. B) Zener diode in reverse breakdown mode. C) The analog output can be digitized using a schmitt trigger.

Because of the trust problems outlined above, it is reasonable to consider the usage of an external noise source. We list some practical and low-cost approaches. CMOS image sensors are inherently noisy at room temperature due to a mixture of thermal-, shot- and quantization noise [66, 67]. This noise can be exploited to construct a low-cost and highspeed TRNG [28, 42]. Another approach samples thermal noise captured by the high speed analog-to-digital converters found in software defined radios (SDR). The antenna input port of the SDR needs to be connected to a 75 Ω resistor and the whole SDR must be shielded from any external electromagnetic influence. What remains is thermal noise from the resistor and the amplification stages [74]. Other solutions require some amount of analog circuitry to amplify noise generated e.g. in a zener diode due to electron quantum tunneling or a reverse-biased transistor PN junction [9, 54, 62], see Fig. 5. It is important to constantly monitor the quality of such low-cost TRNGs, because their performance can degrade due to aging of electronic components. Especially the reversebiased transistor approach is susceptible to aging [68].

784

A. F. Krause and K. Essig

5 Discussion After reviewing the current state of network security and the resulting impact on protecting sensitive private data, we have discussed a concept that combines data diodes, strong cryptography and true random number generators. The majority of commercially available data diodes are optimized for speed and data throughput. They are typically very expensive and often bulky. Therefore, we reviewed and described low-cost implementations of such data diodes and true random number generators. The concept can be implemented at three different feature levels: – Level 1: Securing the sending endpoint against manipulation and exfiltration of private data from outside attacks using a data diode. – Level 2: Securing the sending endpoint using a data diode, encryption of data and forwarding the encrypted data over an unprotected network to the receiving endpoint. Data leakage from the receiving endpoint is avoided using a second data diode. – Level 3: Bidirectional communication using strictly separated receiving and sending devices on both endpoints. For example, level 1 can be useful to protect the microphone of voice-controlled devices by isolating the offline speech-recognition unit from the rest of the system. Now, only the transcribed text will pass through the data diode, but never the raw audio stream. In the near future, most consumer AR- and VR-headsets will facilitate eye tracking. As mentioned in the introduction, a compromised optical eye tracker could leak iris-images of the user. Here, a data diode could isolate the components of the eye tracker that deal with pupil tracking (eye cameras + embedded system). Now, only relevant features (e.g. pupil position, diameter) can pass to the application logic. A level 3 system for secure message exchange, called “tinfoil chat”, is documented online [41] but not yet published in paper form. Tinfoil chat does not address the issue of true random number generation and the potential complexity issues of their data diode design that is based on [46]. The concept has some clear trade-offs between provable security, usability, convenience and performance: 1. The strict unidirectional communication imposed by data diodes breaks all bidirectional protocols. The design of software using data diodes needs to be adapted accordingly, for example eye tracker calibration procedures. 2. Firmware updates of components isolated by a data diode cannot be performed by the end-user anymore. 3. OTP based encryption or quantum-safe ciphers using pre-shared keys require a direct exchange of a large amount of key material, e.g. during a face-to-face meeting (“sneaker-net”). 4. Depending on the implementation, low-cost data diodes might be slow compared to Ethernet and can introduce some additional latency due to buffering in e.g. USB-toserial chips. We hope that low-cost data diodes and strong encryption will become standard tools to secure everyday commodity sensors and devices.

Privacy with Low-Cost Data Diodes

785

References 1. Adrian, D., et al.: Imperfect forward secrecy: how Diffie-Hellman fails in practice. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 5–17 (2015) 2. Albert, R., Patney, A., Luebke, D., Kim, J.: Latency requirements for foveated rendering in virtual reality. ACM Trans. Appl. Perception (TAP) 14(4), 25 (2017) 3. Anderson, R.J., Petitcolas, F.A.P.: On the limits of steganography. IEEE J. Sel. Areas Commun. 16(4), 474–481 (1998) 4. Armknecht, F., Gagliardoni, T., Katzenbeisser, S., Peter, A.: General impossibility of group homomorphic encryption in the quantum world. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 556–573. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-64254631-0 32 5. Becker, G.T., Regazzoni, F., Paar, C., Burleson, W.P.: Stealthy dopant-level hardware trojans. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 197–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40349-1 12 6. Benalcazar, D., Perez, C., Bastias, D., Bowyer, K.: Iris recognition: comparing visible-light lateral and frontal illumination to NIR frontal illumination. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 867–876. IEEE (2019) 7. Bernstein, D.J., Lange, T.: Post-quantum cryptography-dealing with the fallout of physics success. IACR Cryptology ePrint Archive, 2017:314 (2017) 8. Biham, E., Neumann, L.: Breaking the Bluetooth pairing – the fixed coordinate invalid curve attack. In: Paterson, K.G., Stebila, D. (eds.) SAC 2019. LNCS, vol. 11959, pp. 250–273. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38471-5 11 9. Campbell, P., Cheetham, J.: OneRNG: An open hardware random number generator (2014). https://onerng.info. Accessed 19 Jan 2020 10. Chang, F., Onohara, K., Mizuochi, T.: Forward error correction for 100 g transport networks. IEEE Commun. Mag. 48(3), S48–S55 (2010) 11. Cohen, D., Schl¨apfer, U., Ahlfors, S., H¨am¨al¨ainen, M., Halgren, E.: New six-layer magnetically-shielded room for MEG. In: Proceedings of the 13th International Conference on Biomagnetism, pp. 919–921. VDE Verlag, Jena, Germany (2002) 12. Microsoft Corporation. Hololens technical specifications. https://www.microsoft.com/en-us/ hololens/hardware. Accessed 15 Jan 2020 13. Crawford, T.J., Higham, S., Mayes, J., Dale, M., Shaunak, S., Lekwuwa, G.: The role of working memory and attentional disengagement on inhibitory control: effects of aging and Alzheimer’s disease. Age 35(5), 1637–1650 (2013) 14. Daemen, J., Rijmen, V.: Aes proposal: Rijndael (1999) 15. Daemen, J., Rijmen, V.: The Design of Rijndael: AES-The Advanced Encryption Standard. Springer, Heidelberg (2013) 16. Daugman, J.: The importance of being random: statistical principles of iris recognition. Pattern Recogn. 36(2), 279–291 (2003) 17. Daugman, J.: Probing the uniqueness and randomness of iriscodes: results from 200 billion iris pair comparisons. Proc. IEEE 94(11), 1927–1935 (2006) 18. Dietz, P., Yerazunis, W., Leigh, D.: Very low-cost sensing and communication using bidirectional LEDs. In: Dey, A.K., Schmidt, A., McCarthy, J.F. (eds.) UbiComp 2003. LNCS, vol. 2864, pp. 175–191. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-396536 14 19. Dodis, Y., Spencer, J.: On the (non) universality of the one-time pad. In: The 43rd Annual IEEE Symposium on Foundations of Computer Science 2002, Proceedings, pp. 376–385. IEEE (2002)

786

A. F. Krause and K. Essig

20. Durumeric, Z., et al.: The matter of heartbleed. In: Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 475–488. ACM (2014) 21. Beware the eye spies. Sci. Am. 310 (2014) 22. Fluhrer, S., Mantin, I., Shamir, A.: Weaknesses in the key scheduling algorithm of RC4. In: Vaudenay, S., Youssef, A.M. (eds.) SAC 2001. LNCS, vol. 2259, pp. 1–24. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45537-X 1 23. Greenwald, G., MacAskill, E., Poitras, L., Ackerman, S., Rushe, D.: Microsoft handed the NSA access to encrypted messages. The Guardian, 12 (2013) 24. Guri, M.: AIR-FI: generating covert wi-fi signals from air-gapped computers. arXiv preprint arXiv:2012.06884 (2020) 25. Guri, M., Elovici, Y.: Bridgeware: the air-gap malware. Commun. ACM 61(4), 74–82 (2018) 26. Guri, M., Zadov, B., Elovici, Y.: ODINI: escaping sensitive data from faraday-caged, airgapped computers via magnetic fields. IEEE Trans. Inf. Forensics Secur. 15, 1190–1203 (2019) 27. Honggang, L.: Research on packet loss issues in unidirectional transmission. J. Comput. 8(10), 2664–2671 (2013) 28. Hughes, J.P., Gupta, Y.: “The collector”: A gigabit true random number generator using image sensor noise (2019) 29. Kadhim, I.J., Premaratne, P., Vial, P.J., Halloran, B.: Comprehensive survey of image steganography: techniques, evaluations, and trends in future research. Neurocomputing 335, 299–326 (2019) 30. Keizer, G.: NSA helped with windows 7 development. Computerworld, November 2009 31. Krause, A.F., Essig, K.: Libretracker: a free and open-source eyetracking software for headmounted eyetrackers. In: 20th European Conference on Eye Movements, (ECEM 2019), p. 391 (2019) 32. Landau, S.: Highlights from making sense of Snowden, part II: what’s significant in the NSA revelations. IEEE Secur. Privacy 12(1), 62–64 (2014) 33. Larabel, M.: Some AMD CPUs might lose RdRand randomness following suspend/resume (2019). https://www.phoronix.com/scan.php?page=news item&px=AMD-CPUs-RdRandSuspend. Accessed 19 Jan 2020 34. Liebling, D.J., Preibusch, S.: Privacy considerations for a pervasive eye tracking world. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 1169–1177 (2014) 35. Mehrotra, H., Vatsa, M., Singh, R., Majhi, B.: Does iris change over time? PLoS ONE 8(11), e78333 (2013) 36. Mitchell, J.C., He, C.: Security analysis and improvements for IEEE 802.11 i. In: The 12th Annual Network and Distributed System Security Symposium (NDSS 2005) Stanford University, Stanford, pp. 90–110. Citeseer (2005) 37. Miyazaki, E., Itami, S., Araki, T.: Using a light-emitting diode as a high-speed, wavelength selective photodetector. Rev. Sci. Instrum. 69(11), 3751–3754 (1998) 38. Moschner, C., Baloh, R.W.: Age-related changes in visual tracking. J. Gerontol. 49(5), M235–M238 (1994) 39. Naatz, L.C.: Literature review of optocouplers, their polymer components and current applications (2020) 40. Okhravi, H., Sheldon, F.T.: Data diodes in support of trustworthy cyber infrastructure. In: Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, p. 23. ACM (2010) 41. Ottela, M.: Tinfoil chat: Instant messaging with endpoint security (2017). https://github.com/ maqp/tfc/wiki. Accessed 29 Jan 2020 42. Park, B.K., et al.: Practical true random number generator using CMOS image sensor dark noise. IEEE Access 7, 91407–91413 (2019)

Privacy with Low-Cost Data Diodes

787

43. Patney, A., et al.: Towards foveated rendering for gaze-tracked virtual reality. ACM Trans. Graph. (TOG) 35(6), 179 (2016) 44. Pierce, K., et al.: Eye tracking reveals abnormal visual preference for geometric images as an early biomarker of an autism spectrum disorder subtype associated with increased symptom severity. Biol. Psychiatry 79(8), 657–666 (2016) 45. Piumsomboon, T., Lee, G., Lindeman, R.W., Billinghurst, M.: Exploring natural eye-gazebased interaction for immersive virtual reality. In: 2017 IEEE Symposium on 3D User Interfaces (3DUI), pp. 36–39. IEEE (2017) 46. Sanches, P.: (pseudonym). Low cost data diode (2016). https://imgur.com/a/5Cv19. Accessed 18 Jan 2020 47. Raub, D., Steinwandt, R., M¨uller-Quade, J.: On the security and composability of the one time pad. In: Vojt´asˇ, P., Bielikov´a, M., Charron-Bost, B., S´ykora, O. (eds.) SOFSEM 2005. LNCS, vol. 3381, pp. 288–297. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3540-30577-4 32 48. Renner, P., Pfeiffer, T.: Attention guiding techniques using peripheral vision and eye tracking for feedback in augmented-reality-based assistance systems. In: 2017 IEEE Symposium on 3D User Interfaces (3DUI), pp. 186–194. IEEE (2017) 49. Rieger, G., Cash, B.M., Merrill, S.M., Jones-Rounds, J., Dharmavaram, S.M., SavinWilliams, R.C.: Sexual arousal: the correspondence of eyes and genitals. Biol. Psychol. 104, 56–64 (2015) 50. Rieger, G., Savin-Williams, R.C.: The eyes have it: sex and sexual orientation differences in pupil dilation patterns. PloS One 7(8), e40256 (2012) 51. Rupp, H.A., Wallen, K.: Sex differences in viewing sexual stimuli: an eye-tracking study in men and women. Hormones Behavior 51(4), 524–533 (2007) 52. Salter, J.: How a months-old AMD microcode bug destroyed my weekend: AMD shipped Ryzen 3000 with a serious microcode bug in its random number generator (2019). https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcodebug-destroyed-my-weekend/. Accessed 19 Jan 2020 53. Schneier, B.: Cryptographic design vulnerabilities. Computer 9, 29–33 (1998) 54. Seward, R.: Make your own true random number generator 2 (2014). http://robseward.com/ misc/RNG2/. Accessed 19 Jan 2020 55. Shannon, C.E.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28(4), 656–715 (1949) 56. Shin, Y., Meneely, A., Williams, L., Osborne, J.A.: Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37(6), 772–787 (2010) 57. Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134. IEEE (1994) 58. Shumow, D., Ferguson, N.: On the possibility of a back door in the NIST SP800-90 dual Ec Prng. In: Proceedings of the Crypto, vol. 7 (2007) 59. Spooner, J.W., Sakala, S.M., Baloh, R.W.: Effect of aging on eye tracking. Arch. Neurol. 37(9), 575–576 (1980) 60. Sravani, M.M., Ananiah Durai, S.: Side-channel attacks on cryptographic devices and their countermeasures—a review. In: Tiwari, S., Trivedi, M.C., Mishra, K.K., Misra, A.K., Kumar, K.K. (eds.) Smart Innovations in Communication and Computational Sciences. AISC, vol. 851, pp. 209–226. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-24147 21 61. Stevens, M.W.: An implementation of an optical data diode. Technical report, DSTO Electronics and Surveillance Research Laboratory, Salisbury, South Australia (1999)

788

A. F. Krause and K. Essig

62. Stipˇcevi´c, M.: Fast nondeterministic random bit generator based on weakly correlated physical events. Rev. Sci. Instrum. 75(11), 4442–4449 (2004) 63. Stipˇcevi´c, M., Koc¸, C¸.K.: True random number generators. In: Koc¸, C¸.K. (ed.) Open Problems in Mathematics and Computational Science, pp. 275–315. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10683-0 12 64. Sunar, B., Martin, W.J., Stinson, D.R.: A provably secure true random number generator with built-in tolerance to active attacks. IEEE Trans. Comput. 56(1), 109–119 (2006) 65. Tews, E., Beck, M.: Practical attacks against WEP and WPA. In: Proceedings of the Second ACM Conference on Wireless Network Security, pp. 79–86. ACM (2009) 66. Theuwissen, A.J.P.: CMOS image sensors: state-of-the-art. Solid-State Electron. 52(9), 1401–1406 (2008) 67. Tian, H., Fowler, B., Gamal, A.E.: Analysis of temporal noise in CMOS photodiode active pixel sensor. IEEE J. Solid-State Circuits 36(1), 92–101 (2001) 68. Toufik, N., P´elanchon, F., Mialhe, P.: Degradation of junction parameters of an electrically stressed npn bipolar transistor. Act. Passive Electron. Compon. 24(3), 155–163 (2001) 69. Vanhoef, M., Piessens, F.: Key reinstallation attacks: forcing nonce reuse in WPA2. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1313–1328. ACM (2017) 70. Vanhoef, M., Piessens, F.: Release the kraken: new KRACKs in the 802.11 standard. In: Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS). ACM (2018) 71. Vanhoef, M., Ronen, E.: Dragonblood: analyzing the dragonfly handshake of WPA3 and EAP-PWD. In: Proceedings of the 2020 IEEE Symposium on Security and Privacy-S&P 2020. IEEE (2020) 72. Vernam, G.: Secret signaling systems. US Patent 1919 73. von Neumann, J.: Various techniques used in connection with random digits. In: Householder, A.S., Forsythe, G.E., Germond, H.H. (eds.) Monte Carlo Method, volume 12 of National Bureau of Standards Applied Mathematics Series, chapter 13, pp. 36–38. US Government Printing Office, Washington, DC (1951) 74. Warren, P.: An entropy generator using SDR peripherals, including RTL-SDR and BladeRF (2014). https://github.com/pwarren/rtl-entropy. Accessed 19 Jan 2020 75. Wildes, R.P.: Iris recognition: an emerging biometric technology. Proc. IEEE 85(9), 1348– 1363 (1997) 76. Wu, T.D., et al.: The secure remote password protocol. In: NDSS, vol. 98, pp. 97–111. Citeseer (1998) 77. Zaba, S.: The NSAKEY in microsoft’s crypto API: facts, fiction and speculation. Inf. Secur. Tech. Rep. 4(4), 40–46 (1999)

A Security System for National Network Woo Young Park, Sang Hyun Kim, Duy-Son Vu, Chang Han Song, Hee Soo Jung, and Hyeon Jo(B) RealSecu, Busan 48059, Republic of Korea [email protected]

Abstract. Recently, cybercrime attacking the national network has increased. National infrastructure such as water purification plants, power plants, and substations are operated by using programmable logic controllers (PLC). PLC controls through the Industrial Control Systems/Supervisory Control And Data Acquisition (ICS/SCADA), which receives operational commands and sends operational states by communication means such as Ethernet and Modbus. However, the environment of ICS/SCADA, called Industrial IoT, is vulnerable to security attacks unlike recently developed technologies such as IoT devices, systems, and networks. Therefore, this study proposes a new security system to strengthen industrial firewalls. We developed an interface, unauthorized access blocking algorithm, and real-time defense system for system access control. The performance of the proposed system was verified by external organizations. Five performance indicators were measured to validate the proposed system. All indexes were achieved to 100%. We hope that this study and results will help block and defend against cyberattacks. Keywords: Security system · Programmable logic control · Distributed network protocol 3 · Industrial control systems · Industrial firewall

1 Introduction In recent years, large-scale cybercrimes attacking national networks have increased. The possibility of damage caused by security attacks is becoming a reality as PC ransomware infection and the damage have occurred in a national network. In particular, hackers have targeted the programmable logic controller (PLC) in the national network and industrial network [1, 2]. PLC is an industrial computer that has been studied and adapted for the control of manufacturing processes including massive national work. PLCs can be applied to large rack-mounted modular devices with thousands of I/O. PLC is currently playing a pivotal role in national networks and industrial infrastructure. Cyberattacks on large national networks have been reported to be taking place in China and Russia [3, 4]. As an example of the damage caused by these attacks, the ‘cyberattack using Stuxnet’ on Iranian nuclear facilities is typical [5]. The attack caused Iran’s nuclear facilities to experience a dangerous situation in which centrifuges malfunction. In 2015, a sophisticated cyberattack targeted Ukrainian’s power grid causing wide-area power outages [6]. North Korea is conducting various hacking attempts and exercises © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 789–803, 2022. https://doi.org/10.1007/978-3-031-10467-1_48

790

W. Y. Park et al.

against several countries [7]. If the operation of the facility is paralyzed through security attacks, national damage can be faced. These cyber security threats can hurt the physical infrastructure, economy, and society [8]. Thus, it is necessary to develop security technologies that can respond to these attacks. The national infrastructure network generally uses PLC to manage and control each facility and equipment. The PLC communicates with the server through the means such as Ethernet and Modbus. An operation command and an operation state are transmitted through a communication process. The national network controls and communicates facilities through the Industrial Control Systems/Supervisory Control And Data Acquisition (ICS/SCADA) network. In the age of the Internet of Things, SCADA has been advanced into large, complicated, and distributed systems that are prone to be conventional in addition to new threats [9]. A SCADA firewall usually needs to examine deeper into the payload to understand the structure of the industrial network [10]. Numerous firewalls have been extending their security capabilities to support SCADA systems [10]. However, ICS/SCADA is more vulnerable to security attacks than other IoT communication devices. Therefore, a security system specialized for ICS/SCADA is required. With ICS/SCADA, the distributed network protocol 3 (DNP3) has been predominately used by the national network such as electric utility, power plant, and smart grids [11]. DNP3 is a set of communications protocols used between components in process automation systems. It was created for communications between several types of data acquisition and control equipment. It plays a crucial role in SCADA systems, where it is used by SCADA master stations, remote terminal units, and intelligent electronic devices. In this context, this study proposes a security system by identifying the characteristics of DNP3. Cyber security of the national network is a major purpose of current research. Therefore, this study proposes a security system based on the characteristics of PLC, ICS/SCADA, and DLP3 constituting the national network. The development area includes interface for system access control, unauthorized access blocking functions, and real-time defense functions. To verify the performance of the proposed security system, verification was requested by an external specialized institution. As performance verification indicators, five items were presented: unauthorized IP blocking rate, unauthorized MAC address blocking rate, blocking rate by security policy, abnormal traffic detection, and real-time monitoring of fraudulent access. The structure of this paper is as follows. Section 2 presents previous studies related to national network security. Section 3 describes the three security systems proposed in this study. Section 4 presents the environment and results of performance evaluation of the proposed security system. A total of five indicator results are listed. Finally, Sect. 5 covers the results of this study. The main contents of this study, academic contribution, practical implications, and future research directions are included.

A Security System for National Network

791

2 Research Background With the development of industrial infrastructure and national networks, several studies on security have been conducted. Many scholars have performed various researches about PLC, ICS/SCADA, and DNP3 in analyzing cyberattack patterns. In [12], authors stated that modern SCADA systems are salient for scanning and controlling electric power generation, transmission and distribution. They stressed that special attention should be paid to the implementation of new strategies that can detect, prevent, and mitigate data exfiltration attacks. We implemented a real-time monitoring function to solve data exfiltration attacks. Mrabet et al. (2018) shed light on the smart grid and pointed out numerous security threats [13]. They described the security requirements, offered the characteristics of several severe cyberattacks. Moreover, they suggested a cybersecurity strategy to detect and counter these attacks. This study has limitations in that it did not implement the system and only presented a strategy. Sun et al. (2018) conducted a state-of-the-art survey of the most relevant cyber security studies in power systems [14]. They reviewed research that validates cyber security risks and builds solutions to improve the security of a power grid. They also proposed defense systems to protect a power grid against cyber intruders. The results of this study have the potential to be extended to general national networks. In [15], authors presented intrusion detection and prevention system (IDPS) for the DNP3 SCADA systems. The proposed IDPS is called DIDEROT (Dnp3 Intrusion DetEction pReventiOn sysTem). It relies on both supervised machine learning (ML) and unsupervised/outlier ML detection models. DIDEROT identifies whether a DNP3 network flow is related to a specific DNP3 cyberattack. If the corresponding network flow is detected as normal, then the unsupervised/outlier ML anomaly detection model is activated. The performance of DIDEROT is tested using real data originating from a substation environment. Rosborough et al. (2019) pointed out ICS security problems that require cryptographic solutions [16]. They explored the tradeoffs and requirements for the selection of the best approach. They also elaborated the security protocols used in ICS environments. This study laid the foundation for the development of a national security system in that it summarized the protocols used for ICS. In [17], authors compared the use of ML techniques to classify messages of the same protocol exchanged in encrypted tunnels. They examined four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms including decision tree, nearest-neighbor, support vector machine, and Naive Bayes. The results showed that it was possible to extend a Peekaboo attack over multiple substations and to collect significant information from a system that communicates using encrypted DNP3 traffic. Lu and Ou (2021) suggested CPN-Tool to carry out a formal study of DNP3 security, integrity, and authentication [18]. They developed an improved Delov-Yao attacker model to decrease the size of the state space. They evaluated the security of the protocol in the full attack state and gave the vulnerability exploitation path according to the evaluation results. They argued that the suggested CPN model provided a security assessment method for DNP3 for achieving secrecy, integrity, and authentication.

792

W. Y. Park et al.

Marian et al. (2020) described asymmetric cryptography and digital signatures for enhancing the security of SCADA architectures [19]. They demonstrated the possibility to include digital signatures with a reliable data communication protocol such as DNP3. They also designed a multitenant cloud-based architecture for a SCADA environment. This study is different in that it proposes a cloud-based architecture. In [10], authors commented that security features in traditional SCADA firewalls have shortcomings in two main aspects. First, a traditional deep packet inspection only partially inspects the content of the payload. Second, existing SCADA firewalls have the poor capability for protecting industrial protocols. With these comments, they proposed a new SCADA firewall model called SCADAWall. The model was powered by comprehensive packet inspection technology. SCADAWall contained a new proprietary industrial protocols extension algorithm (PIPEA) to extend capabilities to proprietary industrial protocol protection. They compared our security features with two commercial SCADA firewalls. The results described that SCADAWall could effectively mitigate those drawbacks without sacrificing low latency requirements. In summary, some previous studies have developed a security system to counter cyberattacks targeting national networks. Specifically, studies have been conducted on security specific to PLC, security required for ICS/SCADA, and security on DNP3. This study intends to propose data detection/blocking technology for cyberattack targeting PLC based on previous studies.

3 Security System This study proposes a security system to defend against cyberattacks targeting PLCs in national infrastructures such as power plants and substations. The scope of development consists of an interface for system access control, alert generation and unauthorized access prevention module, and a real-time defense function for the control system. 3.1 Internal/External Interface for System Access Control We developed interfaces to define the types of fraudulent access and detect them. We designed functions to detect and block data breach incidents and precursor symptoms to protect access from unauthorized systems. The proposed security system can also configure to check denial of service attacks and infringement of scanning attacks. Interfaces comprise Iga, Igb, Igc, and Igd. • Iga interface: performs the function of transferring the control data received from the control data collection/transmission block to the control protocol content analysis block. • Igb interface: transfers control data information and preprocessed information from control protocol content analysis block to control process monitoring and command validation block.

A Security System for National Network

793

• Igc interface: conducts a function to transfer control information such as forwarding/blocking of the corresponding control data to the control data collection/transmission block according to the execution results of the control protocol content analysis block. • Igd interface: transfers the control information including forwarding/blocking of the corresponding control data to data collection/transmission block according to the result of the control process monitoring and command validation block. After verifying the command by analyzing packet collection, header/payload analysis, generation, and verification data, we developed a function to maintain and extend system access rights by specifying the whitelist set by the user as a target. Figure 1 displays the program code for the system access control function.

Fig. 1. Program for system access control function.

Table 1 shows the screen of the illegal access detection function. We detected several attacks including XMAS Scan, NULL Scan, FIN Scan, and ICMP.

794

W. Y. Park et al. Table 1. Illegal access detection function implementation screen

N

Detect ed Attack

1

XMAS Scan

2

NULL Scan

3

FIN Scan

4

ICMP

Detection Screen

Table 2 illustrates the denial of the service attack detection implementation. We composed the interface to detect the cyberattacks such as SYN Flooding and UDP Flooding. Table 2. Denial of service attack detection implementation screen

N

Detect ed attack

1

SYN Floodi ng

2

UDP Floodi ng

Detection Screen

A Security System for National Network

795

Table 3 describes data breach detection function implementation. We detected the attacks including illegal access to the server and SSH Brute Force. Table 3. Data breach detection function implementation screen N

Detected attack

1

Illegal Access to Server (secure log)

2

SSH Brute Force

Detection Screen

3.2 Alert Generation and Unauthorized Access Prevention Module The control setting function is configured to enable or disable the security function according to the user’s needs. We programmed code to implement this setting function and checked whether the code worked as designed. As a result, it was confirmed that it worked as intended. The warning generation and unauthorized access prevention module were developed to have functions related to the hierarchical control module, control process monitoring, and control application instrumentation access control interface. To prevent errors from colliding with each other according to the operation characteristics of the control systems, the operation priority was set internally to function accordingly. By developing the control module as a class, it is designed to activate or deactivate each class so that various functions necessary to prevent warnings and illegal access can be operated according to the needs.

796

W. Y. Park et al.

Figure 2 shows the screen using the function developed for protocol setting. Through the setting window, the administrator can set the ICS protocol name, transmission protocol, and port. Control protocols include Modbus/TCP, Modbus/UDP, etc.

Fig. 2. Screen using a developed function for protocol setting

We developed the control module as a class and designed the security program in such a way that each class is activated or deactivated (see Fig. 3). Various functions necessary for warning generation and prevention of unauthorized access were coded to operate as needed.

Fig. 3. Program code of control function composed of classes

We conducted and monitored our experiments to detect and generate alarms for illegal access including unauthorized access. The results are shown in Fig. 4.

A Security System for National Network

797

Fig. 4. Configuration of the screen to check the result of monitoring illegal access

3.3 Real-Time Defense Function for Control System PLCs for power, water supply, and dam management can cause great damage if they malfunction or stop operating even for a moment. Therefore, this study developed and applied the real-time monitoring and reporting function required for security threats in critical locations. This function was implemented to operate in conjunction with the security threat illegal access detection function described above. We programmed real-time monitoring and automatic report output of detection history for cyberattacks (see Fig. 5). The security system accesses the connected DB and queries related information. In this process, the user, monitoring period, and domain name are searched. In addition, a visible information screen was constructed for high readability. A security threat real-time monitoring report was produced by properly arranging graphs and keywords.

Fig. 5. Functional program for automatic security threat reporting

798

W. Y. Park et al.

The screen where the developed technology is operated is shown in Fig. 6. The display consists of alert by time, alert source, alert type, destination IP, and destination port. Security administrators can detect and observe various cyberattacks in real-time through this screen.

Fig. 6. Operation screen of automatic report function for security threats

4 Performance Evaluation To measure and verify the performance of the proposed security system, this study commissioned an evaluation to an external professional organization. The equipment used for the performance test was two Envy 13 D039TU (Laptop, HP) and one Probook 430 G1 (Laptop, HP). A total of five indicators were set to assess the performance. Detailed evaluation environment and method for each indicator are described below. 4.1 Unauthorized IP Blocking Rate Testing Procedure 1. It is automatically repeated 2000 times through S/W for performance verification, and the result is recorded automatically. 2. Unauthorized IP Address is automatically created with a random value. 3. Connect to the system that controls the logic controller through the created IP address. 4. In the logic controller control PC, block the connection and record it after recording the connected IP address. 5. Check through dedicated SW whether the IP address recorded in the PC performing the unauthorized connection function matches the IP address recorded in the logic controller control PC.

A Security System for National Network

799

6. Repeat the above procedure 5 times.

Results Figure 7 shows a screen of test results for blocking unauthorized access. As a result of performing a total of 5 access attempts through 2000 unauthorized IP addresses, the proposed security system succeeded in blocking all of them.

Fig. 7. A screen of test results for blocking unauthorized access

4.2 Unauthorized MAC Blocking Rate Testing Procedure 1. It is automatically repeated 2000 times through SW for performance verification, and the result is recorded automatically. 2. Unauthorized Mac Addresses are automatically created with random values. 3. Connect to the system that controls the logic controller through the created Mac Address. 4. After recording the connected Mac Address in the logic controller control PC, block the connection and record it. 5. Check through dedicated SW whether the Mac Address recorded in the PC performing the unauthorized access function matches the Mac Address recorded in the logic controller control PC. 6. Repeat the above procedure 5 times.

Results As a result of performing a total of 5 access attempts through 2000 unauthorized MAC Addresses, the proposed security system succeeded in blocking all of them. 4.3 Blocking Rate Based on Security Policy Testing Procedure 1. It is automatically repeated 2000 times through S/W for performance verification, and the result is recorded automatically.

800

W. Y. Park et al.

2. Create 10 approaches to the security policy and define numbers for each approach. 3. Access to a system that controls the PLC by arbitrarily selecting one of the 10 security policy violation approaches. 4. Logic controller PC records the violation case number of the PC when accessing, and then blocks access and records it. 5. Verify through dedicated SW that both the policy violation number recorded on the PC that performed the unauthorized access function and the policy violation number recorded on the logic controller control PC match. 6. Repeat the above procedure five times.

Results A total of 2000 security policy violation access attempts were made five times, and all proposed security systems succeeded in blocking. 4.4 Abnormal Traffic Detection Testing Procedure 1. It is automatically repeated 2000 times through S/W for performance verification, and the result is recorded automatically. 2. It generates 10 abnormal traffic that violates the security policy and defines a number for each approach. 3. Access to a system that controls a logic controller by arbitrarily selecting one of the 10 abnormal traffic. 4. Logic controller control PC records when an attempt to access abnormal traffic has occurred and whether it has been blocked. 5. Repeat the above procedure five times.

Results The system was accessed by generating 2000 abnormal traffic. As a result of conducting a total of five experiments, the proposed security system detected all abnormal traffic. 4.5 Real-Time Monitoring of Illegal Access Testing Procedure 1. It is automatically repeated 2000 times through S/W for performance verification, and the result is recorded automatically. 2. Create five normal approaches and five fraudulent approaches, respectively, and then define numbers for them. 3. Randomly select one of the 10 approaches and connect it to a system that controls the logic controller.

A Security System for National Network

801

4. Record whether the case of a normal approach and the case of an abnormal approach are accurately distinguished. 5. Repeat the above procedure five times.

Results A total of 2000 security policy violation access attempts were made five times, and all proposed security systems succeeded in blocking. The proposed security system correctly detected 2000 normal and abnormal accesses over a total of five times. Table 4 summarized the performance measures and each value. Table 4. Performance results No

Measurement

Value

1

Unauthorized IP blocking rate

100%

2

Unauthorized MAC blocking rate

100%

3

Blocking rate based on security policy

100%

4

Abnormal traffic detection

100%

5

Real-time monitoring of illegal access

100%

5 Conclusion This study proposes a security system to defend against cyberattacks targeting PLCs used in national networks. To propose a new system, the current status and patterns of recent large-scale cyberattacks were reviewed. In addition, many existing pieces of literature have been summarized. Several previous studies have proposed security systems focusing on essential devices and communication systems included in national infrastructure or networks such as PLC, ICS/SCADA, and DNP3. This study proposed a system required for PLC defense based on recent cyberattack pattern analysis and previous studies. The security consists of an interface for system access control, an alarm generation and unauthorized access prevention module, and a real-time defense system for the control system. Computer programming was performed and a cyber security system was constructed through repeated experiments. To verify the performance of the proposed algorithm, an evaluation was conducted by an external professional organization. As for performance evaluation indicators, unauthorized IP blocking rate, unauthorized MAC address blocking rate, blocking by security policy, abnormal traffic detection, and illegal access real-time monitoring were measured. All indicators showed the expected performance. This study contributes to the literature in that it proposes a security system that can prepare for the fatal risk of large-scale cyberattacks targeting national infrastructure. The characteristics of ICS/SCADA and DNP3 were mainly studied to construct a security system specialized for PLC.

802

W. Y. Park et al.

This study and results provide practical implications in that it can prevent a crisis in which the PLC of the national backbone network stops operating due to hacking and malfunctions. This study has limitations in that the performance of the proposed system was verified only by computer experiments. Thus, future work is necessary to verify the practical performance of the proposed system by applying it to the actual national backbone network. Acknowledgments. This work is the result of the technology development of the SME technology development support project implemented by the Ministry of SMEs and Startups, Republic of Korea.

References 1. Wu, H., Geng, Y., Liu, K., Liu, W.: Research on programmable logic controller security. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing (2019) 2. Yılmaz, E.N., Gönen, S.: Attack detection/prevention system against cyber attack in industrial control systems. Comput. Secur. 77, 94–105 (2018) 3. CAP, R.L., CSSL, C.: Analytic of China cyberattack. Int. J. Multimedia Appl. 4(3), 37 (2012) 4. Ramana, M., Kurando, M.: Cyberattacks on Russia—the nation with the most nuclear weapons—pose a global threat. Bull. Atomic Scient. 75(1), 44–50 (2019) 5. Noguchi, M., Ueda, H.: An analysis of the actual status of recent cyberattacks on critical infrastructures. NEC Tech. J. Spec. Issue Cybersecur. 12(2), 19–24 (2019) 6. Kshetri, N., Voas, J.: Hacking power grids: a current problem. Computer 50(12), 91–95 (2017) 7. Hwang, J., Choi, K.-S.: North Korean cyber attacks and policy responses: an interdisciplinary theoretical framework. Int. J. Cybersecur. Intell. Cybercrime 4(2), 4–24 (2021) 8. Chun, H., Lee, H., Kim, D.: The integrated model of smartphone adoption: hedonic and utilitarian value perceptions of smartphones among Korean college students. Cyberpsychol. Behav. Soc. Netw. 15(9), 473–479 (2012) 9. Sadeeq, M.A., Zeebaree, S.: Energy management for internet of things via distributed systems. J. Appl. Sci. Technol. Trends 2(2), 59–71 (2021) 10. Li, D., Guo, H., Zhou, J., Zhou, L., Wong, J.W.: SCADAWall: a CPI-enabled firewall model for SCADA security. Comput. Secur. 80, 134–154 (2019) 11. Sundararajan, A., Chavan, A., Saleem, D., Sarwat, A.I.: A survey of protocol-level challenges and solutions for distributed energy resource cyber-physical security. Energies 11(9), 2360 (2018) 12. Maglaras, L.A., Jiang, J., Cruz, T.J.: Combining ensemble methods and social network metrics for improving accuracy of OCSVM on intrusion detection in SCADA systems. J. Inf. Secur. Appl. 30, 15–26 (2016) 13. El Mrabet, Z., Kaabouch, N., El Ghazi, H., El Ghazi, H.: Cyber-security in smart grid: survey and challenges. Comput. Electr. Eng. 67, 469–482 (2018) 14. Sun, C.-C., Hahn, A., Liu, C.-C.: Cyber security of a power grid: state-of-the-art. Int. J. Electr. Power Energy Syst. 99, 45–56 (2018) 15. Radoglou-Grammatikis, P., Sarigiannidis, P., Efstathopoulos, G., Karypidis, P.-A., Sarigiannidis, A.: Diderot: an intrusion detection and prevention system for dnp3-based scada systems. In: Proceedings of the 15th International Conference on Availability, Reliability and Security (2020)

A Security System for National Network

803

16. Rosborough, C., Gordon, C., Waldron, B.: All about eve: comparing dnp3 secure authentication with standard security technologies for scada communications. In: 13th Australasian Information Security Conference (2019) 17. de Toledo, T.R., Torrisi, N.M.: Encrypted DNP3 traffic classification using supervised machine learning algorithms. Mach. Learn. Knowl. Extr. 1(1), 384–399 (2019) 18. Lu, Y., Ou, W.-B.: Exploitation of the distributed network protocol in ICS with improved DY model based on petri net. Int. J. Netw. Secur. 23(5), 58–768 (2021) 19. Marian, M., Cusman, A., Stîng˘a, F., Ionic˘a, D., Popescu, D.: Experimenting with digital signatures over a dnp3 protocol in a multitenant cloud-based scada architecture. IEEE Access 8, 156484–156503 (2020)

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy for Connected Health Services Renuka Mohanraj(B) Maharishi International University, Fairfield, IA 52557, USA [email protected]

Abstract. The security and privacy of the users have become considerable apprehensions owing to the association of the Internet of Things (IoT) devices in diverse applications. Cyber threats are being procured at an eruptive speed, making the overall security and privacy proceedings insufficient. Therefore, everybody on the Internet is a by-product of hackers. Consequently, Machine Learning (ML) algorithms generate precise outputs from wide-reaching complicated databases. The outcomes being procured can predict and ascertain susceptibilities in IoT-based systems. A Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Learning (SDOP-LRPL) method for connected health services is proposed in this work. The SDOP-LRPL method is designed into three sections. First, with the Health Analytics dataset provided as input, relevant features for connected health services are obtained by utilizing the Distinctive Nearest Neighbor Confidence Feature Selection model. Second, a secured authentication model is designed using a Digital Oblivious Pseudorandom Signature with the appropriate feature. Finally, further communication processing between users (i.e., patients and doctors) is performed via the Linear Regression Privacy Preservation Communication model with the authenticated users. This method acquires essential sensitive medical healthcare data via IoT devices. Data analysis is performed via machine learning methods to ensure security and privacy for connected health services. The experimental results reveal that the proposed method meets the efficiency and proper authentication accuracy for connected health services with minimum response time and a higher rate of security as well. Keywords: Internet of Things · Health service · Nearest Neighbor · Confidence · Digital Oblivious · Pseudorandom · Signature-based Authentication · Linear Regression · Privacy Preservation

1 Introduction Internet of Things (IoT) is an environment where each linked node can communicate with other nodes in the network to transfer essential data for precise and real-time decisionmaking. IoT is a very efficient environment in precarious conditions such as medical purposes. IoT devices are a significant part and used for health systems. IoT devices gather measurable and analyzable healthcare data to ease healthcare systems’ work. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 804–824, 2022. https://doi.org/10.1007/978-3-031-10467-1_49

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

805

Healthcare data are gathered from IoT devices using remote access mechanisms. Data collected by the sensor is securely sent to the doctor. Data is essential, and it comprises personal information. Unauthorized access to IoT devices generates a necessary risk to patients’ health and confidential information. So, this personal information is widely visible and easy to access, creating problems for privacy and security. The more recent research advances in security and privacy support 5G-enabled IoT applications. 5 G technologies include data rates, large bandwidth, increased capacity, low latency, and high throughput of different capabilities with speed and range. A machine learning-based healthcare monitoring using IoT was proposed in [1] to monitor the essential signs of the student endlessly and detect both biological and behavioral aspects using intelligent healthcare technologies. In this model, first, essential data were acquired using IoT devices. Followed by data analysis were performed using machine learning methods. With these machine learning methods, the anticipated risk of students’ physiological and behavioral changes was also monitored, therefore contributing to accuracy for detecting students’ condition.0 Despite improvement observed inaccuracy, the response time involved in detecting healthcare monitoring was not focused. This work focused on the 5G with the Health Analytics dataset provided as input for intelligent analytics to address this issue. The proposed method is designed with Distinctive Nearest Neighbor Confidence Feature Selection for health-related applications. This method is used only to select the relevant features and effectively focus the response time. To stimulate the implementation of fifth-generation (5G) cellular networks, millions of devices are being associated with enormous Internet of Things (IoT) networks. Nevertheless, head-away in the magnitude of connectivity on 5G networks may heighten the attack surface of these devices and hence shoot up the number of attack potentialities. To mitigate the prospective risks involving IoT systems, secure and effective user authentication schemes called three-factor user authentication were constructed in [2], contributing to computation and communication costs appropriate for tremendously low-cost IoT devices. Though computation and communication cost involved for healthcare monitoring via authentication mechanism was focused, the accuracy involved and security aspects were not concentrated. To focus on these issues, integrated security and privacy mechanism using Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm was designed that also contributed to better security and authentication accuracy to a greater extent. In conventional methods, security and privacy are the two foremost issues that failed to consider for ensuring smooth connected health services while transmitting healthcare data between remote users and medical centers. In the connected health services domain, secure and privacy aspects have moved and failed to enhance reality and test. The existing machine learning method was not covered early analysis, proper authentication, and authorization policies in the critical treatment for the productive utilization of transmission. Also, transmitting a massive amount of sensitive medical information via protected and original form in a secured manner becomes challenging in connected health applications. A Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Learning (SDOP-LRPL) method is proposed to overcome the existing problems.

806

R. Mohanraj

The main contributions of the proposed SDOP-LRPL method are as follows: To guarantee security and privacy for providing the smooth connected health services compared to conventional methods, the proposed SDOP-LRPL method is designed to implement the merge Distinctive Nearest Neighbor Confidence Feature Selection and Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm. To select the relevant features compared to existing methods, Distinctive Nearest Neighbor Confidence Feature Selection has to be sent between remote users to optimize the features with the novelty of key indicator vector-matrix via nearest neighbor functions. In addition, the connected health services algorithms are affected by the response time. A Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm is proposed to address the security and authentication factors compared to traditional methods. The novelty of the Oblivious Pseudorandom function is used to secure communication between users and ensure authentication accuracy. The wonder of the Linear Regression function is applied to measure the relationship among dependent variables and repressors with higher security. Finally, the implementation is done with Python interface programming language with Health Analytics data set. The result acknowledged that the SDOP-LRPL method is found to be better than the state-of-the-art methods. The rest of the paper is organized into the following subsections. Related secure and privacy mechanisms for handling healthcare applications studies and identifying the problems are addressed in Sect. 2. Section 3 discusses the proposed method’s detail and its symbolic representations and detailed algorithm. The setup for simulation is presented in Sect. 4. The experiments and their discussions are provided in Sect. 5. The research article concludes in Sect. 6.

2 Related Works In numerous applications, the Internet of Things (IoT) has received a lot of acceptance due to its uncontrolled sensors operations with the minimum cost. IoT performs the task of sensing the medical circumstances of the patients, to name a few being monitoring of blood pressure, intermittent oxygen level observation, heartbeat rate, analyzing temperature, and getting hold of appropriate actions based on emergency factors. Patientsensitive healthcare is transmitted to remote users and medical centers for post-analysis. As far as medical and healthcare applications are concerned, IoT devices progress into an ecosystem. In [3], a secured and energy-efficient environment is utilizing the Internet of Medical Things (IoMT) for e-healthcare called (SEF-IoMT). It was proposed, which was explicitly designed with the primary objective to reduce the overhead incurred during communication and energy consumption between biosensors while transmitting healthcare data. Also, security aspects were ensured to prevent unauthentic and malicious users, therefore contributing to both network privacy and integrity.

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

807

The Hybrid Lightweight Encryption using Swarm Optimization algorithm (HLE– SO) was proposed in [4] to overcome the issue and provide secure storage and access management. The method is designed with the merge Paillier encryption and KATAN algorithm for offering lightweight features. Then, the lightweight encryption algorithms were used to affect the key’s pace. Followed by the swarm optimization algorithm was applied to optimize the keyspace. The key objective was to encrypt the medical data (EEG signal) and send it to the end-user with the HLE–SO method. Moreover, the methods discussed above did not bestow privacy concerning the location for both the doctors and patients. To address this issue, a secure, anonymous authentication method was focused on the Wireless Body Area Networks (WBANs) for preserving location privacy in [5]. But the machine learning approach was not used. In order to overcome the issue, in [6] privacy and security using machine learning were investigated in detail. However, the designed method was not focused the data sharing and data analysis. Yet another advanced machine learning approach, a comprehensive analysis of artificial intelligence-based Internet of things in healthcare, was designed in [7]. But security and privacy were not considered. To overcome the issue, a detailed review of realizing the timely processing and analysis of medical data using proposed in [8]. The designed method was the focus on the security and privacy of electronic health data in the cloud. Concerns related to security and Privacy for Electronic Medical Records (EMR) were analyzed in detail in [9] to enhance quality performance. However, the encryption algorithm was failed to use secure data communication. To overcome the issue, Elliptic curve encryption and public-key encryption mechanisms were utilized in [10] to ensure user anonymity and integrity. But the designed method of response time was not considered with a secure way to store, use and share medical information. Yet another exhaustive review concentrating on next-generation intelligent systems was presented in [11]. In the past few years, swift metamorphism has been witnessed in the healthcare application from a conventional hospital-oriented model to a patientoriented model in the Smart Healthcare System (SHS). This swift metamorphism has been occurred due to the advancements in different materials and methods. Literature was to provide the most promising solutions for ensuring security and providing privacy mechanisms for the Internet of medical things were investigated in [12]. However, the static and dynamic analysis was not focused. To overcome the issue, both static and dynamic analysis of security and privacy analysis of health applications was proposed in [13]. There is an increasing need for a heightening body of research to be conducted on evaluation mechanisms and methods to aid the estimation of the privacy and security of health apps and measures to manage their design. However, it can be challenging to explore this and select pertinent methods. Recent literature on the security and privacy of m-Health applications was investigated in [14]. A survey concerning advancements in IoT-based health care mechanisms and reviews of different methods concerning IoT-based health care solutions was proposed in [15]. Moreover, numerous analyses of IoT security and privacy features, also focusing on the security aspects, attacks arising from health care perspective was also analyzed. Yet another comprehensive review of security-related aspects and the origin of threats in the

808

R. Mohanraj

IoT applications were also presented in [16]. Along with the detailed discussion of the security aspects, numerous emerging and existing methods concentrated on attaining a high degree of trust in IoT applications were also discussed. In the fifth-generation mobile communication technology (5G) era, medical informatization and global connectivity is successful development. The strong growth of medical data has enormous demand to access data, ensure security and privacy, and process the information in IoMT. But authentication accuracy was not enhanced. To overcome the issue, a survey of advances in IoT-based health care methods and a review of network architectures, directions, and inclinations in IoT-based health care solutions were proposed in [17]. Moreover, numerous IoT security and privacy features from the angle of healthcare were also analyzed. Then, the big data is considered from different aspects covering its inferences in multiple fields, including healthcare. However, the most important and foremost matters of concern are privacy and security because data has to be accessed from numerous locations. In order to address the issue, a comprehensive study on big data and its effectiveness in healthcare big data was provided in [18]. But the implementation performance was failed to ensure security. Next, yet another survey of trend and classification was proposed in [19]. However, privacy was not focused on. An IoT CAD security technique was introduced in [22]. Optimization-intensive CAD techniques compounded with their conventional accurate modeling allow the design of highly optimized IoT devices. But energy consumption was higher. Based on the reviews and the disadvantages of the current state-of-the-art literature, the integration of security and privacy aspect using machine learning for connected health services is proposed. The types of security can be implemented on IoT devices, such as Device authentication and Device authorization. Then, the experimental is conducted to analyze the result by comparing the existing methods. The security and privacy integrated methods are described in the following section.

3 Methodology With the emergence of intelligent and connected health areas worldwide, we have noticed an increasing part of techniques in the health care sector. Numerous sensors are coupled or linked to the patients that ceaselessly monitor their health by means of numerous physiological, environmental, and behavioral criteria. The dissimilar types of IoT devices are Consumer IoT, Commercial IoT, Military Things (IoMT), Industrial Internet of Things (IIoT), Infrastructure IoT. Figure 1 illustrates the types of IoT devices. In healthcare, IoT wearable sensor is a principal technology for patient monitoring. These sensors acquire essential information regarding body temperature, blood pressure, pulse rate, heartbeat rate, respiration rate, ECG, blood glucose for diabetic patients, etc. Also, not all the essential information acquired from the patients via sensors is necessary to all the medical analysts owing to security and privacy aspects. To address these issues, in this work, a Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Learning (SDOP-LRPL) method for connected health services with improved security, authentication accuracy focusing on minimum response time are designed. Figure 2 shows the block diagram of the SDOP-LRPL method.

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

809

Fig. 1. Types of IoT devices

Fig. 2. Block diagram of SDOP-LRPL method

The above figure shows that the proposed SDOP-LRPL method is split into three sections. The Health Analytics dataset provided as input relevant features is first selected using the Distinctive Nearest Neighbor Confidence Feature Selection model. Next, with the appropriate features forming as input to the Digital Oblivious Pseudorandom Signature-based Authentication model security aspect is covered where only users or patients who pass the initial credentials have proceeded with further processing. Finally, authorization or privacy validation is ensured via Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm. The detailed description of the SDOP-LRPL method is explained in the following sections.

810

R. Mohanraj

3.1 Distinctive Nearest Neighbor Confidence Feature Selection Model Despite the propaganda encircling the smart applications and connected healthcare in IoT, the sensors and numerous medical devices coupled with the patients’ bodies produce enormous data volume that is not vital. Hence, it becomes a paramount concern to process the data within the network in an intelligent and significant manner. Therefore, it first requires eliminating irrelevant and invalid data while recognizing and selecting relevant information and acquiring new perceptions into the large volume of raw captured data via efficient feature selection. As far as health data analytics is concerned, feature selection that explores the most illustrative of typical features in the perceived dataset is critical. Unlike feature extraction, feature selection maintains interpretability, signifying that the selected features impart detailed information about explicit health conditions. In our work, Distinctive Nearest Neighbor Confidence function is applied to perform the feature selection model. By applying this function, the relevant features or attributes are selected to reduce the dimensionality. In addition, it is connected health with preventative health and personalized medicine by minimum response time. Figure 3, given below, shows the Distinctive Nearest Neighbor Confidence Feature Selection model.

Fig. 3. Block diagram of distinctive nearest neighbor confidence feature selection model

As shown in the above figure, let us consider a dataset ‘DS’ to be a dataset with 26 Key Indicators represented by ‘KI ’ by the following Key Indicator Vector Matrix. ⎡ ⎤ KI 1,1 KI 1,2 KI 1,3 . . . KI 1,n ⎢ KI 2,1 KI 2,2 KI 2,3 . . . KI 2,n ⎥ ⎥ KIVM = ⎢ (1) ⎣ ... ... ... ... ... ⎦ KI 26,1 KI 26,2 KI 26,3 . . . KI 26,n As given above in (1), each column on the Dataset’ DS’ denotes a feature or attributes, so there are 643 attributes or features (i.e., n = 643), and each row represents a key indicator. So, there are 26 key indicators in this dataset. With a minimal number of

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

811

features, our objective is to attain rational confidence for a Key Indicator, as specified in the objective (i.e., confidence) function given below.

n 1 Cf = argmaxF ConKI − , subjectto|F| < α (2) Val i i=1

As given above (2), with ‘F’ representing the set of obtained features from the Health Analytics dataset [20], we aim to reduce ‘|F|’ that refers to the set cardinality and, ‘Val i ’ denotes the value corresponding with each feature or attribute, ‘ConKI ’ representing the confidence of key indicators, respectively. The objective in our work remains in meticulously obtaining the most relevant features while attaining maximum confidence in a specific class (i.e., key indicator ‘KI ’) without exhausting our threshold for features ‘α’. RF(d ) = (1 − W )I (F, P; Q) + W I (β; Q) − I (β\P; Q) , P ∈ β (3) From the above Eq. (3), the distinctive relevant features ‘RF(d )’ are obtained based on the weight ‘W ’, and the mutual information ‘I ()’ between a set of all features ‘β’ and a single feature ‘F’ in overall set ‘β’ respectively via a nearest neighbor. These distinctive and relevant features are then stored in the webserver and with proper security and privacy mechanism, the intended data are provided to the specified medical persons for further processing. The pseudo-code representation of Distinctive Nearest Neighbor Confidence Feature Selection is given below. Algorithm 1 Distinctive Nearest Neighbor Confidence Feature Selection algorithm Input: Dataset' ', Key Indicators' ', Features' Output: Relevant feature selection ' ' 1: Initialize threshold ' ' 2: Begin 3: For each Dataset' ' with Key Indicators' ' and Features' ' 4: Obtain Key Indicator Vector Matrix as in equation (1) 5: Estimate rational confidence as in equation (2) 6: Evaluate relevant features as in equation (3) ' 7: Return relevant features' 8: End for 9: End

As given in the Distinctive Nearest Neighbor Confidence Feature Selection algorithm, the objective remains to retrieve the distinctive feature selection for preserving the interpretability of raw data. First, the key indicators in the input raw health analytics dataset are formulated as vector matrix to achieve this objective. With the obtained key indicator vector matrix, relevant features for further analysis are selected by means of nearest neighbor confidence value using mutual information as it estimates dependencies between key indicators or features. With this, computationally efficient relevant features to be utilized for further processing (i.e., communication between remote users and medical personnel) are selected, therefore contributing to minimum response time

812

R. Mohanraj

during connected health services. With these unique, relevant features, secured communication and privacy preserved intelligent analytics while driving down healthcare costs is said to be ensured. 3.2 Digital Oblivious Pseudorandom Signature-based Authentication Model Security and privacy in connected health areas around the world are essential issues. Privacy is needed suitable safeguards to preserve the privacy of personal health information. Privacy is provided to patients’ rights over their health information, including requests to observe, achieve a copy of their health records, data share, and management. The sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. On the other hand, security is to preserve the privacy of individuals’ health information created, received, used, or maintained by a covered entity. The security is suitable managerial, physical, and technical safeguards to guarantee integrity and availability. It concentrates on safeguarding the data from unhealthy attacks and stealing data for misusing purposes. In our work, both security and privacy aspects are addressed for intelligent and connected health areas using the Digital Oblivious Pseudorandom Signature-based Authentication model. In our work, digital signatures with the Digital Oblivious Pseudorandom function acquire two keys called the public key and the private key. In the Health Analytics dataset, when a user signs a document, the signature is said to be created by using the signer’s private key. The signer or the user who wants to keep their healthcare sensitive data in the webserver, the Oblivious Pseudorandom function is utilized. The mathematical algorithm creates data matching the signed document with a hash function, therefore encrypting that healthcare sensitive data or document, forming the encrypted healthcare sensitive data. On the other end, the webserver receives the healthcare-sensitive data or document and copies the users’ public key using its private key. If the public key cannot decrypt the signature, it means that the signature is not of the intended user, and hence the iteration proceeds with the other user. On the contrary, the user is valid and hence authenticated for further processing. Figure 4 shows the block diagram of the Digital Oblivious Pseudorandom Signature-based Authentication model. ⎡

RF 1,1 RF 1,2 ⎢ RF 2,1 RF 2,2 CV = ⎢ ⎣ ... ... RF 26,1 RF 26,2

⎤ . . . RF 1,n . . . RF 2,n ⎥ ⎥ ... ... ⎦ . . . RF 26,n

(4)

With the above-obtained class vector matrix, ‘CV ’ in (4), a hash is formed according to the feature size, and this is mathematically formulated as given below. H (CV ) = CVmodFsize

(5)

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

813

Fig. 4. Block diagram of digital oblivious pseudorandom signature-based authentication model

From the resultant hash value formed, an Oblivious Pseudorandom function is evolved where the user locally evaluates ‘Fk (CV 1 ), . . . ., Fk (CV n )’ and sends to the webserver. On the other hand, the webserver only estimates the intersection between ‘Fk (y1 ), . . . ., Fk (yn )’ and ‘Fk (CV 1 ), . . . ., Fk (CV n )’ to generate the digital signature and is mathematically formulated as given below. DS(Ui ) → Priv k ∪ Fk (y1 ), . . . ., Fk (yn ) ∩ Fk (CV 1 ), . . . ., Fk (CV n )

(6)

From the above Eq. (6) the digital signature of the respective user for authentication is estimated based on the union of the private key and the Oblivious Pseudorandom function. On the other hand, during the reverse process, the web server checks to see whether the digital signature is valid only when the hash values are equal, ensuring successful authentication. On the other hand, if the webserver sees that when the hash values are not equal, the user is not valid; hence, authentication is said to fail and proceed with another user. In this manner, secured handling of sensitive healthcare data is ensured. 3.3 Linear Regression Privacy Preservation Communication Model Linear regression is one of the most accessible and popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression is used to measure the linear relationship between a dependent and one or more independent variables. Existing privacy-preserving linear regression (PPLR) was introduced in [21] to improve the accuracy and security with Paillier homomorphic encryption and data masking. To improve the efficiency, data masking is employed. But it failed to consider the IoT with secured and privacy-protected connected health communication. The Linear Regression Privacy Preservation Communication model is introduced to overcome the issue.

814

R. Mohanraj

With the security ensured using the authentication model as given above, data privacy is checked effectively in this section by utilizing the Linear Regression Privacy Preservation Communication model. From our input Health Analytics dataset, five different classes ‘(C = FF, II , JJ , KK, UU )’ each pertaining to different types of diseases relating to fertility ‘(FF = FF 1 , FF 2 , . . . ., FF n )’, injury ‘(II = II 1 , II 2 , . . . ., II n )’, acute illness ‘(JJ = JJ 1 , JJ 2 , . . . ., JJ n )’, chronic illness ‘(KK = KK 1 , KK 2 , . . . ., KK n )’ and childhood disease ‘(UU = UU 1 , UU 2 , . . . ., UU n )’ are utilized to obtain the authorized doctors to be provided with the relevant data. In our work, the authorization to provide healthcare-sensitive medical data to the respective doctors is listed in Table 1. With the aid of the above class values from the table given, in our work, the class vector matrix is utilized by the doctor on the receiving end via modeling the relationship between scalar response and more than one dependent variable. The classes of consideration (i.e., authorized classes or health disease analytics to be viewed only by the concerned medical authorities) are obtained via a set of linear regression functions. Let us consider a set ‘{Ci , RF i1 , RF i2 , . . . , RF im }ni=1 ’ of ‘n’ features, a linear regression model for ensuring privacy (i.e., providing the classes of sensitive healthcare data to the concerned doctors) assumes that the relationship between the dependent variable ‘Ci ’ and the ‘m − vector’ of regressors’ ‘RF is linear. This relationship is designed via an error factor ‘ε’ that includes a noise to the linear relationship between dependent variable and regressors. This is mathematically formulated as given below. Ci = α0 + α1 RF i1 + · · · + αm RF im + εi = RF Ti α + εi

(7)

From the above Eq. (7) ‘T ’, represent the transpose so that ‘RF Ti α’ denotes the inner product between vectors ‘RF i ’ (i.e., related features) and ‘α’. These ‘n’ equations or key indicators are put together for communication ensuring authorization, and mathematically expressed as given below. C = RFα + ε

(8)

From the above Eq. (8), the matrix notation is expressed as ⎞ ⎛ ⎞ α1 C1 ⎜ α2 ⎟ ⎜ C2 ⎟ ⎟ ⎜ ⎟ C=⎜ ⎝ . . . ⎠; α = ⎝ . . . ⎠; ε Cn αn ⎛ T⎞ ⎛ RF 1 1 RF 11 ⎜ RF T ⎟ ⎜ 1 RF 21 2 ⎟ ⎜ RF = ⎜ ⎝ ... ⎠ = ⎝... ... RF Tn 1 RF n1 ⎛

⎞ ε1 ⎜ ε2 ⎟ ⎟ =⎜ ⎝...⎠ εn ⎞ . . . RF 1m . . . RF 2m ⎟ ⎟ ... ... ⎠ . . . RF nm ⎛

(9)

(10)

The above aggregated key indicators and class functions ensure authorization between the users and the receivers or doctors via linear regression function. With the obtained results, only the intended recipients or doctors are received with sensitive medical healthcare data classes. The pseudo-code representation of Secured Digital Oblivious

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

815

Table 1. Class details of healthcare sensitive medical data

Pseudorandom and Linear Regression Privacy Communication for smart and connected health is given below.

816

R. Mohanraj

Two objectives are attained as given in the Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm. With the Health Analytic dataset provided as input, a security mechanism is first ensured via the Secured Digital Oblivious Pseudorandom function. Then, only with the authenticated users, further processing is said to be performed by means of the Linear Regression learning model. With this privacy, each user is ensured by only providing the class of sensitive medical healthcare data to the intended doctors or receivers, therefore bestowing preventative health and personalized medicine efficiently.

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

817

4 Experiments and Results Section The experimental evaluation of the proposed Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Learning (SDOP-LRPL) method and existing machine learning-based healthcare monitoring [1] and three-factor anonymous user authentication [2] is implemented in Python high-level, general-purpose programming language. The proposed and existing method uses the Health Analytics dataset and is taken from [20]. The dataset includes 26 health indicators such as Fertility, Injury, Acute illness, Chronic illness, Childhood disease, Marriage, etc. The dataset concerns 642 variables. This file comprises the data from 9 states and 284 districts of India. Marriage figures are based on marriages taken place during 2009–2011. The dataset size is 2 MB. The experiments are conducted using 500 to 5000 samples with different metrics such as response time, authentication accuracy, authentication time, and security. By evaluating the above metrics, then the results are obtained. At first, the number of samples as input is collected from the Health Analytics dataset. The distinctive Nearest Neighbor Confidence Feature Selection model is applied to choose the relevant features. After selecting the relevant features, the Digital Oblivious Pseudorandom Signature-based Authentication model is employed to secure the handle of the healthcare-sensitive data. At last, the Linear Regression Privacy Communication algorithm is utilized to ensure security and privacy for providing smooth connected health services.

5 Results Section In this section, the results of the SDOP-LRPL method were compared against two conventional methods, namely machine learning-based healthcare monitoring [1] and three-factor anonymous user authentication [2]. The performance of the SDOP-LRPL method is measured in terms of response time, authentication accuracy, and security. The performance results are provided with the aid of graph and table values. 5.1 Case 1: Response Time Analysis One of the most significant factors required for ensuring secured and privacy-oriented connected healthcare services is the response time involved. In other words, response time refers to the time consumed identifying whether the healthcare team is monitoring the sensitive medical healthcare data (i.e., samples) within the stipulated time. This is mathematically formulated as given below. n Si ∗ Time Cf (11) RT = i=1

From the above Eq. (12), the response time ‘RT ’ is measured based on the simulation samples used ‘Si ’ and the time consumed in obtaining the confidence function ‘Time Cf ’ for further processing. We summarize the results of the performance comparison in Table 2. It indicates that the SDOP-LRPL method has significantly less response time than other methods.

818

R. Mohanraj

Table 2. Tabulation for response time using SDOP-LRPL, machine learning-based healthcare monitoring [1], and three-factor anonymous user authentication [2]. Samples

Response time (ms) SDOP-LRPL

Machine learning-based healthcare monitoring

Three-factor anonymous user authentication

500

367.5

412.5

477.5

1000

412.55

625.35

835.35

1500

535.15

815.45

915.45

2000

585.15

1025.35

1123.25

2500

625.35

1135.25

1325.45

3000

695.35

1245.55

1855.15

3500

745.55

1355.45

1935.25

4000

900.15

1485.25

1985.55

4500

1025.35

1535.25

2015.25

5000

1143.25

1715.25

2085.35

Fig. 5. Graphical representation of response time

Figure 5 above illustrates the response time involved in connected health in support of health-related applications. From the figure, the x-axis represents the sample involved for secured communication between users, and the y axis represents the response time involved in analyzing the overall communication process. It is inferred from the figure that increasing the samples involved for connected health communication also increases the response time. This is because the sample healthcare data sent to the other side

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

819

via web server causes a significant amount of time increase for ensuring both security and privacy, therefore increasing the response time. Let us consider the 500 samples for conducting the simulation in the first iteration. From the observation, the response time of the proposed SDOP-LRPL method consumed 367.5 ms, and the time taken for determining the sensitive medical healthcare data by other two methods [1, 2] are observed 412.5 ms, 477.5 ms, respectively. The results showed that the response time was comparatively reduced using the SDOP-LRPL method compared to [1] and [2]. The minimization of response time was due to the application of the Distinctive Nearest Neighbor Confidence Feature Selection algorithm. The key indicators obtained from the raw input dataset were formulated in the form of vector-matrix by applying this algorithm. Next, the nearest neighbor confidence value using mutual information was applied to retrieve relevant features, reducing the response time using the SDOP-LRPL method by 32% compared to [1] and 49% [2]. 5.2 Case 2: Authentication Accuracy Analysis The second metric of importance is authentication accuracy. The accuracy with which the authentication is ensured measures the method’s efficiency. In other words, the authentication accuracy ‘AA’ is referred to as the percentage ratio of the number of accurate samples authorized to the total number of samples involved in simulation. This is mathematically expressed as given below. Aauth =

n SAA ∗ 100 i=1 Si

(12)

From the above Eq. (12), the authentication accuracy ‘Aauth ’ is measured based on the samples involved in simulation ‘Si ’ and the samples accurate authorized ‘SAA ’. It is measured in terms of percentage (%). Table 3 summarizes the results of the comparison in terms of authentication accuracy. The authentication accuracy of the existing methods [1] and [2] is lower than the proposed method. Figure 6, given above, illustrates the authentication accuracy for different numbers of samples. From the figure, it is inferred that the authentication accuracy is said to be inversely proportional to the numbers of samples provided as input. In other words, increasing the number of samples causes a decrease in the authentication accuracy. By considering 500 samples to conduct the simulations, 488 samples are correctly authorized to the total number of samples of SDOP-LRPL method is 97.6%. Similarly, by applying [1, 2] methods, 480 and 475 samples are correctly scheduled to the total number of samples is 96% and 95%, respectively. Ten results are observed, and the results are compared. From these results, the authentication accuracy was observed to be better using the SDOP-LRPL method upon comparison to [1] and [2]. The reason behind the improvement was due to the incorporation of Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm. By applying this algorithm, by means of a Linear Regression learning model authorization was ensured, and only with these authorized users, further processing was ensured. Hence, the average of ten results indicates that the authentication accuracy of the SDOP-LRPL method was said to be better by 6% compared to [1] and 10% compared to [2], respectively.

820

R. Mohanraj

Table 3. Tabulation for authentication accuracy using SDOP-LRPL, machine learning-based healthcare monitoring [1], and three-factor anonymous user authentication [2]. Samples

Authentication accuracy (%) SDOP-LRPL

Machine learning-based healthcare monitoring

Three-factor anonymous user authentication

500

97.6

96

95

1000

97.35

94.25

92.15

1500

95

92.15

90.35

2000

94.25

90

88.15

2500

93

88.35

85.45

3000

92.15

86.15

83.15

3500

94.55

87

84

4000

93.25

85.25

82.15

4500

92

84

81

5000

94.15

84.35

80.25

Fig. 6. Graphical representation of authentication accuracy

5.3 Case 3: Security Analysis The security factor is necessary importance in IoT. The most significant security requirements comprise authentication and tracking, data and information integrity, mutual trust, privacy, and digital forgetting. CAD techniques were used to consider the energy and security issue. But it failed to ensure security. To address the issue, the proposed SDOPLRPL method used to ensure smart connected health services in a robust manner, security plays a major role. In this work, the security is measured in terms of percentage (%).

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

821

Security is defined as the ratio of the number of sensitive medical healthcare data being compromised only by authentic users without any modification to the total number of sensitive medical healthcare data. This is mathematically formulated as given below. Security =

n Scomp ∗ 100 i=1 Si

(13)

From the above Eq. (13) security aspect ‘Security’ is measured based on the samples of data involved in simulation ‘Si ’ and the samples being compromised ‘Scomp ’. It is measured in terms of percentage (%). Table 4 below provides the tabulation results for security using the three methods. Table 4. Tabulation for Security using SDOP-LRPL, Machine Learning-Based Healthcare Monitoring [1] and Three-Factor Anonymous user Authentication [2] Samples

Security (%) SDOP-LRPL

machine learning-based healthcare monitoring

three-factor anonymous user authentication

500

3

4

5

1000

3.35

4.25

5.75

1500

3.85

4.85

6

2000

4

5.35

6.35

2500

4.25

5.55

6.85

3000

4.55

6

7

3500

4.85

6.25

7.25

4000

5

6.85

7.55

4500

5.35

7

7.85

5000

5.55

7.25

8

Figure 7 above illustrates the security aspects covered concerning different sample data collected between 500 and 5000 at different time intervals. Also, from the figure, it is inferred that the security is said to be directly proportional to the samples provided as input. In other words, increasing the samples causes an increase in security also. In the first iteration, 500 samples are considered. These samples being compromised using the SDOP-LRPL method were found to be 15, 20, and 25 using [1] and [2]. From these results, the overall security was observed to be 3%, 4%, and 5%, respectively. The obtained results of the proposed technique are compared to conventional methods. Followed by various performance results are observed for each method. For each method, ten different results are observed. The reason for higher security is to apply Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Communication algorithm. By applying this algorithm first, the Digital Oblivious Pseudorandom function was applied to secure communication between users, reducing the compromised data.

822

R. Mohanraj

Fig. 7. Graphical representation of security

The average result of security is compromised using SDOP-LRPL by 24% compared to [1] and 36% compared to [2], therefore contributing to security in a significant manner.

6 Conclusion A Secured Digital Oblivious Pseudorandom and Linear Regression Privacy Learning (SDOP-LRPL) method for connected health services are proposed in this work. It is introduced to improve the privacy and security involving connected health data communication. Specifically, the Distinctive Nearest Neighbor Confidence Feature Selection algorithm selects the relevant features that can be communicated for personalized medicine and provide rich medical information. It is trained based on Key Indicator Vector Matrix as input parameters. Then, the Digital Oblivious Pseudorandom Signaturebased Authentication model is employed to determine whether the sensitive medical healthcare data is securely communicated to the intended person or not and if they are compromised. For any abnormality found with the connected health, the condition can be resolved via a Linear Regression learning model based on linear regression function. This feature supports the authentication accuracy and security of the proposed SDOPLRPL method. The efficiency of the SDOP-LRPL method was evaluated by different metrics, such as response time, authentication accuracy, and security. The overall metrics of the proposed SDOP-LRPL method achieve a high rate of security; enhance the authentication accuracy with minimum response time.

Secured Digital Oblivious Pseudorandom and Linear Regression Privacy

823

References 1. Souri, A., Ghafour, M.Y., Ahmed, A.M., Safara, F., Yamini, A., Hoseyninezhad, M.: A new machine learning-based healthcare monitoring model for student’s condition diagnosis in the Internet of Things environment. Soft Comput. 24, 17111–17121 (2020). https://doi.org/10. 1007/s00500-020-05003-6 2. Lee, H., Kang, D., Ryu, J., Won, D., Kim, H., Lee, Y.: A three-factor anonymous user authentication scheme for Internet of Things environments. J. Inf. Secur. Appl. 52, 102494 (2020) 3. Saba, T., Haseeb, K., Ahmed, I., Rehman, A.: Secure and energy-efficient framework using Internet of Medical Things for e-healthcare. J. Infection Public Health 13, 1567–1575 (2020) 4. Tamilarasi, K., Jawahar, A.: Medical data security for healthcare applications using hybrid lightweight encryption and swarm optimization algorithm. Wirel. Pers. Commun. 114, 1865– 1886 (2020). https://doi.org/10.1007/s11277-020-07229-x 5. Vijayakumar, P., Obaidat, M.S., Azees, M., Islam, S.K.H., Kumar, N.: Efficient and Secure Anonymous Authentication with Location Privacy for IoT-based WBANs. IEEE Trans. Ind. Inform. 16, 2603 (2019) 6. Aheed, N.W., He, X., Ikram, M., Usman, M., Hashmi, S.S., Usman, M.: Security and privacy in iot using machine learning and blockchain: threats and countermeasures. ACM Comput. Surv. 53, 1–37 (2020) 7. Amin, S.U., Hossain, M.S.: Edge intelligence and internet of things in healthcare: a survey. IEEE Access 9, 45–59 (2020) 8. Sun, L., Jiang, X., Ren, H., Guo, Y.: Edge-cloud computing and artificial intelligence in internet of medical things, architecture, technology and application. IEEE Access 8, 101079– 101092 (2020) 9. Keshta, I., Odeh, A.: Security and privacy of electronic health records: Concerns and challenges. Egyptian Inform. J. 22, 177–183 (2020) 10. Chen, C.-L., Huang, P.-T., Deng, Y.-Y., Chen, H.-C., Wang, Y.-C.: A secure electronic medical record authorization system for smart device application in cloud computing environments. Hum.-Centric Comput. Inf. Sci. 10, 1–31 (2020). https://doi.org/10.1186/s13673-020-002 21-1 11. Shafique, K., Khawaja, B., Sabir, F., Qazi, S., Mustaqim, M.: Internet of Things (IoT) for next-generation smart systems: a review of current challenges, future trends and prospects for emerging 5G-IoT scenarios. IEEE Access 8, 23022–23040 (2020) 12. Vaiyapuri, T., Binbusayyis, A., Varadarajan, V.: Security, privacy and trust in IoMT enabled smart healthcare system: a systematic review of current and future trends. Int. J. Adv. Comput. Sci. Appl. 12, 731–737 (2021) 13. Politou, E., Alepis, E., Solanas, A., Patsakis, C.: Security and privacy analysis of mobile health applications: the alarming state of practice. IEEE Access 6, 9390–9403 (2018) 14. Nurgalieva, L., O’Callaghan, D., Doherty, G.: Security and privacy of mHealth applications: a scoping review. IEEE Access 8, 104247–104268 (2020) 15. Riazul Islam, S.M., Kwak, D., Humaun Kabir, Md., Hossain, M., Kwak, K.-S.: The Internet of Things for health care: a comprehensive survey. IEEE Access 3, 678–708 (2015) 16. Hassija, V., Chamola, V., Saxena, V., Jain, D., Goyal, P., Sikdar, B.: A survey on IoT security: application areas, security threats, and solution architectures. IEEE Access 7, 82721–82743 (2019) 17. Isla, S.M.R., Kwak, D., Kabir, M.H., Hossain, M., Kwak, K.-S.: The Internet of Things for health care: a comprehensive survey. IEEE Access 3, 678–708 (2015) 18. Sarkar, B.K.: Big data for secure healthcare system: a conceptual design. Complex Intell. Syst. 3, 133–151 (2017). https://doi.org/10.1007/s40747-017-0040-1

824

R. Mohanraj

19. Aman, A.H.M., Yadegaridehkordi, E., Attarbashi, Z.S., Hassan, R., Park, Y.-J.: A survey on trend and classification of internet of things reviews. IEEE Access 8, 111763–111782 (2020) 20. Kaggle Page. https://www.kaggle.com/rajanand/key-indicators-of-annual-health-survey?sel ect=Key_indicator_statewise.csv 21. Qiu, G., Gui, X., Zhao, Y.: Privacy-preserving linear regression on distributed data by homomorphic encryption and data masking. IEEE Access 8, 107601–107613 (2020) 22. Xu, T., Wendt, J.B., Potkonjak, M.: Security of IoT systems: design challenges and opportunities. In: 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), January 2015

Hardware Implementation for Analog Key Encapsulation Based on ReRAM PUF Manuel Aguilar Rios1(B) , Sareh Assiri1,2 , and Bertrand Cambou1 1

2

Northern Arizona University, Flagstaﬀ, AZ 86011, USA {maa778,sa2363,Bertrand.Cambou}@nau.edu Computer Science and Information Systems Department, Jazan University, Al Maarefah Road, Jazan, Saudi Arabia https://in.nau.edu/cybersecurity

Abstract. Resistive random-access memory (ReRAM) physically unclonable functions (PUFs) have recently been proposed for direct analog key encapsulation. Pre-formed ReRAM cells can produce reliable and unique responses when injected with very low currents, with the value of these responses varying with the current. The values of current used on ReRAM cells coupled with the response they produce can be used to encapsulate keys with a PUF directly. Analog key encapsulation with ReRAM cells has been simulated using data gathered from precision machinery in a lab setting and has been theorized to work. This study presents the reliability of analog key encapsulation with ReRAM PUFs on a hardware shield. It illustrates how the hardware implementation of direct analog key encapsulation encryption with ReRAM PUFs produces reliable responses on par with the simulated results. This study shows the approach followed to implement encryption and decryption of the plaintext based on hardware on both sides, the client and server. The implementation was written in C++ on the client level and Python 3 on the server level. Keywords: Analog key encapsulation · Resistive random access memory · Physical unclonable function · Analog key encapsulation with ReRAM PUFs

1

Introduction

Most cryptography algorithms use the traditional digital keys for encryption and decryption purposes. Although using digital keys for encryption and decryption is worthy and gives a high-security level, it is not enough security for most applications. There are several vulnerabilities with traditional digital keys, so avoiding key issues with cryptography algorithms such as key generation, distribution, and storage are highly desirable [1,15]. In addition, Quantum computing will be a critical issue for traditional digital keys; however, started ﬁnding alternative ways to post-quantum cryptography [16]. As alternative solutions, using the hash function, random number generator RNG, PUFs become recommended tools to mitigate the risk of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 825–836, 2022. https://doi.org/10.1007/978-3-031-10467-1_50

826

M. A. Rios et al.

exploiting the security. Bertrand et al. [6,14,17,22], mention a way to use the Resistive Random-Access-Memories (ReRAM) to encapsulate the key; furthermore, it can be encrypted and decrypted without using the traditional digital keys cryptography algorithms. They mentioned that Resistive Random-AccessMemories (ReRAM) could be exploited as Physically Unclonable Functions (PUF). ReRAM PUFs have been recently hypothesized for analog key encapsulation, a type of encryption and decryption that does not rely on key generation. That is because key generation, distribution, and storage are extremely complex and have many vulnerabilities. Moreover, longer secret keys and cryptographic protocols are hard to implement on power-constrained systems. Analog key encapsulation encryptions utilize the unique resistances of memristors to directly encrypt messages, removing the need for keys and the many complexities that come with them [6]. So, as mentioned in [6,22], The ciphertext is the value of voltage or resistances extracted from the ReRAM PUF, and both sides can retrieve the plaintext. Having hardware that can encrypt and decrypt with a high level of randomness and high level of security is a desirable matter. This paper aims to implement the analog key encapsulation protocol on a hardware ReRAM Shield using a physical ReRAM PUF. This study is an extension work of the [6] and [22]. In previous work, both articles used a software ﬁle that has been extracted from the ReRAM to use as a software PUF. The software ﬁle contains a ﬁnite number of ReRAM responses for each challenge; each challenge randomly selects from its ﬁnite responses when called to simulate a physical PUF. It should also be noted that PUF responses on the simulated PUF were gathered using a precision semiconductor device analyzer. In this work, an actual hardware ReRAM PUF will be used. In addition, the entire protocol in previous work was a software simulation, whereas, in this work, the client will be entirely implemented on hardware, where there is much more measurement variation. This paper is organized into six main sections in the following ways: [Section 2]: This section contains the previous work involving analog key encapsulation encryption with ReRAM PUFs. The use of ReRAM PUFs for analog key encapsulation encryption has been studied; however, no work has been done using physical ReRAM PUFs. [Section 3]: We show the aims of this study and the purpose of implementing the key encapsulation protocol. [Section 4]: The analog key encapsulation protocol is presented. We show how we test the reliability of the ReRAM PUF responses for this protocol. Also, we illustrate how the hardware is assembled and implemented. [Section 5]: We present the hardware this protocol was implemented on and the software and programming languages that were used. [Section 6]: We present the results of this implementation and discuss the outcomes compared to the previous works. [Section 7]: We go over the study conclusion and future work.

Hardware Implementation for Analog Key Encapsulation

2

827

Background

Physically Unclonable Functions (PUFs) act precisely the same as human biometrics for a physical device. A PUF targets the nanoscale device parametric variations to produce unclonable measurements of physical objects. The characteristic of the PUFs is hard to clone, hard to predict, and diﬃcult to replicate. Their ability to generate and store secret information makes them a good candidate for security systems. An image of a PUF’s challenges and corresponding responses are generated at enrollment time and stored on the server; the physical PUF will generate the same responses to each challenge every time authentication is needed on the client-side. Several types of memory structures can be used as PUFs SRAM, MRAM, and ReRAM [5,7,8,12,19]. PUFs have been used in several cyber security aspects, such as generating the true random number, authentication, one-way encryption, generating cryptographic keys, and recently, encrypting and decrypting the message [2–4,10]. ReRAM based analog key encapsulation has already been theorized and documented at Northern Arizona University. The protocol moves through several stages to be implemented. Besides, the protocol used several tools such as the True Random number Generator TRNG [10,18], hash function [18,20], and ReRAM PUF [7,11,13,21]. The same strategy of The Ternary Addressable Public Key Infrastructure protocol (TAPKI) [3,9] is followed; The true random number is shared among the parties to conﬁrm the handshake. With the TRN, password, and hash functions, both parties can agree to use the same locations in the ReRAM PUF. In [6] and [22] show in more detail how the analog key encapsulation encryption protocol work. The articles document the steps of analog key encapsulation encryption protocol and the memristor characteristics. However, the steps of the protocol are demonstrated as follows: • A handshake between the parties is initiated where true random number, T, is exchanged, and a password, PW, is used to generate the message digest, MD. • The MD will contain addresses, orders, and currents, which will generate a memristor response array. • The ﬁnal cipher sent is C calculated from the memristor response array and the message for encryption M. • The client will then decrypt using the T, PW, and C. Moreover, there needs to be a higher cell-to-cell variation than a cell measurement variation for this protocol to work. Additionally, further work has been done on the analog key encapsulation, which expands on the ideas found in the ﬁrst paper [6] and uses a new protocol to encrypt and decrypt [22]. This protocol uses a PW and T as well; however, instead of using the MD to get the order, it uses the M (plaintext) to get orders directly. This paper will go over a similar as previous works but slightly diﬀerent protocol and its implementation using a physical ReRAM PUF. As mentioned above, the previous papers’ objective was to explain multiple analog key encapsulation

828

M. A. Rios et al.

protocols and design the software implementation, but it did not implement these protocols with a physical ReRAM PUF. The implementation in previous work was dependent on simulation of the ReRAM PUF data gathered in a software ﬁle. Whereas, this work will show how to encrypt and decrypt using analog key encapsulation on a physical ReRAM PUF; we will explain the obstacles that face us during the implementation and how the ReRAM PUF is designed. More details about the protocol will be in Sect. 4.

3

Objective

Sharing a key, PIN code, or plain text message demands a high secure network or high secure cryptographic algorithm. Most cryptographic algorithms require long keys, high power, and memory space to implement to store them. Whereas using the ReRAM PUF will help safely transmit the key, PIN code, password, or plain text message at a low cost compared to traditional cryptography algorithms. As mentioned in the background section, all the previous studies utilized the software ﬁle of the ReRAM PUF, not the actual ReRAM PUF. This study aims to illustrate how actual ReRAM PUF can be used to encapsulate the key, password, PIN code, or plain text message. We will focus in-depth on how the hardware is implemented to be utilized for doing the key encapsulation.

4

Methodology

In this study, we follow a visual representation of the keyless encryption protocol that can be summarized in Fig. 1 and Fig. 2. Figure 1 summarizes the steps of encryption that we will follow for implementing the keyless encryption protocol based on the ReRAM PUF in hardware devices, whereas Fig. 2 shows the steps of decryption. Furthermore, in this section, we will cover the protocol steps used for testing, the hardware this protocol was implemented on. 4.1

Protocol Steps

In this analog key encapsulation protocol, we have two parties, a server, and a client. The server with an image of PUF A will encrypt a key, K, for a client with actual PUF A. The protocol has two main parts: address array generation and current array generation. Both parts are done on the server and the client and rely on the unique responses of the PUF. For this process to work, both the server and the client must agree to a predeﬁned set of currents for this encryption. Currents will be calculated by comparing the memristor response to the expected response. Currents with values further apart from each other are preferred to mitigate the error rate from the measurement variation on the hardware. If current values are close, there is a higher chance that measurement variation will bump the memristor response to match adjacent currents. The memristor response, in this case, will be the voltage of the memristor at a given current; this is interchangeable with the resistance of a memristor at a given current because of R = V /I.

Hardware Implementation for Analog Key Encapsulation

829

Fig. 1. Analog key encapsulation encryption protocol

• The ﬁrst step to encryption on the server-side is to combine the password and random number into a 64-byte message digest, MD. Combining them is done by hashing the password and the random number and then XORing them together. • The second step is to rotate the ﬁrst 16 bits 16 times to get 16 diﬀerent 64-byte message digests. Each of the 64-byte message bytes is then hashed

830

•

• • • • •

M. A. Rios et al.

and combined into a 1024-byte (8192-bit) message digest, which will then be used to generate the address array by using two bytes for each address (512 addresses). The third step is to generate the current array. This is simply done by taking K and converting it to ASCII values. By using 16 possible current values, the currents are then split into 4 bits each, and each is paired with every address in the address array. Since there are 512 addresses in the current array, and each address is 4 bits (half a byte). This makes the current array 256 bytes long. The fourth step is to generate the response array, RA, using the current array. This is calculated by using an image of the ReRAM PUF and extracting the memristor response of an address at a speciﬁc current. In the ﬁfth step, the RA and T will be exchanged with the client during the handshake. After the handshake is complete, the client will use the T, the PW, and the RA to begin the decryption process. After the handshake, the ﬁrst step on the client-side is to generate the same MD using PW and T. The second step will be to use MD and RA to generate the current array. The protocol for generating the current array can vary, and it can be done with either a software comparison or a hardware comparison. Once the current array is generated, the client will then use those values to generate the key K that was originally encrypted. A visual representation of the analog key encapsulation protocol can be found below in Fig. 1 and Fig. 2.

4.2

ReRAM Shield 2.0

This analog key encapsulation protocol was implemented on NAU’s ReRAM Shield 2.0, designed for ReRAM key encryption protocols. The shield is designed to be used on the ChipKit WiFire microcontroller and utilizes Crossbars 1B Chip as a ReRAM PUF. Crossbar’s 1B chip has 4096 preformed memristors designed speciﬁcally to be used as a PUF, and it can output the responses of all memristors at 75 predeﬁned currents. Moreover, the shield has two ReRAM PUFs slots. The shield was not designed for analog key encapsulation use; however, the shield’s ability to read the memristor responses of the ReRAM PUFs allows for the implementation of analog key encapsulation protocols. 4.3

Hardware

Since NAU’s ReRAM Shield 2.0 uses a variety of hardware components in a tight space to communicate with the ReRAM PUFs, the shield contains a notable amount of noise. Consequently, the measurement variation of some cells can become high, leading to inaccuracies in the protocol. In an enrollment, where each cell was read 20 times at a speciﬁc current, the average standard deviation of cell reads was 10 mV, with the highest being 395 mV. This measurement variation makes it very diﬃcult to decrypt because of the high chance of false positives and negatives. Capacitors were added to the outputs of the ReRAM PUFs to

Hardware Implementation for Analog Key Encapsulation

831

mitigate measurement variations. However, since the shield was designed for key encryption methods, the capacitor locations are located close to the comparator inputs, far away from the Analog-to-Digital Converters (ADC), making them less eﬀective mitigating measurement variations.

Fig. 2. Analog key encapsulation decryption protocol

5

Implementation

The server generates the random number and resistance array. The server, in this case, will be run using Python 3 code and generate their random number using the secrets Python library. The client is an NAU ReRAM Shield 2.0, paired with a ChipKit WiFire and a PUF. Moreover, an image of the ReRAM PUF is uploaded to the server before the protocol starts to generate the RA.

832

M. A. Rios et al.

The ReRAM PUF image is produced by enrolling the ReRAM PUF at the 16 predeﬁned currents. Each address is read at each current, 20 times. Then the average and the standard deviation of those reads are calculated. The averages of the reads are used as the image of the PUF, and the standard deviations of those 20 reads are used to determine the ﬂakiness of cells. Flaky cells have a standard deviation higher than f mV at any current. In this implementation, the f was chosen at 16 mV. In this implementation, two current guessing methods are used. The ﬁrst current guessing method, method 1, will read an address at every current an n number of times until it ﬁnds a match. If the memristor response is within the acceptable range r of the actual response, that speciﬁc current will be returned as the current. If no current is within the acceptable range, it will return an error. The main advantage of this method is that the memristor responses will have to match the response array highly. This will make it more secure and give fewer false positives, the correct answer without the correct PUF. The disadvantage to this method is that it is highly susceptible to measurement variation, leading to more false negatives, the incorrect answer with the correct PUF. The n in this implementation is 20, and the r is 4 mV. The second current guessing method used, method 2, will read an address at every current and t number of times and take the average of those t reads. It will then take the average closest to the response array value and return that as the current. The advantage of this method is that the memristor response will be less susceptible to noise as it uses the average of t reads. The disadvantage to this method is that there is a higher chance of false positives, which are the correct current even without the correct PUF, as this method will take the closest guess. The t for this implementation is 10. Each method will be tested on each slot, as it has been found that the memristor responses and measurement variation vary depending on the slot used. The average standard deviation for each address and current using slot 1 was 9.6 mV, with the highest standard deviation being 395 mV. On slot 2, using the same ReRAM PUF, the standard deviation for each address at each current was 8.6 mV, with the highest being 230 mV. Additionally, removing ﬂaky cells should help the success rate of the methods. Each method was tested with and without the removal of ﬂaky cells. 5.1

Implementation Challenges

The measurement variations were the most diﬃcult challenge we encountered when implementing analog key encapsulation using physical ReRAM PUFs. There is inherent measurement variation using a physical ReRAM PUF. Even in a minimal noise environment, ReRAM PUF responses will vary slightly every time it is measured. NAU’s ReRAM Shield 2.0 ampliﬁed the measurement variation because of the inherent noise caused by all the signals being used in a small area. Additionally, NAU’s ReRAM Shield 2.0 was designed for diﬀerent encryption protocols, where analog measurements of the PUF responses were

Hardware Implementation for Analog Key Encapsulation

833

not prioritized. As a result, noise mitigation for analog responses was not implemented. This meant that we had to design the current guessing methods to accommodate the inherent noise variation by widening the acceptable range of a ReRAM response and discarding the memristors with the higher noise measurement variations.

6

Results

After enrolling the same ReRAM PUF in both slots, the two methods were tested in each slot at room temperature. A 10-character message, “HelloWorld,” was encrypted and decrypted 100 times for each test at room temperature. Additionally, a False PUF, which is just a diﬀerent ReRAM PUF than the one enrolled, was used to acquire the number of false positives for each method with the ﬂaky cells removed. The results for each test can be found below in Table 1. Table 1. Analog key encapsulation testing results Successful character decryption percentage

Current guessing method Flaky Flaky Number of cells not cells ﬂaky cells removed removed removed

Slot 1

Method 1 Method 2 False PUF Method 1 False PUF Method 2

45.8% 83.5% NA NA

58.0% 90.9% 0.2% 6.7%

1487

Slot 2

Method 1 Method 2 False PUF Method 1 False PUF Method 2

62.7% 89.8% NA NA

68.9% 97.1% 0.3% 6.5%

1348

After testing each method using real ReRAM PUFs, it was found that using method 2, which took the best value, had the best success rate for analog key encapsulation protocol implementation with NAU’s ReRAM Shield 2.0. However, it did have a higher false success rate at 6.5%. The false-positive responses are caused by the cell in the false PUF having a similar memristor response to the cell in the actual PUF at the same address. This makes it completely random and indeterministic and impossible to know which ones are correct and false. False positives become a problem if the percentage of false positives gets high enough because then a hacker can use a partial part of the key to decrypt the rest of the message if the key is in a standard plaintext sentence structure. As expected, method 1 was substantially less accurate, but it gave substantially fewer false positives at 0.3%. This makes it more secure but reliant on low measurement variations to be practical for use. This means that method 2 is preferred and

834

M. A. Rios et al.

more likely to be used in practical applications where it is in a typical hardware environment with moderate measurement variations. Moreover, it was found that slot 2 had a higher success rate than slot 1 for each method. This can be linked to the higher measurement variations recorded on slot 1. The higher measurement variation and noise on slot 1 is hypothesized to be due to its location on the shield. This means that noise mitigation is crucial for future hardware designs.

7

Summary and Future Work

This paper implemented an analog key encapsulation protocol on an NAU ReRAM Shield 2.0 to see if analog key encapsulation provided reliable results using physical ReRAM PUFs. An analog key encapsulation protocol was implemented using two current guessing methods with diﬀerent conditions. This implementation found that a current guessing method that took the closest matching average value had the best success rate of 97.1%. Overall, the results showed that this analog key encapsulation encryption was viable as it provided reliable results using physical ReRAM PUFs. The next step in improving this protocol is designing a hardware shield speciﬁcally for analog key encapsulation encryption, combined with hardware comparison protocol to increase eﬃciency and security. A hardware comparison protocol and an analog key encapsulation protocolspeciﬁc shield are planned to be implemented in the future to improve performance and eﬃciency further. The hardware comparison protocol is theorized to increase the protocol’s security and the decryption’s performance and speed. The proposed hardware comparison circuitry’s purpose is to bypass the software comparison to improve security and performance. If the software comparison is not used, the values of addresses at multiple currents are never read into the microcontroller, removing another vulnerability that a potential hacker might exploit. Moreover, using a hardware comparison might also improve the performance of the decryption because hardware comparisons are much more precise than software comparisons. This is because the Chipkit WiFire microcontroller also uses a 3.3 V 12-bit Analog-to-Digital Converter (ADC) to read voltage values. The 12-bit ADC limits the precision of each voltage to 880 uV. On the other hand, Op Amps can have precision as low as 5 uV and are much more precise. Additionally, hardware comparisons can be much faster because there is no need to communicate with the microcontroller.

References 1. Al Hasib, A., Haque, A.A.M.M.: A comparative study of the performance and security issues of AES and RSA cryptography. In: 2008 Third International Conference on Convergence and Hybrid Information Technology, vol. 2, pp. 505–510. IEEE (2008) 2. Assiri, S., Cambou, B.: Homomorphic password manager using multiple-hash with PUF. In: Arai, K. (ed.) FICC 2021. AISC, vol. 1363, pp. 772–792. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73100-7 55

Hardware Implementation for Analog Key Encapsulation

835

3. Assiri, S., Cambou,B., Booher, D.D., Miandoab, D.G., Mohammadinodoushan, M.: Key exchange using ternary system to enhance security. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0488–0492. IEEE (2019) 4. Assiri, S., Cambou, B., Booher, D.D., Mohammadinodoushan, M.: Software implementation of a SRAM PUF-based password manager. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1230, pp. 361–379. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52243-8 26 5. Cambou, B.: Encoding ternary data for PUF environments, 29 November 2018. US Patent App. 16/036,477 6. Cambou, B., H´ely, D., Assiri, S.: Cryptography with analog scheme using memristors. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(4), 1–30 (2020) 7. Cambou, B., Orlowski, M.: PUF designed with resistive ram and ternary states. In: Proceedings of the 11th Annual Cyber and Information Security Research Conference. CISRC 2016, New York, NY, USA, 2016. Association for Computing Machinery (2016) 8. Cambou, B., Philabaum, C., Booher, D., Telesca, D.A.: Response-based cryptographic methods with ternary physical unclonable functions. In: Arai, K., Bhatia, R. (eds.) FICC 2019. LNNS, vol. 70, pp. 781–800. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-12385-7 55 9. Cambou, B., Telesca, D.: Ternary computing to strengthen cybersecurity. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2018. AISC, vol. 857, pp. 898–919. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01177-2 67 10. Cambou, B., Telesca, D., Assiri, S., Garrett, M., Jain, S., Partridge, M.: TRNGS from pre-formed RERAM array. Cryptography 5(1), 8 (2021) 11. Cambou, B.F, Quispe, R.C., Babib, B.: PUF with dissolvable conductive paths, 28 May 2020. US Patent App. 16/493,263 12. An Chen. Comprehensive assessment of RRAM-based PUF for hardware security applications. In: 2015 IEEE International Electron Devices Meeting (IEDM), pp. 10–7. IEEE (2015) 13. Korenda, A.R., Afghah, F., Cambou, B.: A secret key generation scheme for internet of things using ternary-states RERAM-based physical unclonable functions. In: 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 1261–1266. IEEE (2018) 14. Korenda, A.R., Assiri, S., Afghah, F., Cambou, B.: An error correction approach to memristors PUF-based key encapsulation. In: 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), pp. 1–6. IEEE (2021) 15. Kumar, M.G.V., Ragupathy, U.S.: A survey on current key issues and status in cryptography. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 205–210. IEEE (2016) 16. Majot, A., Yampolskiy, R.: Global catastrophic risk and security implications of quantum computers. Futures 72, 17–26 (2015) 17. Miandoab, D.G., Assiri, S., Mihaljevic, J., Cambou, B.: Statistical analysis of RERAM-PUF based keyless encryption protocol against frequency analysis attack. In: Arai, K. (ed.) FICC 2022. LNNS, vol. 439, pp. 928–940. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-98015-3 63 18. Paar, C., Pelzl, J.: Understanding Cryptography: A Textbook for Students and Practitioners. Springer, Cham (2009). https://doi.org/10.1007/978-3-642-04101-3 19. Shamsoshoara, A., Korenda, A., Afghah, F., Zeadally, S.: A survey on physical unclonable function (PUF)-based security solutions for internet of things. Comput. Netw. 183, 107593 (2020)

836

M. A. Rios et al.

20. Tsai, J.-L.: Eﬃcient multi-server authentication scheme based on one-way hash function without veriﬁcation table. Comput. Secur. 27(3–4), 115–121 (2008) 21. Yoshimoto,Y., Katoh, Y., Ogasahara, S., Wei, Z., Kouno, K.: A RERAM-based physically unclonable function with bit error rate lt; 0.5% after 10 years at 125◦ C for 40nm embedded application. In: 2016 IEEE Symposium on VLSI Technology, pp. 1–2 (2016) 22. Zhu, Y., Cambou, B., Hely, D., Assiri, S.: Extended protocol using keyless encryption based on memristors. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) SAI 2020. AISC, vol. 1230, pp. 494–510. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-52243-8 36

Questions of Trust in Norms of Zero Trust Allison Wylde(B) Cardiff University, Cardiff C10 3EU, Wales, UK [email protected]

Abstract. Important norms may evolve to be promoted, implemented, and enforced by policymakers; one current example is zero trust. This norm originally arose organically, as a trusted norm among cyber security practitioners. This paper explores a puzzling question; will zero trust continue to be trusted as it evolves as an enforced norm? By leveraging well-established theory on trust, this paper presents a novel approach to allow the study of how actors may trust an evolving norm such as zero trust. The paper first examines the emergence of zero trust. Next, following the SolarWinds breach, state-led policy responses enforcing the adoption of zero trust are reviewed. Key theory on norms and trust are revisited to help create a foundation. Expanding on the integrative processes in trust building together with a comparative assessment of the assumptions in presumptive trust and zero trust, the contribution of this paper lays a foundation through presenting a new approach that enables an assessment of trust in norms (ATiN). Thus, allowing study of the trust in discursive organic norms as compared with norms evolving as policy-enforced norms. Findings from a preliminary evaluation illustrate the ability of ATiN in disentangling the elements and processes involved during trust building in a policy-enforced norm. This paper invites other researchers’ interest and calls for a research agenda for trust and norms for cybersecurity, trust and zero trust. Keywords: Cyber security · Trust · Zero trust · Norms

1 Introduction Although zero trust is an organic norm trusted for cyber security, questions concerning trust in new policy-enforced norms remain. To the best of the author’s knowledge, trust in emerging and policy-enforced norms, such as zero trust, remains understudied. This paper proposes a new approach to address this important gap. The term norm is generally understood as a socially constructed and subjective belief shared by actors about actions and possible actions [1]. In international relations, where concerns are focused on relations involving, policy and power, norms are seen as essential to allow the development of shared understanding, decision-making, and consensus [2]. Thus, norms for cyber security have been proposed to help create shared understanding and wide-ranging agreements [3]. The scope of this paper is limited to norms of zero trust implemented at the level of state actors and international relations in the context of cyber security. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 837–846, 2022. https://doi.org/10.1007/978-3-031-10467-1_51

838

A. Wylde

A pertinent example is through study of the actions of the state actors, notably the US in response to the 2020 SolarWinds breach [4]. A few months after the incident, US executive orders called for the adoption and implementation of the cyber security norm, zero trust, to support national cybersecurity [4, 5]. Subsequently, the newly formed Cyber Security and Infrastructure Security Agency (CSIA), (also established in 2021) launched a public consultation to support US agencies as they adopt zero trust practices [5]. This development is part of wider actions by agencies, such as the CSIA in the US [5], the UK’s National Cyber Security Centre (established in 2016) [6] and numerous state-led agencies set up to support norms. The UN Internet Governance Forum (UN IGF) established in 2006, also supports norms development and practice [3]. In 2015, the UN framework for responsible state behavior was published, establishing the foundational cyber security norms [7]. As context, current norms comprise, 1, non-interference with the public core of the internet 2, protection of the electoral infrastructure 3, avoidance of tampering 4, no commandeering of ICT devices into botnets 5, a vulnerability equities process, and 6, the reduction and mitigation of significant vulnerabilities [7, 8]. Yet, in conversations on norms with senior cyber security practitioners, a puzzle has emerged. Although cyber security practitioners trust zero trust, what will happen as it evolves as a policy-enforced norm? [5]. In answer, this paper presents an argument that well-established trust theory can be harnessed to shed light and help understand trust, in norms and enforced norms; in this case, zero trust. The remainder of the paper is organised as follows. The next section revisits important aspects of key theory on norms and cyber security norms, followed by a discussion on the emergence of zero trust, and then, thinking on trust. Next, the SolarWinds breach is presented, followed by a discussion on state-enforced norms, a consideration of how a norm that evolves may be characterised by trust building theory and a preliminary evaluation. The final section presents the conclusion, limitations, future studies, and implications.

2 Norms and Emergence Norms emerge and evolve over time, embodied as agreed and shared beliefs and actions. It was once the norm for airline pilots to reach for the flight controls, with a cigarette in hand, or for a vehicle driver to not wear a seatbelt, or, as now, for viewers of streaming services to share passwords [9]. These activities seemed normal at the time; in fact, as viewers may ask, what is the harm if I share my password with a cousin or a friend or a colleague [9]? Sharing passwords currently appears as a norm, [2] though a norm that is misaligned [10]. Key theory on norms is discussed next. For political scientists, norms are seen as the ‘proper’ behaviour of actors and the agreed standards and regulations that govern national security, complicated by issues of power and wealth [1, 2]. Social scientists view norms as social constructs, existing as shared beliefs, and or behaviours, among group members [1, 2]. Three main forms of social norms are proposed; descriptive, that, in which others may engage, what ‘is’, and injunctive norms, for ‘ought to’ [11]. Subjective norms concern the role of the approval (or not) from important others in motivating an actor to comply with a referent [12]. Drawing from research on the technology acceptance

Questions of Trust in Norms of Zero Trust

839

model, individuals were found to respond positively to a favorable image (identification) with a referent group [13]. In trying to achieve this image, subjects voluntarily adopted the specific behaviors of the referent group. In doing so, subjects believed their own status had increased [13]. Other studies found that belief in a target group’ approval, for example, to illegally download music, would prompt an individual to also illegally download [14]. These examples illustrate the important role subjective norms play in influencing behavior [13] and actions that may sometimes be considered unacceptable by others, sharing passwords [9] or illegally downloading music [14]. Theory on the emergence of norms suggests that norms generally arise through a process or cycle, from emergence, to cascade and finally internalisation [2]. The process is not linear, and procession to a final-state may never be achieved [2]. Other studies consider the nature of the emergence, and evolution of norms, seen as either a discursive or an operationalised process [15]. Discourse-driven norms rely on negotiated and cooperative recognition of shared beliefs and standards [2, 15]. Many norms are operationalised norms, constructed and or enforced, by agencies to support policy [5]. Some operationalised norms, are termed securitised norms, when they are developed and lie outside the norms of normal political actions [15]. Recent studies have identified the evolution of securitised norms, deliberately developed, and enforced by state actors [5], in response to acts by cyber terrorists and cybercriminals [5, 6, 15]. This type of norm, also termed a reactive norm, has been found increasingly to be the preferred approach as state actors respond and attempt to manage or enforce cyber security norms [5]. Another strand of research, not further examined here, points out that the actions of cyber terrorists and cybercriminals may be suggestive of norms, though clearly misaligned norms [2, 15]. Yet, although many individuals may approve of a practice such as privacy, using secure passwords, or not sharing passwords [9] or illegally downloading music [14], many may fail to enact that practice themselves, resulting in examples of so-called misaligned norms [10, 14]. Returning to norms for cyber security. A recent IGF Best Practice Forum (BPF) in 2021, evaluated the content of thirty-six international published agreements on cyber security norms [7]. The selection criteria were based on; being international, containing commitments, recommendations, and goals to improve the overall state of cybersecurity, and the involvement of significant actors, who operate in significant parts of the internet [8]. Next, the agreements were mapped to the initial eleven norms in the UN 2015 framework [3]. Key findings indicate the presence of norms for cooperation, adherence to human rights, reporting of vulnerabilities and the provision of remedies [8]. Although trust building and trust were identified as key in most of the agreements, roles were not made explicit [8]. Summing up, although much is known about norms, their emergence and how trust may form, relatively little is published about the trust in a discursive norm and how that trust may evolve (or not) as the norm evolves as a reactive policy-enforced, securitised norm [15]. In this paper, trust in the norm of zero trust is examined, specifically, as this norm evolves as a policy-enforced norm for cybersecurity

840

A. Wylde

3 Zero Trust and Norms Worldwide increases in numbers of cyber breaches, cyber attacks and organised crime have led to calls for the implementation of zero trust [5]. This approach reverses current norms, which are founded on presumptive trust [5, 6]. Zero trust is examined next. As the internet has evolved, the need to address the concurrent rise in cyber security threats has become critical. Organisational boundaries have become blurred, third parties, suppliers and clients now routinely operate inside organisational networks [4–6]. The evolution of the cloud and the internet of things have meant that organisations may not even recognise the limits of, or the assets inside their own networks [4–6].

Fig. 1. Zero trust: founded on continuous authentication of identity [6]

The network is now viewed as hostile and trust is viewed as a vulnerability [4–6]. Zero trust standards proposed by the National Institute for Standards and Technology (NIST) aim to prevent unauthorised access to data and service, together with ensuring access control enforcement is as granular as possible [16]. In zero trust two main practices dominate, the first relies on building trusted platforms [5, 6, 16] through continuous identity authentication as shown in above, in Fig. 1, the second, programmatically based, relies on a controller granting trust entitlements [14]. The UK’s National Cyber Security Centre (NCSC) relies on building confidence (as trust), through continuous authentication, authorisation and monitoring of the identity of users, devices, and services [6]. Although zero trust as an agreed norm is increasing in importance among practitioners, questions remain concerning understanding trust during the evolution of zero trust as a policy-driven reactive norm. Trust is considered next.

4 Trust Trust lies at the heart of international relations and statecraft; indeed, trust theory has been drawn on to create the steps involved in conflict resolution [17]. This paper argues

Questions of Trust in Norms of Zero Trust

841

that the application of well-established trust theory [18, 19] together with trust in conflict resolution [17], can be leveraged to provide a basis for understanding how individuals may trust, or not [19], in a new policy. This paper, limited to the prominent trust theory models [17, 18], re-examines the key processes. Substantial research into trust theory suggests the development of personal and professional trust relies on an individual’s belief and willingness to act, and their ability to trust, moderated by their trust experiences [17, 19]. Trust is therefore rooted in the individuals’ personality and belief system, which are shaped by their life experiences of trust and their experience of trust developed in any specific relationship [17]. Finally, trust is bound by the set of rules and norms as constructed by society as a multifaceted construct [17, 18]. Trust relations have been studied in and at, different levels, at the level of an individual trustor-trustee, and in/ at trust across multiple referents, teams, organisations and indeed, across institutions and nation states [20]. Trust has also been examined in non-person relations, such as, trust in technology [21], and trust in institutions and institutional (or policy) structures [22]. What is important for this paper is the idea that trust extends beyond personal relations, to include relationships with physical objects and entities [21, 22]. A prominent model for trust building is examined next. The integrative trust model evaluates the processes of trust building from the perspective of the trustor and trustee [18]. At the start, these processes comprise the presence of positive expectations of trust and, an initial assessment of trustworthiness by the trustor [18]. This is moderated by the trustors’ propensity to trust, together with their acceptance of vulnerability and finally, risk-taking [18]. The trust assessment then examines a trustee’s ability (can they do the job?), benevolence (do they hold good will?), and integrity (will they act honestly?); moderated by the individual trustor’s propensity to trust, in other words how intrinsically trusting they are [18]. These strands are drawn on to arrive at a generally held definition; trust is based on positive expectations of trust and the willingness of a party to be vulnerable to the actions of another party, based on an expectation that the other will perform an action important to the trustor, irrespective of any ability to monitor or control parties [18]. This definition will serve as a frame for this paper, through which the processes of trust may be understood, as norms develop. Like trust, norms may strengthen over time (seatbelt wearing, or password sharing) or sometimes diminish or disappear (smoking or drinkdriving) or be replaced (presumptive trust, by zero trust) [5, 6]. The case of SolarWinds is considered next.

5 SolarWinds In 2020, the SolarWinds breach was first reported by private sector cybersecurity companies FireEye and, as it was recognised that the US government’s inhouse cyber security programme (Einstein) was reportedly unable to detect trojanised software or to read encrypted network traffic [4]. More than 1,800 organisations worldwide were affected, across governments, the military, and the healthcare sector. A second wave attacked organisations that had been carefully targeted and further sensitive data was extracted [4].

842

A. Wylde

The extent of the breach and the sensitivities involved prompted high-level responses from government agencies. In April 2021, the US President Biden reportedly broke the norms of US foreign policy by both attributing the breach to a state-sponsored actor, and, in announcing punitive financial sanctions and expulsions of diplomats [23]. A senior official added that the SolarWinds hack, was ‘beyond the boundaries of acceptable cyber operations’ [24]. At the time of writing the US Cybersecurity and Infrastructure Security Agency (CISA) launched a new Joint Cyber Defense Collaboration (JCDC) [25]. Among the first actions of CISA was to launch a consultation on a policy-enforced implementation of zero trust [5]. This response of enforcement arguably, appears outside of normal policies [15]. Thus, for this paper the SolarWinds breach is considered to have acted as a driver for the evolution of an operationalised, reactive norm; zero trust. The responses serve to illustrate actions, considered from the perspective of the adopter [26] in relation to trust in zero trust as it evolves into a policy-enforced norm [5]; examined next.

6 Understanding Trust in Zero Trust As a result of the SolarWinds breach, the US CISA has called for their agencies to adopt zero trust [5, 25]. The question at the heart of this paper concerns how this response may play out; will policy-enforced zero trust be trusted? Trust models are next revisited to set out a potential approach to understanding trust in this context. By integrating the two foundational trust models [17, 18], an assessment of trust in discursive zero trust and policy-enforced zero trust can be undertaken [26], as set out in Table 1 below. This new assessment of trust in norms (ATiN) approach is proposed to allow the study of the nature of trust in discursive organic as compared with the trust in evolved, policy-enforced norms of zero trust. Drawing from the first trust model established in prominent conflict resolution studies [17], the elements comprise understanding an individuals’ beliefs and willingness to act, together, with their ability to trust (in this case to trust a norm of zero trust). These two elements are based on three questions. Firstly, what are the individuals’ experiences of zero trust? Secondly, how does zero trust fit with their personality? Thirdly, does zero trust mirror the established rules and norms? Next, the integrative trust model elements are considered starting with the propensity of the actor to trust, and their acceptance of vulnerability and risk [18]. Finally, zero trust is assessed, based on; firstly, is it good at doing the job (ability), secondly, does it act with the good of all in mind (in other words, benevolence), and finally, will its implementation promote honesty (integrity) [18]. By considering each of the twelve elements and processes involved in trust building (Table 1, above), researchers can evaluate an actors’ trust in a norm of zero trust that is organic, as compared with, a policy-enforced norm. The assessment first evaluates an actor’s beliefs and willingness to act and their ability to trust, in this paper, in zero trust. As stressed, for this paper, zero trust is viewed as a construct, a nonperson, technology [21] and/or policy domain [22]. Further, it is recognised that the understanding, beliefs, and actions of actors may differ across sector and scale (government, industry, society) [2]. Researchers can ask questions to tease out these subtle differences and begin to understand the nature and likelihood of trust and the acceptance of a discursive norm as compared with a policy-enforced norm.

Questions of Trust in Norms of Zero Trust

843

Table 1. Assessment of trust in norms (ATiN); elements and descriptions Element

Description

1

Beliefs and willingness to act

2

Ability to trust

3

Experience of zero trust

4

Zero trust ‘fit’ with personality

5

Zero trust mirrors prevailing societal rules and norms

6

Positive expectations

7

Propensity

8

Ability

9

Benevolence

10

Integrity

11

Acceptance of vulnerability

12

Risk taking behaviour

The bridge provided by a comparative assessment of presumptive trust and zero trust [26] is drawn on here, to allow an interpretation of trust in the discursive and policyenforced norms. If we begin by considering each element from the ATiN approach, we can start to examine the nature of trust in the norms of zero trust. Commencing with elements 1–4, in Table 1, above, we suggest that when a new discursive norm appears, most actors are unlikely to have experience or exposure to the new norm [2]. In consequence, their responses may be of low trust. However, when a norm has evolved as a reactive and policy-enforced norm, actors are more likely to have experience, and thus trust that norm. Questions concern whether the impact of the experience will result in more, or less trust, in the evolved norm? Also, what role will the nature of the evolution of the norm have on the trust of the actor? Next, considering element 5, rules and norms, for those with different industry experience, for example, individuals working in organic or loose organisations, compared with those in industry with high levels of regulation and compliance. The trust responses could vary between preference for discursive norms in the organic organisations as compared with evolved and enforced norms, expected by actors in highly regulated industries. Next, to account for differences in actors’ expectations and/ or propensities (elements 6 and 7), we could separate those actors who may possess intrinsic high levels of trusting behaviour, as compared with those actors who may be risk-takers (element 12), it can be imagined that these two groups, although very different, may be more likely to trust an organic than an evolved norm [15]. For the individual assessment elements (8–10) actors will make decisions based on their individual preferences, some may favour ability over benevolence and so on. The ATiN approach is next applied to evaluate the immediate actions by state actors after the SolarWinds breach as a first step in the interpretation of trust, in evolved norms, [4, 7]. Commencing with after the breach was reported, US reactions were rapid, in

844

A. Wylde

January 2021 the Biden administration and US agencies published their beliefs that state actors were responsible [6]. This response indicates the presence of element 1, willingness to act, and element 2, ability to trust (in their own decisions) [7]. Arguably, the actions were based on an experience of zero trust and personality (elements 3 and 4). Directives followed in April 2021, including prohibitions on US institutions from investing in/ or lending to foreign banks, and sanctions imposed on foreign companies [7]. For the elements related to zero trust, mirroring societal rules and norms (element 5). The implementation of zero trust was called for and immediately backed up by the establishment of the CISA. This response possibly overlaps with elements 1–4. For the elements of positive expectations and propensity (elements 6 and 7), as a zero trust posture was deployed, both elements were in a negative state [26]. The elements 8, 9 and 10 are illustrated by the testing of the draft policy of zero trust, through a press release and a public consultation [25]. Finally, as evidenced by the imposition of a zero trust stance, the consequences involved, both, a resistance to accepting vulnerability (element 11), and an avoidance of risk-taking (element 12) [27]. Conducting an ATiN analysis has thus teased out and evaluated the separate elements of active trust building, and an avoidance of presumptive trust, consistent with the implementation of a policy-led posture, of zero trust [26, 27]. For the future, as it may be too soon to say now, a comparative analysis of the nature of trust in zero trust between the discursive, and the policy-enforced zero trust norm, remains to be conducted. In fact, all trust elements need to be considered as liable to the bias of the trustor. Summing up, the ATiN approach has enabled the multiple nuances, processes, and biases in trust formation in policy-enforced norms to be distinguished.

7 Conclusion, Limitations Implications In this paper, focused on trust in zero trust, well-established trust models have been leveraged in a first attempt to construct a novel approach to assessing trust in norms. The work has helped our understanding of how the trust in a discursive norm may be compared with the trust (or not) in a policy-enforced norm. Preliminary evaluation points to promising directions for future work. As a short paper the discussion is inevitably limited. In reviewing key theory on norms, specifically, norms for cyber security, and the emergence of zero trust, trust building theory was leveraged to provide a mechanism to allow this study. The SolarWinds breach was then discussed together with the context of the emergence of state-enforced norms. In Sect. 6 above, the ATiN approach was first presented as novel assessment that integrates prominent trust building models [17, 18]. In setting out twelve separate elements and processes involved in trust building, ATiN enables researchers to begin to understand the nature of trust in norms. A preliminary evaluation was conducted on the immediate responses to the SolarWinds breach. The initial findings appear promising; application of ATiN has enabled examples of actions involving, trust building and the implementation of non-presumptive trust to be evaluated. Findings point to a posture accords with the presence of a policy-enforced norm, based on, zero trust. It is proposed that future empirical studies address the limitations identified in this paper, through leveraging on the foundational work, presented here. Several promising

Questions of Trust in Norms of Zero Trust

845

avenues for future studies arise. Notably, research on the automation of zero trust policy management, including, specifying trust parameters, to enable continuous monitoring, authentication, and authorisation [5, 6, 16]. Research could yield results that may help inform the implementation of zero trust through study of the potential acceptance of a policy-enforced norm. As suggested earlier, misaligned norms also appear to evolve in parallel [10, 14]. Detailed study of behavioral intention, and trust, in password sharing or illegal downloading of media, could help inform the development of countermeasures. Although governments and industry may propose and provide operational training in zero trust [25], in practice, zero trust is hard to implement due to legacy infrastructure and resource limitations [5, 6]. For the future, understanding the likelihood of the successful implementation and operation of organic and policy-enforced cyber security norms is crucial. The contribution of this paper helps to disentangle the previously poorly understood, yet central, role of trust in norms. Acknowledgments. The author gives thanks to the anonymous reviewers for their insightful comments, which helped develop and improve this manuscript. Thank you also to BPF colleagues Fred Hansen and Barbara Marchiori de Assis for their valuable discussions. Any mistakes or omissions remain the sole responsibility of the author.

References 1. Katzenstein, G.P.: The Culture of National Security: Norms, and Identity in World Politics. Columbia University Press, New York (1996) 2. Finnemore, M., Sikkink, K.: International norm dynamics and political change. Int. Organ. 52(4), 894–905 (1998) 3. United Nations (UN) General Assembly. Group of governmental experts on developments in the field of information and telecommunications in the context of national security. General Assembly, UN, July 2015. https://www.ilsa.org/Jessup/Jessup16/Batch%202/UNGGER eport.pdf 4. Truesec. The SolarWinds Orion SUNBUSRT supply chain attack. Truesec, December 2020. https://www.truesec.com/hub/blog/the-solarwinds-orion-sunburst-supply-chain-attack 5. Cybersecurity and Infrastructure Security Agency (CISA). Zero trust maturity model, June 2021. https://www.cisa.gov/sites/default/files/publications/CISA%20Zero%20Trust% 20Maturity%20Model_Draft.pdf 6. National Cyber Security Centre (NCSC). Zero trust architecture design principles. NCSC, July 2021. https://www.ncsc.gov.uk/collection/zero-trust-architecture 7. UN Internet Governance Forum (IGF) BPF. Testing norms concepts against cybersecurity events. UN. IGF BPF, November 2022. https://www.intgovforum.org/en/filedepot_download/ 235/20025 8. UN IGF BPF Mapping and analysis of international cybersecurity norms agreements. UN. IGF BPF, November 2021. https://www.intgovforum.org/en/filedepot_download/235/19830 9. Wired. Netflix’s password sharing crackdown has a silver lining. WIRED, December 2021. https://www.wired.com/story/netflix-password-sharing-crackdown 10. Smith, J., Louis, W.R.: Do as we say and as we do: the interplay of descriptive and injunctive group norms in the attitude-behaviour relationship. Br. J. Soc. Psychol. 47, 647–666 (2008) 11. Cialdini, R., Kallgren, C.A., Reno, R.: A focus on normative conduct: a theoretical refinement and reevaluation of the role of norms in human behavior. In: Zanna, M.P., (Ed.) Advances in Experimental Social Psychology, pp. 201–234 (1991)

846

A. Wylde

12. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991) 13. Venkatesh, V., Davis, F.D.: A theoretical extension of the technology acceptance model: four longitudinal field studies. Manag. Sci. 46(2), 186–204 (2000) 14. Levin, A., M. Dato-on, M., C. and Manolis, C.: Deterring illegal downloading: the effects of threat appeals, past behavior, subjective norms, and attributions of harm. J. Consum. Behav. 6(2/3), 111–122 (2007). https://doi.org/10.1002/cb.210 15. Drew, A.: Securitising cyber-capability: an analysis of norm construction methods. PhD thesis. University of London, February 2019. https://core.ac.uk/download/pdf/294771701.pdf 16. Rose, S., Borchert, O., Mitchell, S., Connelly, S.: Zero trust architecture. NIST, August 2020. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-207.pdf 17. Deutsch, M.: Trust and suspicion. J. Confl. Resolut. 2(4), 265–279 (1958) 18. Mayer, R., Davis, J., Schoorman, F.: An integrative model of organizational trust. Acad. Manag. Rev. 20(3), 709–734 (1995) 19. Lewicki, R.J., McAllister, D., Bies, R.: Trust and distrust: new relationships and realities. Acad. Manag. Rev. 23(3), 438–458 (1998) 20. Fulmer, A., Gelfand, M.: At what level (and in whom) we trust: trust across multiple organizational levels. J. Manag. 38(4), 1167–1230 (2012) 21. Mcknight, D.H., Carter, M., Thatcher, J.B., Clay, P.: Trust in a specific technology: an investigation of its components and measures. ACM Trans. Manag. Inf. Syst. 2(2), 1–25 (2011) 22. Mcknight, D.H., Chervany, N.L.: The meanings of trust. Carlson School of Management, Univ. of Minnesota (1996) 23. NCSC. NCSC statement on the SolarWinds compromise. NCSC, December 2020. https:// www.ncsc.gov.uk/news/ncsc-statement-on-solarwinds-compromise 24. Voltz, A.: In punishing Russia for SolarWinds, Biden upends US convention on cyber espionage. Wall Steet Journal, April 2021. https://www.wsj.com/articles/in-punishing-russia-forsolarwinds-biden-upends-u-s-convention-on-cyber-espionage-11618651800 25. CISA. CISA launches a new joint cyber defense collaborative. CISA, August 2021. https:// www.cisa.gov/news/2021/08/05/cisa-launches-new-joint-cyber-defense-collaborative 26. Wylde, A.: Zero trust: never trust always verify. In: 7th International conference on Cyber Security for Trustworthy and Transparent Artificial Intelligence, (CYBER SA 2021), pp. 1–4. IEEE (2021) 27. Microsoft. Security: a guide to building resilience, solution guide Series. Microsoft, July 2020. https://clouddamcdnprodep.azureedge.net/gdc/gdcPJ9yCm/original

Leveraging Zero Trust Security Strategy to Facilitate Compliance to Data Protection Regulations Jean-Hugues Migeon1 and Yuri Bobbert2(B) 1

2

ON2IT B.V., Zaltbommel, Netherlands [email protected] Antwerp Management School, ON2IT B.V., Zaltbommel, Netherlands [email protected]

Abstract. Implementing privacy requirements into technology is cumbersome. On the one hand, we see the speed at which technology develops; on the other hand, we observe ambiguity during the implementation of privacy regulations into the operation of organisations. It is like replacing one engine of a plane during the ﬂight. You cannot freeze the environment and implement, test and release. Zero Trust Security is a strategic approach to information security, deﬁning critical segments that house crown jewels and implement security measures according to a structured process that also encompasses data privacy requirements. A Zero Trust Segment can be a High Valuable Asset that processes Personally Identiﬁable Information (PII) that needs protection. The aim of this paper is to describe the GDPR implementation problems at hand and elaborate on the empirical examination with Chief Information Security Oﬃcers (CISO’s) and Data Protection Oﬃcers (DPO’s) to complement the ON2IT Zero Trust Framework with additional data protection requirements. Keywords: Zero trust · Data protection · General Data Protection Regulation (GDPR) · Compliance · Design science research · Group Support System (GSS) · Cybersecurity

1

Introduction

Data protection and privacy-protective regulations developed during the past decade set the bar for Security at a much higher level than before. The EU General Data Protection Regulation (GDPR) set principles for the processing of personal data to companies; lawfulness, fairness and transparency of processing, purpose limitation, data minimization, storage limitation, data security and accountability. These lay the foundation of a whole new ﬁeld of documenting internal processes, increasing the administrative burden of operational employees and managers who need to review and approve it. Often, this is executed c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 847–863, 2022. https://doi.org/10.1007/978-3-031-10467-1_52

848

J.-H. Migeon and Y. Bobbert

in Sharepoint or Excel lists which is a risk on its own [6,21]. The same happened in the US, as the California Consumer Privacy Act (CCPA), amongst others, additionally emphasizes the notiﬁcation to data subjects before processing of their information. It also relies on the awareness of employees, individuals and organisation to make the use of privacy rights a success. Under the CCPA, any consumer can request a company not to sell their data. That is to say that it makes data management much more complicated than before, and to be eﬀective, requires a clear overview of dataﬂows. Both regulations emphasize the importance of data security and the necessity for companies to take all measures available to ensure the protection of personal information that a company could detain. 1.1

Problem Statement

After two full years of implementation of the European regulation, and a year for the CCPA, information security and its associated measures seem to be the most challenging issue for companies to comply with. A quantitative analysis by lawﬁrm CMS [9] of the most violated GDPR articles demonstrates that the type of GDPR violation that occurred the most frequently seems to be “insuﬃcient legal basis for data processing” (article 6). However, the type of GDPR violation that has the highest sum of (publicly published) ﬁnes imposed since the enforcement of the GDPR is the “insuﬃcient technical and organisational measures to ensure information security” [14]. As stated in article 32 of the GDPR: “Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing, as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor, shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk”. It is understood as companies must ensure the security of information, according to their capacities, resources and technologies available on the market. This does not necessarily give much more insights to companies on what they need to do to “secure” their data. It also conﬁrms that Information Security and Cybersecurity have become increasingly important to the core business of organisations [11]. This also concurs with the idea that the legal Framework had to be adapted to the latest development, the 2010s being the ﬁrst full decade where people truly embraced internet [17]. Since 2012 marketing consultancy ﬁrm Kepios collects digital behaviours of people in every country around the world. During this period, the number of internet users increased from 2.07 billion users (30% of the population) in 2012 to 4.57 billion (59%) in 2020. Besides, the latest report showed that over 60% of internet users reported concerns about the misuse of personal data and other privacy breaches. We learn two lessons here. The ﬁrst one is that, back in 2012, privacy concerns were not monitored and certainly not considered as having an impact on Marketing. The second one is that nearly 2.75 billion people are concerned by the Security of their data and their Privacy. The cause or the consequence of this: a new plethora of privacy legislation to which companies need to comply to. This leads to the following problem:

Leveraging Zero Trust Security Strategy to Facilitate Compliance

849

“Privacy frameworks lack the operational guidelines for implementation security; companies are expected to make enlightened choices for technical and organisational security measures. The development of interconnection between users, devices and networks complicates the governance of information, and most business activities are now data-driven for which is expected a burdensome accountability. As a result, organisations struggle to focus on core business processes.”

1.2

Information Security Frameworks to Protect Privacy

Protecting assets that might pose a privacy risk is a core part of doing Business. In 2004 Weill and Ross linked IT governance to Corporate governance practices such as risk management [20]. In Fig. 1, we distinguish Corporate Governance principles and the translation into Business and IT objectives to be directed as policies and standards into the organisation in order to protect assets. One of the major non-tangible assets, data, has made its entrance in the organisation since 2004. Therefore, we have enriched the initial diagram with the role of the data protection oﬃcer who is responsible for ensuring the compliance to the highest data protection standards. But ﬁrst, we elaborate on the several framework developments seen over the last decades, it is not our intention to be exhaustive, but we aim at highlighting the most relevant ones.

Fig. 1. Corporate governance and key asset governance (taken from IT governance; how top performers manage IT decision rights for superior results) 2004 (updated in 2020 by Bobbert & Migeon).

850

J.-H. Migeon and Y. Bobbert

As part of an answer, Information Security frameworks have started to include Privacy as an element to be taken into account while managing information security. The National Institute of Standards and Technology (NIST) released its Privacy Framework, to help organisations identify and manage privacy risk to build innovative products and services while protecting individuals’ Privacy. As an extension to the ISO27001/2, the International Standard Organisation (ISO) released the ISO 27701 in August 2019 on Privacy information management. Thus, if an organisation has implemented ISO 27001, it can use ISO 27701 to extend its security eﬀorts to cover privacy requirements. Organisations that have not implemented an Information Security Management System (ISMS) can implement ISO 27001 and ISO 27701 together as a single implementation project, but ISO 27701 cannot be implemented as a standalone standard. Only combined with ISO 27001, ISO 27701 can help organisations to comply with key privacy laws like the GDPR. However, one of the main issues of this extension is that the focus of ISO 27701 is data security. It does not provide ﬁrm guidance related to data privacy (governance and management) aspects mentioned in the GDPR like the rights of the data subjects, etc. These topics are only brieﬂy “touched upon” in ISO 27701. For eﬀective privacy governance and management, more is needed than extending the existing data security activities - Privacy asks for diﬀerent skills, knowledge and mind-sets. Other frameworks, such as the NOREA privacy control framework from the homonym Dutch audit company, oﬀers an additional 95 controls purely focused on privacy information management within your company. Yet, most of these controls require manual documentation, procedures and does not involve senior management and board level. It is in that regard that we consider that Privacy and Data Protection face a similar issue to what information security faced in the past decades. While being a “trend”, most companies threw themselves into information security without a clear governance and business approach. 1.3

Zero Trust as a Security Strategy

In the past years, a spotlight has been set on a new type of network security architecture called “Zero Trust” which is seen as a way to enhance the security of a company’s network, and, incidentally of its data. While theorized in 2010 by John Kindervag [13], then working at the research and consulting ﬁrm Forrester, Zero Trust has gained little attention at senior management and board level. Indeed, it is predominantly being viewed, examined and practised by technicians and architects, mainly focuses on managerial level but lack operational detailing that DevOps teams and engineers can get proper guidance from. Thus, stuck at a mid-management level, with no Senior Management support and no proper guidance for the operational lines, Zero Trust toiled to be adopted by companies. Fundamentally, a network is a system consisting of many similar parts that are connected together to allow movement or communication between or along the parts. Simply put, there exists thousands of networks; some like the Internet is not owned by a single entity, regroup multiple networks and can be accessed by any device. Others, like an intranet, are under the control of a single

Leveraging Zero Trust Security Strategy to Facilitate Compliance

851

entity (e.g. a company) and access to them are limited to chosen devices and users. In Security, the traditional architecture breaks networks into zones contained by one or more ﬁrewalls. Each zone is granted some level of Trust, which determines the network resources it is permitted to reach. Thus, riskier devices, such as web servers facing the public internet, are placed in an exclusion zone (“De-Militarized Zone” or DMZ). Zero Trust rejects the traditional conception of a network and considers that all devices, all users and all networks are interconnected in one global network (Internet) and that no such thing as a ‘trusted’ network exists. This can be summarized in 5 core principles for Zero Trust : 1) The network is always hostile; 2) threats can come from anywhere at any time; 3) a local network is not a trusted network; 4) every device, user or ﬂow must be authenticated and authorized; and, 5) a sensor policy must be robust and dynamic and extracted from as multiple data sources as possible [8]. In addition to these ﬁve core principles, an additional principle: 6) assume breach [15] For this, the log collection and interpretation in an automated manner to quickly respond in case of a breach sometimes referred to a Security Orchestration Automation and Response (SOAR) platform. And the last principle to added is: since digital environments become more extended, hybrid (Cloud, IoT, OT) and thereby complex, this raises the need for a real-time administration and dashboarding 7) [8]. According to Jagasia “Zero-Trust does not require the adoption of any new technologies. It’s simply a new approach to cybersecurity to ‘never trust, always verify,’ or to eliminate any and all Trust, as opposed to the more common perimeter-based security approach that assumes user identities have not been compromised, all human actors are responsible and can be trusted” [12]. Although the term “Zero Trust” is often understood as individuals human beings cannot be trusted, Zero Trust implies in fact that humans can be trusted but that there is a need to verify before granting access and authorization. And Jagasia to quote; “perimeter-based security primarily follows ‘trust and verify,’ which is fundamentally diﬀerent from ZTA’s paradigm shift of ‘verify’, and then trust.”. 1.4

The Role of the CISO and the DPO

With data privacy legislation like the GDPR and CCPA on the rise, we can state that reputational risks has increased over time, including the ﬁnancial risks of ﬁnes. Data privacy governance is a relevant (corporate) governance topic. Kuijper highlights (see Fig. 2) the dual role of the CISO and DPO in that respect based on the “principle-agent-problem” [14]. Kuijper quotes; “This well-known ‘agent’ serving the digital interest of the company (the ‘principle’) by protecting the companies digital assets is often the CISO (Corporate Information Security Oﬃcer). However, due to new legislation like the GDPR, we also see a new type of ‘agent-principle’ relationship occurring, namely the individual gaining more inﬂuence (via legislation like the GDPR) on how companies must handle their digital assets in case it contains privacy relevant personal information - a more outward-looking view. Due to these new kinds of stakeholders, backed by a ‘super principle’ (the legislator), we also have seen new roles (or ‘agents’) like the DPO

852

J.-H. Migeon and Y. Bobbert

(or Data Privacy Oﬃcer) occurring in the organisation, focusing on safeguarding the interest of this new ‘principle’: the individual and its privacy rights (with the focus on conformance).” Kuijper continues: “This new reality with diﬀerent types of ‘agents’ organized in diﬀerent roles like the CISO and the DPO and diﬀerent ‘principle’s’ like the company owners and/or shareholders versus the individual backed by the legislator makes (IT) governance and management much more complex and multi-faceted.”

Fig. 2. Data and InfoSec governance: more complex and multi-faced due to multiple requirements and principle/agent relations [14]

Since the role of the CISO and DPO need to work together in achieving individual and company obligations, we have chosen to involve them both for the validation of the Framework to be applicable for Privacy. To address the GDPR implementation problems at hand, we elaborate on the empirical examination with CISO’s and DPO’s to complement the ON2IT Zero Trust Framework with additional data protection requirements. This empirical validation allows to incorporate new function in the Zero Trust Framework and technology artefact. We ﬁnish oﬀ by demonstrating via screenshot of the artefact; the Relevance Score and geographical location per Zero Trust segment, essential for proving GDPR Compliance.

2

Proposing the ON2IT Zero Trust Framework for Privacy

To render a Zero Trust architecture a viable strategy for a company, all actors must be involved. This includes Senior Management, Board members as operational engineers. It is in this regard that, over the past year, ON2IT has developed a framework for addressing Cybersecurity and Privacy issues. The Framework initially was proposed in 2019 and published in June 2020 by Bobbert and

Leveraging Zero Trust Security Strategy to Facilitate Compliance

853

Scheerder [7]. The research paper established the state of the art in the Zero Trust usage and inventoried the limitations of current approaches and how these are addressed in the form of Critical Success Factors in the Zero Trust Framework developed by ON2IT ‘Zero Trust Innovators’. Moreover, they describe the design and engineering of a Zero Trust artefact that addresses the problems at hand. Finally, their paper outlines the setup of an empirical validation trough practitioner-oriented research, to gain a broader acceptance and implementation of Zero Trust strategies. The ﬁnal result is a proposed framework and associated technology which, via Zero Trust principles, addresses multiple layers of the organisation to grasp and align cybersecurity risks and understand the readiness and ﬁtness of the organisation and its measures to counter cybersecurity risks. – On the strategic - or governance - level, it is paramount to question the organisation’s leadership capabilities, the roles and the accountability process in place in order to facilitate and execute at best a Zero Trust strategy: Know your environment and capabilities. – On the managerial level, the focus is on the structure of the organisation, its processes and the relational mechanisms (reporting, roles, tone at the top, and accountabilities) in place to execute the strategic zero trust objectives, e.g. Capabilities for logical segmenting : Know your risks. – On the operational level, the organisation’s current or future technology must be listed and assessed in order to utilize zero trust measures: Know your technology. As a result of that publication, Bobbert and Scheerder presented and published the engineered Zero Trust artefact to ultimately measure and monitor Zero Trust implementations [8]. Bobbert and Scheerder proposed an empirical research with Chief Information Security Oﬃcer (CISO’s) and Data Protection Oﬃcer (DPO’s) validating this Framework collectively due to group collaboration in small groups [16]. And more speciﬁcally on privacy issues that can be addressed via Zero Trust. This research paper continues on that research proposition of longitudinal research.

3

Research Approach

Bobbert and Scheerder propose in their paper longitudinal empirical validation of the Zero Trust Framework [7] with practitioners via Group Support System Research (GSS). The application of GSS for such large-scale longitudinal research has been identiﬁed by De Vreede [19]. The opportunity for larger-scale longitudinal research lies speciﬁcally in gaining knowledge to establish a collective knowledge base [18]. Turoﬀ states that using data from large data sets gained from larger groups can be very helpful to generate more and better ideas. This is desirable because “when the group represents an organisational membership, this is probably a very feasible and desirable pre-step to the execution of the examination of the issue”. According to Pai, this early stakeholder engagement is also relevant for solving epistemological and ontological problems, and it makes

854

J.-H. Migeon and Y. Bobbert

GSS suitable as a collaborative research method. The use of GSS in validating frameworks, technological artefacts and models is described in multiple papers [2–5]. The use of Group Support System (GSS) method allowed to deepen the understanding of the topic of the audience before requesting the participants to validate the Zero Trust’s questionnaire on its clearness and completeness. During the GSS sessions, the participants were invited to share their understanding, inherently generating new knowledge and peer-reﬂection on the Zero Trust topics. For each group of practitioners, new insights were presented to the next group for consideration. This “double-loop learning” [1] provides additional scrutiny to the latter and thereby contributes to the overall quality of the ON2IT Zero Trust framework. The sessions were executed by a professional GSS moderator by means of the technology of Meetingwizard, which, according to Hengst, is key [10]. The technology allows to keep record, for all steps, of scores and arguments as presented and assures therefore objectivity, transparency, controllability and repeatability. In order to ensure a proper preparation of the GSS session, each participants received, prior to the meeting, a predeﬁned agenda, a clear introduction and the readiness questionnaire. The results of the 11 sessions held among 70 participants with GSS are detailed in Table 1. In the GSS based research sessions, CISO’s and DPO’s were asked to validate and improve the framework and associated maturity assessment questions. The validation criteria were on: relevance (does it reﬂect a relevant domain), completeness (complement the list of questions to make it more exhaustive) and clearness (can participants understand the question and suggest improvements to avoid ambiguity). Table 1. Overview of the total amount of participants (CISO’s and DPO’s at Strategic Level) for the GSS validation sessions. [7] Dates of validation session (2020) Participants February

5

March

7

April

6

May

9

June

2

July

7

August

4

September

5

October

7

November

8

December

10

Total participants

70

Leveraging Zero Trust Security Strategy to Facilitate Compliance

4

855

Results for a Zero Trust Framework Applied to Privacy

Based on the results of the empirical validation, we improved the Framework. Indeed, the objective of the ON2IT Zero Trust Framework is to act as a guide for boards and managers prior to starting a Zero Trust strategy and during its implementation. We list the major ﬁndings for improving the Framework and how to leverage privacy enhancements: – The use of a common language is essential. It can be achieved by making use of existing control framework as of level >3, for example, ISF, NIST Cybersecurity Framework, NIST privacy Framework, PCI DSS (Payment Card Industry Data Security Standard) or ISO27000 controls. CMMI based maturity levels on a 1 to 5 scale are applied. Incl ISO15504 maturity criteria, based on audit terminology (ToD, ToI, ToE) that NOREA is using. – Following the questions of the strategic level (Know your environment and capabilities), we can identify additional information valuable for the organisation. Indeed, we can determine whether Business, Privacy and IT are aligned, whether threats and trends are identiﬁed and if they inﬂuence the enterprise risk management (ERM) and is followed by appropriate ownership at board and managerial level (according to the COBIT EDM model). On a managerial/tactical level, the NIST framework can be used to contextualize more while on an operational level ISO can be used. This leaves the following frameworks as references for their own level: – COSO/COBIT - Strategic (Enterprise-Level Approach to Risk Management) – ISO - Operational (Initiative/Program-Level Approach to Risk Management) – NIST - Tactical (Asset/Project-Level Approach to Risk Management) – An additional remark was that Zero Trust addresses Cybersecurity, but the issue is much broader. In this empirical validation, we have investigated Cyber and Privacy, the coverage of Zero Trust can even be broadened to Digital Security as displayed in Fig. 3. Taken into account; Internet of Things (IoT), Operational IT (OT) and Physical Security (Closed-circuit television (CCTV), also known as video surveillance) were also privacy risks can occur. Remark from the DPO of Hikvision – The Framework encourages and should require a strict sign oﬀ for board members on the preconditions of the plan before you can implement Zero Trust. Organisations that use the COBIT5 or COBIT2019 processes and design principles can plot the later to the EDM layers of Governance, Management and Operations. This brings the required common language on technical and organisational security measures. – Each DAAS element requires clear ownership and Conﬁdentiality, Integrity, Availability annotation in a repository (e.g. CMDB) in order to ensure adequate asset qualiﬁcation and even quantiﬁcation so security measures can be assigned to these assets. A Relevance Score, on scale 0–100, is deﬁned by combining the existing standard Business Impact Assessments (BIA) and Privacy

856

J.-H. Migeon and Y. Bobbert

Impact Assessment methods (PIA). This Relevance score is also aﬀected by the presence of personal data, the importance of the DAAS for the Businesscritical processes and the type (sensitive or regular) of personal data. Therefore, it determines the type of Security measures to be applies. With 0 being a segment with low exposure and 100 with high exposure. We improved this relevance score by increasing the importance of Personal data processing in segments and weighing the risk linked to Sensitive Personal data after the discussions with DPO/CISO’s. In addition, the context of processing personal data has been added to the Relevance score, including for example the association of ﬁnancial data in a PCI-related context. See next section our GSS reﬂection. – By assessing the readiness of the organisation regarding its processes and structures as well as its technological ﬁtness to utilize Zero Trust, transparency is given into the current and desired states. Some participants raised the concerns that in large environments, this segmenting and putting measures in place might take years. Monitoring the progress is vital not to lose attention and urgency. Note from CISO of insurance company. 4.1

Discussion on the GSS Session Results

After each session, we have reﬂected on the outcomes. In this section, you will ﬁnd our main take-aways from these; – Privacy documentation truly gains value with Business activities as a starting point and continuous security-based changes. This approach provides a context to the data and embeds privacy management processes within the existing organisation’s processes. For example, when documenting the processing of personal data, one must determine the type conﬁdentiality of personal data processed and the security measures to be applied. Yet, the simple diﬀerentiation between “regular” personal data and “special” personal is somewhat arbitrary and hardly grasps the practical granularity of personal data. An additional intermediate category must be set, as “sensitive” personal data. – The consequences of a leakage of speciﬁc data, such as a name or an email address is much diﬀerent from others, such as a social security number, any unique identiﬁer or even more sensitive health data. For most companies which already added this step, the diﬀerentiation stops here; regular contact data is classiﬁed as “regular” (or C2 data, “C” standing for Conﬁdentiality), ﬁnancial or health data is “sensitive” (C3), and all “special data” as stated by the GDPR are classiﬁed at highest levels (C4 or more) when their processing is justiﬁed. Yet, we argue that the problem is even more prominent when it comes to “combining” data altogether. Thus, whether we process 15 types of “regular” data or 1, the potential harm for an individual diﬀers and therefore, security measures should diﬀer accordingly. Indeed, the more information available, the more security measures should be enabled. The same situation applies if an application processes ﬁnancial data only and is classiﬁed as a “sensitive” data (C3). What if this same application also combines this

Leveraging Zero Trust Security Strategy to Facilitate Compliance

857

ﬁnancial data with regular data such as a “name” for example? Following a typical scoring scheme, its Conﬁdentiality level would remain at C3 and, theoretically, similar security measures should be taken. Yet, the potential harm for an individual would be much more signiﬁcant with their name and ﬁnancial status displayed than solely their ﬁnancial status. As it emerged from the discussion, the relevance score had needs to be determined by the amount of data, the type of data, the amount of data subjects and the possible legal, contractual or regulatory impact in correlation with the reputational damage and ﬁnancial impact in case of breach. The detailing of the criteria is as follows: – 0–25 (CI11): No personal data, no sensitive data, limited number and amount of ﬁnancial data. No possible legal, contractual or regulatory impact. Minor local reputational damage possible. Medium ﬁnancial impact. – 25–50 (CI22): Limited amount of personal data (¡4 diﬀerent data types) and a limited amount of data subjects, no sensitive data, limited number and amount of ﬁnancial data.No possible legal, contractual or regulatory impact. Minor local reputational damage possible. Medium ﬁnancial impact. – 50–75 (CI33): Personal data or ﬁnancial data available, Legal, contractual, or regulatory impact possible. Reputational damage can be locally impacted. High ﬁnancial impact. Industrial Control Systems with High availability. (with business case and all applies with potential waiver process) – 75–100(CI44): Special personal data or sensitive ﬁnancial data available. Serious Legal, contractual, or regulatory impact (serious ﬁnes, suspension or loss of license) possible. Risk of sustained (inter)national reputational damage. Industrial Control Systems with High availability requirements. Very high ﬁnancial impact. (At any cost) Following the deﬁnition of the relevance score, we proceed to the application of DAAS labels (see Table 2 below) in order to prepare the monitoring of the segments A similar situation arises when it comes to determining whether a processing is performed at a large scale. Only a few guidelines on the subject can be found. Whether it is the Working Party 29 or the Data protection authorities, they mostly diﬀer, notably in their interpretation of what is a large-scale processing. Yet, it is a determinant factor in the risks posed to individuals and their rights. The relevance score of a segment should be linked to the reuse of personal data, data combination and the percentage of the population concerned by the processing.

5

Leveraging the Zero Trust Strategy to Facilitate Compliance to Data Protection Regulations

Putting the strategy in practice, we use a portal technology (see Fig. 3) that provides information on the maturity level of each segment, the applicability of

858

J.-H. Migeon and Y. Bobbert Table 2. Application of DAAS labels according to the relevance scores Relevance score DAAS label indentiﬁers 0–25 (CI11)

Not applicable (Public information)

25–50 (CI22)

Subject to audit, Core processing 3rd party access to data Personal data (PII)

50–75 (CI33)

Personal data (PII) Core processing 3rd party access to data Conﬁdential data

75–100 (CI44)

Personal data, Industrial Control Systems. Personal data (PII) Core processing 3rd party access to data Conﬁdential data

a regulation to it and the security measures in place (IST/current) and the missing security measures (SOLL/desired). As each segment has a deﬁned owner, the DPO can immediately spot and act on a non-compliance of segments such as the absence of mandatory security measures (e.g. enforcing Multi-Factor Authentication (MFA) on a segment containing Health data). When it comes to managing compliance to multiple frameworks, especially in large corporate companies, it is diﬃcult to keep track of what legislation or regulation applies to the diﬀerent processes. As an example, PCI-DSS applies to companies handling credit card information; then, one can immediately identify segments to which higher security standards apply, such as a higher encryption of the data. For non-EU companies who need to comply with various regulations GDPR, CCPA, LGPD, etc. it becomes easier to spot on what are the relevant segments. Segments involving personal data of European residents will potentially need to apply diﬀerent security measures than segments out of GDPR’s scope. 5.1

From Corporate Governance Policies to Zero Trust Measures

In more technical terms, in a Zero Trust architecture, controls aka measures are implemented to minimise the attack surface in-depth, and to provide immediate visibility and, hence, swift and to-the-point incident response. First, by identifying traﬃc ﬂows relevant to a (closely coupled) application in physical networks, the notion of segmentation takes a step further; the term ‘microsegments’ has been coined. As stated by Bobbert and Scheerder [7], such segments contain a (functional) application. By this additional segmentation, a ‘micro perimeter’ is formed that can be leveraged to exert control over, and visibility into,

Leveraging Zero Trust Security Strategy to Facilitate Compliance

859

Fig. 3. Screenshot of the zero trust segments deﬁned for the organisation.

traﬃc to/from the contained (functional) application. A policy governs the trafﬁc ﬂows, and a Zero Trust architecture prescribes defence in depth by isolation. With these mechanisms, we can be much more speciﬁc in our Security and privacy approach: – The policy regulating traﬃc to and from a Zero Trust segment – is speciﬁc and narrow, satisfying the ‘least access’ principle: it allows what’s functionally necessary, and nothing more; – is, whenever possible, related to (functional) user groups; – enforces that traﬃc ﬂows contain only the network applications; – enforces content inspection (threat detection/mitigation); – visibility is ensured. – Logs are, whenever possible, related to individual users; – Presence and conformance of policy is operationally safeguarded; – Policy is orchestrated, if applicable, across multiple components in complex network paths; – Operational state and run-time characteristics (availability, capacity instrumentation) are structurally monitored. At the endpoint level, agents can be introduced to detect and mitigate malicious processes. When doing this, ﬁne-grained endpoint behaviour extends the visibility beyond the network layer, and ‘large data’ analysis of (user) behaviour becomes viable, further deepening both visibility and defence in depth. Extracting the telemetry data -near-real-time from these technological measures is needed to feed this data back to tactical and strategical levels and promptly

860

J.-H. Migeon and Y. Bobbert

respond and telecommand back . This relates to the increasingly relevant question: “how to inform the Data protection authorities in hours after a data breach?” 5.2

How Zero Trust Measures Enables Data Protection Laws

Zero Trust Framework enables companies to comply to article 32 of the GDPR and tackle the issue that is being pointed out in the beginning of our paper and by all supervisory authorities. They point out that technical measures are not implemented associated to the risk to personal data. They are just added since they are called “security measures”, but it is unclear to what extent they lower the risk to personal data. Zero Trust enforces to identify and explicate, via the relevance score, in conjunction with Business or asset owners, which understand and ratify what type of data in the segment and select suitable measures for appropriate protection of the data.We mention three examples: 1. Processing large data sets; The Zero Trust measure “data analytics or content inspection” allows to monitor and inspects traﬃc going back and forth to certain data stores. This also prevents large unauthorized scale database cross-referencing (two data sets put in the same repository but without the legal basis and consent of the user to do the cross-referencing. 2. The authority requires organisations to report breaches, including the Root Cause of the breach as well as the aﬀected entities. Zero Trust divides the organisation into small segments and microsegments, so it enables to quickly ﬁnger on the sore spot due to the log analytics and telemetry data gained via logs. Including the amount of exﬁltrated of impacted data. 3. Enforcement of technical measures in large global companies; Local data protection laws are hard to enforce for international acting companies. The Zero Trust measure restricted inbound access allows to set data access restrictions on geographical segment level.

6

Discussion

The application of the Zero Trust Framework explicitly recognizes the major shortcomings in the article 32 of the GDPR and the administrative burden that derives from the principle of accountability. We detail the most relevant ones; The Zero Trust Relevance score goes further than the Conﬁdentiality score used in traditional standards. When assessing the Conﬁdentiality of a system, you list the data that you can ﬁnd as C1, C2, C3 or C4. The system’s Conﬁdentiality is then determined by the highest level you can ﬁnd (e.g. the most conﬁdential data is C2). However, traditional CIA methods do not take into account the fact that the system might contain a lot of C2 information, which could be much more harmful if it was disclosed than if isolated C3 data were to be disclosed. The Relevance score takes into account the amount of data stored and the interactions between the diﬀerent type of personal data or other sensitive data.

Leveraging Zero Trust Security Strategy to Facilitate Compliance

861

For example, in the Netherlands, the impact will be diﬀerent whether you have access to a single postcode; a postcode and a house number; or a postcode and a house number associated with the name of a notorious person with a controversial reputation. It also sets the triggering threshold for a ‘large-scale’ processing much lower. Indeed, large-scale processing is not merely about the number of data subjects in a system, but also about the amount of data contained for each individual. By providing complete insight in the processing and involving assets owners in legal liabilities, risks for Privacy can be more transparent, tangible, and therefore, easier to act upon. The relevance score structures the ownership and responsibility for asset risks and measures, allowing a company to justify the application for technical and organizational security measures. These assets and measures are clearly deﬁned in the ‘classic’ Zero Trust concepts of segments and transaction ﬂows. Thus, by forcing the Zero Trust concept of segmentation ‘up’ to the boardroom, the connection between risk and the required measures becomes much more tangible and manageable than in existing frameworks. Mainly due to the fact that names are attached to assets. A key design goal of the ON2IT Zero Trust Framework is to formalize the involvement of organizational asset owners from a business perspective, with Privacy responsibilities and thus yielding more insightful interpretations of security measures. Because the eﬀectiveness of operational measures is assessed in relation to the Zero Trust segments deﬁned at the upper levels, the alignment of risk and technology can be designed and measured with greater precision. The ‘relevance score’ of every individual segment is a concept integrally embedded in the Zero Trust methodology and the Zero Trust Security Orchestration, Automation and Response portal technology presented with screenshots. It steers the compliance to Privacy regulations and Security standards with the required measures and the necessary dynamic feedback on their eﬀectiveness. This is a real-time process. This simply cannot be a static process; otherwise, you cannot inform“upper” levels with adequate and current information in case of a data breach. Participants of the GSS session acknowledge that the principles of Zero Trust of separating the environment to reduce the blast radius and activate logging and monitoring on data ﬂows enable the DPO to have -nearreal-time insights into GDPR issues, after and before an event. Participants of the GSS session see additional value in ﬁne-grained monitoring of eﬀectiveness of GDPR related measures such as authorizations (via User-ID ), apps that violate GDPR principles (App-ID), Cryptography and Data Leakage Prevention, regardless of their (geo)location or environments. This enables the DPO to perform periodic checks on applications, or when applications make the news for a GDPR violation (e.g. Schrems 1 and 2 decisions), they can immediately extract detailed reports on their usage. The usage of log ingestion and contextualization enables to directly set actions towards the technology (e.g. using Security Orchestration Automation and Response (SOAR) principles). Further research and development for both the Framework as well as the portal technology is needed to improve the usage and adoption of the

862

J.-H. Migeon and Y. Bobbert

developed artefacts. With the objective to strengthen organizations privacy and security maturity, the data protection risk administration, and to decrease risks and lower the operational cost.

7

Conclusion

Thus, Zero Trust is not about buying more technology rather than making use of existing technology in a smarter way. Not only does Zero Trust provide a security strategy supporting the CISO but also oﬀer to the DPO a way of acting on upcoming Data protection regulatory changes. By oﬀering a list of concrete data protection measures, it provides DPOs and CISOs a common ground of discussion and better alignment; becoming each others’ ally.

References 1. Argyris, C.: Double-loop learning, teaching, and research 1(2), 206–218 (2002) 2. Bobbert, Y.: Deﬁning a research method for engineering a business information security artefact. In: Proceedings of the Enterprise Engineering Working Conference (EEWC) Forum (2017) 3. Bobbert, Y.: Improving The Maturity of Business Information Security. Radboud University (2018) 4. Bobbert, Y., Mulder, J.: A research journey into maturing the business information security of mid market organizations, vol. 1(4), pp. 18–39 (2010). Place: United States Publisher, October–December 2010 5. Bobbert, Y., Mulder, J.: Group support systems research in the ﬁeld of business information security; a practitioners view. In: 46th Hawaii International Conference on System Science, Hawaii US (2013) 6. Bobbert, Y., Ozkanli, N.: LockChain technology as one source of truth for cyber, information security and privacy. In: Computing Conference (2020) 7. Bobbert, Y., Scheerder, J.: Zero trust validation: from practical approaches to theory, vol. 2 (2020) 8. Bobbert, Y.: On the design and engineering of a zero trust security artefact. In: Future of Information and Communication Conference (FICC). Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73100-7 58 9. C.M.S. The CMS. Law GDPR enforcement tracker is an overview of ﬁnes and penalties which data protection authorities within the EU have imposed under the EU general data protection regulation (GDPR) (2020) 10. Hengst, M., Adkins, M., Keeken, S., Lim, A.: Which facilitation functions are most challenging: a global survey of facilitators. Delft University of Technology (2005) 11. I.T.G.I. Information Risks; Who’s Business are they? IT Governance Institute (2005) 12. Jagasia, B.: Another buzzword demystiﬁed: Zero-trust architecture (2020) 13. Kindervag, J.: Build Security Into Your Network’s DNA : The Zero Trust Network Archit Security (2010) 14. Kuijper, J.: Eﬀective privacy governance-management reserach A view on GDPR ambiguity, non-compliancy risks and eﬀectiveness of ISO 27701:2019 as Privacy Management System. Antwerp Management School (2020)

Leveraging Zero Trust Security Strategy to Facilitate Compliance

863

15. Saint-Amant, D.: A perspective: Zero trust concepts terminology (2019) 16. Straus, D.: How to make collaboration work; Powerfull Ways to Build Consensus. Berrett-Koehler Publishers Inc, Solve Problems and Make Decisions (2002) 17. Vanderbeken, Y.: An exploratory study into Critical Success Factors for the design of Business Platform Models. AMS (2020) 18. Vreede, G., Vogel, D., Kolfschoten, G., Wien, J.: Fifteen years of GSS in the ﬁeld: a comparison across time and national boundaries. In: Proceedings of the 36th Hawaii International Conference on System Sciences, HICSS 2003 (2003) 19. De Vreede, G., Briggs, R.O., Van Duin, R., Enserink, B.: Athletics in electronic brainstorming; asynchronous electronic brainstorming in very large groups. In: Proceedings of the 33rd Hawaii International Conference on System Sciences (2000) 20. Weill and from Ross. IT governance; how top performers manage IT decision rights for superior results (2004) 21. Zitting, D.: Are you still auditing in excel? (2015)

Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises Yuri Bobbert1,2(B) , Jeroen Scheerder2 , and Tim Timmermans2 1 Antwerp Management School, Antwerp, Belgium 2 ON2IT, Zaltbommel, Netherlands

{yuri.bobbert,js,tim.timmermans}@on2it.net Abstract. Everybody’s talking about Zero Trust (ZT). Even the White House issued an Executive Order to start implementing Zero Trust. Many technology manufacturers position their products as enabling or fulfilling ZT requirements. ZT focuses on eliminating trust in the digital network, verifying all traffic, and segmenting the environment. By enforcing on every service, user, or application, strict access and verification policy avoid bad actors having unauthorized access to systems. This paper continues the authors’ previous research on examining Zero Trust approaches. It defines the core problems of vendor promises which causes Information Asymmetry that impede the understanding and successful implementation of Zero Trust. We first start with a description of Zero Trust and continue with practical lessons that we have gathered from six expert interviews with a collective experience of over 50 years implementing Zero Trust in diverse settings. The paper finishes by providing concrete guidance and examples that practitioners can consider when implementing Zero Trust. Keywords: Zero trust security · Strategy · Architecture · Cybersecurity · Digital security · Lemon market · Information asymmetry · Lessons Learned

1 Introduction Malicious actors infiltrate our ‘digital’ society through hacks and denial-of-service attacks. Platform-oriented businesses are typically built on API-based ecosystems of data, assets, applications, and services (DAAS) [1]. These hybrid technology landscapes, most of the time made using software-defined networks in clouds [2], lack real-time visibility and control when it comes to their operations [3, 4]. This makes it hard for boards to take ownership and accountability of cyber risks [5]. In practice, we have seen security and privacy frameworks falter because they tend to become a goal on their own, rather than a supporting frame of reference to start dialogues with key stakeholders [6]. Kluge et al. [7] noted that frameworks as a goal do not support the intrinsic willingness and commitment to protecting the company. Many other researchers [8–10] have also pointed out the necessity of empirical research into lessons learned from practice. These theoretical The original version of this chapter was revised: Author provided belated reference corrections have been incorporated. The correction to this chapter is available at https://doi.org/10.1007/978-3-031-10467-1_55 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022, corrected publication 2023 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 864–881, 2022. https://doi.org/10.1007/978-3-031-10467-1_53

Perspectives from 50+ Years’ Practical Zero Trust Experience

865

voids and the practical observation of failing “compliant-oriented” approaches widen the knowledge gap [11]. This “knowing-doing gap” [12] is also perceived in the current Zero Trust approaches, predominantly aimed at the technology industry or pushed by the technology industry. As Aramco’s CISO stated in 2021 at the World Economic Forum, “Zero Trust …adoption across the private and public sectors has been slow and inconsistent”. [13] In previous publications, we as researchers elaborated on the impediments of a successful Zero Trust implementation [14] and how practical experiences must be brought back to science to enrich the existing Zero Trust Body of Knowledge and vice versa. Before we elaborate on the phenomena of vendor promises, we first introduce the concept of Zero Trust. 1.1 Introducing Zero Trust Zero Trust is conceptually simple, yet elusive. In its densest form, the Zero Trust pitch is an architecture built on the requirement that inherent trust is removed from the network [15]. The Zero Trust strategy is based on four core principles as defined by John Kindervag (2010) [16]: 1. Design from the inside out means that critical assets are protected up close through Protect Surfaces. 2. The principle of least privileges means access is granted to systems and users only on a need-to-know basis. 3. Inspect and log all traffic to quickly detect and respond to malicious intent to ensure complete visibility. 4. Never trust, always verify, which means inherent trust is removed from the network, and everything requires explicit verification. Everything is in a gray zone, and security incidents cannot be entirely prevented. The critical property a security approach should establish is resilience, limiting security impact (containment), and enabling rapid response. This allows ‘defense in depth’ tactics and blocks the path to a single devastating knockout blow. For this reason, Zero Trust breaks loose from the classic trust vs. untrust train of thought. According to the WEF, “The Zero-Trust model has been widely recognized as an effective approach to prevent data breaches and mitigate the risk of supply chain attacks.” [13] According to Deloitte, the adoption of Zero Trust Architecture has become an important topic since the COVID-19 pandemic started. This Deloitte poll was held in July 2020 under 595+ organizations that already adopted or had plans to adopt Zero Trust. 37.4% wanted to speed up the Zero Trust implementations. Another survey by Thales revealed that 32% of the interviewees implement Zero Trust concepts to a great extent in the cloud journey, and 44% rely on some concepts. This Thales survey was conducted with 2600 security professionals as respondents. 1.2 Industry Promises In our previous paper published in June 2020, titled “Zero Trust Validation: From Practical Approaches to Theory” [14] we have described several streams of Zero Trust, such

866

Y. Bobbert et al.

as security vendors aiming to deliver point solutions for Zero Trust. A query on the RSA Conference1 website on February 24th resulted in 178 items, ranging from presentations to exhibitors, with the biggest, leading vendors Microsoft2 and Palo Alto Networks3 followed by the herd of vendors such as Appgate, Centrify, Cisco, Truefort, Menlo, Illumio, Mobile Iron, Entrust, Fortinet, Fortanix, Pulse Secure, etc., all promising Zero Trust. An additional search on Google resulted in an impressive 684 million results when we query the keyword “Zero Trust.” Vendors that “sell” Zero Trust as a concept or product are overwhelming; examples are ZScaler, CrowdStrike, McAfee, Tufin, Ivanti, Permiter81, Akamai. Citrix, Splunk, RSA, IBM, OKTA, Bitglass, OneIdentity, SailPoint, SonicWall, Atlassian, Barracuda, Varonis, Watchguard, Dell, Curity, Ping, LogRhythm, A10 Networks, Cloudflare, Awingu, F5 Networks, Gigamon, Lookout, even companies like Oracle, Unisys, Blackberry, and the list goes on and on. In her article, Lily Hay Newman from Wired stated: “Vendors hear new buzzwords, and then they try to package a product they already have into that [17]”. With the recent Executive Order from the United States president directing the implementation of Zero Trust to protect federal institutes from Cyber incidents, the market madness is at its peak [18]. The following section will elaborate on the problems of industry promises by vendors pushing products versus the market need for honest implementation experiences and lessons learned. Mainly since “All that glitters is not gold.“

2 The Problem Ever since its Zero Trust inception in 2010, research and consulting firm Forrester has put forward the thought leadership of Kindervag [19] in their approaches, focusing mainly on the managerial level but lacking operational detailing that DevOps teams and engineers can get proper guidance from. Most of the security measures are derived from the control objectives in control frameworks and are not directly aligned with security measures prescribed by tech vendors. Consequently, linking the strategic objectives to operational security measures is complex and is rarely implemented [10]. The problem with an approach that lacks alignment with strategic goals lies in the limitations of mainly IT-focused security and security experts. Bobbert refers to operating in silos without reflection outside the silo [6]. The security experts operate in silos with a limited view of the world and the business drivers and business context [11]. An example of this is that a diverse vocabulary is applied for the same activities in the Identity and Access Management domain, but this mixture of tongues causes information asymmetry. For 1 The RSA Conference is a series of IT security conferences. Approximately 45,000 people

attend one of the conferences each year (Source wikipedia.org). It is the leading cyber-industry conference held. Source: https://www.rsaconference.com/site-search?q=zero%20trust. 2 Microsoft released the “Zero Trust Maturity Model” (to measure the implementation and readiness for Zero Trust and focusses on the implementation and use of Microsoft technology over 6 foundational elements; identities, devices, applications, data, infrastructure, networks. 3 Palo Alto Networks released the “Zero Trust Maturity Model”. Designed using the Capability Maturity Model, the Zero Trust Maturity Model mirrors the 5-step methodology for implementing Zero Trust and should be used to measure the maturity of a single Protect Surface. (https://www.paloaltonetworks.com/resources/guides/zero-trust-maturity-model).

Perspectives from 50+ Years’ Practical Zero Trust Experience

867

example, access reviews, role reviews, permission verification, IST/SOLL verification, and recertification all imply the same outcome, but information asymmetry occurs due to the lack of taxonomy. Information asymmetry deals with the study of decisions in transactions where one party has more or better information. This asymmetry creates an imbalance of power in transactions, which can sometimes cause the transactions to be inefficient, causing failures. Examples of this problem are adverse selection. George Akerlof, in his 1970 paper, “The market for lemons,” highlights the effect of adverse selection in the used car market, creating an imbalance between the sellers and the buyers that may lead to a quality uncertainty [20]. Digital Security nowadays appears to have embodied some of these lemon market characteristics. Table 1 illustrates the parallel between Akerlof’s lemon market and today’s Digital Security market. Another phenomenon is FUD. FUD stands for Fear, Uncertainty, and Doubt and was coined in the late eighties. As of 1991, the expression became fashionable for any form of disinformation used against the competition. FUD is a simple but effective strategy that supplies the audience with negative, fake, or false information to influence their behavior and decisions. FUD is so effective because adverse events have a more significant impact on our brains and associated attitudes than positive ones. In psychology, this is called negative bias. Negative bias can have an effect on behavior as well as your decisions. 2.1 Problem Statement Summarizing the issues of information asymmetry and trust between the buyer and seller, we distinguish the following problems: – Zero Trust has a predominantly technical orientation and is dominated by technology suppliers who want to sell their products. – Zero Trust is detached from other organizational disciplines such as risk management, compliance, and legal, and very few empirical proofs of best practices and lessons learned from practice are shared. This led to the following problem statement: Zero Trust has become a marketing term cloaked in smoke and mysticism that everybody talks about. Suppliers preach Zero Trust with a high level of information asymmetry. Still, very few have applied or empirically studied critical success factors and Lessons Learned. To learn from the two main problems of adopting Zero Trust, we have interviewed six experts to identify lessons learned along each of the five steps in the Zero Trust methodology, being: 1. Define the Protect Surface, which contains the High Valuable Assets such as specific Data, Applications, Assets, and Services. 2. Map the transaction flows between these protected surfaces. 3. Define and build a Zero Trust architecture including associated measures. 4. Create Zero Trust Policies within the technology. 5. Monitor and maintain log events from the network.

868

Y. Bobbert et al. Table 1. Digital security as a lemon market (Taken from https://www.12ways.net [21])

# Criteria for a lemon market

Parallel with digital security

1 The buyer cannot fully determine the value and quality before buying the product. The salesman is aware of this value (asymmetric information)

Although the seller market increased significantly, it remains challenging to understand the difference between products and services. The business has too little information to determine the actual costs, benefits, and quality of digital security. Decisions are mainly based on “can we afford this” instead of understanding the benefits or ability to determine the value

2 The seller is stimulated to disguise a lower-quality product as a higher-quality product

Security providers and security software providers often state that their software or service will solve all the problems and prevent hacks from happening. In practice, however, software can’t meet that expectation, is harder to implement than expected, or has unexpected higher costs (e.g., storage costs when using a SIEM). See also the “Research Report Cybersecurity Technology Efficacy – Is cybersecurity the new “market for lemons” [21]

3 The salesman doesn’t have a credible story Value delivery based on an outcome or or technique to represent the high quality of committed SLA is rarely done. Sales are his product often made based upon Fear, Uncertainty, and Doubt. “If you don’t buy my product, you’ll be hacked.” Comparison with other products can hardly be made, except for what, e.g., Gartner or Forrester state in their vendor or product evaluations 4 Buyers are pessimistic and suspicious concerning the seller and the quality of his products

Buyers become more pessimistic and suspicious since expectations are often not or partially met. Products can’t deliver what was expected or are not fully implemented or used. Also, due to the increase in competitors, significant discounts are already given with the first offer. Instead of being thankful for this discount, it usually makes buyers suspicious about the product’s actual value. Many point solutions are perceived as spaghetti, making things worse and hard to rationalize since nobody understands them (continued)

Perspectives from 50+ Years’ Practical Zero Trust Experience

869

Table 1. (continued) # Criteria for a lemon market

Parallel with digital security

5 There are no adequate public supervision or Specific sectors (e.g., financials), and even general guidelines for consumers to countries, are regulated and supervised on guarantee quality standards digital security maturity. In most industries and countries there is, however, no adequate public supervision or guideline for businesses to control the quality of digital security. A general HACCP-like norm for security providers is highly desirable but not yet widely adopted. If you don’t comply, you are not allowed to operate, similar to restaurants, lawyers, airlines, hospitals, etc. Another dilemma is the lack of parties collaborating in collectively selecting, acquiring, and consuming security services from the commercial market, also referred to as the prisoner’s dilemma [22]

3 Expert Panel Interviews The objective was to generate vendor-neutral experiences and filter core lessons. The experts were selected on the following criteria: – More than five years of experience in designing Zero Trust environments – More than five years of experience in implementing Zero Trust principles in technology – More than ten years of experience in IT security as a technician – Experience as a trainer, teacher, or manager in Zero Trust In Table 2 we have used the initials of the experts to preserve privacy. The full names, the interview transcripts and interview recordings are available on request at the authors. Table 2. Expert panel interviewees characteristics Expert

# Years ZT experience

Expert characteristics

JS

23

Mathematical logic, networking, cryptography, protocol design, networking, secure coding, network security, UNIX, pentesting

LJK

12

CTO, development, coding, networking, security, large ZT architecture design, public speaking, software engineering, tech engineering, platform models, education, analytics (continued)

870

Y. Bobbert et al. Table 2. (continued)

Expert

# Years ZT experience

Expert characteristics

RM

7

Security, rewarded architect, consulting, ZT trainer, Forrester ZT strategist

JB

7

Cloud, networking, architecture, ZT trainer, software engineering, forensics

PP

5

consulting, engineering, networking, training

Total

54

4 Results As a result of the interviews, we summarised the most important lessons per five steps and the principles of Zero Trust. Our objective is not to be exhaustive but to summarize the core lessons that we can learn from experts who implemented Zero Trust in several forms over multiple years. 4.1 Interview Results Generic Zero Trust Design Principles Lessons learned: – Stick to the Zero Trust Taxonomy, thereby avoiding misunderstanding and misconceptions about Zero Trust [23]. – Network segmentation is not the same as Zero Trust; the basis of Zero Trust lies in determining the micro-segments (formerly MCAPs). Each micro-segment is based on a specific data type, which logically includes an appropriate policy. The policy is not limited to firewall rules, which is often assumed in the context of micro-segmentation but can also describe that data must be stored encrypted, or that endpoint protection is required [24]. – DMZ is NOT Zero Trust; The traditional DMZ setup is not secure anymore; it doesn’t protect what it was supposed to protect. In fact, in many cases, there are multiple servers in the DMZ serving different types of data, each with its risks. There is no protection against lateral movement and thus no isolation in case of a compromise [25]. – Depart with an indication of the current and desired state of Zero Trust Readiness maturity level. To determine the in-house required capabilities to implement Zero Trust successfully. This Readiness Assessment and the associated Frame of Reference, [14] e.g., Framework, are depicted and described in previous publications and validated by +73 CISO’s in the field [26].

Perspectives from 50+ Years’ Practical Zero Trust Experience

871

Step 1: Define the Protect Surface, which contains the High Valuable Assets such as specific Data, Applications, Assets, and Services. Lessons learned: – Align with business and asset owners. They own valuable assets and understand the assets’ economic value and potential impact. Involve them by design; ZT journeys require time and effort from the organization, extended vision, and mental stamina. Not all managers have a strategic view and look day-to-day (clock watchers). CISO’s job rotation is two years, so I reckon they will be replaced. – Start small, start small, expand; it’s a skill. First, grab three applications, put security devices such as firewalls around them, do that well. After that, the rest is a variation on well-known earlier work. – Pay (sincere) attention to objections, pay early attention to protests that can live in operation (such as performance, this will never work such an inline firewall; will there not be too much policy, how do we monitor that). – Invest in capabilities to tell the Zero Trust journey-story (to non-technicians), awareness of zero trust has evolved over the years; first compelling that it is necessary to do, then that it is indeed possible, and now that not every marketing story in which “zero trust” appears, is zero trust. – Demystify jargon, formulate your Protect Surfaces in readable language so everybody understands the value to the business. Nobody understands IP ranges but understands the protect surface “General ledger.” Step 2: Map the Transaction Flows between these protected surfaces and identify users’ access density and privileges, applications, and services. Lessons learned: – Run monitoring only on App ID and User ID to identify the initial access density between protect surfaces and if toxic combinations are present [10]. – Identify any regulatory violation, for example, large data sets being exchanged between protecting surfaces traveling outside the EU. – Check your access density. Understand who has which access right based on the existing ten days or three weeks and make decisions and refine policies. App ID and User ID can help identify over privileges or orphan rights. Step 3. Build Zero Trust Architecture, Define and build a Zero Trust architecture including associated security measures. Lessons learned: – Reuse existing investments in technology; user-id/app id make it manageable. Much current technology already supports Zero Trust principles. Trying to construct your protect surfaces from logging data doesn’t work, and it only serves to validate whether you have included everything. – Never make a Big Bang; every first Protect Surface is a big win for security, so start small. We usually begin with Active Directory, wireless environments, endpoints, etc., and assign measures to them.

872

Y. Bobbert et al.

– Establish a project for implementation defining goals, deliverables, and timelines. Assign “someone” to ‘drive’ the project internally to the customer. After a while, the focus tends to shift to other stuff.

Step 4. Create Zero Trust Policies Within the Technology Lessons learned: – Validate the performance of segmentation gateways, starting with your Firewall (NGFW). Performance of the firewall (= segmentation gateway, = policy enforcement point) is essential (so buy a device that actually achieves the performance and doesn’t just promise). – Align with existing security and risk frameworks: The corporate policy must align with technical policies and workable format. Kipling support in Corporate policy formulation and technological policies, attention points are on Content Inspection, logging, monitoring, and crypto standards. These are dynamic. And some required clear guidance; Data Leakage Prevention measures. – Apply Kipling for granular policy formulation and effectuation. After policy enforcement, monitor only to determine the level of effectiveness. The Kipling method is used in auditing practices [27] and is elaborated in the Authentic Zero Trust Guide established by Kindervag [23]. Step 5. Monitor and Maintain log events from the network. Lessons learned: – Enrich logging with context, contextualize your logging with Zero Trust metadata that tells you more about the context of an event and potential impact on your PS and DAAS (Data, Assets, Applications, and Services). In short: the idea is to separate the environment into segments and that these segments are shaped around the Data, Assets, Applications, and Services (DAAS) that you want to protect. Because we know what is located in each segment, it is helpful to describe this, including relevant information inadequately. A few examples are: What kind of data is located in this segment? A variety of rules and regulations apply. Responsible person for this data Location of the data. The purpose of this information is to give context to events (content) related to this segment [28]. – In the legacy environment, start with placing a Segmentation Gateway with basic Zero Trust measures like anti-DDOS, content inspection will immediately improve the security posture of the legacy environment. Recent research at Antwerp Management School confirmed this [5].

5 Important Takeaways Based on the interview results, we have developed core takeaways that practitioners can use while examining Zero Trust strategies and their operational impact.

Perspectives from 50+ Years’ Practical Zero Trust Experience

873

5.1 How to Design a Protect Surface An example of how to define a Zero Trust Protect Surface is displayed in Table 3. The guidance here is: – The name represents the set of DAAS elements we want to protect in clear, unambiguous language. – The characteristics represent the types of data processed by the set of DAAS elements. A good source for this can be the outcome of the Business Impact Analysis or the Data Protection Impact Assessments, where asset owners determine the type of data processed in and out of the protect surface. – The relevance score represents the Confidentiality, Integrity, and Availability of the Protect Surface. Many organizations already use CIA ratings to determine the rating of the assets, and viewing this solely from the asset is an old practice. In Zero Trust, we consider the entire Protect Surface as a set of DAAS elements, and the individual component with the highest rating reflects the relevance score of the collective. – The reasoning is the argumentation we need for a specific protection level for the Protect Surface, which we need for the next step in the 5-step model, “Build the Protect Surface”. To elaborate on what DAAS means, we refer to the definitions mentioned in the Best Practices guide of Kindervag in his period serving for Palo Alto Networks [29]. – Data. What data needs to be protected? Think about intellectual property such as proprietary code or processes, personally identifiable information (PII), payment card information (PCI), and personal health information (PHI) such as Health Insurance Portability and Accountability Act (HIPAA) information. – Applications. Which applications are critical for your business functions? Which applications consume sensitive information? – Assets. Which assets are the most sensitive? Depending on your business, that could be SCADA controls, Point Of Sale terminals, medical equipment, manufacturing equipment, and groups of critical servers. – Services. Which services can attackers exploit to disrupt IT operations and negatively impact the business, such as DNS, DHCP, and Active Directory? Each critical DAAS element is part of a protect surface (or, in some cases is a protect surface). For example, if your business provides health care, then personal health information (PHI) is critical to your business. The Data is the patient information. The Applications are used to access PHI data—for example, EPIC. The Assets are servers that store the data and equipment that generates PHI, such as medical scanners or physicians’ workstations. The Services are services used to access the data, such as single sign-on and Active Directory. Below we provide a template example on designing such a protect surface and associated DAAS elements.

874

Y. Bobbert et al.

Table 3. Some examples of defining a ZT protect surface for a fictive healthcare organization. Name

Characteristics

Relevance

Reasoning

1. Medical data and IT

- Highly sensitive medical 75–100% CI44 data - Subject to HIPAA and GDPR - High availability need of medical systems and devices - The integrity of medical data is vital

- These are perceived as the organization’s crown-jewels - Highly sensitive and critical data/IT systems that can severely impact a patient’s privacy and/or treatments - Only medical professionals should have access to this segment - Therefore, it requires the highest level of security measures

2. Private Workplace

- Business supporting 50–75% CI33 systems and apps (ERP, workplaces, HR, finance, risk) - Contains personal data of patients and employees, hence subject to GDPR - High availability need to support business processes

- Sensitive and confidential data should be protected - Maintaining the integrity of financial data is important from a regulatory point-of-view - Critical business processes must have high availability - Therefore, it requires medium to high levels of security measures

3. IT & Security Operations

- Security orchestrations (incident response, logging, monitoring, baselining) - IT operations architecture

50–75% CI33

- Especially important from an Integrity and Availability point of view - Should always be up-and-running, and tampering of logs must be prevented - Only IT & Security Operations employees should have access to this segment - Therefore, it requires medium to high levels of security measures

4. Cloud/Partner

Microsoft 365 IaaS platform container

50–75% CI33

- Hosts essential business applications and systems that run externally - Therefore, it requires medium to high levels of security measures (continued)

Perspectives from 50+ Years’ Practical Zero Trust Experience

875

Table 3. (continued) Name

Characteristics

Relevance

Reasoning

5. Public

Guest network

25–50% CI22

- Less critical and sensitive to the organization - Therefore, it requires low levels of security measures - As this network is vulnerable, it should be segmented from other more critical/sensitive networks

5.2 Zero Trust Documentation With implementing Zero Trust, the situation can become quite complex or get out of control if not properly documented. A consistent documenting structure supports the effective design and implementation of Zero Trust, from the corporate level to the individual Protect Surface level. In addition, the content of these documents must be comprehensive yet concise. 5.2.1 Corporate Policy Versus Technical Procedures We distinguish three basic types of documents in Table 4: a corporate policy, standard and technical procedure. Each has a different scope, level of detail, and objective. Table 4. Types of zero trust documents per organizational level Type of document

Level

Objective

Scope

Must answer

1. Corporate policy

Strategic

Set high-level expectations

Whole organization

Why do we need to do certain things?

2. Standard

Tactical

Set quantifiable requirements or baseline

Protect surface

What needs to be done?

3. Technical procedure

Operational

Describes execution and implementation

Protect surface

How do we do it?

876

Y. Bobbert et al.

1. A Corporate Policy resides at a strategic level and aims to set high-level expectations for the whole organization. A corporate policy describes the why of certain decisions and directions. It is instigated by the context such as external legislation, strategic goals, Sustainability Development Goals (SDG), etc. Examples of corporate policies are risk policies or information security policies that can initiate or refer to a standard. Example: For ensuring data integrity and confidentiality of our cardholder data we adhere to the regulatory requirements of the Payment Card Industry (PCI) Data Security Standard (DSS). 2. A Standard or baseline is derived from the corporate policy and assigns quantifiable requirements for a Protect Surface. The standard specifies requirements on what needs to be done and instigates a procedure. Example: Restrict inbound and outbound traffic necessary for the cardholder data environment, and deny all other traffic. 3. A Technical Procedure prescribes how to implement the technology in a specific domain, e.g., a Protect Surface. A procedure focuses on operational implementation aligned with its related standard(s). Example: To ensure unauthorized access to cardholder data (Why) IT Security (Who) implements vendor best practices (What) on each security boundary of the protect surface (Where) and validates weekly (When) via the config validator (How) on Device Configuration, Network Configuration and Security Configurations (What), e.g., rulesetting to meet the PCI-DSS objectives and reports deviations in the configuration to CISO. 5.2.2 Introducing Kipling The Kipling method [30], also known as the 5W + 1H method inspired by Rudyard Kipling, uses a set of questions to get new ideas and solve problems. Applying these questions is to analyze a situation before starting a solution. Sometimes an issue is anticipated so quickly that not enough attention is paid to all aspects of the problem. It is then possible too fast to choose a solution that, in retrospect, was not the right or the best one—the Kipling method forces you to find the correct and encompassing answer. When drawing up an action plan, developing new ideas, or developing a policy or standard, this method helps to explicate all the essential elements that construct a comprehensive policy and fit the purpose/objective. Kipling is common practice in auditing [27]. Mills stated, “The auditor should always use Kipling’s checklist… during this stage of the auditing process” [31]. Mahopo et al. quotes, “The Kipling method is always recommended for use because it helps to explore the problem by probing the thinking of the problem solver with the questions: what, why, how, who, when and where.”

Perspectives from 50+ Years’ Practical Zero Trust Experience

877

Before we explain the difference between corporate and technical policies, we describe the overarching concept of Zero Trust and the necessity for detailed prescriptions so strategic objectives can be aligned to operational -technology- settings. In the Zero Trust strategy, conditions for access are formulated in an abstract (i.e., technology-independent manner) via the Kipling method: WHO gets access, WHAT exactly access is provided, WHEN, (from) WHERE, WHY, and HOW (i.e., under which additional measures or conditions is access granted). The access defined in this way is made concrete by implementing granular and precise (targeted) access measures with strong restrictions and under-enforcement of ‘security measures’ via so-called ‘policy enforcement points. User identity, membership of (functional) user groups, (geographical or organizational) origin, time, specific (and enforced) network application, the ‘compliance’ of the user and his workplace concerning the established frameworks, and additional substantive validation/checking of network traffic are part of this. As an example: it has been established via Kipling that Oracle Database Administrators may perform administrative queries on several Oracle database systems at a workplace set up following security standards and securely connected, which is located within the Benelux region during an agreed time window, provided that the queries performed are checked and guaranteed against known exploits. Technical implementation of this could be, for example, in an environment in which the relevant Oracle database systems are hosted in a VMware environment, that a policy is enforced at the level of VMware networking (NSX), using Palo Alto Networks technology, and using Application ID, User ID, Content ID, and Host Information Profiling. The relevant policy is then as follows: users in the user group ‘Oracle DBA’ are given permission every Wednesday between 18:00 and 21:00 to access Oracle database hosts via the ‘oracle’ network application, from a ‘compliant’ configured workstation that is located either is located at an internally earmarked location or is connected within the Benelux via client VPN. This ‘oracle’ network traffic is inspected and alerted/mitigated when malicious network traffic is detected. More refined implementations are possible; for example, the relevant ‘oracle’ traffic can be reserved to only a number of internal, highly guarded systems - which in turn, under the conditions mentioned earlier, the select administrator group. Implementing “need to know” throughout the organization is not only a technical matter but also requires proper role assignment (by Human Resource), including attribute assignment (by Risk Management and process owner). If and where this has not been done or has not been done sufficiently in the past or has been compromised under pressure from everyday dynamics, this should be included in the implementation of Zero Trust. Emphatic advice is to do this small (per, say, five protect surfaces). Short iterations provide learning and improvement points for subsequent iterations. Process and asset owners should be closely involved in Zero Trust use case development, implementation, and evaluation.

878

Y. Bobbert et al.

5.2.3 How to Use Kipling All Kipling methods’ explicit identification and reference help formulate a comprehensive policy, standard, or procedure. An Example of a Corporate Policy • Data Classification Criteria: CISO (who) annually (when) reviews and approves the companies data classification criteria (what) and communicates this to personnel via the information security policy on the internal sharepoint (where). CISO determines data treatment according to its designated data classification level via standards (how).

An Example of a Tactical Standard • Logical Access Review: IT Manager (who) quarterly (when) verifies with the HR department that all registered users still have the appropriate RBAC rights (what) based on personnel administration (how) to ensure proper permissions are assigned only to relevant users (why). Additionally, the IT Manager (who) daily (when) receives and reviews automated emails alerts on mutations (what) in Active Directory roles and permissions (where). • Segregation of Duties: IT Manager (who) bi-annually (when) verifies and updates all business roles and additional roles (what) as defined in the RBAC (where) based on the principle of least privilege (how) to avoid conflicting and over-granting permissions (why). as an additional check, the IT manager (who) verifies that only authorized personnel (IT Operations) (who) can access the production Kubernetes environment (where) and release new changes (what). • Penetration Testing: CISO (who) annually (when) executes a penetration test (what) against critical systems on the production environment (where) using the case, and scope defined together with IT Manager (how) to prevent abuse or exploitation of system vulnerabilities on the production environment (why).

An example of a Technical Procedure As a follow-up to the design of a protect surface, experts suggest defining the technological policy that needs to be enforced in the technology, in this case, the Segmentation Gateway, which can be a firewall (aka Policy Enforcement Point). Zero Trust policies must address who, what, when, where, and how? We base this sample on the materials provided by The National Cybersecurity Center of Excellence (NCCoE), a part of the National Institute of Standards and Technology (NIST) [32]. As displayed in Table 5. In May 2019, Zero Trust inventor John Kindervag published his guidance on establishing technical policies via the Kipling Method in Palo Alto Networks by using the Firewall functionalities, e.g., security measures [33]. The experts acknowledged this method of granular refinement to be very effective and crucial to adopting Zero Trust strategies.

Perspectives from 50+ Years’ Practical Zero Trust Experience

879

Table 5. Template for policy development (Kipling method) 5W + 1H

Who

What

When

Where

Function

User ID

App ID

Time

System Classification Content ID object

Why

Characteristic Sales Salesforce Working US/NL Toxic hours

How

Action

SFDC_CID Allow

5.2.4 Research Contribution The cybersecurity technology and services market exhibits many lemon markets and FUD-selling traits. Zero Trust has become a dominant theme in this market, notable since Joe Biden’s Executive Order and EU directives to enforce and strongly advise a Zero Trust strategy for government agencies and other critical infrastructure. The necessity of demystifying and breaking down the hype surrounding Zero Trust seems evident. The outcome of this paper contributes to security professionals and decision-makers in: • a status quo of the problems with Zero trust, • solutions to these problems, both from academia as well as practitioners and • important lessons learned from experts that practitioners could take into account. This article addresses essential Zero Trust learnings by detailing: 1. Lessons that put forward basic principles such as using a taxonomy for Zero Trust, regardless of the situation. 2. The use of a Zero Trust framework that distinguishes the organizational focal points of Strategy, Management, and Operations can function as a frame of reference for security, risk, and compliance professionals and supervisory bodies. 3. A five-step model to start the design and implementation of the Zero Trust journey and a prescriptive model to design the first Protect Surface. This template can be found in Table 3. This template offers a practical approach to design the first 5 Protect Surfaces and determine the underlying Zero Trust Measures that organizations need to implement. The complete list of measures is displayed in a previous publication from 2020 [14] and covers measures like; traffic inspection, authentication, encryption, etc.

6 Conclusions We conclude that the final lessons learned from experts in the field provide practical guidance on designing and implementing Zero Trust in a vendor-agnostic way. Without having to invest in new technology that already has the functionality of Zero Trust Measures. Since sellers tend to focus on closing new deals, this list offers a conversation starter with sellers on utilizing their existing installed -technology- base.

880

Y. Bobbert et al.

The lessons learned also contribute as a conversation starter for buyers that want to contract parties to implement Zero Trust, thereby focussing not only on the gold that glitters but also on the pitfalls and failures that must be taken into account. Section five offers tangible guidelines for aligning frameworks - such as NIST, ISF, ISO, NIS, PCI DSS- and policies to tactical standards and operational procedures. The guidance on applying the Kipling method into documentation and technology offers immediate control and reduction of cyberthreat manifestation. Thereby directly addressing the problem at hand. Kipling offers the precision for rule-automation in Policy Enforcement Points (PEP), thereby limiting lateral movement, unauthorized access, and data leakage. The “How to Define a Protect Surface” describes practitioners to overcome the “Information Asymmetry” and start immediately irrespective of their technology vendor. Future research should aim to validate the success of the five-step model, the Readiness Assessment approach, and the use of automated response and reporting technology. Hence, to continue to share lessons learned with practitioners (buyers) to understand and balance out the information asymmetry and improve Cybersecurity postures.

References 1. Bobbert, Y., Chtepen, M., Kumar, T., Vanderbeken, Y., Verslegers, D.: Strategic Approaches to Digital Platform Security Assurance. IGI Global, Hershey (2021) 2. McCarthy, M.A.: A compliance aware software defined infrastructure. In: Proceedings of IEEE International Conference on Services Computing. pp. 560–567 (2014) 3. Bobbert, Y.: Defining a research method for engineering a business information security artefact. In: Proceedings of the Enterprise Engineering Working Conference (EEWC) Forum, Antwerp (2017) 4. Hilton, M., Nelson, N.: Trade-offs in continuous integration: assurance, security, and flexibility. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (2017) 5. ITGI: Information Risks; Who’s Business are They?. IT Governance Institute, United States (2005) 6. Bobbert, Y.: Improving the Maturity of Business Information Security; On the Design and Engineering of a Business Information Security Artefact. Radboud University, Nijmegen (2018) 7. Kluge, D., Sambasivam, S.: Formal information security standards in german medium enterprises. In: Conisar, Phoenix (2008) 8. Workman, M., Bommer, W., Straub, D.: Security lapses and the omission of information security measures: a threat control model and empirical test. Comput. Hum. Behav. 24(6), 2799–2816 (2008) 9. Lebek, B., Uffen, J., Neumann, M., Hohler, B., Breitner, M.: Information security awareness and behavior: a theory-based literature review. Manag. Res. Rev. 12(37), 1049–1092 (2014) 10. Yaokumah, W., Brown, S.: An empirical examination of the relationship between information security/business strategic alignment and information security governance. J. Bus. Syst. Gov. Ethics 2(9), 50–65 (2014) 11. Flores, W., Antonsen, E., Ekstedt, M.: Information security knowledge sharing in organizations: investigating the effect of behavioral information security governance and national culture. Comput. Secur. 2014–43, 90–110 (2014) 12. Pfeffer, J., Sutton, R.: The knowing-doing gap: how smart companies turn knowledge into action. Harvard Business School Press (2001)

Perspectives from 50+ Years’ Practical Zero Trust Experience

881

13. Al-Ruwaii, B., De Moura, G.: Why the time has come to embrace the zero-trust model of cybersecurity, 27 October 2021. https://www.weforum.org/agenda/2021/10/why-the-timehas-come-for-the-zero-trust-model-of-cybersecurity/. Accessed 15 Jan 2022 14. Bobbert, Y., Scheerder, J.: Zero trust validation: from practical approaches to theory. Sci. J. Res. Rev. 2(5) (2020). https://doi.org/10.33552/SJRR.2020.02.000546 15. Stuart, H.: Zero trust architecture design principles, November 2019. https://www.ncsc.gov. uk/blog-post/zero-trust-architecture-design-principles 16. Kindervag, J.: No more chewy centers: introducing the zero trust model of information security. Forrester Res. (2010). https://www.forrester.com/report/No-More-Chewy-Centers-TheZero-Trust-Model-Of-Information-Security/RES56682 17. Newman, L.: Wired. https://www.wired.com/story/what-is-zero-trust/. Accessed 19 Nov 2021 18. WhiteHouse: Executive Order on Improving the Nation’s Cybersecurity, Washington, United States (2021). https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/ executive-order-on-improving-the-nations-cybersecurity/ 19. Kindervag, J.: Build Security into Your Network’s DNA: The Zero Trust Network Architecture Security (2010). https://www.forrester.com/report/build-security-into-your-networksdna-the-zero-trust-network-architecture/RES57047 20. Akerlof, G.: The market for ‘lemons’: quality uncertainty and the market mechanism. Q. J. Econ. The MIT Press. https://doi.org/10.2307/1879431. JSTOR 1879431 84(3), 488–500 (1970) 21. Butterhoff, M., Bobbert, Y.: Is digital security a market for lemons? (2021). https://www.12w ays.net, https://12ways.net/blogs/digital-security-in-2025-when-the-novelty-wears-off-andbudget-pressure-remains/ 22. Poundstone, W.: Prisoner’s Dilemma. 1st Anchor Books, New York (1993). ISBN 0-38541580-X 23. Kindervag, J.: Authentic zero trust guide; zero trust in practice, explained in one concise guide. ON2IT Zero Trust Innovators (2021). https://on2it.net/en/downloads/authentic-zerotrust-guide/ 24. Maas, R.: ON2IT Zero Trust Innovators, April 2019. https://on2it.net/en/network-segmentat ion-vs-zero-trust/. Accessed 19 Nov 2021 25. Maas, R.: ON2IT (2020). https://on2it.net/en/broken-dmz-cybersecurity-model/. Accessed 19 Nov 2021 26. Bobbert, Y., Scheerder, J.: On the design and engineering of a zero trust security artefact. In: Future of Information and Communication Conference (FICC), Vancouver (2021). https:// link.springer.com/content/pdf/bfm%3A978-3-030-73100-7%2F1.pdf 27. Mills, D.: Working methods. Qual. Audit. 122–142, (1993). https://doi.org/10.1007/978-94011-0697-9_10 28. Maas, R.: Context is key: the data challenge of cybersecurity, 9 April 2021. https://on2it.nl/ en/context-is-key-the-data-challenge-of-cybersecurity/. Accessed 19 Nov 2021 29. Kindervag, J.: (PANW), "Best Practices Implementing Zero Trust with Palo Alto Networks. Palo Alto Networks (2021). https://docs.paloaltonetworks.com/content/dam/techdocs/ en_US/pdf/best-practices/9-1/zero-trust-best-practices/zero-trust-best-practices.pdf 30. Kipling, R.: Just so stories. Double Day page (1902) 31. Mahopo, B., Abdullah, H., Mujinga, M.: A formal qualitative risk management approach for IT security. Inf. Secur. South Africa (ISSA), 1–8 (2015). https://doi.org/10.1109/ISSA.2015. 7335053 32. Pincever, N.: Zero trust architecture technical exchange meeting, November 2019. https:// www.nccoe.nist.gov/sites/default/files/legacy-files/7_palo_alto.pdf. Accessed 19 Nov 2021 33. Kindervag, J.: All layers are not created equal; how the principles of journalism help define zero trust policy, May 2019. https://www.paloaltonetworks.com/blog/2019/05/network-lay ers-not-created-equal/. Accessed 19 Nov 2021

Bit Error Rate Analysis of Pre-formed ReRAM-based PUF Saloni Jain1(B) , Taylor Wilson1 , Sareh Assiri1,2 , and Bertrand Cambou1 1

2

Northern Arizona University, Flagstaﬀ, AZ 86011, USA {sj779,tjb389,Bertrand.Cambou}@nau.edu Computer Science and Information Systems Department, Jazan University, Al Maarefah Rd, Jazan, Saudi Arabia [email protected] https://in.nau.edu/cybersecurity

Abstract. Various Resistive Random Access Memory (ReRAM) devices have been used to generate cryptographic keys. The physical characteristics exploited are often related to the forming of conductive ﬁlaments, as well as the programming of cells. In this paper, key generation methods based on pre-formed ReRAM cells are analyzed. An evaluation of the bit error rate (BER) of cryptographic keys is conducted by analyzing physically unclonable function arrays that have been exposed to changes such as temperature drifts, aging, and other factors. Understanding and utilizing this data for security requires an insight of the behavior of physical elements under varying temperatures and currents. In order to guarantee maximum data security by leveraging cryptographic key generation with these methods, we must ensure that keys have low error-rates, which is only possible by producing stronger keys. We are reporting experimental data showing conditions in which the bit error rates are as low as 10−6 . Keywords: Resistive Random Access Memory (ReRAM) · Physical Unclonable Function (PUF) · Cryptographic key generation · Bit error rate · Unlimited digital ﬁngerprints · Fuzzy cells · Challenge Response Pairs (CRPs)

1

Introduction

Bit Error Rates (BERs) is the ratio of the number of incorrectly received bits to the total number of bits transmitted in a secret key and by performing BER analysis we understand the reliability of the keys generated. For key generation purposes, we exploit the resistances of the pre-formed Resistive Random Access Memory (ReRAM) at ultra-low currents [14]. Here, operating at currents on the scale of nano-amperes, the ReRAM technology is not retaining any information and due to manufacturing defects, each cell is inherently diﬀerent from one another thus making the array unclonable. A methodology is presented in this paper for generating secret keys based on a ternary public key infrastructure [3,13]. Parameter responsible for errors in a Physically Unclonable Function c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 882–901, 2022. https://doi.org/10.1007/978-3-031-10467-1_54

Bit Error Rate Analysis

883

(PUF) are repetitive use of the PUF, existing erratic cells, unfavorable eﬀects such as system noise and aging, bits ﬂipping from an initial state to the opposite state, and environmental factors like temperature prohibits authentication of cryptographic key. Therefore, quantiﬁcation and understanding are essential for generating reliable keys. In this study, we evaluate the error rates for various keys generated from 100 nA, 400 nA, and 800 nA in a 1024 cell-population with each cell being read for 50 consecutive cycles per current. Ideally, the intra-cell variations are to be 0% but due to the unwanted parameters mentioned above, the PUF outputs some variation. Higher the intra-cell variations, higher is the BER. We have collected data i.e. resistance in the pre-formed state of ReRAM at 0 ◦ C, 23 ◦ C and 80 ◦ C. The main objective of this paper is calculating BER and generating cryptographic keys at 23 ◦ C for three diﬀerent currents. Initially, the raw 256-bit key sampled from the 1024 cell population will have high BERs, but a buﬀer is applied to the 256-bit key to reduce BERs. Introducing buﬀer aids in selecting more cells that are far from the threshold can be randomly selected, also known as stable cells to generate strong keys. Our ﬁnal step is to generate a million keys per current at 23 ◦ C, including the buﬀer, to determine the average BERs and the minimum sample size needed to generate keys with low errors. This paper is structured in the following order. Section 2 outlines the background information on Physical Unclonable Functions (PUFs), ternary-based PUF cryptosystems, the ReRAM technology, and ReRAM-based PUFs. In Sect. 3 we present the design of the pre-formed ReRAM-based PUF, and BER benchmarks of various PUFs are provided. The fourth section describes the experimental setup, the buﬀer sizes, and the method used to compute BERs. The ﬁfth section reports the experimental results. Finally, Sect. 6 concludes the paper and describes future work.

2 2.1

Background Information Physically Unclonable Functions (PUFs)

PUFs exploit uncontrollable process variations to derive chip-speciﬁc ﬁngerprints which are used to enhance security and is cost-eﬀective. Each device is unique and identiﬁes itself diﬀerent from others based on factors introduced during fabrication, like critical dimensions, doping levels of semiconductors, and threshold voltages [30,32]. Despite the best eﬀorts of the manufacturer, no identical objects can be created since random and uncontrollable forces are involved [22]. Unlike traditional security primitives, where digital secrets are stored in nonvolatile memory (e.g., EEPROM and FLASH) with additional layer of protection, PUF architectures do not store secrets in memory. The quality of a PUF is determined by its entropy, randomness, BER and inter and intra hamming distance [12].There are diﬀerent types of PUFs [6] namely Optical PUF [32], Ring-Oscillator PUF [19,20], Arbiter PUF [29], Coating PUF [33] and Memory Based PUFs [26]. Our main focus is on memory based PUFs such as Static RAM

884

S. Jain et al.

(SRAM), Resistive RAM (ReRAM), Dynamic RAM (DRAM), Magnetic RAM (MRAM), and Flash Memory, of which we elaborate in this paper on ReRAM. PUFs relies on a black-box Challenge-Response Pair (CRP) conﬁguration, where the challenge is the input, and a response is a random output in the form of a binary digit. Challenges can change the internal combinations of variables, aﬀecting the responses, but the reverse is not true, thus making it impossible to clone [2,4,12]. This allows PUFs to provide services such as key generation, system authentication, and can serve as True Random Number Generators (TRNGs). 2.2

Existing Key Generation Protocols Based on Ternary Implementation

Cryptography keys are essential in both encryption and decryption processes. There are several methods used to generate keys using random number generator, hash functions, ternary computations and PUFs [13,31]. Ternary computations are implemented when the initial PUF is read iteratively for a certain number of times in a secure environment. Based on these readings, we can calculate and analyze stable cells, to distinguish them logically as 0 or 1 whereas unstable or fuzzy cells are designated by a third state or also called as the tertiary state, ‘X’, which makes it possible to avoid them during key generation, thus reducing the bit error rate [7,8]. The Ternary Addressable Public Key Infrastructure (TAPKI) protocol [3,8,13] generates cryptographic keys using ternary computation, allowing them to be exchanged among parties without having to send them over the network [8]. Despite using such advanced techniques PUFs can generate errors as they are subject to environmental drifts, aging and electromagnetic interference. To resolve such errors error correction techniques have been used.

Fig. 1. Example of Error Correction Code (ECC) with PUF based architecture

Bit Error Rate Analysis

885

Figure 1 illustrates the PUF based architecture with error correction code (ECC). The PUF image is read into a secure environment by the server and saved as a lookup table that represents its cells in three diﬀerent states (0,1,X). In the ﬁgure, the server generates a key K using the lookup table, whereas the client tries to regenerates key K’ using physical PUFs and instructions delivered by the server during handshake. As PUFs are aﬀected by certain factors, the keys generated from the lookup table may diﬀer from the ones generated at the client end, so helper data is generated by using the lookup table and ECC engine at the server. The client uses the helper data and ECC extractor like a fuzzy extractor to recover the original key K and validate the user [24]. The disadvantage of this method is that it creates burden on the client device thus reducing the computation speed. To reduce this burden a similar protocol was presented, called Response Based Cryptography (RBC) [8] protocol which is used to extract accurate keys from the erroneous keys. As part of this technique, the server and client generate keys K and K’ respectively and encrypt a user name (ID) using Advanced Encryption Standards (AES) to generate cipher texts E(ID;K) and E(ID;K’). The server uses the RBC engine to inspect if the cipher text matches. If not, then it iteratively generates all possible combination of keys to generate the same cipher text as the client, thus revealing K’ [10].

Fig. 2. Ternary Addressable Public Key Infrastructure (TAPKI) protocol used to generate cryptographic keys using SRAM PUFs [3, 8, 13]

Public Key Infrastructure (PKI) [1] describes a model where the server and the client have two keys: public and private key. These keys are used for encryption and decryption of the secret message. The generation of private key using the memory based PUF which in turn boosts the security and makes it diﬃcult for the intruders to attack the system. There are other ways to improve security is by generating true random number using the PUF and noise injection. Two

886

S. Jain et al.

examples of key generation methods based on memory-based PUFs explained in this section are for SRAM and ReRAM-based PUFs. Figure 2 explains the cryptographic key generation protocol using the SRAM.The PUF is read multiple time to identify the fuzzy cells at the server end which is marked and saved in the lookup table. A random number(RN) is generated which is XORed with the user generated or randomly generated password and the output of this is hashed using SHA3. The output of the hash(A) also known as the message digest determines the cell addresses in the PUF. The server holds the image of the PUF i.e. the lookup table, whereas the client is in possession of the hardware device. A binary string is constructed by reading each address at the server side, which consists of stable and fuzzy/unstable cells, thus it is referred to as a ternary key. From the ternary key, 128 stable cells with 0’s and 128 stable cells with 1’s is used to generate 256-bit keys [8,13,15]. In the 2, mask(M) is generated by using the ternary key where 0 represents the locations of the addresses which are used in the formation of the cryptographic key whereas 1 represents the other fuzzy and unused cells. M is further XORed with message digest A to get S which is sent to the client as a part of the handshake. This is done to enhance the security and keep the mask hidden from the intruders .The client can XOR S with message digest A to retrieve the mask(M). RN and S are appended to represent the handshake which is sent from the server to client. After reverse engineering the entire process at the client end, 256-bit cryptographic key is regenerated. It is possible that there may be errors at the response since the client uses a physical PUF. The keys are validated by using error correction technique i.e., RBC engine (Response Based Cryptography) to ensure that the user is genuine or not [8]. In this paper we elaborate about pre-formed ReRAM PUF as the memory operates extremely fast at low power making them hard to be hacked [9,28,39]. The previous work done with the ReRAM PUF has been proposed in [11]. To implement the key generation schemes, enrollment of the PUF is done in a secure environment. A threshold value is selected and the cells close to the threshold value can ﬂip on either side of the threshold value during the cycle generation. If ratio of the cells close to the threshold is too high, this could result in 5–20% of CRP matching errors. This can be resolved by creating a range around the threshold where the cells are marked as ‘X’ and other cells outside the range are solid 0 or 1 as shown in Fig. 3. Build-In-Self-Test(BIST) modules can be used to sort all the marginal cells and mark them as ‘X’. The solid 0’s and 1’s will be further continue for key generation process. To generate PUF challenges, the memories are segmented by pairs of rows.The ﬁrst row of each pair is the active row and the second row is the comparison row. The second row stores the complementary information about the ternary states. When a solid 0 is found in the active row, 0 is programmed in the cell and 1 is programmed in the companion row. Similarly, when a solid 1 is found in the active row, 1 is programmed in the cell and 0 is programmed in the companion row. All the other bits are the fuzzy bits and so the bit in active and companion rows is identical like ‘0,0’ or ‘1,1’. The ternary binary PUF challenge is downloaded on

Bit Error Rate Analysis

887

Fig. 3. This ﬁgure shows how to program 0’s, 1’s and X’s in a memory. The X-Axis represents the physical parameters of memory devices which are subjected to manufacturing variations with threshold. The Y-Axis shows the percentage of occurrence of the physical parameter in the memory device.

the secure server for authentication and the responses are read after powering up the device. To complete the authentication process, the positions of ‘X’ is the memory is extracted and used to generate the ternary challenge. The response generated by the PUF and the challenge generated by the server are compared. If matched, authentication is successfully completed else ECC techniques are used to eliminate the errors [27]. 2.3

Resistive RAM (ReRAM) Technology

Fig. 4. RS process. (a) MIM structure. (b) Oxidation of active material and migration of cations. (c) CF formed, “N”. (d) CF dissolution by applied negative voltage, “OFF”. Repeat (b)–(d) to Turn ON/OFF device.

888

S. Jain et al.

ReRAM is an emerging nonvolatile memory with unique properties such as low operating voltages, high scalability, and is CMOS compatible [5,38]. In Electrochemical Metallization Memory (ECM)-type ReRAM, the cell-design has three layers: 1) an electrochemically active top electrode (TE), 2) an intermediate insulating material, and 3) a passive bottom electrode (BE). The metal-insulatormetal (MIM) structure is depicted in Fig. 4(a). Unlike existing memories that hold a charge to retain data, i.e., SRAM, Dynamic Random Access Memory (DRAM), and FLASH, ReRAM relies on HIGH and LOW resistive states to introduce binary logic. Depending on the type of polarity applied across the cell, a resistive switching (RS) can occur either turning oﬀ or on the cell. By applying a positive voltage to the TE, this allows the active electrode’s properties to diﬀuse into the insulating material and nucleate upward. This eﬀect creates a metal conductive ﬁlament (CF) between the two metals and the process is referred to as a SET operation. Furthermore, creating a CF result in a Low Resistive State (LRS or “ON”). Whereas, applying a negative voltage across the cell ruptures the CF, also referred to as a RESET operation and reverts to a High Resistive State (HRS or “OFF”). For example, let us consider Cu/TiO2/Pt cells [36]. ReRAM is initially in a pristine state and before switching cycles, an initial step called electroforming must be executed which is done by applying a high positive voltage that allows ions to break into the insulating material. Subject to an applied positive voltage to the Cu TE, the Cu gets oxidized and Cu+ cations are generated and deposit into the TiO2 layer. The grounded Pt BE attracts the cations and get reduced to Cu atoms and accumulate until the CF is formed (LRS). Next, subject to a negative voltage, the CF dissolves and transitions back to a HRS. The switching process is depicted in Fig. 4. 2.4

Background on ReRAM-Based PUFs

ReRAM-based PUFs [11,25,34,35] exploit stochastic resistances that are either generated from the probabilistic switching of the devices or extracted from the inherent variations after SET/RESET processes. Recently, the ReRAM technology has been facing poor uniformity issues [16,18]. This aﬀects the switching voltages (Vset and Vreset) as well as the resistance levels in HRS and LRS by returning unwanted cycle-to-cycle (C2C) and device-to-device (D2D) variability. The high variations in C2C makes the resistances ﬂaky and D2D has been observing degradation within the cells bring the HRS read-margin closer to the LRS range therefore PUFs built on these methods is challenging. The diﬃculty in controlling the CF is believed to be the reason for these unwanted eﬀects. To bypass these constraints, various existing PUF architectures have been proposed such as ReRAM with Arbiter [21], and Ring Oscillator [17]. These combined architectures allow more control when generating CRPs. In the next section, we introduce a new methodology for ReRAM-based PUFs.

Bit Error Rate Analysis

3 3.1

889

ReRAM PUFs Operating at Low Power Pre-Formed ReRAM PUF

The PUF design operates in the un-formed/pre-formed state of ReRAM and requires no changes to the technology. To generate responses, the cells are read for small currents, on the scale of nano-amperes, and return unique resistances on the range of 0.1MΩ to 10MΩ giving excellent cell to cell variations. The design does not rely on the RS of the devices therefore alleviates the ongoing challenges of ReRAM and its diﬃculty to control the CF. Figure 5 displays an example of the variations for the number of cells and their responses generated from 100 nA. The population’s median resistance is calculated and can be used to deﬁne a binary threshold (TH), where cells above the TH, are considered as “1” and cells below the TH, are considered as “0”, giving a 50% chance to select either state when generating keys. The resistance of each cell is independent from one another and some factors to attribute for this eﬀect is due the type of materials, impurities, device dimensions, cell location, etc. Notice responses either laying on the TH or residing near the TH, if selected during key generation cycles these cells can introduce bit errors into a key. To reduce potential bit errors, a ternary state is implemented with a lower- and upper- boundary size of α = 0.05, and cells within this range are not selected during key generation cycles (masked). In this study, the boundary is constant across all generated responses and BER calculations are subject to it. For future work, we could increase the

Fig. 5. Resistance variations of 1024 cells generated from 100 nA in room temperature with TH and ternary boundaries provided.

890

S. Jain et al.

ternary range (e.g., α = 0.10) and calculate BERs relative to that range. We hypothesize that with an increased ternary range, or masking more cells near the TH, BERs will drastically decrease thus producing error-free keys.

Fig. 6. Various devices undergoing current sweeps giving unpredictable behavior.

For the cryptographic protocol, each cell is read for 15 increasing currents starting from 100 nA to 800 nA and read in 50 nA increments (100us/current), therefore generating 15 responses per cell. We observed that the cells can withstand higher current sweeps up to 50 uA proving robust, however, to reduce drifting eﬀects, localized joule-heating, and to keep the energy consumption at a minimum, we operate the PUFs below 800 nA. At room temperature, the minimum energy needed to read each cell is 1.68 pJ on average: E = V.I.L, where V is voltage, I is current, and L is time. The response information is downloaded into the server database and is called the “Image of the PUF”. The Image of the PUF will continuously be referred to for future key generation and authentication cycles to match CRPs. Figure 6 displays multiple devices being read for increasing currents with their resistances decreasing which is normal for the technology. Observe how each device does not drift the same and how they respond diﬀerently at each current which makes responses diﬃcult to predict. This eﬀect provides more entropy to the PUF design as keys generated at 100 nA will be diﬀerent than keys generated at greater currents. The scheme hides keys in a simple manner and allows quasiinﬁnite possible CRPs to be generated if reading the cells at diﬀerent currents. Reliability: Reliability can be quantized by computing the intra-cell variation which is deﬁned as how much deviation occurs within multiple responses generated from the same challenge(s) [22]. A PUF design with low intra-cell variations

Bit Error Rate Analysis

891

can be classiﬁed as a highly reliable PUF due to low response-variations and it requires minimal error correction codes. Equation 1 below is used to calculate the intra-cell variation and was applied to all responses. Each cell was read for 50 consecutive cycles for all currents but only the minimum and maximum operating currents are presented. Figure 7 shows a frequency plot for the intra-cell variations computed for three temperatures used in this study (i.e., 0 ◦ C, 23 ◦ C, and 80 ◦ C) and for 100 nA and 800 nA. The cells prove highly reliable, or reproducible, as most responses give variations less than 1%. The larger variations for 100 nA relative to 800 nA could be due to accurately measuring resistances in the MΩ range and greater noise at low currents. The variation appears to grow tighter at higher currents. intra − cell − variation(%) =

ST DEV (r50 ) ∗ 100 median(r50 )

Fig. 7. Intra-cell variations computed for various temperatures and currents.

(1)

892

3.2

S. Jain et al.

Cryptographic Key Generation Protocol Using Pre-Formed ReRAM

Fig. 8. Key generation protocol using pre-formed ReRAM for BER analysis.

Figure 5 provides a clear imagine about the fuzzy area around the TH resistance. The TH resistance is computed as the median of all the resistance of the cell in a ReRAM. By comparing the resistance value of each cell with the TH resistance, we can read the values of the PUF and save it as a binary lookup table. This binary lookup table is the used in the protocol at the server side to obtain an error-free cryptographic key .BER is calculated by generating cryptographic keys as shown in Fig. 8, steps for which is as follows: 1. A 512 bits of random number (RN) and password (P) is generated, which is XORed and hashed using SHA3-512 to generate a 512-bit message digest (A). P used for the protocol can either be random or user-supplied. In case of user-supplied password, it is converted to 512 bits by using the hashing methods. 2. Message digest (A) and the length of output message digest(OMD) are inputs to SHAKE256. Length of OMD is the sum of the addresses required for 256 bit key and the buﬀer size i.e.,(OMD = 256 * 2 + buﬀer size). A secret key of 256 bits requires 512 addresses, where X and Y each have 256 addresses that

Bit Error Rate Analysis

3.

4.

5.

6.

893

are further computed to produce an index value. Probability to read a fuzzy cell location is high, thus resulting in an error in the private key. In order to avoid these errors, we increase the size of the buﬀer so as to allow us to read few additional addresses, from which only 256 stable values will be chosen for key generation. In other words, if the buﬀer size = 0, the OMD would require 256 * 2 i.e., 512 bytes (OMD = 512 + 0) to read 256 index values. Additionally, if the buﬀer size = 20, we would need a OMD of 532 bytes i.e. (OMD = 512 + 20) in order to read 266 addresses and choose stable cells from the ReRAM PUF to generate a 256-bit key. In this protocol, two pre-formed ReRAM PUFs are used. The OMD received from SHAKE256 contains the addresses to the cells in ReRAM PUF, and it determines which one of the PUF will compute the TH resistance and the other reads the resistance values of the cells from the addresses computed by using OMD. One PUF is used for computing the TH resistance, while the other PUF is used to read index values for the cryptographic key. The entropy calculated is 4096256 ∗ 1 because from one PUF we determine 256 bit key and from the second PUF TH resistance is read from one cell. In OMD, two bytes are considered as one address. X is the ﬁrst 8 bits of the address, and Y is the next 8 bits, both which are used to calculate index value. Resistance from the index value is read and compared with the TH resistance. Our next step is to determine if the resistance value is below or above the TH and to set the bit accordingly. If the resistance value is greater than the TH, we set the bit to 1, and if it is less than the TH, we set the bit to 0. If the resistance lies in the fuzzy area, we represent it with a ternary state, ‘X’. To generate a 256-bit key, we choose the lowest 128 resistances and highest 128 resistances from the bunch. If the buﬀer is zero, choose the lowest or highest 128 resistances out of 128 resistances, but when the buﬀer is 20, select the lowest and highest 128 resistances avoiding the ones in the fuzzy area as seen in second graph of Fig. 9 thus reducing BER whereas if the fuzzy area is not deﬁned, like in the ﬁrst graph of Fig. 9, BER will be higher.

3.3

Benchmarking Various PUF BERs

BERs for various PUF architectures have been reported in the literature and Fig. 10 provides BER benchmarks comparing diﬀerent PUF technologies. These include the legacy SRAM PUF [23], ReRAM with Arbiter [21], ReRAM exploiting the resistance variations after the SET process (Vset-based) [11,37]. These architectures in one form or another implement diﬀerent methods to reduce BERs. For example, [37] studied BERs on a ﬁxed threshold and found that by recalculating for an optimized threshold, it had signiﬁcantly lower BERs relative to the ﬁxed threshold. The author in [11] implements a ternary state like what has been implemented in our study. To reduce bit ﬂips, their design avoids generating responses from 1.5 to 2.5 V and only use Vset equal to 1 V and 3 V to generate 0 and 1 states, respectively. As mentioned in this study, we apply the same approach but exclude resistances in a ﬁxed range.

894

S. Jain et al.

Fig. 9. The graph shows experimenting the data with and without buﬀer.

Fig. 10. The table above Lists Down BER for SRAM and ReRAM PUFs with diﬀerent PUF designs.

Bit Error Rate Analysis

4 4.1

895

Methodology Experimental Setup

Pristine Al/AlOx/W cells with a device size of 180 nm were used in this study. A brief processing description follows: the BE (“W” or Tungsten) was deposited by sputtering, the switching layer (SL) (“AlOx” or Aluminum Oxide) was deposited by atomic layer deposition (ALD), and the TE (“Al” or Aluminum) was deposited by reactive sputtering. At the wafer-level, the devices were characterized by Keysight Technologies’ B1500 A Semiconductor Device Analyzer (SDA). Electrical tests performed in 0 ◦ C and 80 ◦ C temperature conditions were made possible by Micromanipulator’s Temperature Controller and tests at 23 ◦ C were performed in room temperature. The B1500 A SDA allows electrical tests to source either voltage or current and for the purposes of this study, a current source is used. The devices are read by multiple [user-deﬁned] current sweeps for 50 consecutive cycles to explore intra-cell variations. The read pulse widths are 100 us per current. The output is resistance which is deﬁned as the ratio of the voltage by the current and measured in Ohms (symbol: Ω). 4.2

Computing Error Rate of Each Cell of the ReRAM

The error rate(ER) for each cell is calculated by iteratively reading each cell 50 times where each value is further compared with the TH resistance. Suppose we have 10 out of 50 values that lie above the TH and 40 below the TH, we consider the smaller chunk of values when computing ER. Therefore, this cell is likely to have resistance readings below the TH, but due to some factors aﬀecting the ReRAM we notice few values above the TH, thus calculating the error rate using the values that are above the TH. Therefore, the ER for a cell is 10/50 * 100 = 20%. No matter if the small chunk of data is higher or lower than the TH resistance, it must be close to or in the fuzzy area to be considered a fuzzy cell as illustrates in Fig. 5. Using the steps below, the number of fuzzy cells can be calculated: 1. A counter variable is created to monitor the resistance values of a cell below the TH resistance. Initial value of the counter is zero and is reset to zero every time ER for the next cell has to be calculated. 2. Every time the resistance of the cell is less than the TH resistance and close to the fuzzy area, we increment the counter by 1. In cases where resistance values are considerably oﬀ from the TH, or much greater or less than the TH values, those cells are considered good/stable. 3. A counter value of 0 or 50 indicates that there have been no bit errors. In contrast, a counter value between 1 and 49 indicates a high probability of an erratic cell. 4. Following this, we determine whether the value of the counter is greater than or less than half of the number of reads, so 50/2 = 25. Counters smaller than 25 are used as input for ER calculation, otherwise, counters are subtracted from the total number of reads, therefore, (50 - counter), is used to calculate ER of a cell.

896

S. Jain et al.

Fig. 11. The graph is a representation of successive resistance values of three cells that were each read 50 times. Two of the cells are good, while the other is fuzzy and the Blue line represents the median of the PUF

In order to fully comprehend the steps mentioned above, Fig. 11 can be helpful. In this ﬁgure, an accurate distinction between a stable and a fuzzy cell is illustrated. Blue indicates the median of the resistance of all cells i.e., threshold resistance. In this case, the grey and yellow lines correspond to the resistance values of a stable cell, since all the values are above and below the TH value, respectively. The binary value assigned to cell which represents the grey line will be 1 as the resistance value exceeds the TH value, whereas if the cell represented by the yellow line is selected, the binary value is 0 since the resistance value is less than the TH value. Orange color line is the resistance values of the fuzzy cells since few resistance values are above the TH and some are below but moreover they lie in the fuzzy area which is marked by black line above and below the TH resistance. We calculate the ER by counting the number of reads below the TH and if the reads lie in the fuzzy area. Once all the data has been processed, we can generate a lookup table and error rate ﬁle. In our study, we created lookup tables for diﬀerent currents of 100 nA, 400 nA, and 800 nA at diﬀerent temperatures separately. Using the lookup table, we can determine the location of the fuzzy cells which can be avoided while generating cryptographic keys thus reducing the BER.

Bit Error Rate Analysis

4.3

897

Compute BER with Diﬀerent Buﬀer Size

Data set used for this analysis, are diﬀerent combinations of currents and temperatures of the pre-formed ReRAM PUF. Following are the steps taken to compute BER experiment: 1. We generate a million keys by using the protocol explained in Sect. 3.2 and the lookup table generated in Sect. 4.2 for 100 nA, 400 nA and 800 nA at 23 ◦ C. For each of these data sets, the OMD value keeps increasing. 2. TH resistance of each of the data set is calculated and used for computing keys. 3. Assuming the buﬀer size is 0, 256 addresses are read and their resistance values are compared with the TH resistance to form a binary stream of 256 bits. These keys have a possibility of high error rates as no ﬁltration of good or fuzzy can be done to avoid errors. 4. As the buﬀer size increases, the value of OMD also increases, resulting in reading more addresses. Next these addresses are compared with the TH resistance. The additional addresses help us choose 256 good cells for our private key generation, thus resulting in low error rate. 5. For each data set and the increasing length of OMD, a million keys are generated and the average BER is calculated. Based on our methodology, we expect that the BER will decrease with an increase in buﬀer size. Considering Buﬀer-Size(BS) and Bit Error Rate(BER), we can mathematically state that BS is inversely proportional to BER: BS∝

1 BER

(2)

Below are the experimental results that demonstrate our methods are eﬀective.

Fig. 12. Predicting buﬀer size for diﬀerent key length at room temperature and with 800 nA current

898

5

S. Jain et al.

Experimental Results

The data collected at 23 ◦ C with three input currents (100 nA, 400 nA, and 800 nA) is used for our analysis. With each data set a million keys are generated and average of errors of each key is calculated to compute BER. At room temperature, when we inject 100 nA of current in the pre-formed/pristine ReRAM PUF and set the buﬀer size to 0, the BER is higher. BER decreases as buﬀer size increases. In Fig. 13, we observe that with a digest length of 291 (256 + 35) i.e., OMD = (256 * 2) + (35 * 2) = 582 and the buﬀer size is 70 and has a BER of 10−6 . In other words, one key in a million has an error. The BER test results that we observed with 400 nA and 800 nA input currents at room temperature are 10−6 and 10−5 with digest lengths of 292 and 293, respectively. This data can be used to predict the digest length for 128,256 and 512 bit keys when 100 nA, 400 nA and 800 nA are injected into preformed ReRAM, as shown in Fig. 12.

Fig. 13. Bit error analysis at room temperature with three diﬀerent currents 100 nA, 400 nA, and 800 nA. The X-Axis represents the log10 BER and the Y-Axis has the digest which represents the number of addresses read (256+buﬀer Size) to generate Aeys.

6

Conclusions and Future Work

Based on the experimental results presented in this paper, we demonstrated methods yielding extremely low BERs, in the 10−6 range, which is low enough for key generations when combined with light error correcting schemes, meeting the main objective of the paper. We were able to choose stable cells whose resistances is far from the median by doing a double veriﬁcation analysis on the erratic bits during data collections, and by ﬁltering out those erratic bits

Bit Error Rate Analysis

899

during key generation. There are diﬀerent numbers of fuzzy cells depending on the temperature and current. Thus, even with small buﬀer sizes, we can generate error-free keys with a particular combination of temperature and currents. Thus, saving us signiﬁcant computational time. Our major goal is to generate stronger cryptographic keys with low BER. As a result, for our future research we will investigate on the following topics:(a) developing diﬀerent key generation protocols whichusesthe both the PUFs for 4096 P key generation and achieves an higher entropy of 4096 256 . Higher the 1 1 entropy higher is randomness and stronger are the keys. (b) randomness of the cryptographic keys, (c) BER calculation as we increase the buﬀer size and the fuzzy area, (d) analyze the behaviour of pre-formed ReRAM and compute BER at diverse environmental conditions, (e) calculate BER for a larger range of temperatures and currents. Further analysis of these temperatures will be conducted in the future with more data and with multiple currents at diﬀerent temperatures. Our results will help us understand the quality of keys generated using the pre-formed ReRAM PUF, and improve our algorithm to reduce computational time for key generations. Acknowledgments. The authors are thanking the contribution of several graduate students at the cyber-security lab at Northern Arizona University, in particular,Ian Burke, Jack Austin Garrard, Michael Partridge, Christopher Philabaum, and Brit Morgan Riggs.

References 1. Adams, C., Lloyd, S.: Understanding public-key infrastructure: concepts, standards, and deployment considerations. Sams Publishing (1999) 2. Assiri, S., Cambou, B.: Homomorphic password manager using multiple-hash with PUF. In: Future of Information and Communication Conference, pp. 772–792. Springer, 2021. https://doi.org/10.1007/978-3-030-73100-7 55 3. Assiri, S., Cambou, B., Booher, D.D., Miandoab, D.G., Mohammadinodoushan, M.: Key exchange using ternary system to enhance security. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0488–0492. IEEE (2019) 4. Assiri, S., Cambou, B., Booher, D.D., Mohammadinodoushan, M.: Software implementation of a SRAM PUF-based password manager. In: Science and Information Conference, pp. 361–379. Springer (2020). https://doi.org/10.1007/978-3-03052243-8 26 5. Benoist, A., et al.: 28nm advanced CMOS resistive ram solution as embedded nonvolatile memory. In: 2014 IEEE International Reliability Physics Symposium, pp. 2E–6. IEEE (2014) 6. B¨ ohm, C., Hofer, M.: Physical Unclonable Functions in Theory and Practice. Springer Science & Business Media (2012) 7. Cambou, B., Flikkema, P.G., Palmer, J., Telesca, D., Philabaum, C.: Can ternary computing improve information assurance? Cryptography 2(1), 6 (2018) 8. Cambou, B., et al.: Securing additive manufacturing with blockchains and distributed physically unclonable functions. Cryptography 4(2), 17 (2020)

900

S. Jain et al.

9. Cambou, B., H´ely, D., Assiri, S.: Cryptography with analog scheme using memristors. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(4), 1–30 (2020) 10. Cambou, B., Mohammadi, M., Philabaum, C., Booher, D.: Statistical analysis to optimize the generation of cryptographic keys from physical unclonable functions. In: Science and Information Conference, pp. 302–321. Springer (2020). https://doi. org/10.1007/978-3-030-52243-8 22 11. Cambou, B., Orlowski, M.: PUF designed with resistive ram and ternary states. In: Proceedings of the 11th Annual Cyber and Information Security Research Conference, pp. 1–8 (2016) 12. Cambou, B., Philabaum, C., Booher, D., Telesca, D.A.: Response-based cryptographic methods with ternary physical unclonable functions. In: Future of Information and Communication Conference, pp. 781–800. Springer (2019). https:// doi.org/10.1007/978-3-030-12385-7 55 13. Cambou, B., Telesca, D.: Ternary computing to strengthen cybersecurity. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Computing, pp. 898–919. Springer International Publishing, Cham (2019) 14. Cambou, B., Telesca, D., Assiri, S., Michael, G., Jain, S., Partridge, M.: TRNGs from pre-formed reram arrays. Cryptography 5(1), 8 (2021) 15. Cambou, B.F., Quispe, R.C., Babib, B.: PUF with dissolvable conductive paths, May 28 2020. US Patent App. 16/493,263 16. Chen, A., Lin, M.-R.: Variability of resistive switching memories and its impact on crossbar array performance. In: 2011 International Reliability Physics Symposium, pp. MY–7. IEEE (2011) 17. Cui, Y., Wang, C., Liu, W., Chongyan, G., O’Neill, M., Lombardi, F.: Lightweight conﬁgurable ring oscillator PUF based on RRAM/CMOS hybrid circuits. IEEE Open J. Nanotechnol. 1, 128–134 (2020) 18. Fantini, A., et al.: Intrinsic switching variability in HFO 2 RRAM. In: 2013 5th IEEE International Memory Workshop, pp. 30–33. IEEE (2013) 19. Gassend, B., Clarke, D., Van Dijk, M., Devadas, S.: Silicon physical random functions. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, pp. 148–160 (2002) 20. Gassend, B.L.P.: Physical random functions. PhD thesis, Massachusetts Institute of Technology (2003) 21. Govindaraj, R., Ghosh, S., Katkoori, S.: Design, analysis and application of embedded resistive ram based strong arbiter PUF. IEEE Trans. Dependable Secure Comput. 17(6), 1232–1242 (2018) 22. Herder, C., Meng-Day, Yu., Koushanfar, F., Devadas, S.: Physical unclonable functions and applications: a tutorial. Proc. IEEE 102(8), 1126–1141 (2014) 23. Holcomb, D.E., Fu, K.: Bitline PUF: building native challenge-response PUF capability into any SRAM. In: International Workshop on Cryptographic Hardware and Embedded Systems, pp. 510–526. Springer (2014). https://doi.org/10.1007/978-3662-44709-3 28 24. Kang, H., Hori, Y., Katashita, T., Hagiwara, M., Iwamura, K.: Cryptographie key generation from PUF data using eﬃcient fuzzy extractors. In: 16th International Conference on Advanced Communication Technology, pp. 23–26. IEEE (2014) 25. Kim, J., Nili, H., Adam, G.C., Truong, N.D., Strukov, D.B., Kavehei, O.: Predictive analysis of 3D RERAM-based PUF for securing the internet of things. In: 2018 IEEE Region Ten Symposium (Tensymp), pp. 91–94. IEEE (2018) ¨ Sadeghi, A.-R.: Memristor PUFs: a new generation of 26. Koeberl, P., Kocaba¸s, U., memory-based physically unclonable functions. In: 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 428–431. IEEE (2013)

Bit Error Rate Analysis

901

27. Korenda, A.R., Afghah, F., Cambou, B.: A secret key generation scheme for internet of things using ternary-states reram-based physical unclonable functions. In: 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 1261–1266. IEEE (2018) 28. Korenda, A.R., Assiri, S., Afghah, F., Cambou, B.: An error correction approach to memristors PUF-based key encapsulation. In: 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), pp. 1–6. IEEE (2021) 29. Lee, J.W., Lim, D., Gassend, B., Suh, G.E., Van Dijk, M., Devadas, S.: A technique to build a secret key in integrated circuits for identiﬁcation and authentication applications. In: 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No. 04CH37525), pp. 176–179. IEEE (2004) 30. Maes, R.: Physically unclonable functions: constructions, properties and applications. Springer Science & Business Media (2013) 31. Miandoab, D.G., Assiri, S., Mihaljevic, J., Cambou, B.: Statistical analysis of RERAM-PUF based keyless encryption protocol against frequency analysis attack (2021) 32. Pappu, R., Recht, B., Taylor, J., Gershenfeld, N.: Physical one-way functions. Science 297(5589), 2026–2030 (2002) 33. Posch, R.: Protecting devices by active coating. J. Univ. Comput. Sci. 4(7), 652– 668 (1998) 34. Uddin, M., et al.: Design considerations for memristive crossbar physical unclonable functions. ACM J. Emerg. Technol. Comput. Syst. (JETC) 14(1), 1–23 (2017) 35. Uddin, M., Majumder, M.B., Rose, G.S.: Robustness analysis of a memristive crossbar PUF against modeling attacks. IEEE Trans. Nanotechnol. 16(3), 396– 405 (2017) 36. Yang, L., Kuegeler, C., Szot, K., Ruediger, A., Waser, R.: The inﬂuence of copper top electrodes on the resistive switching eﬀect in TIO 2 thin ﬁlms studied by conductive atomic force microscopy. Appl. Phys. Lett. 95(1), 013109 (2009) 37. Yoshimoto, Y., Katoh, Y., Ogasahara, S., Wei, Z., Kouno, K.: A RERAM-based physically unclonable function with bit error rate< 0.5% after 10 years at 125◦ c for 40 nm embedded application. In: 2016 IEEE Symposium on VLSI Technology, pp. 1–2. IEEE (2016) 38. Zahoor, F., Zulkiﬂi, T.Z.A., Khanday, F.A.: Resistive random access memory (RRAM): an overview of materials, switching mechanism, performance, multilevel cell (MLC) storage, modeling, and applications. Nanoscale Res. Lett. 15(1), 1–26 (2020) 39. Zhu, Y., Cambou, B., Hely, D., Assiri, S.: Extended protocol using keyless encryption based on memristors. In: Science and Information Conference, pp. 494–510. Springer (2020). https://doi.org/10.1007/978-3-030-52243-8 36

Correction to: Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises Yuri Bobbert, Jeroen Scheerder, and Tim Timmermans

Correction to: Chapter “Perspectives from 50+ Years’ Practical Zero Trust Experience and Learnings on Buyer Expectations and Industry Promises” in: K. Arai (Ed.): Intelligent Computing, LNNS 508, https://doi.org/10.1007/978-3-031-10467-1_53

The original version of the chapter was inadvertently published with incorrect reference number 21 and its citation, which has now been corrected. The chapter and the book have been updated with the change.

The updated original version of this chapter can be found at https://doi.org/10.1007/978-3-031-10467-1_53 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. Arai (Ed.): SAI 2022, LNNS 508, p. C1, 2023. https://doi.org/10.1007/978-3-031-10467-1_55

Author Index

A Aguma, J. Ceasar, 20 Ajgaonkar, Aditya, 470 Alami, Sara, 143 Almutairi, Abdulrahman, 693 Alshihri, Saad, 506 Alshoshan, Abdullah I., 693 Álvarez-González, Ricardo, 1 Anindita, Aurpa, 316 Annarelli, Alessandro, 710 Apiola, Mikko-Ville, 238 Assiri, Sareh, 602, 825, 882 Ayuyang, Dorothy M., 682 B Bellaj, Badr, 483 Beloff, Natalia, 388, 409 Berteanu, Mihai, 225 Bertin, Emmanuel, 483 Bhattacharya, Rituparna, 388, 409 Bobbert, Yuri, 847, 864 Booher, Duane, 602 Bowers, David S., 121 Bowker, Jack, 630 Broekx, Ronny, 225 Bullen, Christopher, 193 Burke, Ian, 532 C Cambou, Bertrand, 532, 602, 825, 882 Chen, Yong, 742 Chroboci´nski, Kuba, 214 Chu, Joanna Ting Wai, 193 Ciobanu, Ileana, 225

Colque, Milagros Vega, 331 Contreras, Gustavo A. Vanegas, 279 Crespi, Noel, 483 E Easttom, Chuck, 550 Essig, Kai, 776 F Fonticoli, Lavinia Foscolo, 710 G Ghafarian, Ahmad, 581 González-Campos, Edgar R., 1 Greiff, Paul, 347 H Hakkarainen, Kai, 238 Hanoune, Mostafa, 143 Hasan, Mahady, 316 Heynssens, Julie, 532 Hoppe, Uwe, 347 Hou, Feng, 193 Hu, Anlei, 742 Huang, Ching-Chun, 429 Humpire, Javier Apaza, 331 Hutton, William, 735 I Iliescu, Alina, 225 Iskandarani, Mahmoud Zaki, 95 Islam, Mahmudul, 316 Islam, Noushin, 316

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 K. Arai (Ed.): SAI 2022, LNNS 508, pp. 903–905, 2022. https://doi.org/10.1007/978-3-031-10467-1

904 J Jain, Saloni, 882 Jo, Hyeon, 756, 789 Jo, Sung-Hyun, 532 Johora, Fatema Tuj, 316 Jung, Hee Soo, 756, 789 K Keskin, Deniz, 581 Kim, Sang Hyun, 756, 789 Korhonen, Tiina, 238 Krause, André Frank, 776 Kritzinger, E., 617 Kunkel, Julian, 34, 67 L Lautenbach, G., 617 Levinskas, Matas, 511 Li, Yinan, 660 Liao, Shih-Wei, 429 Lipponen, Sofia, 238 Liu, Fang, 660 López, Miguel A. Acuña, 279 M Ma, Zhizhong, 193 Marin, Andreea, 225 Marina, Boronenko, 294 Medina, Heberto Ferreira, 279 Mezrioui, Abdellatif, 483 Miandaob, Dina Ghanai, 602 Migeon, Jean-Hugues, 847 Mihalkovich, Aleksejus, 511 Miller, Adam L., 359 Mohanraj, Renuka, 804 Moina, Miguel Tupayachi, 331 Mozaffari, M. Hamed, 204 N Neelavathy Pari, S, 646 Nieto-Chaupis, Huber, 169 Nonino, Fabio, 710 O Oksana, Isaeva, 294 Ophoff, Jacques, 630 Ouaddah, Aafaf, 483 Ozioko, Ekene Frank, 34, 67 P Padmanabhan, Jayashree, 646 Palombi, Giulia, 710 Pan, Lanlan, 742 Park, Sooyong, 506 Park, Woo Young, 756, 789

Author Index Patel, Dhiren, 470 Petrov, Milen, 159 Pilkington, C., 258 Piórkowska, Katarzyna, 214 Priya, Vishnu, 646 Q Qiu, Ruonan, 742 Qiu, Yuanhang, 193 Quiroz-Hernández, Nicolás, 1 R Raghani, Anuj, 470 Rahimi, Ilan Daniels, 379 Rajakumar, Balaji Rajaguru, 646 Ramírez, María E. Benítez, 279 Reinken, Carla, 347 Riggs, Brit, 532 Rios, Manuel Aguilar, 825 S Sai Ganesh, S, 646 Sakalauskas, Eligijus, 511 Sánchez-Gálvez, Alba M., 1 Scheerder, Jeroen, 864 Seitamaa, Aino, 238 Serrano, J. Artur, 225 Shanbhag, Sanket, 470 Sheth, Bhavya, 470 Shukla, Dyuwan, 470 Singh, Satwinder, 193 Song, Chang Han, 756, 789 Sorysz, Danuta, 179 Sorysz, Joanna, 179 Stahl, Fredric, 34, 67 Sulla-Torres, José, 331 Sumuano, Jesús L. Soto, 279 Surya Siddharthan, S, 646 T Tay, Li-Lin, 204 Timmermans, Tim, 864 V Val Danilov, Igor, 305 Valdez, José L. Cendejas, 279 van Biljon, J., 258 van der Merwe, R., 258 Vidal, Maria Guerra, 331 Vu, Duy-Son, 756, 789 W Wang, Anyu, 742 Wang, Ruili, 193

Author Index

905

Wang, Zan-Jun, 429 White, Martin, 388, 409 Wilson, Taylor, 532, 882 Witarski, Wojciech, 214 Wylde, Allison, 837

Yosifov, Georgi, 159 Yuan, Zih-shiuan Spin, 429 Yuri, Boronenko, 294

Y Yang, Minghui, 742

Z Ziegler, Gabriela, 448