508 24 89MB
English Pages XII, 761 [773] Year 2021
Advances in Intelligent Systems and Computing 1252
Kohei Arai Supriya Kapoor Rahul Bhatia Editors
Intelligent Systems and Applications Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 3
Advances in Intelligent Systems and Computing Volume 1252
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Kohei Arai Supriya Kapoor Rahul Bhatia •
•
Editors
Intelligent Systems and Applications Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 3
123
Editors Kohei Arai Saga University Saga, Japan
Supriya Kapoor The Science and Information (SAI) Organization Bradford, West Yorkshire, UK
Rahul Bhatia The Science and Information (SAI) Organization Bradford, West Yorkshire, UK
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-55189-6 ISBN 978-3-030-55190-2 (eBook) https://doi.org/10.1007/978-3-030-55190-2 © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface
This book contains the scientific contributions included in the program of the Intelligent Systems Conference (IntelliSys) 2020, which was held during September 3–4, 2020, as a virtual conference. The Intelligent Systems Conference is a prestigious annual conference on areas of intelligent systems and artificial intelligence and their applications to the real world. This conference not only presented state-of-the-art methods and valuable experience from researchers in the related research areas, but also provided the audience with a vision of further development in the fields. We have gathered a multi-disciplinary group of contributions from both research and practice to discuss the ways how intelligent systems are today architectured, modeled, constructed, tested and applied in various domains. The aim was to further increase the body of knowledge in this specific area by providing a forum to exchange ideas and discuss results. The program committee of IntelliSys 2020 represented 25 countries, and authors submitted 545 papers from 50+ countries. This certainly attests to the widespread, international importance of the theme of the conference. Each paper was reviewed on the basis of originality, novelty and rigorousness. After the reviews, 214 were accepted for presentation, out of which 177 papers are finally being published in the proceedings. The conference would truly not function without the contributions and support received from authors, participants, keynote speakers, program committee members, session chairs, organizing committee members, steering committee members and others in their various roles. Their valuable support, suggestions, dedicated commitment and hard work have made the IntelliSys 2020 successful. We warmly thank and greatly appreciate the contributions, and we kindly invite all to continue to contribute to future IntelliSys conferences.
v
vi
Editor’s Preface
It has been a great honor to serve as the General Chair for the IntelliSys 2020 and to work with the conference team. We believe this event will certainly help further disseminate new ideas and inspire more international collaborations. Kind Regards, Kohei Arai Conference Chair
Contents
Multimodal Affective Computing Based on Weighted Linear Fusion . . . Ke Jin, Yiming Wang, and Cheng Wu
1
Sociophysics Approach of Simulation of Mass Media Effects in Society Using New Opinion Dynamics . . . . . . . . . . . . . . . . . . . . . . . . Akira Ishii and Nozomi Okano
13
ORTIA: An Algorithm to Improve Quality of Experience in HTTP Adaptive Bitrate Streaming Sessions . . . . . . . . . . . . . . . . . . . . Usman Sharif, Adnan N. Qureshi, and Seemal Afza
29
Methods and Means for Analyzing Heat-Loss in Buildings for Increasing Their Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . Veneta Yosifova
45
The Energy Conservation and Consumption in Wireless Sensor Networks Based on Energy Efficiency Clustering Routing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaudence Stanslaus Tesha and Muhamed Amanul Intelligent Control of Traffic Flows Under Conditions of Incomplete Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena Sofronova and Askhat Diveev
55
73
Virtual Dog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . José Luis Pastrana-Brincones
88
Context-Aware Transfer of Task-Based IoT Service Settings . . . . . . . . . Michael Zipperle, Achim Karduck, and In-Young Ko
96
Agent-Based Architectural Models of Supply Chain Management in Digital Ecosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Alexander Suleykin and Natalya Bakhtadze
vii
viii
Contents
A Deep Insight into Signature Verification Using Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Umair Muneer Butt, Fatima Masood, Zaib unnisa, Shahid Razzaq, Zaheer Dar, Samreen Azhar, Irfan Abbas, and Munib Ahmad Measures to Ensure the Reliability of the Functioning of Information Systems in Respect to State and Critically Important Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Askar Boranbayev, Seilkhan Boranbayev, and Askar Nurbekov IoTManager: Concerns-Based SDN Management Framework for IoT Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Radwa Hamed, Mohamed Rizk, and Bassem Mokhtar JomImage: Weight Control with Mobile SnapFudo . . . . . . . . . . . . . . . . 168 Viva Vivilyana, P. S. JosephNg, A. S. Shibghatullah, and H. C. Eaw Smart Assist System for Driver Safety . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Etee Kawna Roy and Shubhalaxmi Kher On the Applicability of 2D Local Binary Patterns for Identifying Electrical Appliances in Non-intrusive Load Monitoring . . . . . . . . . . . . 188 Yassine Himeur, Abdullah Alsalemi, Faycal Bensaali, Abbes Amira, Christos Sardianos, Iraklis Varlamis, and George Dimitrakopoulos Management of Compressed Air to Reduce Energy Consumption Using Intelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Mohamad Thabet, David Sanders, Malik Haddad, Nils Bausch, Giles Tewkesbury, Victor Becarra, Tom Barker, and Jake Piner Multi-platform Mission Planning Based on Distributed Planning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Yang Guo and Shao-chi Cheng Development of Artificial Intelligence Based Module to Industrial Network Protection System . . . . . . . . . . . . . . . . . . . . . . . . 229 Filip Holik, Petr Dolezel, Jan Merta, and Dominik Stursa Learning and Cognition in Financial Markets: A Paradigm Shift for Agent-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Johann Lussange, Alexis Belianin, Sacha Bourgeois-Gironde, and Boris Gutkin Agent-Based Simulation for Testing Vehicle-On-Demand Services in Rural Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Marius Becherer and Achim Karduck Overcrowding Detection Based on Crowd-Gathering Pattern Model . . . 270 Liu Bai, Chen Wu, and Yiming Wang
Contents
ix
Multi-person Spatial Interaction in a Large Immersive Display Using Smartphones as Touchpads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Gyanendra Sharma and Richard J. Radke Predicting Vehicle Passenger Stress Based on Sensory Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Dario Niermann and Andreas Lüdtke Infinite Mixtures of Gaussian Process Experts with Latent Variables and its Application to Terminal Location Estimation from Multiple-Sensor Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Ryo Hanafusa, Jiro Ebara, and Takeshi Okadome Flying Sensor Network Optimization Using Bee Intelligence for Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Abdu Salam, Qaisar Javaid, Gohar Ali, Fahad Ahmad, Masood Ahmad, and Ishtiaq Wahid A Quantum Model for Decision Support in a Sensor Network . . . . . . . 340 Shahram Payandeh Design and Implementation of a Flexible Platform for Remote Monitoring of Environmental Variables . . . . . . . . . . . . . . . . . . . . . . . . . 353 Francisco de Izaguirre, Maite Gil, Marco Rolón, Nicolás Pérez, and Pablo Monzón Adaptable Embedding Algorithm to Secure Stream Data in the Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Mohammad Amanul Islam Intelligent Monitoring Systemof Environmental Biovariables in Poultry Farms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Gabriela Chiluisa-Velasco, Johana Lagla-Quinaluisa, David Rivas-Lalaleo, and Marcelo Alvarez-Veintimilla Effect of Analysis Window and Feature Selection on Classification of Hand Movements Using EMG Signal . . . . . . . . . . . . . . . . . . . . . . . . . 400 Asad Ullah, Sarwan Ali, Imdadullah Khan, Muhammad Asad Khan, and Safiullah Faizullah A Convolutional Neural Network Approach for Quantification of Tremor Severity in Neurological Movement Disorders . . . . . . . . . . . . 416 Rajesh Ranjan, Braj Bhushan, Marimuthu Palaniswami, and Alok Verma Brain MR Imaging Segmentation Using Convolutional Auto Encoder Network for PET Attenuation Correction . . . . . . . . . . . . . . . . 430 Imene Mecheter, Abbes Amira, Maysam Abbod, and Habib Zaidi
x
Contents
Colorimetric Analysis of Images Based on Objective Color Data . . . . . . 441 Valery V. Bakutkin, Ilya V. Bakutkin, Yuriy N. Zayko, and Vladimir A. Zelenov A Deep Learning-Based Approach for the Classification of Gait Dynamics in Subjects with a Neurodegenerative Disease . . . . . . 452 Giovanni Paragliola and Antonio Coronato Smartphone-Based Diabetic Retinopathy Severity Classification Using Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Sarah Sheikh and Uvais Qidwai Alzheimer Disease Prediction Model Based on Decision Fusion of CNN-BiLSTM Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 482 Shaker El-Sappagh, Tamer Abuhmed, and Kyung Sup Kwak On Mistakes We Made in Prior Computational Psychiatry Data Driven Approach Projects and How They Jeopardize Translation of Those Findings in Clinical Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Milena Čukić, Dragoljub Pokrajac, and Viktoria Lopez Machine Learning Strategies to Distinguish Oral Cancer from Periodontitis Using Salivary Metabolites . . . . . . . . . . . . . . . . . . . . 511 Eden Romm, Jeremy Li, Valentina L. Kouznetsova, and Igor F. Tsigelny Smart Guide System for Blind People by Means of Stereoscopic Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Jesús Jaime Moreno Escobar, Oswaldo Morales Matamoros, Ricardo Tejeida Padilla, Jhonatan Castañón Martínez, and Mario Mendieta López An IoMT System for Healthcare Emergency Scenarios . . . . . . . . . . . . . 545 Tomás Jerónimo, Bruno Silva, and Nuno Pombo Introducing Time-Delays to Analyze Driver Reaction Times When Using a Powered Wheelchair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 David Sanders, Malik Haddad, Martin Langner, Peter Omoarebun, John Chiverton, Mohamed Hassan, Shikun Zhou, and Boriana Vatchova Intelligent Control and HCI for a Powered Wheelchair Using a Simple Expert System and Ultrasonic Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 571 David Sanders, Malik Haddad, Peter Omoarebun, Favour Ikwan, John Chiverton, Shikun Zhou, Ian Rogers, and Boriana Vatchova Intelligent System to Analyze Data About Powered Wheelchair Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 Malik Haddad, David Sanders, Martin Langner, Mohamad Thabet, Peter Omoarebun, Alexander Gegov, Nils Bausch, and Khaled Giasin
Contents
xi
Intelligent Control of the Steering for a Powered Wheelchair Using a Microcomputer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Malik Haddad, David Sanders, Martin Langner, Nils Bausch, Mohamad Thabet, Alexander Gegov, Giles Tewkesbury, and Favour Ikwan Intelligent Risk Prediction of Storage Tank Leakage Using an Ishikawa Diagram with Probability and Impact Analysis . . . . . . . . . 604 Favour Ikwan, David Sanders, Malik Haddad, Mohamed Hassan, Peter Omoarebun, Mohamad Thabet, Giles Tewkesbury, and Branislav Vuksanovic Use of the Analytical Hierarchy Process to Determine the Steering Direction for a Powered Wheelchair . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Malik Haddad, David Sanders, Mohamad Thabet, Alexander Gegov, Favour Ikwan, Peter Omoarebun, Giles Tewkesbury, and Mohamed Hassan Methodology of Displaying Surveillance Area of CCTV Camera on the Map for Immediate Response in Border Defense Military System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Hyungheon Kim, Taewoo Kim, and Youngkyun Cha Detecting Control Flow Similarities Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 André Schäfer and Wolfram Amme Key to Artificial Intelligence (AI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Bernhard Heiden and Bianca Tonino-Heiden US Traffic Sign Recognition Using CNNs . . . . . . . . . . . . . . . . . . . . . . . . 657 W. Shannon Brown, Kaushik Roy, and Xiaohong Yuan Grasping Unknown Objects Using Convolutional Neural Networks . . . . 662 Pranav Krishna Prasad, Benjamin Staehle, Igor Chernov, and Wolfgang Ertel A Proposed Technology IoT Based Ecosystem for Tackling the Marine Beach Litter Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 Stavros T. Ponis Machine Learning Algorithms for Preventing IoT Cybersecurity Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Steve Chesney, Kaushik Roy, and Sajad Khorsandroo Development of Web-Based Management System and Dataset for Radiology-Common Data Model (R-CDM) and Its Clinical Application in Liver Cirrhosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687 SeungJin Kim, Chang-Won Jeong, Tae-Hoon Kim, ChungSub Lee, Si-Hyeong Noh, Ji Eon Kim, and Kwon-Ha Yoon
xii
Contents
Shared Autonomy in Web-Based Human Robot Interaction . . . . . . . . . 696 Yug Ajmera and Arshad Javed tanh Neurons Are Bayesian Decision Makers . . . . . . . . . . . . . . . . . . . . . 703 Christian Bauckhage, Rafet Sifa, and Dirk Hecker Solving Jigsaw Puzzles Using Variational Autoencoders . . . . . . . . . . . . 708 Mostafa Korashy, Islam A. T. F. Taj-Eddin, Mahmoud Elsaadany, and Shoukry I. Shams Followers of School Shooting Online Communities in Russia: Age, Gender, Anonymity and Regulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Anastasia Peshkovskaya, Yuliya Mundrievskaya, Galina Serbina, Valeria Matsuta, Vyacheslav Goiko, and Artem Feshchenko Discrimination of Chronic Liver Disease in Non-contrast CT Images using CNN-Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 Tae-Hoon Kim, Si-Hyeong Noh, Chang-Won Jeong, ChungSub Lee, Ji Eon Kim, SeungJin Kim, and Kwon-Ha Yoon Analysis and Classification of Urinary Stones Using Deep Learning Algorithm: A Clinical Application of Radiology-Common Data Model (R-CDM) Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 Si-Hyeong Noh, SeungJin Kim, Ji Eon Kim, Chung-Sub Lee, Seng Chan You, Tae-Hoon Kim, Yun Oh Lee, Ilseok Chae, Rae Woong Park, Sung Bin Park, Kwon-Ha Yoon, and Chang-Won Jeong Intelligent Monitoring Using Hazard Identification Technique and Multi-sensor Data Fusion for Crude Distillation Column . . . . . . . . 730 Peter Omoarebun, David Sanders, Favour Ikwan, Mohamed Hassan, Malik Haddad, Mohamad Thabet, Jake Piner, and Amjad Shah Factors Affecting the Organizational Readiness to Design Autonomous Machine Systems: Towards an Evaluation Framework . . . 742 Valtteri Vuorimaa, Eetu Heikkilä, Hannu Karvonen, Kari Koskinen, and Jouko Laitinen RADAR: Fast Approximate Reverse Rank Queries . . . . . . . . . . . . . . . . 748 Sourav Dutta Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Multimodal Affective Computing Based on Weighted Linear Fusion Ke Jin(B) , Yiming Wang, and Cheng Wu School of Urban Rail Transportation, Soochow University, Suzhou, China [email protected]
Abstract. Affective computing is one of the most important research directions in human-computer interaction system which gains increasing popularity. However, the traditional affective computing methods all make decisions based on unimodal signals, which has low accuracy and poor feasibility. In this article, the final classification decision is made from the perspective of multimodal fusion results which combines the decision results of both text emotion network and visual emotion network through weighted linear fusion algorithm. It is obvious that the speaker’s intention can be better understood by observing the speaker’s expression, listening to the speaker’s tone of voice and analyzing the words. Combining auditory, visual, semantic and other modes certainly provides more information than a single mode. Video information often contains a variety of modal characteristics, only one mode is always not enough to describe all aspects of the overall video stream characteristic information. Keywords: Semantic affective analysis · Facial expression analysis · Human-computer interaction · Weighted linear fusion
1 Introduction With the development of the social network people have more and more diversified ways to express their emotions on social platform. Such as through pictures, videos and texts. So how to recognize emotions in multimodal data (paper, voice, image, text, and sensor data) is an opportunity and challenge facing the current field of emotion analysis. Previous affective computing focus on analyzing data got in single mode. But the results got by this way are incomplete. Multimodal data contains more information than unimodal data [1]. And multiple modes can complement with each other which can help AI system understand the user’s emotion better. From the perspective of humancomputer interaction, multimodal affective computing can make machines interact with people in a more natural way. AI system can interpret the user’s emotions based on the expressions and gestures of the people in the images, the tones in the voice, and the natural languages in text. To sum up, the development of multimodal affective computing technology stems from the needs of real life, which makes machines understand emotions more accuracy. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 1–12, 2021. https://doi.org/10.1007/978-3-030-55190-2_1
2
K. Jin et al.
Because the traditional emotion recognition problems are predicted in the standard library and controlled environment, the results of single-task and single-perspective model cannot truly reflect the actual emotional state of the tested personnel. Therefore, we need to learn the joint model based on the data of different modes through multiperspective and multi-task methods, so as to improve the accuracy of emotion state estimation, which will meet the needs of practical engineering. Therefore, to create a new database which integrates with a variety of emotional factors according to the demands of a real practical engineering is necessary. And the algorithm must be highly efficient and easy to implement. The timeless of the system is also important in addition to the way of understanding the emotion. Most of algorithms are proposed based on the standard database (CK+, FER2013, voice database), which has poor real-time performance and is difficult to be applied into a real project. And the database itself has shortcomings, such as incomplete database, small data volume and many other factors [2]. The evaluation index of the algorithm only cares about the accuracy of the standard database but ignores the timeless, feasibility, practicability in the actual engineering application. Although the algorithm can achieve high accuracy on the standard database, the efficiency and accuracy in practical application are still not high. The rest of the paper is organized as below. Section 2 describes the related work in affective computing and summarizes the shortages and advantages in mainstream research direction. Section 3 describes the whole architecture and algorithm. Section 4 describes the experiment environment and shows the simulation results and curves. Section 5 gets the conclusion.
2 Related Work Due to the importance of affective computing, the research of affective has been widely concerned by researchers in facial expressions, speech identification, and natural language process (NLP). A lot of research has been carried out in the fields of classification and fusion of information from different modal, which makes great progress. In 1971, Ekman and Friesen suggested that human emotions can be divided into six kinds: happy, sad, angry, disgust, surprise and fear [3]. Furthermore, FACS [4] was proposed from the perspective of anatomy. Since human being expressed emotions in a variety of ways, the analysis of information based on single mode is not enough and incomplete. In fact human behavior in multimodal affective computing is mutually complementary and indispensable [5]. Currently main signal mode of affective computing is visual, audio, NLP and physiological signal. 2.1 Emotion Analysis Based on Visual Signal Visual is the main way of human beings to perceive outside world. In terms of emotion recognition, facial expression is undoubtedly one of the most important emotion pattern. The algorithm implementation steps of facial expression recognition is divided into face detection and pre-processing, expression feature extraction and expression feature classification [5]. In the field of face detection, traditional and typical methods are based
Multimodal Affective Computing Based on Weighted Linear Fusion
3
on VJ detector relies on Ada boost and Haar characteristic [6], face detector relies on LBP [7] characteristic and DPM [8] detector relies on HOG characteristic [9]. These kinds of face detector perform well in controlled experimental environment but the performance degrades in outdoor scenes in practice. The mainly reason is the features extracted manually are sensitive to illumination, light, occlusion and gesture [8, 10]. In recent years deep learning performs well and gradually becomes a mainstream research direction in the field of facial expression. Facial expression recognition and face detection based on deep learning can be divided into three categories: Cascade CNN method [11–13], proposal based on twostep approach [14, 15], one-step method [16–19]. In the aspect of visual expression feature extraction, traditional method is extracted features manually then categorizes the features by using machine learning methods. For example, Gabor filter in reference [20], LBP features in reference [21], HOG features in reference [22]. However the performance of these manual extracted features is very poor in the actual scene, which is easy to disturbed by the outside environment. Deep learning can extract appropriate features automatically. So features of facial expression are mostly extracted by deep neutral network. But there still exist problems in deep learning method. Deep learning neutral network needs a lot of time to train the parameter, which needs high quality of training data. And the training method and loss-function is also difficult to define [23]. 2.2 Emotion Analysis Based on Text Emotion recognition based on text is an important research direction in NLP (natural language process), which analyzes, processes and extracts subjective texts by using text mining technique. The application of emotion analysis in text is widely increasing. Especially emotion recognition in conversations (ERC) is a challenging task that has recently gained increasing popularity due to its potential applications. According to Poria et al. (2019) [24], ERC presents several challenges such as conversational context modeling, emotion shift of the interlocutors, and others, which make the task more difficult to address. Many scholars have introduced RNN and CNN into the field of text mining to extract emotional words to judge whether the whole sentence is positive or negative. But the pure text emotion ignores the information of the whole context, and it is not very accurate to predict the actual person’s emotion. Simply judging the emotion of words in a sentence isolating from context can’t really reflect a person’s emotion. 2.3 Multimodal Fusion Affective Computing Because of the shortcomings of unimodal emotion analysis, more and more scholars devote themselves to the study in multimodal affective computing. However, due to the great difference of different signals in different feature spaces, it is difficult to reflect the characteristics of different signals at the same time. Two kinds of fusion methods are adopted: pre-classifier emotion feature expression fusion and classifier fraction fusion. Mainly fusion technique can be divided into feature-level fusion, decision-level fusion, hybrid-multimodal fusion, model-level fusion and rule-based level [25]. The Reference [26] proposed a Mod drop multimodal fusion method [25], which is proved to be more accurately and efficiently than the way of feature vector fusion.
4
K. Jin et al.
Reference [27] proposed CCA (Canonical Correlation Analysis) method to carry out the fusion feature according to the correlation of different features. However neither method consider scenario and context information. It is necessary to consider that not only the speaker information but also the scene and context information of the dialogue, and the result is greatly influenced by the content of the previous dialogue. Due to this issue, Ref. [28] use dialog RNN which contains 3 GRU to model the contextual information. But in the model, the global contextual state, the actor state, needs to be predefined. In practical engineering, the global context state is not constant. The people involved in the conversation are not fixed. Dialog information from past scenarios can affect current states. Therefore, the real-time performance of this algorithm is not very good, and it is difficult to achieve. To solve the above problems, this paper designs the networks for text and semantics respectively and trains both the network parameters according to the dialogues and facial expressions in the paragraphs. The results of expression network and semantic network are fused by linear weighting algorithm. And the judgement of the sentiment is made from the fusion result vector. This approach takes into account both contextual information and the speaker’s multiple emotional modes: expression and semantics. The two modes compensate with each other and improve the accuracy of emotion recognition results. In order to reflect the real-time performance of the whole network, based on the original MELD database, we collected the facial expressions of people in the video of corresponding sentences and established the corresponding facial expressions database.
3 Overall Algorithm Architecture 3.1 Semantic Network Architecture We chose text data from MELD (Multimodal Emotion Lines Dataset) database as training and testing data of semantic network, which contain dialogues from the popular TVseries Friends with more than two speakers. There are 10478 sentences and 1038 scenes. Overall, we built the training set, test set and validation set according to the paragraph scenario. We define 24 sentences per scene, with a maximum of 50 words per sentence. Each word corresponds to a coded number. If not enough sentence in a dialog or not enough words in a sentence, zero padding is available. So the dimension of the training tensor is (None, 24, 50). We divided sentiments into 3 categories: negative, neutral and positive. We design a multi-layer CNN to train the tensors. The CNN contains three convolutional layers, three pooling layers, and one fully connected layer. The full connected layer define as softmax. We choose the reLU function as activation function in convolutional layer and tanh function as activate function in full connected layer. The architecture of the CNN is shown as Fig. 1. The training method uses adadelta. And the loss function is categorical cross entropy. Semantic features are extracted by three-layer convolutional network and generalized and pooled by three-layer pooling layer to reduce computation. Semantic features are integrated through embedding. We put 24 sentences in a context into a whole paragraph. We trained the parameter in paragraph. In this way, the network can fully learn the contextual information, so that the network get higher accuracy in emotion recognition.
Multimodal Affective Computing Based on Weighted Linear Fusion Start
Epoch=100
Batch_size=50
embedding_dim
Vocabulary_size
3*embedding_dim activaltion function:reLU (num of convolutional filter is 512) Conv_2D:
4*embedding_dim, Activation function :reLU (num of convolutional filter is 512) Conv_2D:
Conv_2D:5*embedding_dim, activation function:reLU (num of convolutional filter is 512)
MaxPool2D max_pool the dimension of pool filter (sentence_length - 3 + 1, 1)
MaxPool2D max_pool the dimension of pool filter (sentence_length - 4 + 1, 1)
MaxPool2D max_pool the dimension of pool filter (sentence_length - 5 + 1, 1)
Full connect layer:Dense,100 neuron nodes activation function:tanh
Output layer:Dense,The number of neurons is the number of classes needed
Reshape layer
End
Fig. 1. The architecture of semantic CNN
5
6
K. Jin et al.
3.2 Facial Expression Recognition 3.2.1 Facial Database Established We collected the facial expressions images of people in the video of corresponding sentences and established the corresponding facial expressions database. The facial images used in facial expression are all manually extracted by the tool LABELME. Then we located the middle point of two eyes which is used as center of rotation to rotate the face to align the facial features. The face is shown as Fig. 2.
Fig. 2. Facial image after aligned
Each facial image corresponds to the sentence in the semantic network. The serial number before the facial image is the number of the sentences in the MELD. Each facial image is named in the same format. In the middle of the name is the corresponding sentiment classification label. The characters in the video were trained through a multilayer CNN network. The facial expressions also divided into 3 labels: negative sentiment, neutral sentiment and positive sentiment. 3.2.2 Facial Expression CNN Architecture The CNN neutral network contains 3 convolutional layers, 4 max pool layers, 2 full connected layers, 2 dropout layers. The whole architecture is shown as Fig. 3. During the training process, the optimizer is selected as Adam, and the loss function is selected as: sparse categorical cross entropy. The initial learning rate is chosen to be 1e−3. As the training goes on, when the error of the loss function reaches the set value of the threshold, the learning rate is modified to 1e−4 to fine-tune the parameter. 3.2.3 Linear Fusion Weighted Algorithm Linear weighted fusion algorithm is one of the simplest fusion algorithm to apply, which is very convenient for engineering implementation. Each sample included one sentence
Multimodal Affective Computing Based on Weighted Linear Fusion Start
Model=Sequential()
Input tensor width=128 Height=128 channels=1 Convolutional layer::The number of conv filter is 32,the dimension of each filter is (5,5),activation function is reLU
Max pool layer: (2,2)
Convolutional layer::The number of conv filter is 32,the dimension of each filter is (3,3),activation function is reLU
Max pool layer: (2,2)
Convolutional layer The number of conv filter is 64,the dimension of each filter is (3,3),activation function is reLU
Max pool layer: (2,2)
Convolutional layer The number of conv filter is 64,the dimension of each filter is (5,5),activation function is reLU
Max pool layer: (2,2)
Flatten()
Full connected layer number of nodes is 512 activation function is :reLU
dropout:0.5
Full connected layer number of nodes is 512 activation function:reLU
dropout:0.5
Output layer:the number of nodes is the number of classes needed
end
Fig. 3. The architecture of facial expression CNN
7
8
K. Jin et al.
and several facial images corresponding to the dialogue. Both the facial expression network and the semantic network will output the probability corresponding to three emotional categories. By linear weighted fusion of the outputs of the two networks, we can obtain a fused output vector. We make emotional classification based on the classification vector after fusion, the dimension of the fusion vector is 3*1. The index of the largest element is the final classification of the emotion. The algorithm is shown as formula (1). The overall fusion architecture is shown in Fig. 4. (1) Score(u, i) = βk reck (u, i)
Fig. 4. Overall algorithm fusion architecture
4 Experiment We realized and implemented the networks in python with keras and tensorflow. Network structure and training curve were drawn in tool Tensorboard. We use 10478 sentences in MELD to train the semantic networks. And the facial images corresponding to the sentence is used to train the facial networks. The training curves are shown as Fig. 5. 4.1 The Training Curves of the Semantic Network Figure 5 shows the training curves of loss function changing of the semantic networks. The curves indicate that although the curves fluctuate, it is still going down steadily, which indicates that the semantic networks is running normally without overfitting. 4.2 The Training Curves of Facial Expression Networks We have used facial images correspond to the sentences to train the facial network. Figure 6 shows the training curves of loss function in facial network. The expression network loss function can be attenuated to 0 which indicates that indicates that the facial expression network has reached the maximum saturation level.
Multimodal Affective Computing Based on Weighted Linear Fusion
9
Fig. 5. The training curves of the semantic network
Fig. 6. The training curves of the facial expressions networks
4.3 Network Test Results The number of test samples are 87, which contains 87 sentences corresponds to 242 facial images. The results are presented as confusion matrix, which is shown as Tables 1, 2 and 3. The data in Table 3 shows that the accuracy after fusion is higher than individual facial network and semantic network. The results indicates that our algorithm works effectively.
10
K. Jin et al. Table 1. Results of individual semantic network Label
Precision
Recall
Fi-score
Support
Neutral
0.72
0.88
0.79
24
Positive
0.50
0.43
0.46
14
Negative
0.85
0.80
0.82
49
Micro avg 0.76
0.76
0.76
87
Macro avg
0.69
0.70
0.69
87
Weighted avg
0.76
0.76
0.76
87
Table 2. Results of individual facial expression network Label
Precision
Recall
Fi-score
Neutral
0.68
0.48
0.56
Support 44
Positive
0.61
0.61
0.61
36
Negative
0.85
0.91
0.88
162
Micro avg 0.79
0.79
0.79
242
Macro avg
0.71
0.67
0.68
242
Weighted avg
0.78
0.79
0.78
242
Table 3. The results of fusion Label
Precision
Recall
Fi-score
Support
Neutral
0.89
Positive
0.64
0.67
0.76
24
0.64
0.64
14
Negative
0.84
0.94
0.88
49
Micro avg 0.82
0.82
0.82
87
Macro avg
0.79
0.75
0.76
87
Weighted avg
0.82
0.82
0.81
87
Multimodal Affective Computing Based on Weighted Linear Fusion
11
5 Conclusion The algorithm proposed in this paper combines the facial expression network on the semantic network, which improves the accuracy of sentiment prediction. The network can effectively obtain the textual information through the way of training by paragraph. The real-time expression in the video extracted as the training samples has improved the real-time performance of the network. The whole algorithm is easy to implement and highly feasible.
References 1. D’mello, S.K., Kory, J.: A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. 47(3), 43–79 (2015) 2. Sharma, A., Anderson, D.V.: Deep emotion recognition using prosodic and spectral feature extraction and classification based on cross validation and bootstrap. In: Signal Processing and Signal Processing Education Workshop, pp. 421–425. IEEE (2015) 3. Eknam, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124–129 (1971) 4. Ekman, P., Friesen, W.V.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto: Consul. Psychol. 17(2), 124–129 (1971) 5. Solymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. (TAC) 3(2), 211–223 (2012) 6. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22(12), 1424–1445 (2000) 7. Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vis (IJCV) 57(2), 137–154 (2004) 8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005) 9. Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29(1), 51–59 (1996) 10. Zhang, L., Chu, R., Xiang, S.: Face detection based on multi-block LBP representation. In: International Conference on Biometrics, pp. 11–18 (2007) 11. Li, H., Lin, Z., Shen, X.: A convolutional neural network cascade for face detection. In: CVPR, pp. 5325–5334 (2015) 12. Yang, S., Luo, P., Loy, C.-C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: IEEE International Conference on Computer Vision (ICCV), pp. 3676–3684 (2015) 13. Zhang, K., Zhang, Z., Li, Z., Qiao, Yu.: Joint face detection and alignment using multi task cascaded convolutional networks. Signal Process. Lett. 23(10), 1499–1503 (2016) 14. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017) 15. Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 650–657 (2017) 16. Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: SSH: single stage head less face detector. In: ICCV, pp. 4875–4884 (2017) 17. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016) 18. Zhang, S.F., Zhu, X.Y., Lei, Z., et al.: S3FD: singles hot scale-invariant face detector. In: ICCV, pp. 192–201 (2017)
12
K. Jin et al.
19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016) 20. Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. (TIP) 11(4), 467–476 (2002) 21. Shan, C., Gong, S., Mcowan, P.W.: Facial expression Recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009) 22. Mavadatis, M., Mahoor, M.H., Bartlett, K., et al.: DISFA: a spontaneous facial action intensity database. TAC 4(2), 151–160 (2013) 23. Tan, L., Zhang, K., Wang, K., et al.: Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In: ICMI, pp. 549–552 (2017) 24. Soujanya, P., Navonil, M., Rada, M., Eduard, H.: Emotion recognition in conversation: research challenges, datasets, and recent advances (2019). arXiv preprint arXiv:1905.02947 25. Shoumy, N.J., Ang, L.M., Seng, K.P., Rahaman, D.M.M., Zia, T.: Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 149, 102447 (2020) 26. Neverova, N., Wolf, C., Taylor, G., et al.: Moddrop: adaptive multi-modal gesture recognition. TPAMI 38(8), 1692–1706 (2016) 27. Weenink, D.: Canonical Correlation Analysis Inference For Functional Data With Applications. Springer, New York (2012) 28. Majumder, N., Poria, S., Hazarika, D., et al.: Dialogue RNN: an attentive RNN for emotion detection in conversations. In: National Conference on Artificial Intelligence, pp. 6818–6825 (2019)
Sociophysics Approach of Simulation of Mass Media Effects in Society Using New Opinion Dynamics Akira Ishii(B) and Nozomi Okano Department of Applied Mathematics and Physics, Tottori University, Koyama, Tottori 680-8552, Japan [email protected] http://www.damp.tottori-u.ac.jp/lab3/index.html
Abstract. We simulated the effects of mass media in society using a new theory of opinion dynamics that incorporated both trust and distrust into human relationships. We calculated not only the case where the media works uniformly for the people of society, but also the case of microtargeting where the media works only for those who have weak opinions. We also calculated the mass media effect that drives people to a particular opinion. The network of people is a random network, the coefficient of trust between people is determined by random numbers, and half of human relationships are untrusted.
Keywords: Opinion dynamics Media effect
1
· Trust · Distrust · Advertisement ·
Introduction
The mass media effect to society is very interesting and very important topics in sociology. The effect of mass media to opinions of people in society is very significant in the field of politics, sociology and marketing science. There are many works on mass media and public opinion [1,2]. However, there are few works for the mass media effect in opinion dynamics. Opinion dynamics discusses the movement of opinion in society, taking into account the influence between people. Therefore, it is very natural to include the influence of the mass media there. However, there is no such study. For analysis of social opinion via social network service, such study of mass media effects on opinion dynamics would be very helpful. Especially for microtargeting effect on advertizement [3], the theory will be very significant to simulate such problems. The traditional approach of opinion dynamics has addressed this issue by taking an approach that continuously responds to changes in opinion, rather than a binary opinions approach that either agrees or disagrees. Unfortunately, a representative of the theory of opinion dynamics that continually processes opinion transitions is the bounded confidence model. These theories aimed at c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 13–28, 2021. https://doi.org/10.1007/978-3-030-55190-2_2
14
A. Ishii and N. Okano
consensus building [4–8]. Using these theories, the simulation only calculates assuming trust relationships between individuals. Therefore, the main purpose of these opinion dynamics is consensus building in society. However, in society, people’s opinions do not always reach consensus. Opinions can be divided or confused. This is a problem that cannot be handled by conventional opinion dynamics. Recently, Ishii et al. proposed a new theory of opinion dynamics that deals with human relationships of both trust and distrust [9,10]. Using this theory, for example, Ishii and Kawahata calculated the effect of charismatic person in the simulation [11]. Ishii and Okano [12,13] calculated the people who are untrusted by all in the society using the same new opinion dynamics. Therefore, we use Ishii’s new opinion dynamics to simulate the effects of mass media on society. The theory of Ishii et al. includes a term that includes the effect of the mass media, and utilizes that. Although Ishii et al. present this term as a theory, they have not been used in their research.
2
Theory
The dynamics of opinion, a theory for analyzing the process of reaching consensus within society (or within a small group), has been studied from different perspectives over the years. [14–19]. Attempts have also been made to apply game theory as opinion dynamics to consensus [20]. We can also refer to a comprehensive report on opinion mechanics theory [8]. However, in today’s society, a large amount of log data is left to exchange opinions. It goes without saying that in order to analyze these, we need a theory that can be used for quantitative analysis focusing on integration with analysis of large-scale data. The theory of opinion dynamics suitable for quantitative analysis can be roughly classified into two types. One way treats each person’s opinion as 1 and 0, or 1 and −1. It is easy to apply this binomial theory in such cases, as the referendum seen in US presidential elections, French presidential elections, Brexit, etc. is voted on one side or the other. The other method is a theory that regards opinions as one-dimensional (or multidimensional) continuous values. Phenomena such as consensus building are often thought of in this way. A typical example of the discrete binary theory is the theory that applies the Ising model of the magnetic physics theory in which the magnetic moments of electrons are only in two directions [21–23]. In addition, the local majority voting model that uses the concept of the renormalization group in theoretical physics theories the process by which two opinions, approval and disapproval, determine the local public opinion by the local majority vote [24,25]. As an application of this theory, Galam published an analysis of the referendum (BREXIT) [26], referendum on Britain’s departure from the EU and a study on the election of US President Trump [27]. The representative example of discrete binary theory is the theory that applies the Ising model of Galam’s magnetic physics theory . Moreover, a local majority decision model using the concept of renormalization groups from theoretical physics is also binary, consisting of approval and disapproval. Utilizing this
Mass Media Effect in Opinion Dynamics
15
theory, Galam published a study on the analysis of the referendum of Britain’s EU withdrawal (BREXIT) and the election of US President Trump . On the other hand, typical examples of the theory of opinion as continuous values include the Deffuant-Weisbuch model [4,5] and the Heselmann-Krause model [6] bounded confidence model. It is worth noting that opinions take consecutive values from 0 to 1, with 1 indicating consent and 0 indifferent. There are no dissenting assumptions there. Attempts to deal with the polarization resulting from disagreements were made as an extension of the Deffuant-Weisbuch model [28–30]. With the extension of the Deffuant-Weisbuch model, the influence of N people on each other is constant. This is determined by the degree of disagreement with each other, and does not mean that each person has his or her own trust or distrust of each other. In Ishii et al.’s work, [9,10] is based on the theoretical form of the Hegselmann-Krause model and is influenced by the exchange of N’s opinions. Here, they include the possibility of people distrusting each other and also include the influence of the mass media. For a fixed agent, say i, where 1 ≤ i ≤ N , we denote the agent’s opinion at time t by Ii (t). The person i can be affected by opinions of surrounding people. According to Hegselmann-Krause [6], opinion formation of agent i can be described as follows. Ii (t + 1) =
N
Dij Ij (t)
(1)
j=1
This can be written in the following form. ΔIi (t) =
N
Dij Ij (t)Δt
(2)
j=1
where Dij ≥ 0 for i, j, similar to the Hegselmann-Krause model. Based on this definition, Dij = 0 means that the views of agent i are unaffected by the views of agent j. The Hegselmann-Krause model implicitly hopes that consensus building is the ultimate goal of negotiations between people. However, in the real world, people do not always form consensus. There are many examples of such a consensus that cannot be formed in the international politics of world history, not to mention the American Civil War. Even on domestic affairs, there is no consensus between people pursuing economic development and those advocating for conservation, and agreement is difficult. Applying game theory can be difficult because it is impossible to define a payoff matrix for such serious political conflicts. Therefore, in order to deal with the problems that are difficult to form a consensus among these people, the theory of opinion dynamics needs to include the lack of trust among people. Considering the opinion on a one-dimensional axis, the value range of Ii (t) is −∞ ≤ Ii (t) ≤ +∞. Here we assume that Ii (t) > 0 means a positive opinion and Ii (t) < 0 means a negative opinion. In the restrictions of the Hegselmann-Krause model, 1/2 ≤ Ii (t) ≤ 1
16
A. Ishii and N. Okano
corresponds to a positive opinion and 0 ≤ Ii (t) ≤ 1/2 is a negative opinion. However, Ishii et al.’s work [9,10] decided to clearly show the confrontation of opinion by whether the opinion value is positive or negative. Ishii et al. change the meaning of the coefficient Dij as a confidence coefficient. Here, they assume Dij > 0 if there is a trust to other person and Dij < 0 if there is a distrust to other person. Based on the previous theory [10], here we decide to ignore opinions that are very far from the point of view without consent or repulsion. Also, person does not particularly affected by opinions that are very close to them. To include two effects, use the following function instead of Dij Ij (t) as follows: Dij Φ(Ii , Ij )(Ij (t) − Ii (t))
(3)
where Φ(Ii , Ij ) =
1 1 + exp(β(|Ii − Ij | − b))
(4)
This function is a well-known Sigmoid function, which is a smooth cutoff function at |Ii − Ij | = b. A typical graph of this function is shown in Fig. 1. With this function, if two opinions are too far apart, they are completely unaffected by each other’s opinions. Further, if the opinion Ij (t) is almost the same as the opinion Ii (t) due to the factor Ij (t) − Ii (t), then the opinion Ii (t) is not affected by the opinion Ij (t) . If Ii (t) and Ij (t) are both positive, or both are negative, or one is positive and the other is negative, Ij (t) − Ii (t) is thought to have the same effect due to the difference of two opinions. This is very natural. For example, even among conservatives there is intense debate between moderate and radical conservatives.
Fig. 1. The typical graph of the Sigmoid function (4) as smooth cut-off function.
On the other hand, the influence of the mass media on the opinion of the people is also necessary, and the influence of the statements of the mass media and the government is very important for public opinion formation and needs to be incorporated into opinion dynamics. Such mass media effects are also important in small group negotiations. The term of mass media influence introduced
Mass Media Effect in Opinion Dynamics
17
by Ishii et al. [9,10] is similar to Ishii’s model of hit phenomenon [31]. In this theory, popularity of a particular topic is analyzed using the sociophysical model Ishii et al. ref. [31] will introduce the same mass media effects. Let A(t) be the external pressure at time t and the reaction difference of each agent is represented by the coefficient ci . The coefficient ci has a different value for each person, and ci has a positive or negative value. If the coefficient ci is positive, the person i directs his opinion towards the mass media. Conversely, if the coefficient ci is negative, that person’s opinion changes in the opposite direction to that of the mass media. Using different ci for each individual corresponds to micro-targeting marketing. Therefore, including the above mass media effects into the frame work of opinion dynamics, the change in opinion of the agent can be expressed as follows. ΔIi (t) = ci A(t)Δt +
N
Dij Φ(Ii (t), Ij (t))(Ij (t) − Ii (t))Δt
(5)
j=1
We assume here that Dij is an asymmetric matrix; Dij and Dji , Dij = Dji and Dij and Dji can have different signs. Long-term behavior requires attenuation, which means that topics will be forgotten over time. Here we introduce exponential attenuation. The expression is as follows. ΔIi (t) = −αIi (t)Δt + ci A(t)Δt +
N
Dij Φ(Ii (t), Ij (t))(Ij (t) − Ii (t))Δt
(6)
j=1
3
Results
In this paper, we consider the mass media effect for opinion dynamics theory. The effect of mass media has been already included in the mathematical model of hit phenomena by A. Ishii et al. [31,32]. In their works, the mass media effect is introduced as the time of advertisement on television and it looks work well. Using the similar idea, we put the mass media effect as ci A(t) in the following opinion dynamics equation. In the actual simulation below, we assume that ci = 1 for simplicity. ΔIi (t) = ci A(t)Δt +
N
Dij Φ(Ii (t), Ij (t))(Ij (t) − Ii (t))Δt
(7)
j=1
As an example, the Fig. 2 shows an example calculated for 300 people using the opinion dynamics theory of this paper. Here, the mass media effect A(t) is a constant value that works equally for 300 people. The network that connects people is a random network with a link connection probability of 50%. The confidence coefficient matrix Dij values of each other for 300 people is determined by a uniform random number between −1 and 1. As you can see, people’s opinions vary to some extent from positive to negative.
18
A. Ishii and N. Okano
In this calculation, we set the mass media effect A(t) = 0, so that the opinion distribution in this future is positive and negative, with no bias. According to the calculation of 300 people, the trajectories of the opinions of 300 people are uniformly distributed in positive and negative, but it seems that they are in a certain equilibrium state as a whole. Also, as you can see from the calculated distributions, the calculations that start with a uniform opinion distribution with a small difference are somewhat spread and balanced, but the final opinion distribution is not uniform. The calculated distribution is divided into clusters of several opinion groups. Hereafter, we use the calculation result of Fig. 2 as the reference.
Fig. 2. Calculation result for N = 300. The human network is assumed to be random network with the link connection probability 0.5. The left is the time development of the trajectories of opinions. The right is the distribution of opinions at the final time of this calculation. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The mass media effect is assumed to be zero, A(t) = 0.
3.1
Constant Media Effect
First, we apply constant media effect to opinion dynamics. Namely, we assume that A(t) = Ac , here, Ac is the constant value. In Fig. 3, we show the calculation with same calculation condition to Fig. 2 but the media effect A(t) is no zero. Here, in Fig. 3, we assume that A(t) is 0, 0.5 and 5. Comparing A(t) = 0 to A(t) = 0.5, it can be seen that the opinion distribution of A(t) = 0.5 is slightly biased toward positive opinions. In case of A(t) = 5.0, the calculated opinion distribution is clearly skewed towards positive opinion. It means that in Ishii’s theory of opinion dynamics [10], it can be seen that the media effect qualitatively explains the phenomenon of biasing the opinion of society toward media-induced directions. Next, a media effect that induces a positive direction to those who have a positive opinion and a media effect that induces a negative direction to a person who has a negative opinion are introduced. In Fig. 4 shows the calculation result that The influence of the media is set to work at A = +5 for people with a positive opinion and at A = −5 for people with a negative opinion.
Mass Media Effect in Opinion Dynamics
19
Fig. 3. Calculation result for N = 300 for different mass media effect, A(t) = 0, 0.5 and 5.0. The human network is assumed to be random network with the link connection probability 0.5. The distribution of opinions at the final time of this calculation. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone.
20
A. Ishii and N. Okano
As can be seen from the calculation results, this media effect causes the opinions of people with positive opinions to be largely separated from those with negative opinions. Such phenomena can arise in political conflicts, but also in championships between two popular teams at major sporting events.
Fig. 4. Calculation result of the opinion distribution for N = 300. The influence of the media is set to work at A = +5 for people with a positive opinion and at A = −5 for people with a negative opinion. The human network is assumed to be random network with the link connection probability 0.5. The left is the time development of the trajectories of opinions. The right is the distribution of opinions are at the final time of this calculation. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone.
3.2
Media Effect for Weak Opinion People
Consider the case where the media effect works only for those who do not have strong opinions. Considering a certain threshold value A0 , let’s calculate it assuming that the media effect works only for people with weak opinions that are −A0 or more and A0 or less. Here, we set the working media effect is 5. This situation corresponds to very simple case of microtargeting. As shown in the Fig. 5, suppose that the media effect works at A = 5 only in the range of weak opinions near zero of opinions. ⎧ ⎨ 0 (Ii (t) > A0 ) A(t) = 5 (A0 > Ii (t) > −A0 ) (8) ⎩ 0 (A0 > Ii (t)) In Fig. 6, we show the calculation of the dependence of A0 for the opinion dynamics. The influence of the media is set to work only for people with weak opinions. The media is only available to those whose opinion ranges from −A0 to +A0 . A0 are set to be 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 and 9.0. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. In Fig. 7, we show the similar calculation for A0 = 10, 12, 15.
Mass Media Effect in Opinion Dynamics
21
Fig. 5. The influence of the media is set to work only for those with weak opinions. In this figure, the influence of the media reaches only those in the light pink area with opinions ranging from −10 to +10. The distribution drawn here is the same as Fig. 2.
Fig. 6. Calculation result of the opinion distribution for N = 300. The influence of the media is set to work at A = 5. The influence of the media is set to work only for those with weak opinions. The media is only available to those whose opinion ranges from −A0 to +A0 . A0 are set to be 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 and 9.0. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone.
22
A. Ishii and N. Okano
Looking at the calculation results of Fig. 6 and Fig. 7, especially when A0 is a large value, the number of people who have opinions near zero of the opinions tends to decrease. This is because the person having the opinion of the value is guided to the positive opinion. Especially for the case of A0 = 15, most of all people go to the positive opinion over 10.
Fig. 7. Calculation result of the opinion trajectory and the opinion distribution for N = 300. The influence of the media is set to work at A = 5. The influence of the media is set to work only for those with weak opinions. The media is only available to those whose opinion ranges from −A0 to +A0 . A0 are set to be 10, 12. and 15. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone.
Next, we consider the variation of the strength of media effect. In the calculation, we set N = 300 and the human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The media is only available to those whose opinion ranges from −10 to +10. Here we set that the influence of the media is A = 1, 5, 7 and 10. The calculated result is shown in Fig. 8. According to this calculation result, the distribution of opinions tends to be biased in the positive direction as the influence of the media effect is stronger. Next, calculation is performed while changing the probability of connection of nodes of the random network. In the calculation, we set N = 300. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The media is only available to those whose opinion ranges from −10 to +10. Here we set that the influence of the media is A = 5. We change the probability to connect to other nodes in random network to be 50%, 30%, 20%, 10%, 5% and 2%. The calculated result is shown in Fig. 10.
Mass Media Effect in Opinion Dynamics
23
Fig. 8. Calculation result of the opinion trajectory and the opinion distribution for N = 300. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The influence of the media is set to work only for those with weak opinions. The media is only available to those whose opinion ranges from −10 to +10. The influence of the media is set to work at A = 1, 5, 7 and 10.
Fig. 9. Calculation result of the opinion trajectory and the opinion distribution for N = 300. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The influence of the media is set to work only for those with weak opinions. The media is only available to those whose opinion ranges from −10 to +10. The influence of the media is set to work at A = 5. The human network is assumed to be random network with the link connection probability 0.5, 0.3, 0.2, 0.1, 0.05 and 0.02.
From this calculation, if the probability of connection between nodes is 2%, the distribution between −10 and 10 is clearly missing. On the other hand, if the connection probability is 50%, the part where the opinion distribution is missing is ambiguous. This means that when the connection probability is high, the effect of being influenced by the opinion of other persons is large, and the distribution of the opinion does not change as guided by the media effect.
24
3.3
A. Ishii and N. Okano
Convergence of Opinion by Media Effect
Finally, let’s calculate the convergence of people’s opinions into one particular opinion. Here, the following function is used as a media effect that converges where opinions are desired. This function below converges the whole opinion to I0 . (9) A(t) = − tanh(α(Ii (t) − I0 )) Shown in the Fig. 10 is the dependence on α. Here the opinions converge to zero, but we can see that the width of the converged opinion distribution depends on the value of α.
Fig. 10. Calculation result of the opinion trajectory and the opinion distribution for N = 300. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The media effect is set to be (a) A(t) = − tanh(0.02 ∗ Ii (t)) and (b) A(t) = − tanh(0.2 ∗ Ii (t)).
Figure 11 shows the calculation result of changing the value of I0 and concentrating the distribution of opinions at positions of 0, +10, and −10. From this, at least in terms of calculation, we can express that people’s opinions are freely guided by the media.
Mass Media Effect in Opinion Dynamics
25
Fig. 11. Calculation result of the opinion trajectory and the opinion distribution for N = 300. The human network is assumed to be random network with the link connection probability 0.5. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone. The media effect is set to be (a) A(t) = − tanh(0.2 ∗ Ii (t)), (b) A(t) = − tanh(0.2 ∗ (Ii (t) − 10)) and (c) A(t) = − tanh(0.2 ∗ (Ii (t) + 10)).
4
Discussion
A study to investigate the effects of media using a new theory of opinion dynamics that incorporates both trust and mistrust into human relationships has shown some computational results above, which we will consider. In the above simulations, the number of persons is set to 300, and the coefficient of reliability Dij connecting the persons is set to a random number between 1 and −1. Thus, the probability of Dij taking a positive value and the probability of taking a negative value is 50% each. A person-to-person connection is a random network in which a part of the complete graph is actually connected. Except Fig. 9, we set the probability of link between nodes to be 50%. Figure 3 is the result of simulation calculation for the effect of media that works uniformly on people. The results show us that the stronger the effect of the media, the more skewed the opinion of people in the society is towards the media. In Fig. 4, it has been shown that the influence of the media induces people with positive opinions to be more positive, and those who have negative opinions to be more negative, resulting in a division of social opinion. The case where the media effect works only for those who do not have strong opinions is very interesting situation for sociology, because the mass media is mainly used for such weak opinion people. Usually, people with strong opinions are not affected by the mass media In the calculation of Fig. 6, Fig. 7, Fig. 8 and Fig. 9, we simulate the case that the mass media works only on people having only weak opinion. We set a certain threshold value A0 , let’s calculate it assuming
26
A. Ishii and N. Okano
that the media effect works only for people with weak opinions that are −A0 or more and A0 or less. For the case of Fig. 6 and Fig. 7, we check the dependence on the threshold value A0 . If the threshold is very small like A0 = 1, the mass media effect seems to be negligible. However, in the case of large threshold value like A0 = 15 in Fig. 7, the most of all opinion of people move to the positive direction and the people having negative opinion is very few. It means that, even those who have strong negative opinions fluctuate in communication with other people, and if they fall into the range of weak opinions, they will be guided to positive opinions by the influence of the mass media working there. It is confirmed by the results show in Fig. 9. In Fig. 9, we check the dependence on the probability to link between nodes in random network we use here. If the probability of connecting notes is low, people’s communication becomes sparse and people are not affected by the opinions of others. Therefore, if the probability is 2% in the figure, only the opinion distribution from −A0 to A0 is guided by the mass media and moves toward the positive opinion. However, if the probability is 30% or 50%, the opinion of a person who has a negative opinion fluctuates under the influence of the opinion of other people and falls into the range of weak opinions, so the mass media working in that range Influenced by positive opinions. If we set suitable value for each ci separately depend on the property of each person, the simulation is just the simulation of micro targeting [3] which is discussing now mainly as problems of political subject. Thus, our theory can be used to simulations for both progressing microtargeting or prevent microtargeting. In Fig. 10 and Fig. 11, we try calculations to converge the opinions of people to the desired opinions. The used function to converge the whole opinion to I0 is the simple function showing below. A(t) = − tanh(α(Ii (t) − I0 ))
(10)
In the real world, it is difficult to converge people freely on a particular opinion like this function. However, as an application of the theory used in this paper, such a derivation that draws people to a particular opinion can be calculated. It can be applied to cases where the people are led to the government propaganda.
5
Conclusion
This paper examines the impact of the mass media on society, using a new theory of opinion dynamics that includes both trust and distrust in human relationships. It is shown that when the influence of the mass media affects the society uniformly, the opinion distribution of people is biased toward the direction guided by the media. The effect of the media on those who have only weak opinions can change their opinion in the direction that the media leads. This can be applied as model of microtargeting. In addition, it is shown that when the connection between people is strong, people who have opinions contrary to the media are
Mass Media Effect in Opinion Dynamics
27
guided to some extent. In addition, it was shown that it is possible to converge the opinions of people to the opinions as intended. Elucidation of the mechanism of media effects based on the theory of opinion dynamics presented in this paper will provide a computational social science method that is useful in researching society and mass media. Acknowledgment. This work was supported by JSPS KAKENHI Grant Number JP19K04881.
References 1. Baum, M.B., Potter, P.B.K.: The relationships between mass media, public opinion, and foreign policy: toward a theoretical synthesis. Annu. Rev. Polit. Sci. 11, 39–65 (2008) 2. McCombs, M.: Setting the Agenda: Mass Media and Public Opinion. John Wiley & Sons, New York (2018) 3. Barbu, O.: Advertizing, microtargeting and social media. Procedia - Soc. Behav. Sci. 163, 44–49 (2014) 4. Deffuant, G., Neau, D., Fr´ed´eric, A., G´erard, W.: Mixing beliefs among interacting agents. Adv. Complex Syst. 3(15), 87–98 (2000) 5. Weisbuch, G., Deffuant, G., Amblard, F., Nadal, J.P.: Meet, discuss and segregate!. Complexity 7(3), 55–63 (2002) 6. Hegselmann, R., Krause, U.: Opinion dynamics and bounded confidence models, analysis, and simulation. J. Artif. Soc. Soc. Simul. 5(3) (2002) 7. Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81(2), 591–646 (2009) 8. Sˆırbu, A., Loreto, V., Servedio, V.D.P., Tria, F.: Opinion dynamics: models, extensions and external effects. In: Loreto, V., et al. (eds.) Participatory Sensing, Opinions and Collective Awareness. Understanding Complex Systems, 42 pp. Springer, Cham (2017) 9. Ishii, A., Kawahata, Y.: Opinion dynamics theory for analysis of consensus formation and division of opinion on the internet. In: Proceedings of The 22nd Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES2018), pp. 71–76 (2018) arXiv:1812.11845 [physics.soc-ph] 10. Ishii, A.: Opinion dynamics theory considering trust and suspicion in human relations. In: Morais, D.C., Carreras, A., de Almeida, A.T., Vetschera, R. (eds.) GDN 2019. LNBIP, vol. 351, pp. 193–204. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-21711-2 15 11. Ishii, A., Kawahata, Y.: Opinion dynamics theory considering interpersonal relationship of trust and distrust and media effects. In: The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, 33 JSAI2019 2F3-OS-5a-05 (2019) 12. Okano, N., Ishii, A.: Sociophysics approach of simulation of charismatic person and distrusted people in society using opinion dynamics. In: Sato, H., Iwanaga, S., Ishii, A. (eds.) Proceedings of the 23rd Asia-Pacific Symposium on Intelligent and Evolutionary Systems, pp. 238–252. Springer (2019) 13. Okano, N., Ishii, A.: Isolated, untrusted people in society and charismatic person using opinion dynamics. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2019, Thessaloniki, Greece (2019)
28
A. Ishii and N. Okano
14. French, J.R.P.: A formal theory of social power. Psychol. Rev. 63(3), 181–194 (1956) 15. Harary, F.: A criterion for unanimity in French as theory of social power. In: Cartwright, D. (ed.) Studies in Social Power. Institute for Social Research, Ann Arbor (1959) 16. Abelson, R.P.: Mathematical models of the distribution of attitudes under controversy. In: Frederiksen, N., Gulliksen, H. (Eds.): Contributions to Mathematical Psychology. Holt, Rinehart, and Winston, New York (1964) 17. De Groot, M.H.: Reaching a consensus. J. Amer. Statist. Assoc. 69, 118–121 (1974) 18. Lehrer, K.: Social consensus and rational agnoiology. Synthese 31, 141–160 (1975) 19. Chatterjee, S.: Reaching a consensus: Some limit theorems. In: Proc. Int. Statist. Inst. 159–164 (1975) 20. Tanimoto, K., Kita, H., Mitukuni, A.: Opinion choice model in public meeting by using evolutionary game theory infrastructure. Plan. Rev. 18, 89–95 (2001) 21. Galam, S.: Rational group decision making: a random field Ising model at T = 0. Physics A 238, 66 (1997) 22. Sznajd-Weron, K., Sznajd, J.: Opinion evolution in closed community. Int. J. Mod. Phys. C 11(6), 1157–1165 (2000) 23. Sznajd-Weron, K., Tabiszewski, M., Timpanaro, A.M.: Phase transition in the Sznajd model with independence. Europhys. Lett. 96(4), 48002 (2011) 24. Galam, S.: Application of statistical physics to politics. Phys. A: Stat. Mech. Appl. 274, 132–139 (1999) 25. Galam, S.: Real space renormalization group and totalitarian paradox of majority rule voting. Phys. A: Stat. Mech. App. 285, 66–76 (2000) 26. Galam, S.: Are referendums a mechanism to turn our prejudices into rational choices? An unfortunate answer from sociophysics. In: Morel, L., Qvortrup, M. (eds.) The Routledge Handbook to Referendums and Direct Democracy. Taylor & Francis, London (2017). Chapter19 27. Galam, S.: The Trump phenomenon: an explanation from sociophysics. Int. J. Mod. Phys. B 31, 1742015 (2017) 28. Jager, W., Amblard, F.: Uniformity, bipolarization and pluriformity captured as generic stylized behavior with an agent-based simulation model of attitude change. Comput. Math. Organ. Theory 10, 295–303 (2004) 29. Jager, W., Amblard, F.: Multiple attitude dynamics in large populations. In: Presented in the Agent 2005 Conference on Generative Social Processes, Models and Mechanisms, 13–15 October 2005 at The University of Chicago (2005) 30. Kurmyshev, E., Ju´ arez, H.A., Gonz´ alez-Silva, R.A.: Dynamics of bounded confidence opinion in heterogeneous social networks: concord against partial antagonism. Physics A 390, 2945–2955 (2011) 31. Ishii, A., Arakaki, H., Matsuda, N., Umemura, S., Urushidani, T., Yamagata, N., Yoshida, N.: The ‘hit’ phenomenon: a mathematical model of human dynamics interactions as a stochastic process. New J. Phys. 14, 063018 (2012) 32. Ishii, A., Kawahata, Y.: Sociophysics analysis of the dynamics of peoples’ interests in society. Front. Phys. (2018) https://doi.org/10.3389/fphy.2018.00089
ORTIA: An Algorithm to Improve Quality of Experience in HTTP Adaptive Bitrate Streaming Sessions Usman Sharif1(B) , Adnan N. Qureshi2 , and Seemal Afza2 1 Punjab University, University of Central Punjab, Lahore, Pakistan
[email protected] 2 University of Central Punjab, Lahore, Pakistan
Abstract. Adaptive Bitrate (ABR) is used at large scale in online video streaming to improve viewer perception. The advanced online streaming process utilizes adaptive bitrate adaptation algorithms that works in video-players. To improve viewer’s quality of experience, ABR algorithm has to select high quality video as per the available network throughput and transmit with minimal stops and lowbitrate fluctuation. Although, current ABR algorithms suffers from stops, and low bitrate fluctuations because they don’t consider the influence of the underlying atypical value also known as outlier in statistics during throughput prediction. Here we propose a new ABR Outlier-Removal-based Throughput Improvement Algorithm (OTRIA), that provides a realistic forecast of throughput in changing network circumstances, thus minimizing stops and bitrate fluctuations. We used various statistical methods to detect outliers. We used the method of Inter-QuartileRange to identify a range of throughput values that differs significantly from the other values in dataset and utilized this tool to maximize the precision for the predictions of throughput. Results from real-time experimentation have shown the elimination of stops during the implementation of our algorithm. In addition, a comparison is made with the latest generation ABR algorithm named DYNAMIC to exhibit that the ORTIA-algorithm is superior to the current ABR algorithm. Simply put, our algorithm delivers an excellent viewer’s Quality of Experience and aperture for adaptive transmission of real video. It is noteworthy that our algorithm beats the DYNAMIC, which is now officially the part of DASH-IF reference player and is used by video content supplier in production atmosphere. Keywords: Outliers · Adaptive video streaming · Bitrate adaptation · DASH
1 Introduction Online video production companies are growing rapidly. This has led to a report of high internet activity. Online video streaming is linked to traditional TV video companies in terms of content usage. HTTP Adaptive Streaming (HAS) is currently the primary technology for video transmission over the Internet. Adaptive streaming changed the shape of internet, the time of utilization of video content on a (TV) set, is drawing © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 29–44, 2021. https://doi.org/10.1007/978-3-030-55190-2_3
30
U. Sharif et al.
nearer to evaporate as time passes. Instead of conventional television programs, another model for the playback of videos on mobile phones or tablets has emerged. Truth be told, another routine has been set up that focuses the concept of watching whatever one wants, whenever one wants and where one wants [1, 2]. This transformation is upheld through huge increment in portable cell organizations and establishment of 4th generation (4G) and 5th Generation (5G) mobile networking which are equipped for supporting versatile recordings on request with bearable latency. Discussed facts are reinforced through a report issued by Cisco, which states that the wired devices are responsible for 48% of the rate of internet traffic which is less than half of it. In addition, it is predicted that wireless devices will consume more traffic than wired devices compared to 2017–2022. Moreover, it is also expected that remote and versatile video traffic will be generating tremendous traffic by 2022, compared to 2017. To put it in two words, it will be around 71% of all Internet traffic. IP video traffic will be equal to 82% of overall IP traffic and will comprise mostly video content [3]. Video streaming has never been an encouraging process because Web has never been designed from the perspective of running applications that need a tight check over the Quality of Service (QoS). Back in 2013, around 26.9% of video streaming over the Internet confronted playback disturbance. Moreover, more than 43.3% of the sessions experienced low resolution and around 4.8% could not initiate [4]. Viewers are exposed to channel constraints, especially for cellular networks, such as changing the quality of service from interruptions, cross talk and multipath blurring. Considering that the quality of the viewer’s experience can be enhanced through a bitrate adjustment, we are fascinated towards bitrate adjustment based on throughput. Specifically, we performed empirical analyses of the bitrate adaptation algorithm which is dependent on network throughput. Since there is high-level of adaptive transmission activity that is persistently able to change the factors of transmission medium as per the changing network conditions, therefore, the impression of viewing is elegant with fewer interruptions during playback and more intelligent use of present network resources. Adaptation of the bitrate requires that each video be divided into periods of seconds, in which the content of the period is encoded with different codecs to obtain different video qualities recorded in the MPD file and stored on the server, every index is defined by the average of its associated bitrate and is later used by the client to select quality index against average bitrate. In particular, HTTP Based Adaptive Streaming (HAS) [5] is thought of to be the transmission standard over the Internet. One of the business guidelines is an open source Dynamic Adaptive Streaming over HTTP also known as MPEG-DASH [6, 7]. The DASH structural design is shown in Fig. 1. The HAS player launches a new video through loading manifest file (MPD). The manifest file contains information about the quality indices of the videos contained by the server along with the time of the video file and its periods. The HAS player estimates the next segment’s quality index based on the state of the network and the playback buffer filling level. Many writers identify this mechanism as the rate-prediction algorithm. The goal of rate prediction algorithm is to upgrade viewer’s experience of video watch over internet.
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
31
The concept of quality of experience (QoE) was introduced so that various phenomena that affect a viewer’s perception of filmed content are available for an impartial evaluation process [8]. While maintaining a minimum re-buffering and video stalls, delivering high bitrate quality is the key objective of QoE. Major factors that affect QoE are listed below: (1) Number of pauses (2) Number of bitrate switches (3) Average bitrate It is commanding that these three influencing elements cannot be considered freely and must be handled simultaneously. For example, the constant choice of the most reduced quality video lessens playback pauses, bitrate switches and takes into account the least-playback latency. On the other hand, the consistent choice of the most noticeable video quality usually leads to an unacceptably large number of playback interruptions. So far, many researches are done to improve the QoE. Many of the studies done before were focused on “Bitrate Adaptation”.
Fig. 1. System architecture of MPEG-DASH [10].
Bitrate adaption are partitioned into three principle classes: 1) Client-based 2) Server-based 3) Network-assisted
32
U. Sharif et al.
In current papers, a significant part of literature discussed that the Client based adaption as the most effective way to improve the viewer’s QoE, as indicated in the DASH standard [9]. These methods are essentially adapted to meet the requirements of varying network conditions by measuring the data transfer capacity of a link. This method considers several input variables: Two of which include the current bandwidth and the buffer size. The player uses different information factors for the algorithm to pick the proper representation for the next section to be downloaded. Such methods help avoiding transmission issues, such as video stops, frequent video quality changes and the buffer blankness during a video transmission session. These methods work to achieve (1) (2) (3) (4)
Insignificant re-buffering events on depletion of playback buffers Minimal startup delay in videos High average bitrate quality compared to the network’s bandwidth Low fluctuations in video as a result of frequent transitions.
An IEEE survey document [10] released in early 2019 has given a summary of results for the above classes. We also divide client-based bitrate adjustment calculations into two fundamental classes: 1) Available Throughput-Based Adaptation 2) Playback Buffer-Based Adaptation. In the calculation method based on Throughput, the client determine the bit rate based on the network data transmission capacity. While in the playback adaption, the media player uses the size of advance buffer-fill as an information variable. For filled buffer size there is a cross-ponding bitrate representation that is used for next segment selection. DYNAMIC [11] is a half breed form of the buffer-based and throughput based algorithm. DYNAMIC pulls in a great deal of consideration from researchers and software houses. This is viewed as the benchmark in numerous IEEE explore publications. DYNAMIC is perceived as modern for rate adaption at large scale and has likewise been executed by the DASH industry. To fairly compare our proposed Outlier-Removal based Throughput Improvement Algorithm (ORTIA) with DYNAMIC, we used the same environment. The average bit rate of the last N segments is calculated in the bitrate adaption algorithm of the DASH-IF reference player and this value is multiplied by 0.9 for safety purpose and the size of N relies on the status of the network, its minimum value is four and the maximum can be equal to number of downloaded segments. The algorithm for measurement of N is given underneath.
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
33
Consider a network situation as appeared in Fig. 2(a) where a system is in a steady condition and it abruptly begins to shift. Afterward, the system gets steady and starts varying sooner or later. In such situations, throughput does not show appropriate image of network capacity. Subsequently, DYNAMIC and related algorithms have various explicit lacks, for example, Stops or pauses and bitrate switches because of the conflict between the realistic throughput and the determined throughput. As such, there is a great deal of space for new calculations that can fill these holes. Considering the above deficiencies, we would like to offer a modified DYNAMIC a robust computation that can provide better QoE by minimizing stalls and bit rate switches under such network conditions. Below are ten levels of video quality to represent different levels of throughput as shown in Fig. 3. The criteria where the average bit point meets 9914 kbps or exceeds indicates the 9th video quality index is being downloaded while in case the average bit point meets or exceeds 4953 kbps but less than or equal to 9913 kbps represents the selected video quality index is 8 and in the same pattern down to video level 0. According to the network throughput in Fig. 2(a) the average method is used for predicting quality of next slice to be downloaded, the concerning method is already used in MPEG-DASH player [6, 7]. Figure 2(b) represents the calculated average bit rate after each segment downloaded while Table 1 indicates the selected video quality association with bit rate. Figure 2(a) shows the considerably increased bandwidth and simultaneously improved average bit rate and level of video quality after the segment 5 download. Right after 5th segment network reinstated to the previous status where the bandwidth is low but the average bitrate function indicates the high bit rate value. In segment 6 you can review the contrast with actual results follows by facts where average number of series is limited by the lowest numbers. So, because of large numbers appearance in number series the average will be sharp contrast due to anomalies presence. The existence of anomalies does not reveal the real throughput and answers in a high throughput and that is not feasible. That is why an end user is confronted with a stop when playing video. As per the network information provided in Fig. 2(a) above, we utilize the normal strategy utilized in the MPEG-DASH player [6, 7] to predict the next segment’s quality. After each segment that is downloaded, the estimation of the average bitrate is depicted in Fig. 2(b) and degree of the chosen video quality in connection to the bit rate is shown in Table 1. This shows that when the fifth segment is downloaded, the bandwidth
34
U. Sharif et al.
a)
1600 1400 1200 1000 800 600 400 200 0 1
2
3
4
5
6
Avg. through-put (Kbps)
7
8
9
10
Safe through-put (Kbps)
b) Fig. 2. a) Throughput received against the segment b) Shows the average throughput and safe throughput for the next segment against the downloaded segment.
is impressively expanded, along these lines at the same time improving the average bitrate and the degree of video quality. After the fifth segment download organize is restored to the past status, which is a low data transmission, yet our average bitrate function demonstrates a high bitrate esteem. It is evident against the 6th slice in grid 2, which contrasts with the actual current. This follows that progression numbers average is constrained by the most minimal number and bigger as compared to the higher number. So if an enormous number is remembered for an average of numbers, the average will be in sharp complexity to other numbers because of the nearness of anomalies. Thus,
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
35
Table 1. Table represents the video quality index chosen from Fig. 3 slabs against the safe throughput predicted at every segment. Seg. # Quality index 1
0
2
0
3
0
4
0
5
4
6
3
7
3
8
3
9
3
10
0
Fig. 3. Shows throughput slabs and video quality index respectively
until there are exceptions in the frame, averaging the numbers won’t mirror the genuine current and will answer a gigantic current that isn’t achievable. In our table specifically a similar situation happens from the sixth to the ninth portion, which isn’t sensible. That is the reason an end user is gone up against with a stop when playing video.
36
U. Sharif et al.
If we observe this scenario, it can be concluded that the speed of 6000 kbps disturbs the throughput’s average. If the segment 7 or higher are studied, it can be easily concluded that if the outliers of the average formula are ignored, the average can be approximated to 200 that is closer to the current throughput. Hence, we cannot solve this problem by only employing the weighted average sliding or average sliding window. The previously employed methodology in literature is said to be the average sliding window. However, this scheme does not produce required results through the use of one or more outliers. In addition, the weighted averaging technique is not optimal to be used in similar cases. For example, if heavyweights are assigned to previous sections, the outlier will interfere with the average, but the effect will be visible until the window ends. Moreover, if heavyweights are assigned to the most recent sections and lesser weights to the starting sections in the window then the average is not expected to raise instantly. But if the exception gets connected with the most recent section as the window moves, then the weight gets heavier and the average begins to increase as opposed to the actual throughput.
2 Methodology We presently propose another system to diminish the impact of exceptions. The ideal procedure for limiting the tremendous disruption on averaging due to the appearance of exceptions is to overlook the anomalies of the data in view of resources. We are offering a customary meaning for outliers here. “An outlier is an exceptional value which differs considerably from all other members from a sample in the data set” [12]. The accompanying strategies are viewed as the most widely recognized procedures for discovering outliers and these include: (i) Z-Score (ii) Inter Quartile Range (IQR). 2.1 Z-Score A Z-Score is a numeric measure employed in the areas of statistics for analyzing the relationship between a value and the mean of the group of values. It is measured in the terms of standard deviations. If the Z-score comes out to be 0, it demonstrates the equivalence of point scores to the average. However, a Z-score of 1.0 illustrates the difference of a number is a standard deviation from the mean. Z-scores can be positive or negative, with a positive integer showing the score to be higher than average and a negative one demonstrating that it is underneath the average. The formula of Z-score is given below in terms of average, data point and standard deviation. zi =
xi − x¯ s
Where zi stands for Z-score of the ith data point in the given set, xi stands for the ith value of data point, x¯ shows the mean of a data set and s stands for standard deviation. Any Z-score of three or higher and lesser than negative three is treated as a special case. These standard instructions are based on the empirical rule. As per the standards, practically whole information (approx. 99.7%) should lie within the three standard deviations from the average. With a probability of 99.7%, the data in the normally distributed data lie actually within the 3 standard deviations, as depicted in Fig. 4.
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
37
Fig. 4. Normal distribution curve showing percentages of 1st , 2nd and 3rd deviations of data from the mean.
2.2 Inter Quartile Range The IQR is a measure of variability, based on the division of a set of data into quartiles. Quartiles partition a ranking data set into four equal parts. The values that separate the parts are called the first, second, and third quartiles; and they are designated Q1, Q2 and Q3 respectively. • • • •
Q1 is the “average” value in the first 50% of the ordered data set. Q2 is the median value of the set. Q3 is the “average” value in the second 50% of the ordered data set. IQR = Q3 − Q1
Note: The IQR shows how the median information is distributed. IQR Rules for exceptions: The IQR can be used to recognize differences. We should simply: Get a number that is one and half times the value from inter quartile range (IQR) and sum this value to the 3rd quartile Q3. Any higher number is a speculated exception and furthermore minus this number from the 1st quartile Q1 and all numbers lesser than it is an alleged anomaly. Remember this is a general rule. Now, let us apply the same rule to the following scenario for finding exceptions. Table 3 depicts the outcomes of finding exceptions using the Z-score method and Table 4 depicts the outcomes of the IQR technique. In the Z-score technique, the exception is identified when the score is less than negative three or more than three and is marked in the table. Thus, in this technique, an exception is detectable with a score which has a value greater than three. The Z-score is calculated based on the arrival of new segments and it can be seen in Table 3. At the arrival of the fourth segment in Table 3, no outlier was pointed out because the size of the Z-score was less than three and greater than negative three for every point. However, it was detected at the arrival of the thirteen segment when we calculated the Z-scores for it, but detection is not worth it in this segment. This is due to the fact that the effect of the fourth section is not included in the formula for calculating the average throughput at this point. In other words, it does not lie in average window.
38
U. Sharif et al.
In Table 4, the IQR procedure is utilized to distinguish anomalies. It is clear that all anomalies are successfully detected. This is on the grounds that in every scenario the outlier crosses the range that is restricted by the lower and furthest cutoff values. To be more specific, every time another segment shows up, we play out the estimations of the lower and upper bounds. We at that point match all throughputs of the segments in the range constrained by the lower and upper bounds to find outlier. Presently, when we analyze these consequences of distinguishing anomaly from z-score and inter quartile range technique. The Z-score technique just recognizes the throughput of fourth section as an anomaly later, giving no favorable position. It doesn’t identify the values of the fourteenth and fifteenth segments which are likewise exceptions because of their gigantic size contrasted with the remainder of the information. Specifically, it distinguishes the information point as an anomaly when it is excluded from a window used to ascertain the average. So, the Z-score strategy has no favorable circumstances. Then again, the IQR technique identifies the estimations of the fourth, fourteenth and fifteenth segments as exceptions quickly: it distinguishes them when the data point of the anomalies show up. The IQR technique in this way gives the best outcome contrasted with the Z-score. Table 2. Average throughputs of different methods for each segment. Seg. #
Throughput (kbps)
Average
Average Z-score
Average IQR
1
100
100
100
100
2
100
100
100
100
3
200
133
133
133
4
5000
1350
1350
133
5
400
1160
1160
200
6
300
1017
1017
220
7
800
1133
1133
317
8
600
1217
1217
400
9
400
1250
1250
450
10
500
500
500
500
11
450
508
508
508
13
200
492
492
492
14
5000
1136
1136
464
15
6000
1879
1879
464
16
200
1669
1669
431
17
250
1625
1625
425
Table 2 shows that the Z-score technique doesn’t show visible improvements as compared to the DYNAMIC averaging strategy. IQR averaging strategy shows visible
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
39
improvements as compared to the two strategies. We pick IQR strategy due to flexibility and heartiness to network throughput exceptions. It merits watching that simple average, weighted average can’t handle this situation, so we present another calculation Outlier-Removal based Throughput Improvement Algorithm (ORTIA) as appeared in Fig. 5. In this procedure before getting average of last N segment, remove anomalies. This system beats the best method, planted in Dash reference player, in results when matched. ORTIA works by getting a throughput history array as input and in the first step it calculates “N”, the latest number of throughput observations, from Algorithm 1. In the second step, it removes the outlier from the throughput history array. In the third step, it shortlists the throughput array by latest N observations and finally calculates and returns the average of shortlisted set. 2.3 Test Environment Work station with NetLimiter 4 Pro and a strong internet speed for the transmission capacity control is employed for playing “Huge Buck Bunny” video from a digital video collection library [14] which accessible online at the following URL: https://dash.aka maized.net/akamai/bbb_30fps/bbb_30fps.mpd. Note: we have played the video multiple times on various systems, for example in the workplace, at home and at the college and the outcomes exhibited beneath are barring the beginning of 30 s of every session. System transmission capacity is constrained by as far as possible at 30 kbps and is haphazardly opened and limited again at 30 kbps by NetLimiter 4 Pro program. We collected the results after playing the video in the same state of the system for two calculations, in which we kept the overall quality of the video session for iteration of each algorithm, the DYNAMIC method and ORTIA method. We have saved the average video quality in each iteration and the number of stops for each iteration, as well as the overall quality of the video sessions and the average number of stops in the sessions in the form of Table 5.
3 Results and Discussion The test outcomes accumulated after playing the video at various places and exhibited here for investigation. There are a few factors that we consider in Table 5 to make a fair comparison under real life circumstances. Specifically, we compare the number of stops due to the ORTIA algorithm with the number of stops due to the DYNAMIC algorithm based on a naïve average. Table 5 shows that the video quality of a session remains unchanged in both methods in a peak throughput network limited by outliers. However, stops have been considerably reduced by the use of the ORTIA algorithm when compared with DYNAMIC algorithm. This betterment is achieved by suppressing the outlier under variable network conditions. Precisely, the average of stops using the DYNAMIC algorithm is almost 5.6, which is 6 roughly, while the proposed algorithm ORTIA offers only 2 stops on average.
U. Sharif et al. 40
Table 3. Segments with their Throughputs and Z-score values against each segment. S. #
Throughput
-0.51 -0.49
-0.49 -0.47
-0.47
-0.44
-0.5
-0.5
-0.44
-0.5
-0.5
-0.43
-0.5
-0.5
2.81
-0.43
-0.5
-0.5
2.98
-0.43
-0.5
-0.5
3.14 2.24 1.67 1.75
-0.41 -0.5 -0.58 -0.55 -0.56
-0.48 -0.56 -0.62 -0.6
-0.48 -0.56 -0.62 -0.6
-0.61
-0.61
Z Score -0.58 -0.51 -0.42
2.64
-0.26 -0.39 -0.48 -0.46 -0.46
(Kbps) 100 -0.58 -0.45
2.45
-0.29
-0.51
1.75
1 100 -0.47
2.25
-0.3
2 1.15 2.04
-0.31
-0.34 -0.45 -0.53 -0.5
200 1.79
-0.32
-0.36
3 1.5
-0.33
-0.37
5000
-0.32
-0.37
0.03 -0.16 -0.35 -0.26 -0.27
4 -0.35
-0.38
0
400
-0.38
-0.03
5
-0.37
-0.05
300
-0.08
-0.11 -0.27 -0.39 -0.36 -0.37
6
-0.1
-0.15
800
-0.16
7
-0.18
600
-0.2
8
-0.26 -0.39 -0.48 -0.46 -0.46
400
-0.29
9 500
-0.41 -0.5 -0.58 -0.55 -0.56
-0.3
10 450
1.75
-0.31
11 200
2.24 1.67 1.75
2.24
-0.19 -0.33 -0.44 -0.41 -0.41
13 5000
2.14 2.23
-0.22
14 6000
-0.55 -0.56
-0.23
15
200
-0.53
-0.22 -0.36 -0.46 -0.43 -0.44
16
250
-0.25
17
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
41
Table 4. Segments with their throughputs, lower and upper cut off values against each segment. 1 represents outlier and 0 represents non outlier. S. #
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
Outlier
0
0
0
1
0
Throughput
0
0
0
1
0
(Kbps)
100
0
0
1
0
1
100
0
1
0
2
0 1
0
200
1
0
3
1 0
5000
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
400
0
0
0
300
0
0
0
5
0
0
0
6
800
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
7
400
600
0
0
0
1
8
500
9
450
225
250
10
200
475
700
11
225
500
700
-487.5
0
200
525
750
1412.5
1
600
-550
0
200
400
1450
1
525
-562.5
1
250
325
1537.5
0
550
-400
0
225
300
1012.5 1200
-287.5
1
575
-200
200
350
1000
5000
200
-300
13
175
400
600
1100
14
650
-400
0
150
475
1200
1
600
-537.5
0
125
450
1362.5
1
100
250
375
-525
1
100
300
400
1275
200
1400
750
-250
6000
100
1300
850
-350
15
50
150
3350
-1850
16
Q3
25
17
IQR
225
Q1
Upper cutoff value
Lower Cut off value
The proposed algorithm in this way offers a 64.28% decrease in the quantity of stops. Hence, the overall service quality has been considerably improved in scarce networks. The average quality of video sessions remained unaltered. This is due to the job of insufficient-buffer rule, in which the video stream of high-quality is called due to a bitrate anomaly by DYNAMIC [11], it is dropped on the grounds that the client player actually gets low piece of throughput and the advanced buffer is finished before the download
42
U. Sharif et al.
Fig. 5. Outlier-Removal based Throughput Improvement Algorithm (ORTIA)
is complete, the default value of advance buffer is 20 s as employed in dash.js [13]. So, before the segment is fully downloaded when an insufficient buffer rule calculates the freed up memory, it is canceled and a low-quality segment is called by this rule to prevent rebuffing. This does not change the quality, because it is not sent to the media
ORTIA: An Algorithm to Improve QoE in HTTP ABR Streaming Sessions
43
Table 5. Video sessions along with video quality level and number of stops occurring in each iteration utilizing best in class strategy and our own. Sr.#
Video quality of Video quality Stops using naive Stops using ORTIA session (kbps) using of session using average naive average ORTIA (kbps)
1
200
200
7
2
2
200
200
5
2
3
200
200
6
2
4
200
200
4
2
5
200
200
4
1
6
200
200
5
2
7
200
200
5
2
8
200
200
6
2
9
200
200
6
3
10
200
200
5
1
11
200
200
5
2
12
200
200
5
2
13
200
200
4
1
14
200
200
9
3
15
200
200
8
3
Average
200
200
5.6
2
player for on-screen watch, so the overall quality of the session remains unaltered in DYNAMIC and ORTIA.
4 Conclusion OTRIA outperforms the DYNAMIC in the occurrence of anomalies and we can also establish that in the absence of exceptions OTRIA and the DYNAMIC method perform in similar ways because the two approaches give the same results for throughput. That is why it is recommended that online media-player production industry (You tube, Sling, Netflix, Hulu, etc.) should employ the suggested technique to improve the end user experience to maximum capacity. This will in particular enhance the QoE of viewers from underdeveloped countries such as Bangladesh, India and Pakistan etc., where the broadband Internet framework does not exist. In particular, this will enhance the QoE of viewers who are in an area where broadband service isn’t available. Although, the proposed algorithm is based on JAVASCRIPT, it also applies to other environments supporting adaptive streaming such as Microsoft Smooth Streaming and Apple HLS. We are also working on new method proposed in SARA along with predicting average size of segment to be downloaded. This is the basis of our future work.
44
U. Sharif et al.
References 1. ComScore: U.S.: Digital Future in Focus. White Paper, comScore, Inc., Reston, VA (2014) 2. Conviva: Internet TV: Bringing Control to Chaos. White Paper, Conviva, Foster City, CA (2015) 3. Cisco Visual Networking Index: Forecast and Trends, 2017–2022 White Paper 4. Conviva: Viewer Experience Report. White Paper, Conviva, Foster City, CA, 2014 5. Andelin, T., Chetty, V., Harbaugh, D., Warnick, S., Zappala, D.: Quality selection for dynamic adaptive streaming over HTTP with scalable video coding. In: Proceedings of the 3rd Multimedia Systems Conference (MMSys), pp. 149–154 (2012) 6. Rainer, B., Petscharnig, S., Timmerer, C., Hellwagner, H.: Statistically indifferent quality variation: an approach for reducing multimedia distribution cost for adaptive video streaming services. IEEE Trans. Multimedia 19(4), 849–860 (2017) 7. Li, Z., et al.: Probe and adapt: rate adaptation for HTTP video streaming at scale. IEEE J. Sel. Areas Commun. 32(4), 719–733 (2014) 8. ITU-T: Vocabulary for Performance and Quality of Service, Amendment 2: New Definitions for Inclusion in Recommendation ITU-T P.10/G.100. Recommendation, ITU-T, 2008 9. Stockhammer, T.: Dynamic adaptive streaming over HTTP: standards and design principles. In: Proceedings of the Second Annual ACM Conference on Multimedia Systems (MMSys), pp. 133–144 (2011) 10. Bentaleb, A., Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R.: A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Commun. Surv. Tutorials 21(1), 562–585 (2019) 11. Spiteri, K., Sitaraman, R., Sparacio, D.: From theory to practice: improving bitrate adaptation in the DASH reference player. In: Proceedings ACM Multimedia Systems Conference (MMSys’18) (2018) 12. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2003) 13. dash.js DASH Industry Forum. https://github.com/Dash-Industry-Forum/dash.js/wiki. Accessed 01 April 2019 14. ITEC: Dynamic Adaptive Streaming over HTTP (2016). http://www.itec.uni-klu.ac.at/ftp/dat asets/DASHDataset2014/. Accessed 23 March 2019
Methods and Means for Analyzing Heat-Loss in Buildings for Increasing Their Energy Efficiency Veneta Yosifova(B) Institute of Information and Communication Technologies, Bulgarian Academy of Science, Acad. G. Bonchev St., 25A, 1113 Sofia, Bulgaria [email protected]
Abstract. Energy efficiency in buildings is a popular topic when it comes to reducing worldwide energy consumption, releasing of harmful gases and global climate change, since that they consume around 40% of the provided world energy. Heat losses in buildings cause a significant share of building’s poor energy performance. The paper observes the main problems with heat losses in buildings resulting to decreasing of their energy efficiency. The procedures for determining old building’s energy performance are described. Thermal imaging is presented with its contribution for discovering heat losses, thermal bridges, insulation problems and other efficiency disruptions. Experiments with thermal camera are carried out concerning building’s faults that have effect to the energy efficiency. Keywords: Heat loss · Energy Efficiency · Thermography · Thermal imaging · Thermal bridges
1 Introduction Energy efficiency is one of the most relevant topics in the 21st century in order to reduce energy consumption, air pollution and climate change. The 2012 Energy Efficiency Directive sets out a set of measures to help the EU to reach the 20% target for energy efficiency by 2020, and after the update in 2016, this target was increased to 30% by 2030. The directive requires all EU countries to use energy more efficiently at all stages of the energy chain, from production to final consumption. Building sector consumes more than 40% of the whole energy and becomes an important sector in which efficient measures should be taken in consideration along with the Building Code. The energy performance is usually determined in energy used per unit surface and per year: kWh/m2 /year. This can be evaluated by taking in consideration of the building’s materials and construction element characteristics, while assuming that all materials and constructions are well assembled and the building functions as planned and is done before the start of the construction. Also outside natural factors such as sunlight, season characteristics, geographical location etc. cause difference for the needed heating and
© Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 45–54, 2021. https://doi.org/10.1007/978-3-030-55190-2_4
46
V. Yosifova
cooling [1]. All those theoretical research has inclines when the actual building is finished and starts to operate that is why additional non-destructive analysis can be made to determine the real energy efficiency. For reducing the energy consumption and increase the energy efficiency in every building, one of components that should be taken into consideration is heat loss. In old buildings, heat losses are dominated by the transfer of heat through building components. The better a construction conducts heat, the greater the heat losses [2]. For that reason a descriptive heat-loss analysis should be carried out. With modern thermography imaging, the heat losses can be detected and measures for their elimination can be applied. One of the most common heat-loss building mistakes are thermal bridges due to bad construction and element assembly, bad insulation, cheap or old materials etc. For better understanding of the heat loss and their prevention a broad analysis should be carried out taking in consideration many examples in an on-sight research in order to submit the main problems and start planning on their prevention.
2 Heat Loss in Buildings Buildings acts as a barrier between the inside and outside climate and their architecture and construction should be assigned with the local climate conditions and seasoning. For example in warm places, it is important to have wide openings and shaded spaces in order to achieve good ventilation, while in a cold region the sun must be allowed to enter the rooms as much as possible for natural heating and lightning. The direction of heat flow always goes from the hottest to the coldest surface and the transmission occurs when there is a difference between the temperatures of the two surfaces. It is determined that the main forms of energy loss in a building are divided between its surrounding constructions as follows: 35% through the walls, 25% through windows and doors, 25% through the roof and 15% through the floor. These heat losses occur by convection, conduction, and radiation [3]. Conduction is a direct heat flow through matter (molecular motion). It results from direct physical contact between elements. Heat always moves from hot to cold and uses the shortest route. The more dense a substance, the better conductor it is. Because air has low density, the percentage of heat transferred by conduction through air is comparatively small. Convection is the transport of heat within a gas or liquid, caused by actual flow of material itself (mass motion). In building spaces, natural convection heat flow is largely upward which is called free convection. When it is mechanically inducted (from a fan or a heater) it becomes forced convection. Radiation is the transmission of electromagnetic rays through space and it is invisible. Infrared rays occur between light and radar waves (between 3 and 15 micron of the spectrum). Any material with temperature above absolute zero emits infrared radiation which rays radiate from their surfaces in all directions in straight lines until they are reflected or absorbed by another object including walls, floors, furniture etc. This invisible rays travel with the speed of light, they have no temperature or visibility, only energy. Heating an object excites the surface molecules, causing them to give off infrared radiation. When this infrared rays strike the surface of another object, they are absorbed and
Methods and Means for Analyzing Heat-Loss in Buildings
47
a heat is produced that spreads throughout the mass by conduction. The heated object then transmits infrared rays from exposed to air surfaces by radiation [4]. 2.1 Calculating Heat Losses The calculation of heat losses (H) can be simplified as sum of the heat losses from every building element in Watts using the equation: H = U .A.dT
(1)
where: U = Thermal transmittance [W/m2 K] A = Area of surface [m2 ] dT = Temperature difference The formula doesn’t include the heat loss through thermal bridges. Since the surfaces are architectural decisions and the temperature difference depends on the climate situation, the only way to reduce the heat losses is to design the building elements with lower U-value [5]. 2.2 Heat Loss Due to Ventilation and Infiltration Air exchange of outdoor air with the air already in a building can be divided into two broad classifications: ventilation and infiltration. Ventilation includes the intentional introduction of air from the outside into a building; it can be further subdivided into natural ventilation and forced ventilation. Natural ventilation is the flow of air through openings such as windows, doors, etc. and it is driven by natural and/or artificially produced pressure differences. Forced ventilation, is the intentional movement of air into and out of a building using automatic fans and vents; it is also called mechanical ventilation. Infiltration is the flow of outdoor air into a building through cracks and other unintentional openings and through the normal use of exterior doors and windows. Infiltration is also known as air leakage into a building. Like natural ventilation, infiltration and exfiltration are driven by natural and/or artificial pressure differences. These modes of air exchange affect differently energy consumption, air quality, and thermal comfort, and they can each are specific depending on weather conditions, building operation, and use. Although one mode may be expected to dominate in a particular building, all must be considered for the proper design and operation of an HVAC system [6]. Heat loss due to air change inside a building is calculated using: H = ρ.V .C.dT where: ρ=
Density of air [kg/m3 ]
(2)
48
V. Yosifova
V = Ventilation/Infiltration rate [m3 /s] C = Specific heat capacity [kJ/kg K] dT = Temperature difference inside to outside [K] Heat loss due to air change inside a building can be simplified to: H = 0.33.N .v.dT
(3)
where N = Number of air change/hour v = Volume of room [m3 ] dT = Temperature difference inside to outside [K] Based on the following assumptions: ρ = density of air - 1.2 kg/m3 at 20 °C C = specific heat capacity – 1000 J/kg K When designing buildings, the volume of the building (v) is set by the owner needs. The temperature difference (dT) between inside and outside is set by the climate. Only the number of air changes per hour (N) can be changed by the designer of the building which means that to minimise heat loss the number of air changes must be reduced. However it is necessary to have a minimum amount of air change to provide high indoor air quality. To minimise air infiltration, all uncontrolled air flow paths should be closed off. Heat loss is inevitable event, but it is the architect’s duty to manage how quickly heat is lost could be controlled through the use of appropriate building materials and techniques to establish and maintain a watertight building enclosure incorporating high levels of insulation [5, 7]. 2.3 Analysing Heat Loss in Old Buildings In already build constructions, determining the heat loss is a long process, especially when in comes to old buildings and architectural sights. Every specification of the building starts with audit of its characteristics concerning the energy efficiency. It includes detailed description of the building, analysis of the condition of the functioning technical systems that provide and manage energy. Evaluating the energy consumption and the energy balance and determining possible energy savings. Based on that, measures for cutting energy consumption can be created and a plan for their consistent application. Usually the specification of building takes place after construction, before exploitation, selling or renting. The calculated energy characteristics are compared to the specified norms based on the building purpose, construction and climate zoning. The indicators for energy consumption are divided in three main groups:
Methods and Means for Analyzing Heat-Loss in Buildings
49
1. Indicators for the surrounding construction elements and their physical characteristics like their heat transfer ratio (U), efficiency and geometry. 2. Indicators for energy consumption of build-in systems like heating, ventilation, water heating, lightning, system heat losses and annual consumption. 3. Indicators for overall energy consumption of the build-in systems The energy characteristics may be also determined based on the annual CO2 caused emissions. The climate conditions should be taken in consideration like monthly average temperature, duration of the heating season, grades based on the climate zoning (DD) and 24 h based sun shining. The overall energy balance is based on the energy losses and gains from those calculations [8]. 2.4 Determining U Value of Materials Since there is a temperature difference between the heated space and the environment, the heat transfer from the warmer to the colder place under the action of heat conducts through the surrounding elements of the buildings (walls, roof, floor). The thermal properties of the enclosing elements are characterized by the coefficient of heat transfer U and, accordingly, the resistance to heat transfer - R. They depend on the structure, density, humidity and pressure in the materials. The ratio between these indicators is determined by their reciprocal value: (4) R = 1/U m2 K/W U = 1/R W/m2 K
(5)
The principle of the method for calculating the heat transfer resistance is to determine the thermal resistance of the individual homogeneous layers of the building element (wall, floor, roof) and also the heat transfer resistance from the air to the surfaces (internal and external) of the element. The total resistance of a multilayer surrounding element determined as a sum of each layer resistance calculated by the formula: (6) Ri = di/λi m2 K/W where: d is the thickness of the layer [m] λ is the calculated thermal conductivity factor for each material as ability of material to transmit heat [W/m °C] and it can be determined based on the national standards for each material (Table 1), [9]. For example:
50
V. Yosifova Table 1. Thermal conductivity of materials. Material
Thermal conductivity [W/m °C]
Concrete
1.65
Brick wall
0.79
Bitumen insulation 0.27
3 Thermography Thermography is the only technology, known so far, able to deliver the whole overview of the building envelope as a complete system. Thermal images, or thermograms, display of the amount of infrared energy emitted, transmitted, and reflected by an object. Thermography measures surface temperatures using infrared video and still cameras (IRC). Infrared technologies are used in almost every industry, with applications in commerce, medicine, military, and recently building analysis. The work with thermal imaging cameras have a lot of benefits like, gives you detailed image of the situation, locating and identifying problems, temperature measurements, find faults and additionally save time. It can serve to many industries like in high-temperature processes using furnace for determining small temperature differences in heated components [10]. Thermal imaging cameras for building applications are powerful and non-invasive tools for monitoring and diagnosing the condition not only of buildings, but every kind of construction you have access to. With a thermal imaging camera you can identify problems early, allowing them to be documented and corrected before becoming more serious and more costly to repair. A building diagnostics inspection with a thermal imaging camera can help: – – – –
Visualize energy losses Detect missing or defective insulation Source air leaks Find moisture in insulation, in roofs and walls, both in the internal and the external structure – Detect mold and badly insulated areas – Locate thermal bridges [11] 3.1 Experimental Work with Thermal Camera In order to verify the possibilities of thermography concerning heat loss and energy efficiency a series of experiments was carried out with thermal camera Flir P640 for determining construction faults connected to building’s energy efficiency. In that way we can better understand the problems that occur linked with heat loss during buildings employment. Thermo Vision SDK utility software is used, which control access to the camera and image data. Programs can be developed to use the camera with different applications including in civil engineering:
Methods and Means for Analyzing Heat-Loss in Buildings
– – – – – – –
51
Create communication between apps and camera; Capture images via FireWireTM and Ethernet interfaces; Adjust the camera configuration parameters and focusing; Manage camera calibration; Send other commands to the camera; Generate real-time 16-bit temperature images; Conduct tests for image analysis and processing.
With the use of specialized FLIR Reporter software, designed for image manipulation, an analysis is carried out. Thus, the use of the thermal camera is more efficient and allows for in-depth analysis to prevent future accidents and problems or to manage and remedy them before they appear [12]. Thermal Bridges in Window Frames The exterior of the façade of IICT’s building in Sofia was monitored with thermal camera Flir P640. On the first floor, the windows are wooden, placed at the time of construction of the building. On the second floor, the windows are replaced with PVC windows. From the photos taken with the thermal camera and the recorded temperature difference, the presence of thermal bridge on the old wooden windows is clearly visible (Fig. 1). This means that warm air from the heated room inside flows freely through defective formations in the window, which inevitably leads to greater heating and consumer discomfort. In the renovated PVC windows, such a phenomenon is not observed so the energy required for heating is used as much as possible without losses. After replacing the old windows frames a new observation will be made, for determining the improving of the energy efficiency.
Fig. 1. Thermal bridges in window frames
Rooftop Thermal Bridges In order to guarantee the quality performance of any finishing work during construction,
52
V. Yosifova
it is necessary to carry out quality control. A number of irregularities can be detected through the thermal camera, such as compromising thermal insulation, thermal bridges, poor waterproofing and other structural defects. This type of problem disrupts the quality of the building and its energy efficiency. Poor performance of roof waterproofing around chimney was observed with thermal camera in office building in Sofia (Fig. 2). When the roof has irregular surface or has structural elements such as chimneys, ventilation shafts and drainage pipes, the continuity of waterproofing and thermal insulation can be easily disrupted. This results in loss of heat, water collection, condensation and subsequently mold and faults in the rest of the insulation.
Fig. 2. Thermal bridge around chimney on rooftop
Insulation Performance If the wall thermal insulation is incorrectly installed, its function and insulation ability will be significantly reduced. Defects in the applied thermal insulation, otherwise invisible through the applied cover, were detected by means of a thermal camera. In Fig. 3 are observed gaps between the thermal insulation sheets leading to thermal bridges and poor distribution of the attachment material. All this compromises the integrity of the insulation and its energy efficiency [13]. The research shows that defects in the construction during the building process can be a reason for significant effect for the decreasing energy performance of a building. Heat loss, otherwise invisible for human eye may be detected with thermal camera and gives the opportunity for effective repairment works. This is why thermographic analysis is crucial when it comes to determining the energy efficiency in existing buildings and constructions. It is also a way to validate builder’s work and secure the efficient energy performance of the building for its future exploitation of new buildings depending on their function.
Methods and Means for Analyzing Heat-Loss in Buildings
53
Fig. 3. Faults in wall insulation
4 Conclusion With the increasing of energy efficient buildings demand, a look back to the old buildings should be made about their energy performance. One of the biggest problems is old building techniques and methods, that wasn’t adapted for efficient energy consumption which leads to significant heat and energy losses. With methods and means for observing the facade and construction elements of old buildings such as thermal imaging, such faults can be detected and a plan for their repairment can be carried out. After determining current building’s energy performance a series of improvements can be made in order to increase its energy efficiency such as new insulation, integration of systems using renewable energy resources and integrated smart building technologies for observing and managing energy consumption. Acknowledgments. The paper is partially supported and financed by National Science Fund of Ministry of Education and Science of Republic of Bulgaria, by Project for junior basic researchers and postdocs – 2019, Contract № KP-06-M37/2 from 06.12.2019 and by Bulgarian Ministry of Education and Science under the National Research Programme “Young scientists and postdoctoral students” approved by DCM # 577/17.08.2018.
References 1. Mangematin, E., Pandraud, G., Roux, D.: Quick measurements of energy efficiency of buildings. C. R. Phys. 13(4), 383–390 (2012) 2. Volker, K., Hans, E., Helmut, K., Thomas, L., Johannes, W., Andreas, W.: The Significance of Thermal Insulation. KEA Climate Protection and Energy Agency of Baden-Württemberg GmbH 3. Eduardo Souza - Learn How to Avoid Energy Loss in Your Buildings, ArchDaily, June 2019. https://www.archdaily.com/915546/learn-how-to-avoid-the-main-sources-of-ene rgy-loss-in-your-home. Accessed 12 Dec 2019
54
V. Yosifova
4. Heat gain/loss in buildings ProTek-USA. https://www.protek-usa.com/pdf-new/Heat-GainLoss-Buildings.pdf. Accesses 12 Dec 2019 5. Tipperary Energy Agency. Fabric and ventilation heat loss, September 2011. https://tip penergy.ie/wp-content/uploads/2011/09/Module-2.3-Fabric-and-ventilation-heat-loss.pdf. Accessed 12 Dec 2019 6. Ventilation and Infiltration. 2005 ASHRAE Handbook—Fundamentals (SI), Chapter 27. https://moodle-arquivo.ciencias.ulisboa.pt. Accessed 12 Dec 2019 7. Hall, F.: Building Services & Equipment, vol. 1, 3rd edn., Chapter 4, ISBN 0-582-23652-5 8. Ministry Of Regional Development and Public Works, Bulgaria. Energy efficiency agency, Guidelines for the implementation of energy efficiency regulations for new and existing buildings (September 2005) 9. Ministry of Regional Development and Public Works, Bulgaria. Energy efficiency agency, Regulation no. 7 on heating and energy savings in buildings (December 2004) 10. Stoimenov, N., Dimitrov, L., Karastoyanov, D., Georgieva, V., Klochkov, L.: Experimental study of furnace temperature for metallization of polypropylene, Part II. Temperature differences analysis in heating unit for silver metallization of polypropylene. In: Problems of Engineering Cybernetics and Robotics, vol. 67, pp 27–33. Prof. Marin Drinov Publishing House of Bulgarian Academy of Sciences, Sofia (2016). ISSN 0204-9848 11. Yosifova, V.: Study of the energy efficiency for buildings and facilities. Insulation materials and detection of construction defects with thermal camera. In: International Conference Robotics, Automation and Mechatronics’ 15 (RAM 2015), Sofia, Bulgaria, 05 November 2015, pp. 37–43 (2015). ISSN 1314-4634 12. Chikurtev, D., Grueva, M., Karastoyanov, D.: Intelligent Service Mobile Robots – Localization, Navigation and Observation Using IR Thermal Camera, ADP 2016, June, Sozopol, Bulgaria, pp. 200–207 (2016). ISSN 1310-3946 13. Yosifova, V.: Methods and means for observing the energy efficiency of buildings and constructions. Dissertation (December 2018)
The Energy Conservation and Consumption in Wireless Sensor Networks Based on Energy Efficiency Clustering Routing Protocol Gaudence Stanslaus Tesha(B) and Muhamed Amanul School of Telecommunication Engineering, Xidian University, Xi’an 710071, China [email protected]
Abstract. Wireless sensor network consists of irreplaceable sensor nodes which are equipped with limited energy resources. Minimizing energy consumption is one of the main problem in wireless sensor networks. This paper proposes an energy conservation and consumption in wireless sensor networks based on clustering routing protocol. The protocol adopts clustering technology to solve the hot spots problem and proposes cluster head priority mechanism to reduce the energy consumption of head nodes in the clusters partition. Firstly, a new network structure model using the cluster members and cluster head and combines existing energy consumption model to construct a new method that determine the optimal for total energy conservation. Finally, we simulated this protocol both with 100 nodes network capability. Simulation results prove that our proposed protocol can effectively improve better performance than LEACH in terms of first node dead, network lifetime, and throughput. Keywords: Wireless sensor network · Energy consumption · Clustering routing protocol · Efficiency energy · Energy conservation Lifetime
1
·
Introduction
As a symbol for change in the global electronic companies, wireless sensor networks (WSNs) is a distributed self-organizing network that integrates data in acquisition, processing and diversity communication. The use of WSNs is an imperative necessity for future revolutionary areas like ecological fields or smart cities in which more than hundreds or thousands of sensor nodes are deployed. The WSNs provides numerous advantages such as network setups can be carried out without fixed infrastructure, its suitable for non-reachable places such as over the sea, mountains, rural areas or deep forests, the network is flexible if there is random situation when additional node is needed. However, the WSN provides cheap pricing for implementation due to plenty of wiring avoided and can be accessed by using a centralized monitor. Because of the advantages of c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 55–72, 2021. https://doi.org/10.1007/978-3-030-55190-2_5
56
G. S. Tesha and M. Amanul
WSN it has a wide number of applications implemented that describe WSNs at different levels. A modest, perhaps the most compact version of WSNs application, is the (e.g. human behavior recognition [1], critical navigation services [2], healthcare [3], and model driven architecture [4]). In particular, it has been used mostly for MAC protocol application [5], traffic and motoring [6–8], and smart city management [9]. The problems for WSNs are for sensor nodes which powered by limited battery and it is difficult to recharge or replace batteries after deployment, thus energy conservation and consumption is a very important issue to deal with. In recently year, academia in various global have proposed various kinds of clustering routing protocols for WSNs. The selection of primary and backup cluster head (CH) model ignores the LEACH protocol to mainly explain CH selection and how two CH initiated and send data to base station (BS) [10,11]. However, recent theoretical work shows that two CH model prolonging the lifetime of WSNs using the energy harvesting, transfer and conservation [12]. The CH selection model with at least two CH form of primary and backup CH is necessary. Similarly, the CH model with inter-cluster where the sensor field is partitioned into a number of equal-size rectangular units is usually sufficient to understand detailed CH selection methods [13]. For large-scale WSNs, energy efficiency routing protocol provide balance energy consumption for homogeneous proactive network [14–16]. However, what balance energy consumption schemes, that simplify the whole WSNs lifetime, is still not well understood [17]? Here we address this question by focusing on the cluster routing protocol. The cluster protocol, as one of the most well-studied energy consumption method, traditionally plays an essential method in WSNs energy consumption for, unequal clustering protocols [18], rotating energy efficient clustering for heterogeneous devices (REECHD) [19,20] and unequal double cluster head (UDCH) [21]. In the original idea for WSNs, each sensor node collects and sends data directly to BS. In this circumstance based on large WSNs, data congestions and collisions will occur when a large number of sensor nodes transmit data at the same time, which causes more data retransmission lead to high energy consumption. In order to solve these problems mentioned, an energy-efficient clustering routing protocol (EECRP) is proposed in this paper. In clustering routing protocol, clustering technology is adopted to solve the problem both distance of cluster member (CM) to BS and CH to BS. CH closer to the BS takes more forwarding tasks, so cluster size should be smaller to reduce the overhead of CH. Meanwhile, cluster formulation model considers residual energy of nodes within the cluster partition region. The node within the cluster partition region with more residual energy will be CH within the region. In this, if CH decreases below threshold then it will be replaced with other node with high residual energy. In addition, to balance energy consumption a network model based on cost function on cluster partition region distance and residual energy is adopted based on its rounds. Based on the large scale of WSNs, multi-hop communication method is usually implemented in order to save energy. In this scenario, the nodes with more
Energy Efficiency Cluster Routing Protocol (EECRP)
57
residual energy within the cluster partition region will take on more forwarding tasks from the CM as a CH to the BS, which results in heavy overhead of CH for data aggregation, fuses and transmission to BS, lead to run out of power earlier than other nodes within cluster partition region. This create a large number of WSNs disconnect which refers to the number of CM that cannot connect to its CH and the number of CH that cannot connect to the BS. Thus, will cause failure of the communication due to CH run out of power leads to energy distortion of entire WSNs. This problem is referred to as hot spot problem. In fact, while all CM will have the same amount of average residual energy for traffic communication, the node selected as the CH that is near to the BS have a higher traffic communication amongst CM. The LEACH does not support the residual energy for node within the network but the proposed protocol EECRP provide efficient CH formation system. However, every initialization step, the cluster protocol check for energy in CH has fallen a defined threshold cluster formation process. This ensure continuation of CH on it is operation and by doing so it will save amount of energy which goes wasted. The main innovative points of the paper include the construction of the CH selection mechanism through cluster partition region based on the uniformity of the CH distribution, the residual energy, and the distances from the CM to the BS. Apart from the CH selection, regular CH residual energy comparisons for each (1/ρ) round (ρ is a probability to be CH) enable balance for energy consumption for WSNs as any CM can be revoking in each round as CH according to its residual energy capabilities and avoids the groups CH communication which used by LEACH.
2
Preliminary
In this section, we introduce the LEACH which is generally used and can be applied on various scenario. LEACH is a protocol that is clustering-based protocol that utilizes randomized rotation of local cluster base stations (cluster-heads) to evenly distribute the energy load among the sensors in the network. LEACH uses localized coordination to enable scalability and robustness for dynamic networks, and incorporates data fusion into the routing protocol to reduce the amount of information that must be transmitted to the BS. In addition, LEACH is able to distribute energy dissipation evenly throughout the sensors, doubling the useful system lifetime for the networks. 2.1
Low-Energy Adaptive Clustering Hierarchy
The operation of low-energy adaptive clustering hierarchy (LEACH) is circular and each circle was named as a round. Each round starts with a preparing section, when each node is clustered and decides whether it is a cluster head node, followed by a working section, when each node transmits information to the CH according to the set of TDMA schedule based on setup phase and steady state phase.
58
G. S. Tesha and M. Amanul
The detail of setup phase and steady state phase in LEACH are described in five phases: – Phase 1: Advertisement phase Originally, when clusters are being created, each CM decides whether or not to become a CH for the current round. This decision is based on the suggested percentage of CH for the network and the number of times the CM has been a CH so far. This decision is made by the node n choosing a random number between 0 and 1. If the number is less than a threshold T (n), the CM becomes a CH for the current round. The threshold is set as illustrated Eq. (1) ρ 1 If nG (1) T (n) = 1−(ρ∗rmod ρ ) 0 = otherwise
–
–
–
–
whereas ρ is desired percentages of CH (ρ = 0.1), r is the current round, and G is the set of CM that have not been CH in last ρ1 rounds. Phase 2: Cluster setup phase The CM decided to which cluster it belongs, it must inform the CH that it will be a member of the cluster. During this phase all CH must keep their receivers on Phase 3: Schedule creation The CH receives all the messages for CM that would like to be included in the cluster. Based on the number of CM in the cluster, the CH creates a TDMA schedule telling each CM when it can transmit. This schedule is broadcast back to the CMs in the cluster. Phase 4: Data transmission After the clusters being created and the TDMA schedule is fixed, data transmission can begin from the CM. Assuming CM always have data to send, they send it during their allocated transmission time to the CH. The radio of each non CH can be turned off until the node’s allocated transmission time, thus minimizing energy dissipation in these nodes. Phase 5: Multiple clusters The radio communication affect communication in a nearby cluster. The radio inherently a broadcast medium. To reduce interference among cluster, each must communicates using different CDMA codes.
2.2
Lifetime and Energy Consumption
The lifetime involves the low energy adaptive clustering hierarchy system which enables the nodes to transmit information to the group heads of the cluster to which they have a place. The WSNs partitioned into groups managed by cluster group heads. The cluster heads total the information got from the non-group head hubs or group part and forward it to the BS and after send it then pass to internet. It is a cluster based steering convention whose fundamental point is to build the lifetime of the WSNs as it is a decent case of self-versatile and self-sorted out convention. Its general task depends on rounds and each round
Energy Efficiency Cluster Routing Protocol (EECRP)
59
Fig. 1. A typical architecture of a wireless sensor network.
comprises of two phases. In Fig. 1 the fundamental delineated a typical WSNs architecture. Each communication component for WSNs is presented. As may be observed, a typical WSNs consist of sensor node, a sensing monitoring field, Internet, communication links, and a user computer control system. The most advantages of LEACH protocol is to reduce the energy consumption by each node in a network and extend the lifetime of the network by use the CH to collects the data from its members which are associated to cluster. The most doubts part is on the CH selection for the LEACH, and energy efficiency cluster using LEACH for the hub send information to CH following the TDMA schedule to enduring the state stage [17]. In general, some shortcomings for LEACH protocol which need improvements are as follows: – No guaranteeing that the CH nodes are well distributed through the network. LEACH has been proposed but cannot scale for large number of nodes – The selection of CH is randomly without consider the residual energy of the nodes within the cluster partition. Therefore, some nodes can be selected with less residual energy which caused to die earlier than nodes with high residual energy – Hierarchical clustering problem. The promising for larger networks, could save a lot of energy rather than using the groups CH
2.3
Cluster Head Selection Principle
The LEACH defines the concept of “round”. A round consists of two phases: initialization and stabilization. In order to avoid extra processing overhead, the steady state generally lasts for a relatively long time (this experiment cycle is p1 , that is, 10 cycles is a cycle) [22]. During the initialization phase, CH is generated through the following mechanism. The CM generates a random number temp
60
G. S. Tesha and M. Amanul
between 0 and 1. If the value of the random number is greater than the threshold T , the node is selected as the CH. The calculation method of T is as shown Eq. (2). P where(i = 1, 2, 3, ...n) (2) T (i) = (1 − P [rmod(1/P )]) whereas from Eq. (2), i is the number of CM, P is the percentage of CM that becomes CH (P = 10 in this experiment), and r is the current number of rounds.
3 3.1
Proposed Energy Efficiency Cluster Routing Protocol Overview
The energy efficiency cluster routing protocol, overcome the limitation of LEACH. It is assuming after each selection of CH in a one round the whole process of CH creation is being enacted. Thus, it has been tried to improve the CH and CM by adding scheme that improves the power consumption of the WSN. To balance the energy consumption of the sensor nodes in the WSN, it was performing the sensor node death decision after each round of data transmission in the WSN. Through the protocol step once the node dies within the circulation round, the system returns to rebuild the architecture in step number (1), otherwise it can continue on step (3). In generally, step (1), (2) and (3) is the initialization process of the system protocol while steps (4) and (5) are the equilibrium process protocol. The main steps are described as follows: Step 1. Build up WSN architecture Step 2. Identify and assign initial energy to sensor node Step 3. Select the CH from the CM based on the residual energy as shown Eq. 3 F unction(S(i).E) = distance(S(i).E)/(diCH + diBS )
Step 4. Step 5. Step 6. Step 7. Step 8.
(3)
β00 = α0 |1 − α1 |1when results equal to 11 whereas, (S(i).E) represent function for residual energy of node i, diCH represent function distance for node i to CH and diBS represent function distance for node i to BS. For round =1, allocate the CH depending on factors separation from the CM For next round calculate the node degree from the CM and compare with the chosen CH Calculate the residual energy if S(i).E 0) cost function(i) == distance(i)/(S(i).(E) 3: begin 4: Then elect as next forwarder node with minimum cost function 5: Minimum node == minimum function nodenumber == I nodeselect (r) == nodenumber 6: If energy of nodes is greater threshold 7: if if (nodenumber .E > threshold) then 8: Send packet to CH = packet to CH+1 9: begin 10: else 11: If energy of nodes is less threshold 12: if (nodenumber .E < threshold) D = sqrt((S(i).xd − sink.x)2 + (S(i).yd − sink.y)2 ) 13: end if 14: while begin do 15: if CM Send packets to CH then 16: P ropdelay = P ropdelay + delay(i) 17: CH Send packets aggregates To BS 18: end if 19: end while
However, before starting data initialization process, each member of all cluster region partition needs to send set of control message to its CH to indicates its status and the cluster identification number. CH send control message to indicates status within the cluster partition region. Once starting data initialization process, CM that has received CH status send their information. Energy consumption for CM within the cluster region partition can be expressed as shown Eq. (9). (9) Ecmt = kEs + kET xCH + kεr d2
Energy Efficiency Cluster Routing Protocol (EECRP)
65
whereas from Eq. 9, Ecmt is the total energy for CM, k is data bit and in this paper k = 4 kbits , Es is a sensing energy, ET xCH is an energy consumed for transmission from CM to CH, εr is an amplification power and d represent distance from CM to CH respectively. Area consumed by each cluster is (A2 /c) on an average for all CM within the cluster region partition. Therefore, density distribution for CM in WSNs is ρ(x, y), then expected distance value represented as shown Eq. (9) that can be illustrate as shown Eq. (10). E(d2 ) = (x2 + y 2 )ρ(x, y) dxdy = (r2 ρ(r, θ) drdθ (10) As referred distribution of CM in the cluster region partition is designed as a circular area, the formula Eq. (10) can be formulated as Eq. (11).
2π
E(d2 ) = ρ θ=0
3.5
√d nc
r3 drdo =
r=0
ρA4 2πc2
(11)
Distance Formulation Model
The distance of the packet transmission from the sensing area to the BS as adopted [17], actual distance from CH to BS, free space and multi path fading channel model both need to be comprehensively analyzed because from CH to BS it has different factor. To minimize energy consumption and prolong WSNs lifetime, therefore, we integrate all energy consumption together as shown Eq. (12).
Algorithm 2. Cluster data formation algorithm Require: CH with coordinates of transmitting and receiving energy Et and Er 1: Select node with less residual energy 2: begin 3: All node compute residual energy 4: if generate EnergyEt > Er required energy then 5: CH=Broadcast to all nodes 6: begin 7: else 8: Send Detail information to CMs 9: end if 10: while begin do 11: if Send information to common nodes then 12: itr = itr + 1; 13: Updates Information to CH 14: end if 15: end while
66
G. S. Tesha and M. Amanul
M inimizing(ET otal )subject = Es + ET x + ERx + Ef s + EI
(12)
whereas, ET otal is total energy to transmit k to destination, Es sensing node energy, ET x + ERx is transmission and receiving energy and Ef s + EI is amplification and idleness energy respectively. In clustering, sensor nodes are divided into different virtual groups according to a set of rules [23]. Some nodes are CH based on residual energy and other CM as presented Sect. 3.4. CH are responsible for managing the CM, and being charge of receiving and processing data from them. As for CH, that is responsible for receiving and processing data sent by CM and eventually transmitting it to BS. As it can see from the Algorithm (2) to minimizing total energy consumption and balance energy consumption of CM in WSNs, the paper present performance node death decision after each round of data transmission in the network based on the distance from the sensing area to BS. The Algorithm (2) indicates that (once node dies, return to the initial setup (1) otherwise return to step (6). In additional, setup configuration protocol are on steps (1), (2) and (3) while from step (6) is the packet transmission based on the residual energy for each round until time limit threshold met.
4
Simulation Analysis and Performance Evaluation
Simulation analysis is based on the LEACH protocol and energy consumption using the improved LEACH. Network lifetime of WSNs is very important due to restricting of the energy consumption. To investigate effectiveness, following parameters where considered as network lifetime that determining number of the sensor nodes dead for WSNs, energy consumption that determine the average residual energy, throughput for determine packet sent and received at the BS. Data analysis were implemented with MATLAB. The total of 100 nodes are randomly distributed in a 100 ∗ 100 area, and a BS is set at the (50, 50) central position. Simulation parameters is shown on Table 1. Table 1. Simulation parameters Parameter name
Symbol
Number of nodes n Distance of BS X axis m Random election probability P Multipath attenuation energy EDA (nJ) Distance condition d0 Data packet size kbit Transmit and receive energy per bit nJ Free space energy Ef s (pJ/bit/m2) Attenuation of space energy Emp (pJ/bit/m4)
Values 100 500 0.1 5 87.7 4 50 100 0.0013
Energy Efficiency Cluster Routing Protocol (EECRP)
Table 2. Comparisons of node deaths for both protocols Protocol
First dead node (round) Last dead node (round)
LEACH 1389 Improve LEACH 7571
2379 11991
100 90 80 70 60 50 40 30 20 10 0
0
10
20
30
40
50
60
70
80
Fig. 4. Node distribution for LEACH
Fig. 5. Node distribution for improved LEACH
90
100
67
68
G. S. Tesha and M. Amanul
Fig. 6. LEACH dead node distribution after 3500 rounds
Fig. 7. Improved LEACH dead node distribution after 14000 rounds
The networks were analyzed based on first node’s death from two aspects: the number of rounds until death of the first node and last node’s death, and number of rounds failure in network coverage. Results of first dead node it can be seen from Table 2. The first node for LEACH dead after 1389 rounds where as improved LEACH dead after 7571 rounds. The last node’s death for LEACH is 2379 and for improved-LEACH at 11991 rounds so, we can obviously say that lifetime of the first node dies is increment by 5 times than the LEACH. Figure 4 and 5 indicates the node distribution for LEACH protocol. Figure 6 indicates the node distribution after 3500 rounds for LEACH protocol and most of nodes they are invisible because they have died a lot based on those rounds. However, for improved LEACH, the node distribution Fig. 7 indicates the invisible after 14000 rounds. The number of dead node for LEACH
Energy Efficiency Cluster Routing Protocol (EECRP)
69
100 90 80
´No of Nodes
70 60 50 40 30 20 10 0
0
500
1000
1500
2000
2500
3000
3500
rounds
Fig. 8. Node deaths for LEACH
Fig. 9. Node deaths for Improved LEACH
indicates all node dead at 2300 while for improved-LEACH the last node to die is at 12000 rounds as shown Fig. 8 for LEACH and 9 for improved LEACH respectively. The average consumption energy indicate improvement for the proposed method as decreased based on the number of rounds as indicated in Fig. 10. Thus, for the improved-LEACH the energy consumption continue decreased as per rounds.
70
G. S. Tesha and M. Amanul 50 45 40
No of nodes
35 30 25 20 15 10 5 0
0
500
1000
1500
2000
2500
3000
3500
Rounds
Fig. 10. Network energy change for LEACH
5
Conclusion
This paper proposes a WSNs energy efficiency clustering routing protocol to improve existing LEACH protocol for improving performance of the WSNs based on it is application. Using clustering routing protocol system design reduces energy conservation and consumption for WSNs and improves network lifetime. In this method, we have indicated the improvement for energy consumption from LEACH to improved LEACH. This increases the WSN lifetime. In comparison, our method can be implemented to WSNs better than LEACH and need to less time for evaluating. As future work, this study can be further extended for case of energy mobility for enabling WSNs. Further, by introducing additional parameters in designing energy clustering routing protocol, cost function will be another alternative to improve network lifetime performance of WSNs.
References 1. Tao, D., Jin, L., Wang, Y., Li, X.: Rank preserving discriminant analysis for human behavior recognition on wireless sensor networks. Ind. Inform. IEEE Trans. 10, 813–823 (2014) 2. Wang, C., Lin, H., Zhang, R., Jiang, H.: Send: a situation-aware emergency navigation algorithm with sensor networks. IEEE Trans. Mob. Comput. 16, 1149–1162 (2017) 3. Loubet, G., Takacs, A., Dragomirescu, D.: Implementation of a battery free wireless sensor for cyber-physical systems dedicated to structural health monitoring applications. IEEE Access. 7, 24679–24690 (2019) 4. Anwar, M.W., Azam, F., Khan, M.A., Butt, W.H.: The applications of model driven architecture (MDA) in wireless sensor networks (WSN): techniques and tools. Future Inf. Commun. Conf. 69, 14–27 (2019). https://doi.org/10.1007/9783-030-12388-8 2
Energy Efficiency Cluster Routing Protocol (EECRP)
71
5. Egea-L´ opez, E., Vales-Alonso, J., Mart´ınez-Sala, A., Garc´ıa-Haro, P.J., Pav´ onMari˜ no, P., Delgado, M.B.: A wireless sensor networks MAC protocol for real-time applications. Pers. Ubiquit. Comput. 12, 111–122 (2008) 6. Byamukama, M., Nannono, J.N., Ruhinda, K., Pehrson, B., Nsabagwa, M., Akol, R., Olsson, R., Bakkabulindi, G., Kondela, E.: Design guidelines for ultra-low power gateways in environment monitoring wireless sensor networks. In: 2017 IEEE AFRICON, pp. 1472–1478. IEEE (2017) 7. Shao, X., Wang, C., Zhao, C., Gao, J.: Traffic shaped network coding aware routing for wireless sensor networks. IEEE Access. 6, 71767–71782 (2018) 8. Ali, S., Ashraf, A., Qaisar, S.B., Afridi, M.K., Saeed, H., Rashid, S., Felemban, E.A., Sheikh, A.A.: Simplimote: a wireless sensor network monitoring platform for oil and gas pipelines. IEEE Syst. J. 12, 778–789 (2018) 9. Naranjo, P.G.V., Pooranian, Z., Shojafar, M., Conti, M., Buyya, R.: FOCAN: a fog-supported smart city network architecture for management of applications in the internet of everything environments. J. Parallel Distrib. Comput. 132, 1–16 (2018) 10. Xin, G., Yang, W.H., DeGang, B.: EEHCA energy efficiency hierarchical clustering algorithm for wireless sensor networks. Inf. Technol. J. 7, 245–252 (2008) 11. Soua, R., Minet, P.: A survey on energy efficient techniques in wireless sensor networks. In: 2011 4th Joint IFIP Wireless and Mobile Networking Conference (WMNC), pp. 1–9. IEEE (2011) 12. Engmann, F., Katsriku, F.A., Abdulai, J.D., Adu-Manu, K.S., Banaseka, F.K.: Prolonging the lifetime of wireless sensor networks: a review of current techniques. Wirel. Commun. Mob. Comput. 2018, 1–24 (2018) 13. Yang, L., Zhi, Y., Chang, L., Simon, Z., Yang, X.: An unequal cluster-based routing scheme for multi-level heterogeneous wireless sensor networks. Telecommun. Syst. 68(1), 11–26 (2018) 14. Mohamed., R.E., Saleh., A.I., Abdelrazzak., M., Samra, A.S.: Survey on wireless sensor network applications and energy efficient routing protocols. In: Wireless Personal Communications, vol. 101, pp. 1019–1055 (2018) 15. Tam, N.T., Hai, D.T., et al.: Improving lifetime and network connections of 3d wireless sensor networks based on fuzzy clustering and particle swarm optimization. Wirel. Netw. 24, 1477–1490 (2018) 16. Arjunan, S., Sujatha, P.: Lifetime maximization of wireless sensor network using fuzzy based unequal clustering and ACO based routing hybrid protocol. Appl. Intell. Springer. 48(8), 2229–2246 (2018) 17. Pachouri, R., Saraf, S., Jain, A.: Energy efficient clustering technique using leach protocol for wireless sensor network. Int. J. Adv. Res. Comput. Sci. 9, 74–80 (2018) 18. Arjunan, S., Sujatha, P.: A survey on unequal clustering protocols in Wireless Sensor Networks. J. King Saud Univ. - Comput. Inf. Sci. 31, 304–317 (2019) 19. Cacciagrano, D., Culmone., R., Micheletti, M., Mostarda, L.: Energy-efficient clustering for wireless sensor devices in Internet of Things. Perform. Int. Things 7, 59–80 (2019) 20. Han, G., Jiang, X., Qian, A., Rodrigues, J.J., Cheng, L.: A comparative study of routing protocols of heterogeneous wireless sensor networks. Sci. World J. pp. 1–11 (2014). Article ID 415415 21. Zhu, F., Wei, J.: An energy-efficient unequal clustering routing protocol for wireless sensor networks. Int. J. Distrib. Sens. Netw. 15, 1–15 (2019)
72
G. S. Tesha and M. Amanul
22. Koucheryavy, A., Salim, A.: Cluster head selection for homogeneous Wireless Sensor Networks. In: 2009 11th International Conference on Advanced Communication Technology, vol. 3, pp. 2141–2146. IEEE (2009) 23. Yang, H., Wang, X.: ECOCS: energy consumption optimized compressive sensing in group sensor networks. Comput. Netw. 146, 159–166 (2018)
Intelligent Control of Traffic Flows Under Conditions of Incomplete Information Elena Sofronova(B) and Askhat Diveev Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia sofronova [email protected], [email protected]
Abstract. An intelligent control of traffic flows under conditions of incomplete information is considered. A method for constructing a model of traffic flows control based on the controlled networks theory is described. The optimal control problem is formulated. The control is searched as a program for traffic lights switching at intersections. The search for the optimal control is performed by the variational genetic algorithm. An example of solving the problem for a road network consisting of four intersections is given. Keywords: Traffic flows
1
· Optimal control · Traffic flows model
Introduction
The traffic flow control problem belongs to the class of artificial intelligence. Initially, it is proposed to solve the problem as an optimal control one. We need a model that adequately describes the control object. One of the main obstacles is that to solve this problem, as a rule, mathematical models are used, which do not coincide with the models used in the theory of optimal control. In most models, either ordinary differential equations with a loose control vector are not used, or there is no mathematical expression for a function that explicitly describes the correlation between the control vector and the quantitative values of traffic. Simulation of traffic flows historically has two main approaches - deterministic and probabilistic (stochastic) [1,2]. In a deterministic approach, models contain functional relationships between individual indicators, for example, speed and distance between cars in a flow. In stochastic models, traffic flow is considered as a probabilistic process. Models of traffic flows can be divided into three classes: analogue models, the leader following models, and probabilistic models. In analogue models, the movement of traffic flows is compared to the physical flows, for example, fluid flows in pipes. As a result, models are obtained in the form of a system of partial differential equations without explicit control in the right parts of the equation. In the models of following the leader, the relationships between the movement of the slave and the leading car are investigated. In c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 73–87, 2021. https://doi.org/10.1007/978-3-030-55190-2_6
74
E. Sofronova and A. Diveev
probabilistic models, stochastic equations of traffic flows are considered taking into account the influence of control on their statistical characteristics. All these models are not fully suitable for the optimal control problem statement in the classical form, or it is necessary to formulate new optimization problems, most likely, in the class of discrete optimization problems that relate to NP problems. In any case, the mathematical expressions describing the functional dependence of traffic flow parameters on control parameters should be included in these models. In 2010, a traffic flow model based on the controlled networks theory [3] was developed in [4]. This model was presented in the form of finite difference equations, which most closely correspond to the system of differential equations used in the theory of optimal control. The resulting model was subsequently widely studied in [5–7]. The model allows to get accurate quantitative characteristics of traffic flows in a network of urban roads. The main feature of the model is that it requires the exact values of some parameters. Another feature is that it is always open for modelling the large networks of cities, i.e. we can include a large number of subnetworks in it, although it is not known in advance which of the subnetworks have the greatest impact. Therefore, it is necessary to limit the part of the network information about which is known, and consider the rest of the network as “external”. The article contains some key information on the theory of controlled networks in Sect. 2. The description of the proposed traffic flow model is given in Sect. 3. The optimal control problem for this kind of object is formulated in Sect. 4. The search for optimal control is performed by the variational genetic algorithm given in Sect. 5. An example of solving the problem for a road network consisting of four intersections is given in Sect. 6.
2
Controlled Networks Theory
Controlled networks were introduced and studied in details in [3]. Here we shall briefly provide only the main ideas and give an example. A controlled network is a set of the base networks G = (V, E), V = {v1 , ..., vl } is the set of vertices, E = {e1 , ..., ep } is the set of edges, ej = (vij , vkj ) (j = 1, p), and the subset K(G) = {G1 , ..., Gq }, where Gi = (V, Ei ), Ei ⊆ E (i = 1, q), of the set of its subnetworks. A subnetwork is obtained by removing several edges from the base network and is called a configuration of the controlled network. Let us define the control vector to determine the rule of choosing the network configuration u = [u1 ...uM ]T , u ∈ U = U1 × U2 × ... × UM ,
(1)
+ where ui ∈ Ui = {0, 1, ..., u+ i }, ui ∈ Z+ , i = 1, M , and Z+ is the set of positive integers. Each edge of the base network corresponds to a certain element of control vector. For certain values of a control vector element, the associated edge is
Intelligent Control of Traffic Flows
75
removed from the base network, thus determining a network configuration. A control vector element may be associated with the several edges of the network. For certain values of such an element, some associated edges are removed from the base network, while the others are not. Let us give a formal description of the controlled network. The structure of the graph of the base network is presented by the adjacency matrix of the base network. A = [ai,j ], ai,j ∈ {0, 1}, i, j = 1, L.
(2)
The correlation of the elements of u = [u1 ...uM ]T with the edges of the base network is performed by using the control matrix C = [ci,j ], ci,j ∈ {0, 1, 2, ..., M }, i, j = 1, L.
(3)
Here, ci,j is either the index of the element associated with the edge of the base network leading from the vertex i to j or 0 if there is no edge leading from the vertex i to j, therefore, ci,j = 0 whenever ai,j = 0 (i, j = 1, L). The correlation between the control vector elements and the corresponding edge in the base network, we use the matrix of admissible phases F = [Fi,j ], Fi,j ⊆ Uci,j , i, j = 1, L.
(4)
Here, the element Fi,j of F is the set of values that can be taken by the T element uci,j , uci,j ∈ Uci,j = {0, 1, ..., u+ ci,j }, of the control vector u = [u1 ...uM ] for which the edge from the vertex i to j is not removed from the graph of the base network, therefore, Fi,j = if ai,j = 0 (i, j = 1, L). The A, C, and F allow us present the subnetwork configuration of the base network depending on the value of the control vector u = [u1 ...uM ]T . The structure of the graph of a subnetwork is given by the adjacency matrix of the controlled network configuration where A(u) = [ai,j (u)], i, j = 1, L,
(5)
where 1, if uci,j ∈ Fi,j , ai,j (u) = 0, otherwise, i, j = 1, L. The configuration of the controlled network varies at certain moments called control steps U(·) = (u(1), ..., u(K)),
(6)
where u(k) = [u1 (k)...uM (k)]T is the control vector at the time k (k = 1, K). Let us define the flow vectors in the controlled networks. The flow vector is a vector with the real-valued elements in which each element is associated with a certain vertex of the network
76
E. Sofronova and A. Diveev
1 1 x = [x1 ...xL ]T , xi ∈ R1+ , i = 1, L, x ∈ RL + = R+ × ... × R+ .
(7)
L
Here, xi is the traffic flow in the vertex i, i = 1, L, and R1+ is the set of nonnegative real numbers. The flow vector can change its values at every control step k depending on the current configuration of the network. The controlled network affects the flow vector x(k). A control vector u = [u1 ...uM ]T is chosen that determines the network configuration A(u(k)) for each k = 0, K. Depending on configuration, and on the values of the flow vector at preceding step x(k−1), the values of the flow vector x(k) are modified. To describe the constraints imposed on the flow, we specify the matrix of edge capacities B = [bi,j ], bi,j ∈ R1+ , i, j = 1, L,
(8)
where bi,j is the capacity of the edge leading from the vertex i to j, therefore, bi,j = 0 when ai,j = 0 (i, j = 1, L). The flow distribution among different directions specified by the edges leading from vertices is described by the distribution matrix D = [di,j ], di,j ∈ R1+ , i, j = 1, L,
(9)
where di,j is the value of the fraction of the flow that goes from the vertex i in the direction of the vertex j, therefore, di,j = 0 when ai,j = 0 (i, j = 1, L). The entries of D should satisfy the equation L
di,j = 1, i = 1, L.
(10)
j=1
The change of the flow vector x(k) = [x1 (k) . . . xL (k)]T at the control step k depending on the network configuration A(u(k)) determined by the control u(k) = [u1 (k) . . . uM (k)]T is described by x(k) =x(k − 1)− ((x(k − 1)1TL ) A(u(k)) D− ˙ ((x(k − 1)1TL ) A(u(k)) D− A(u(k)) B))1L + ((x(k − ((x(k −
1)1TL ) 1)1TL )
(11)
A(u(k)) D− ˙ A(u(k)) D−
A(u(k)) B))T 1L , ˙ is an integer where 1TL = [1, . . . , 1], is the Hadamard matrix multiplication, − substraction.
L
Intelligent Control of Traffic Flows
77
Consider an example. Let the controlled network be described by the base graph depicted in Fig. 1. This network has five vertices (L = 5), vertices 1 and 3 are sources, and vertex 5 is a sink. The adjacency matrix of the base graph is ⎡ ⎤ 01010 ⎢0 0 0 0 1⎥ ⎢ ⎥ ⎥ A=⎢ ⎢0 0 0 1 0⎥ . ⎣0 0 0 0 1⎦ 00000
Fig. 1. A base graph
Suppose that the control vector of dimension two is used to change the network configuration u = [u1 u2 ]T , u1 ∈ U1 = {0, 1}, u2 ∈ U2 = {0, 1, 2}. Element u1 affects the states of the edges (1, 2) and (2, 5), and the element u2 affects the states of the edges (1, 4), (3, 4), and (4, 5). Then, the control matrix is ⎡ ⎤ 01020 ⎢0 0 0 0 1⎥ ⎢ ⎥ ⎥ C=⎢ ⎢0 0 0 2 0⎥ . ⎣0 0 0 0 2⎦ 00000 Assume that the edge (1, 2) appears in the graph when u1 = 0, the edge (2, 5) - when u1 = 0 or u1 = 1, the edge (1, 4) - when u2 = 0, the edge (3, 4) when u2 = 1, and the edge (4, 5) - when u2 = 0 or u2 = 2. Then, the matrix of admissible phases in the controlled network is ⎡ ⎤ {0} {0} ⎢ {0, 1}⎥ ⎢ ⎥ ⎥ F=⎢ ⎢ {1} ⎥ . ⎣ {0, 2}⎦
78
E. Sofronova and A. Diveev
Suppose that the edges of the base network have limited capacities specified by the capacity matrix ⎡ ⎤ 0 10 0 20 0 ⎢0 0 0 0 15⎥ ⎢ ⎥ ⎥ B=⎢ ⎢0 0 0 10 0 ⎥ . ⎣0 0 0 0 14⎦ 0 0 0 0 0 The distribution of the flow moving from each vertex in different directions is given by the distribution matrix ⎡ ⎤ 0 0.4 0 0.6 0 ⎢0 0 0 0 1⎥ ⎢ ⎥ ⎥ D=⎢ ⎢0 0 0 1 0⎥ . ⎣0 0 0 0 1⎦ 0 0 0 0 0 This controlled network has six different configurations, | U1 || U2 |= 2 × 3 = 6, presented in Fig. 2. For each configuration, the corresponding value of the control vector is also given.
Fig. 2. Configurations of base graph
For this example the performance of (11) at one control step for the control u(k) = [0 1]T and x(k − 1) = [250 40 60 50 0]T results in x(k) = [240 35 50 60 15]T , that means that 15 vehicles moved to the sink vertex 5.
Intelligent Control of Traffic Flows
3
79
Universal Traffic Flow Model
The time of control is divided into control steps. All traffic phases are integers, and all traffic lights are synchronized. We can transform from integer k to time t if we know an interval Δt between control steps t = kΔt. To determine the values of the traffic flows let us divide all roads of the network on sections. A section of the road is a small part on which the traffic flow moves in one direction without manoeuvres. Time of movement on the section is quite small and not taken into account. Each section of the road is connected to one component of the state vector that keeps the number of average-sized cars currently in the road. A state vector is (12) x(k) = [x1 (k) . . . xL (k)]T , where xi (k) is a number of average-sized vehicles on road section i at time k, xi (k) ∈ R1 , i = 1, L, k = 0, K, K is a number of control steps. The traffic flow model based on the controlled networks [3,4] is (11) with addition of δ(k), δ(k) = [δ1 (k) . . . δL (k)]T , δi (k) is a randomly given value of the incoming/outgoing flow to road section i, i = 1, L, k is a control step. In (11) A is an adjacency matrix, A = [ai,j ], ai,j ∈ {0, 1}, i, j = 1, L. If ai,j = 1 then there is a maneuver from section i to section j, u is a control vector, u = [u1 ...uM ]T , ui ∈ {0, 1, . . . , u+ i −1}, where ui is a number is a maximum number of phases at of traffic light phase at intersection i, u+ i intersection i, i = 1, M , M is a number of intersections. A(u) is a configuration matrix, A(u) = [ai,j (u)], ai,j (u) ∈ {0, 1}, i, j = 1, L. Matrix of configurations A(u) is obtained from adjacency matrix A by substitution of some elements by 0 1, if ai,j = 1, uci,j ∈ {Fi,j } ai,j (u) = , 0, otherwise C is a control matrix, C = [ci,j ], ci,j ∈ {1, . . . , M }, ci,j shows the index of intersection where maneuver from road i to j is performed, F is a matrix of phases, F = [Fi,j ], Fi,j = {fi,j,1 , . . . , fi,j,p(ci,j ) }, + fi,j,r ∈ {0, 1, . . . , u+ ci,j }, 1 r p(ci,j ), uci,j is a maximal number of active phases at ci,j , p(ci,j ) is a maximal duration of phase that allows maneuver from section i to section j at ci,j , Fi,j is a set of phase indices that allow maneuver from section i to section j, D is a distribution matrix, D = [di,j ], di,j ∈ [0, 1], where di,j shows the fraction of the traffic on i that makes a maneuver to j, B is a capacity matrix, B = [bi,j ], bi,j ∈ R1+ ∪ {0}, where bi,j shows the flow from i to j.
80
E. Sofronova and A. Diveev
Matrices A, F, C, B, D are of the same dimension and structure: if ai,j (u) = 1, then bi,j > 0, ci,j > 0, di,j > 0, Fi,j = , otherwise bi,j = 0, ci,j = 0, di,j = 0, Fi,j = . If the elements B and D are known then the model (11) allows to find the flows on each road section at each control step. Model (11) is universal and describes any road network with all possible maneuvers. If the road network has unregulated intersections at which maneuvers are possible without traffic lights, then this intersection can be considered as regulated intersection at which all maneuvers are permitted. If the network has a ring road, for example, or turns are possible on some section, then this section should be divided into subsections and it should be assumed that there are always phases between these sections of roads that allow maneuvers. It is also necessary to subdivide long sections into subsections. If several road sections have no maneuvers from them, for example when the road is an output road of the network, then these sections may be joined into one. The generality of model means that if we divide some network into subnetworks connected through intersections then such controlled aggregated network of sections and subnetworks can be described by (11).
4
Optimal Control of Traffic Flows
For the traffic flow model (11) let us define an initial state on all L road sections in the network x(0) = x0 = [x01 . . . x0L ]T ,
(13)
State vector is constrained xi (k) x+ i , i = 1, L,
(14)
where k is a current control step,k = 1, K, K is a duration of control process. The control should minimizes one or several quality criteria xi (K) → min, (15) J1 = − J2 =
i∈I0
J3 = α
K
i∈I1
xi (K) −
xj (K) → min,
(16)
j∈I1
+ ϑ(xi (k) − x+ i )(xi (k) − xi ) + J1 → min,
(17)
+ ϑ(xi (k) − x+ i )(xi (k) − xi ) + J2 → min,
(18)
k=0 i∈I / 0 ∪I1
J4 = α
K k=0 i∈I / 0 ∪I1
where α is a weight coefficient, I0 is a set of indices of elements of the state vector associated with the entrance road sections in the network, I1 is a set of
Intelligent Control of Traffic Flows
81
indices of elements of the state vector associated with the exit road sections in the network, 1, if a > 0 ϑ(a) = . 0, otherwise The solution, a control program, is searched in the following form ˜ (·) = (˜ ˜ (K)), u u(0), · · · , u
(19)
˜ (k) = [˜ u u1 (k) · · · u ˜M (k)]T ,
(20)
where u ˜i (k) ∈ {0, 1}, i = 1, M , M is a number of controlled intersections. The order of working phases of traffic lights is fixed, and the control program switches the phases. The values of elements in control program are ones and zeros. The ones switch the current phase to the next one in the specified order, and zeros - do not switch the phase. When the maximum phase number u+ i is reached, the phase turns to the initial, (˜ ui (k − 1) + 1) mod u+ ˜i (k) = 1, i , if u u ˜i (k) = (21) u ˜i (k − 1), otherwise. The phases may switch only after several control steps. The number of control steps in which the phase remains unchanged is limited ˜+ u ˜− j aj − bj u j ,
(22)
where aj , bj are the nearest indices of control steps in which program control ˜j (bj ) = 1, u ˜− ˜+ keeps the value 1, u ˜j (aj ) = 1, u j ,u j are lower and upper permissible number of control steps for one working phase of the traffic light. The values of components of the program control are equal to zero between moments aj , bj , u ˜j (aj + i) = 0, i = 1, bj − aj . The control program is searched by a modification of genetic algorithm, the variational genetic algorithm [7].
5
Search Algorithm
To solve an optimal control problem, we use a modification of the genetic algorithm, a variational genetic algorithm. A variational genetic algorithm emerged from the classic genetic algorithm in 2014 when we solved the problem of minimization of calculations. The main operations of variational GA remain nearly the same as in classical genetic algorithm, see Fig. 3. Variational GA is described in details in [7]. Here we present only the brief description. To implement the variational genetic algorithm, we set one basic control program which we call a basic solution and a set of small variations of the basic
82
E. Sofronova and A. Diveev
Fig. 3. Flowchart of variational genetic algorithm
solution in [8]. For the traffic control problem vector of small variations contains three elements: index of intersection, index of control step, and a new value of the control program element. Vector of small variations changes the basic solution, thus changes control program. A basic solution as a control program for each control step is ˜ 0 (·) = (˜ ˜ 0 (K )). u u0 (0), · · · , u
(23)
The initial population of possible solutions consists of a basic solution (23) and a set of ordered sets of vectors of small variations W = (W1 , . . . , WH ),
(24)
where Wi is an ordered set of vectors of small variations, Wi = (wi,1 , . . . , wi,d ), [w1i,j
w2i,j
w3i,j ]T ,
(25)
w = i = 1, H, j = 1, d, H is a number of possible solutions in an initial population, d is a depth of variation, that shows maximal number of variation of the basic solution. To estimate each solution in the population we calculate the quality criterion (15)–(18) i,j
Intelligent Control of Traffic Flows
F = (f0 = J(˜ u0 (·)), . . . , fH = J(˜ uH (·))).
83
(26)
Next, the best possible solution is found fi− = min{f0 , . . . , fH }.
(27)
The main genetic operations, crossover and mutation, are performed on ordered sets of vectors of small variations. For crossover we select two ordered sets Wα = (wα,1 , . . . , wα,d ) and Wβ = (wβ,1 , . . . , wβ,d ) and calculate the probability of crossover fi− fi− , . (28) pc = max fα fβ A random number from 0 to 1 is generated, and if it is bigger than pc , then a crossover is performed. A point of crossover is found randomly σ ∈ {1, . . . , d}. WH+1 = (wα,1 , . . . , wα,σ−1 , wβ,σ , . . . , wβ,d ),
(29)
WH+2 = (wβ,1 , . . . , wβ,σ−1 , wα,σ , . . . , wα,d ).
(30)
To perform a mutation, we compare a random number from 0 to 1 with a given probability of mutation pμ , and if this random number is bigger, then mutation is performed. Next, for WH+1 and WH+2 mutation points are found randomly μi ∈ {1, . . . , d}, i = 1, 2, and two new variation vectors are generated wH+i,μi = [w1H+i,μi w2H+i,μi w3H+i,μi ]T , i = 1, 2.
(31)
The new vectors (31) are inserted in corresponding places of the new possible solutions WH+1 and WH+2 . The new received possible solutions are estimated by functional (15)–(18) ˜ 0 (·)), fH+1 = J(WH+1 ◦ u
(32)
˜ (·)). fH+2 = J(WH+2 ◦ u
(33)
0
For each new possible solutions the solution with the worst quality criterion is found (34) fi+ = max{f0 , . . . , fH }. If the worst possible solution (34) is worse than (32) or/and (33), then (34) is replaced by this new possible solution(s), i.e. if fi+ > fH+i , then Wi+ ← WH+i , fi+ ← fH+i , i = 1, 2.
(35)
Crossover and mutation are repeated a given number of times. Then the basic solution is replaced by the best found solution. The functional (15)–(18) are calculated again, and the algorithm is repeated. The calculations are terminated when the basic solution is changed a given number of times. The found by this time solution is considered to be the best possible solution.
84
6
E. Sofronova and A. Diveev
Test Road Network
For computational experiment a network with 4 intersections and 24 road sections was used. Schematically, the network is shown in Fig. 4.
Fig. 4. Test road network with 4 controlled intersections
Entrance road sections are sections 1–8, internal sections are 9–16, and exit sections are 17–24. If road sections are taken as nodes of the graph, then we can construct a graph model of the network, where the vertices of the graph are road sections and arcs are possible maneuvers between them. At the same time, maneuvers have a certain direction, so a directed graph of the network is obtained. The network graph for the network shown in Fig. 4 is shown in Fig. 5. Control at intersections is carried out by sequentially switching the phases. Maximum number of control modes at each intersection is 8. It is supposed that all traffic lights are synchronized. Each control mode is characterized by permitted maneuvers. The control modes change in the established order, the durations of the control modes are the control actions. For example, for intersection 1, control mode 0 allows only the maneuvers shown in Fig. 6 (left), and, control mode 2 allows the maneuvers shown in Fig. 6 (right). Initial state of the network is x0 = (30, 25, 28, 29, 22, 32, 26, 29, 6, 9, 8, 7, 9, 8, 7, 5, 0, 0, 0, 0, 0, 0, 0, 0). Constraints on road sections are x+ = (100, 100, 100, 150, 120, 150, 220, 100, 20, 20, 20, 30, 40, 20, 20, 25, 8000, 8000, 8000, 8000, 8000, 8000, 8000, 8000). Increments to the entrance road sections are Δ = (20, 20, 20, 20, 16, 16, 16, 22).
Intelligent Control of Traffic Flows
85
Fig. 5. A graph of network
Fig. 6. Control modes 0 (left) and 2 (right) at intersection 1
˜ 0450×4 = Basic solution (part) is u ⎡
⎤ 0 000 ⎢ 0 0 0 0⎥ ⎥ ⎢ ⎢ 0 0 0 0⎥ ⎥ ⎢ ⎢ 0 0 1 0⎥ ⎥ ⎢ ⎢ 0 0 0 0⎥ ⎥ ⎢ ⎢ 0 0 0 0⎥ ⎥ ⎢ ⎢ 1 0 0 0⎥ ⎥ ⎢ ⎦ ⎣. . . 0 011
Table 1 presents the parameters of computational experiment. In the computational experiment we used two quality criteria (15) and (16). To choose a solution in multicriterial task we used a Pareto set (see Table 2).
86
E. Sofronova and A. Diveev Table 1. Computational parameters Parameter
Value
Number of intersections, M
4
Number of road sections, L
24
Number of control steps, K
450
Initial phases at intersections, u0
[1 2 3 4]
Minimal phase durations, u− ,
[5 5 5 5]
Maximal phase durations, u+ ,
[50 50 50 50]
Maximal number of phases at intersections, u+ , [8 8 8 8] Parameters of variational GA: Number of possible solutions, H
512
Number of generations, G
128
Number of pairs for crossover in 1 generation, R 128 Depth of variations, d
8
Probability of mutation, pµ
0.75
This set contains indices of obtained best solutions and values of criteria. For example, in Table 2 we have three solutions: 15, 64, and 113. Table 2. Pareto set Index of solution J1 , J2 15 64 113
2252, 310 2240, 324 2298, 290
The results of simulation with basic solution showed J1 = 2665 and J2 = 392 which is, for example, 20% worse than the obtained optimal solution n.64 with J1 = 2240 and J2 = 324. Acknowledgment. Research was partially supported by the Russian Foundation for Basic Research, project No 19-08-01047-a.
References 1. Semenov, V.V.: Mathematical simulation of traffic flows dynamics in megapolis. Institute of Applied Mathematics of RAS, vol. 34, 44 p. Moscow (2004) 2. Allsop, R.: Some reflections on forty years’ evolution of transport studies. In: 38th Annual Conference of the Universities Transport Study Group (2006)
Intelligent Control of Traffic Flows
87
3. Diveev, A.I.: Controlled networks and their applications. Comput. Math. Math. Phys. 48(8), 1428–1442 (2008) 4. Alnovani, G.H.A., Diveev, A.I., Pupkov, K.A., Sofronova, E.A.: Control synthesis for traffic simulation in the urban road network. In: Proceedings of the 18th IFAC World Congress, pp. 2196–2201 (2011) 5. Diveev, A.I., Sofronova, E.A.: Synthesis of intelligent control of traffic flows in urban roads based on the logical network operator method. In: Proceedings of European Control Conference (ECC-2013), 17–19 July 2013, pp. 3512–3517, Z¨ urich (2013) 6. Diveev, A.I., Sofronova, E.A., Mikhalev, V.A., Belyakov, A.A.: Intelligent traffic flows control software for megapolis. Procedia Comput. Sci. 103, 20–27 (2017) 7. Sofronova, E.A., Belyakov, A.A., Khamadiyarov, D.B.: Optimal control for traffic flows in the urban road networks and its solution by variational genetic algorithm. Procedia Comput. Sci. 150, 302–308 (2019) 8. Diveev, A.I.: Small variations of basic solution method for non-numerical optimization. In: Proceedings of 16th IFAC Workshop on Control Applications of Optimization, pp. 28–33 (2015)
Virtual Dog José Luis Pastrana-Brincones(B) University of Málaga, Málaga, Spain [email protected]
Abstract. Present project aims to develop a navigator mobile application for blind people, in such a way that it is not just a routing path guide; it will describe your current environment in a configurable radio. It will be integrated in a social network that allows adding new elements into the environment that will be shared by all the users. And it will also work inside building by estimating the current position using step counter, compass, altimeter, etc. Keywords: Visual impaired software · Smart cities · Ambient assisted living
1 Introduction The World Blind Union (WBU) estimated in 2020 that 253 million people are blind, visual impairment or partially sighted around the world. One of the most important problem in everyday life as a person with visual impairment is the decreased mobility and also, the difficulties found in discovering and getting around new and unknown environments. Blind people mobility around a city by themselves is a problem even harder when they are inside a building they don’t know. Those people take help in common life from other people, dogs and their own stick. However, what happens when they are alone? They usually know the way the must follow but what happens when there is a new obstacle like a trash bin moved in their path or they are inside a building they never have been. How could they get the toilet or the room the look for? Maybe there are Braille signs but where? And how to find them? Using GPS devices for guiding visually impaired people was first proposed by Collins [4]. and Loomis [6]. There are many GPS navigation apps as Apple Maps, Google Maps, and other apps that will tell you where you are and what is nearby anytime you wish, but this information is usually got by writing addresses, GPS coordinates or tapping icons, which can slow you down significantly if you are visually impaired. Blindness-aware navigation apps should inform using voice all the information at regular intervals as you move, or, if you prefer, whenever you touch or shake your mobile phone. You can set a blindness-aware navigation app to keep you in the direction you have to go by using the compass existing in the device. This could be very useful to orient yourself in a busy city area, or when crossing a street or a wide parking lot back to the street. Standard GPS navigator apps usually alert you when you have to make a turn, but what happen when you have to go across streets? A blindness-aware navigation app © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 88–95, 2021. https://doi.org/10.1007/978-3-030-55190-2_7
Virtual Dog
89
should tell the distance to as well as the name of each cross street as you approach it. It also should announce nearby addresses from time to time so you know about how far you have traveled, the distance remaining and whether your destination is on the left or right side of the street. A sighted traveler can see all of the shops, buildings, office and other points of interest he/she is passing along his/her journey. A blindness-aware navigation app should offer the information in a same way as you were sighted. So, you should have all nearby points of interest announced as soon as you approach them, along with their distance and direction. Of course, the amount of information could be limited avoiding announcements from various categories, such as banks and restaurants, or to places you have marked as favorites or not interested. The UE ASSAM (Assistants for Safe Mobility project at http://www.aal-europe.eu/ projects/assam/) is a project that aims to compensate problems that people suffering from declining physical and cognitive capabilities have to face by developing navigation assistants, which are user-centered, for several mobility platforms, such as walker, wheelchair, and tricycle, improving sustained everyday mobility and autonomy with seamless transition from home to street in many environments such as hotels, residential complexes, parks, neighborhoods, or touristic areas. Those systems provide obstacle avoidance and a set of assistance utilities as navigational aid or cognitive assistance for visual impairment, and security by connection to a help center in emergency situations. BlindSquare is one of most used in the world accessible GPS-app developed for the blind and visually impaired. It is paired with third-party navigation apps. BlindSquare is a self-voicing app that delivers detailed points of interest and intersections for getting a safe and reliable travel both outside and inside. Google also released two Android applications that are designed to help the blind with walking directions that pair Google Maps with GPS navigation technology. Those applications are WalkyTalky and Intersection Explorer. They both use spoken walking directions from Google Maps giving the blind (or visually-impaired) the opportunity to explore the map of streets and the path to go before navigating them in the physical world. WalkyTalky uses voice and it is considered as is an audible directions app, while Intersection Explorer features touch exploration. These two new apps will like to map nerds who like to virtually roaming even if they are not blind. Lazarillo is another guidance App. It allows people to know your current location and nearby services by voice messages. You can also get information regarding bus stations, cafes, banks, restaurants, street intersections and more. There are no many research projects concerning indoor navigation [3, 7, 9]. One of them could be Ubica2 (http://smartprojects.zed.com/?p=12) that works on the developing a guidance platform for several types of impaired people using previously uploaded routes inside buildings equipped with location sender devices. Blattner, in [1] proposes one indoor guidance using system that uses Bluetooth and the mapping standard WGS84 where maps are static and previously uploaded. Pereira, in [7]. proposes one indoor guidance using system by reading QR codes. Most popular guidance system for blind people uses Bluetooth Beacons [2]. that are programmed to communicate with smartphones and send audio instructions to get the target goal. Wayfindr project [3] (https://www.wayfindr.net/) in Pimlico subway station
90
J. L. Pastrana-Brincones
at southeast of London and developed by the Royal London Society for Blind People (RLSB) and UsTwo, uses big Bluetooth labels set on the walls of the station to guide people along the station.
2 Business Case 2.1 Business Case 1: Blind Student Going to the University Peter is a visual impaired computer science student at the University (see Fig. 1). Walking to the college there is a trash bin in the middle of the street but blind people cannot avoid it.
Fig. 1. Visual impaired student going to the university.
A mate walked by that area before he is and he added it to the application (see Fig. 2).
Fig. 2. Obstacle added to the system.
This way (as seen in Fig. 3). Peter goes round it and avoids the obstacle.
Virtual Dog
91
Fig. 3. Avoiding one obstacle
2.2 Business Case 2: Blind Person Assisting to a Conference Peter is also a person interested on tourism. So, he is going to the International Conference on Tourism for every people. A taxi leaves her in the main gate of the hotel, but he does not know where the registration desk or the toilet is. However, registration desk, toilets and rooms have been added to the application, so he can get wherever he wants to go. Fig. 4 shows what happened when he wants to go to the toilet. The system guides him to the toilet corridor and it tells him where the toilet is.
Fig. 4. Looking for the toilet
3 The App Present Project arises from personal experience as teacher in Miami Oxford, Ohio University where I taught a blind student. Those entire common thing we don’t think about them like arriving to the campus, get your classroom or teacher office was a real hard problem to that student and become a motivation to develop present project.
92
J. L. Pastrana-Brincones
This project aims to develop a navigator mobile application for blind people, such a way that it is not just a routing path guide, it will describe your current environment in a configurable radio, telling you where is a pedestrian crossing, a light, any obstacle, etc. like “Elevator is 2 m left hand side” or “There is a trash bin 1 m ahead”. This application will be part of a social network where every people in the net could add or update any information being a live community sharing information and helping each other. This way, first people finding a new obstacle could add it and it will be notified to any other one who walks near it, so “not bumping into the same stone twice”. In order to make the application really useful building maps and points of interest in building can be added using three dimension coordinates (latitude, longitude and altitude) what makes the application extensible by every people in the net. Other point of interest in the application is it must work inside a building where GPS or Wi-Fi signal is not available. Nowadays, mobile devices have a lot of tools as step meter, compass, altitude meter, etc. that allows us to calculate its current position even when there is no GPS, Wi-Fi or Bluetooth beacon signal. What it will allow to guide you to the toilet or one office inside a building. Present project main goals to achieve are: 1. To develop a navigator mobile application for blind people that works on mobile devices. 2. Describe your round environment inside a configurable long radio. 3. Be part of a social network community sharing information. 4. Works inside building with no needed of GPS, Wi-Fi or Bluetooth beacons. 5. Allows add building maps at runtime. Figure 5 shows the schema of the system we are developing. Virtual Dog Server will take map information from Google Maps API and it will add the POI added by Virtual Dog users stored in our server database. This way, the navigator is telling what is around the user in the configured radio from the current GPS position. While GPS signal is available the app running in the smartphone is estimating the length of one step of the current user by counting how many steps have been taken in the current walk. This way, current position can be estimated when GPS signal is not available like indoor in an easy way because we know the distance counting steps and the direction given by the compass. The existing assisting technologies for the visually impaired generally focus on either words/voice or tactile sense, or both for alternative approaches for input and output. Our App interface has been developed using voice and big areas on the screen that user can touch for selecting options as it can be seen in Fig. 6. Every user of the application has the following options shown in Fig. 7.
Virtual Dog
93
Fig. 5. Virtual dog system schema
Fig. 6. Virtual dog interface
He/she can start the navigation, listen to the instructions, set and save a point of interest and classify it using predefined labels as “cut road”, “trash bin”, “catch basin”, “fence”, etc. So, when he/she is moving he/she will get auditory information about what is surrounding him/her on the one hand by points of interest given by Google Maps and on the other hand by points of interest saved by any user.
94
J. L. Pastrana-Brincones
Fig. 7. User story board
4 Conclusions and Future Work GPS navigation systems have been a tool that transformed daily life for people with visual impairments [5, 8]. Developed app offers to blind people or sight problems audible directions and guidance to help them to navigate the common life with more independence. In addition, it also solves on of the biggest problem with traditional GPS, one which is essential in dairy common life and, of course, blind people need: GPS is unable to provide assistance indoors. It is not necessary to say that, people who are blind or visually-impaired visiting place indoors have a lot of problems and find them at a disadvantage when trying to move without the technology they use to. Prototypes developed demonstrate that our app works and allows you to navigate indoor. It is also self-adaptive by calibrating the size of the step of the user. Implementation of a working prototype has been successful. Focus should now be focused on improving the prototype communication and optimization. The barrier to entry is high for assistive technologies like this visual aid as it is required to be accurate to be effective. A qualitative system evaluation involving real visually impaired people to reveal underlining system qualities and characteristics would be an important step in bringing the solution closer to market.
Virtual Dog
95
References 1. Blattner, A., et al.: Mobile indoor navigation assistance for mobility impaired people. Proc. Manuf. 3, 51–58 (2015) 2. Castillo-Cara, M., et al.: Ray: smart indoor/outdoor routes for the blind using Bluetooth 4.0 BLE. Proc. Comp. Sci. 83, 690–694 (2016) 3. Chang, Y., et al.: Indoor wayfinding based on wireless sensor networks for individuals with multiple special needs. Cybern. Syst. 41(4), 317–333 (2010) 4. Collins, C.C.: On mobility aids for the blind. In: Warren, D.R., Strelow, E.R. (eds.) Electronic Spatial Sensing for the Blind, pp. 35–64. Boston (1985) https://doi.org/10.1007/978-94-0171400-6 5. Ding D., et al.: Design considerations for a personalized wheelchair navigation system. In: EMBS (2007) 6. Loomis, M.: Digital map and navigation system for the visually impaired. Department of Psychology, University of California-Santa Barbara (1985) 7. Pereira, C., et al.: Open-source indoor navigation system adapted to users with motor disabilities. Proc. Comput. Sci. 67, 38–47 (2015) 8. Queraltó, P., et al.: Herramienta de cálculo de rutas óptimas según parámetros de accesibilidad física en itinerarios urbanos. Ciudad y Territorio Virtual (2010) 9. Torrado, J.C., et al.: Easing the integration: a feasible indoor wayfinding system for cognitive impaired people. Pervasive Mob. Comput. 31, 137–146 (2016)
Context-Aware Transfer of Task-Based IoT Service Settings Michael Zipperle1,2(B) , Achim Karduck1 , and In-Young Ko3 1
3
Furtwangen University, Furtwangen, Germany [email protected], [email protected] 2 University of New South Wales, Canberra, Australia Korea Advanced Institute of Science and Technology, Daejeon, South Korea [email protected]
Abstract. The number of available Internet of Things (IoT) devices is growing rapidly, and users can utilize them via associated services to accomplish their tasks more efficiently. However, setting up IoT services based on the user, and environmental context, and the task requirements is usually a time-consuming job. Moreover, these IoT services operate in distributed computing environments in which spatially-cohesive IoT devices communicate via an ad-hoc network, and their availability is not predictable due to their mobility characteristic. To the best of our knowledge, there have been no researches done on saving and recovering users’ task-based IoT service settings with considering the context and task requirements. In this paper, we propose a framework for describing task-based IoT services and their settings in a semantical manner, and providing semantic task-based IoT services in an effective manner. The framework uses a machine learning technique to store and recover users’ task-based IoT service settings. We evaluated the effectiveness of the framework by conducting a user study. Keywords: Semantic web · Context awareness · Web services Internet of Things · Service description · Mobile computing
1
·
Introduction
There are a growing number of IoT devices that can be found in our everyday life, and we can utilize them to provide users with the IoT services to enable them to accomplish their tasks in an efficient and effective manner. A statistic predicts that the number of available IoT devices will be doubled by the end of 2020 [1]. However, as the number of IoT devices available increases, users need to spend more time and efforts to set up their IoT services. This may lead to a situation in which the effort that is necessary to set up the IoT-based services to accomplish a task is higher than the effort that is required to perform the task without the support of these IoT services. Especially, the users need to determine which IoT services are required and which settings need to be configured. c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 96–114, 2021. https://doi.org/10.1007/978-3-030-55190-2_8
Context-Aware Transfer of Task-Based IoT Service Settings
97
However, the settings of the IoT services for a particular task may need to be done differently according to the context of a user. Users usually have different contexts and preferences, which makes it difficult to reuse the same IoT service settings for a task for different users. In summary, IoT service settings depend on three factors, the user context, the environmental context, and the task to be accomplished by the user. Furthermore, users often perform their tasks in heterogeneous environments, whereby the environmental context may change frequently. Not only the environmental conditions such as temperature, loudness, brightness, and humidity are subject to change but also the availability of IoT devices and their services can be change. Nowadays, IoT devices operate in distributed computing environments where spatially-cohesive IoT devices communicate with each other over an ad-hoc network. Moreover, it is hard to predict the availability of IoT services due to the mobility of IoT devices. The following is an example scenario for the provision of services in a IoT environment, which is used throughout the paper. A user has the task “watch a movie” and begins to accomplish this task in the environment “living room”. To do this, he sets up various IoT services according to his preferences. Now, the user notices after some time that he will get a visit from his friends and therefore he has to cook dinner. However, at the same time, he also wants to watch the movie further, which is why he changes to the environment “kitchen” and continues to pursue the task of “watch a movie” there. Since the context of the environment has changed, it is not possible to recover the IoT service settings from the environment “living room”. Whereas the user’s preference for the brightness in the “living room” was dark, in the “kitchen” it is bright. In addition, the offered IoT services differ. In the “living room” the air conditioning was used to adjust the temperature, but in the kitchen, there is only one fan and a smart window opener. This leads to the first problem to be solved: task-based IoT service settings cannot only be saved and recovered in another environment with a different context. Next, the user’s friends came earlier than expected, and the user could still not finish his tasks “watch a movie”. Therefore, the user decides to finish his task on the next day in the student lounge at his university. The user had never done this task in this environment before. For the user, it is very time-consuming to select the right IoT services and assign the right settings to them. This leads to the second problem to be solved: setting up task-based IoT service settings manually is time-consuming and inefficient. In order to solve these problems, we propose an approach considering the user and environmental context, and the user’s task itself to set up task-based IoT service settings. Our approach is composing of two essential components, which are the main contributions of our work. Firstly, a semantic task-based IoT service description, which is necessary to match the required IoT services of the user with the IoT services that are offered by the environment. Secondly, a semantic setting description to detach settings from specific IoT devices and make them more interchangeable. The core of this framework is an agent that predicts the taskbased IoT service settings while considering the user context and environmental context. We have prototypical implemented the agent to evaluate our approach.
98
M. Zipperle et al.
Therefore, we have conducted a user study which results show that our approach can sufficiently solve these problems. The remainder of this paper is organized as follows. In Sect. 2, we describe related works on storing settings, collection context information, and representation of IoT device information. In Sect. 3, we introduce our approach, which consists of the semantic task-based IoT service description, semantic setting description, and the framework itself. A prototypical implementation of the agent of the framework is shown in Sect. 4. In Sect. 5 we show the evaluation of the approach. Finally, in Sect. 6, we summarize all our main contributions, limitations and propose a plan for future work.
2
Related Work
There is a Google patent [2], which describes how user settings can be stored and transferred from one device to another device. This supports the modern paradigm of changing from being device-centric to user-centric. However, this patent assumes that the same settings are available on all devices. AWARE [3] is a framework that allows collecting context information about the environment and the user from the sensors integrated into the smartphone. This context information can be used by context-aware applications to adapt further. Nevertheless, AWARE does not take into account the user’s task while collecting context information. Currently, there are several context schemes, which try to standardize the representation of IoT device information. In [4], the authors propose a model for scalable semantic aware context storage. This d-dimension model enables scalability and semantic extraction of IoT device information and further any semantic comparison of different IoT device information. The authors of [5] proposed a context definition and query language for IoT device information. This language allows showing temporal relations between situations, which can be used by context-aware applications to adapt to a specific situation. The IoT-Device Description Language (DDL) [6] is a machine-readable and humanreadable description language for IoT devices whose goal is to homogenize these devices so they can easily integrate into the surrounding environment to create opportunities for powerful IoT applications. In [7] an architecture for contextas-a-service platform is proposed. This platform stores and manages the context of IoT devices and offers this information as a service to context-aware applications. However, none of those context schemes considers the user’s context nor the IoT device settings. In conclusion, there is related work on storing settings, collection context information, and representation of IoT device information. Nevertheless, none of these approaches can completely solve the problems of this work, as they are lacking in the following points. First, they do not take into account that the IoT service settings depend on the context of the user and the environment. Second, they do not consider the correlation between the task of the user and the context of the user and his environment.
Context-Aware Transfer of Task-Based IoT Service Settings
3
99
Key Components
A solution is required that allows to store user’s IoT service settings, recover them in the same or different physical environment, and allows to efficiently set up IoT service settings for a new task or physical environment. In the following, we propose two essential components: A semantic task-based description of services and semantic setting description. These two components are required by our framework, which will be introduced afterward. 3.1
Semantic Tasked-Based IoT Service Description
In recent years, there has been a growing demand for a ubiquitous computing environment that enables users to utilize services embedded into the infrastructure or on the web anytime, anywhere. Previous concepts followed the central cloud computing model, which enabled the efficient collection and management of IoT services. Due to the rapid increase in the number of IoT services, such a centralized model is not sufficiently scalable. A distributed computing model offers high scalability by allowing IoT resources to interact over an ad-hoc network and thereby, providing services efficiently and with more flexibility to users. A user can use a combination of these IoT services available on the web to accomplish a task [8]. The authors of [9] proposed a task-based provision framework and in this framework, a task describes the goal of a user and the services required to achieve it. A concept to describe the services and their settings required for a task is necessary. The task-based provisioning framework is well suited as a basis. However, the concept needs to be extended to provide a suitable task-based IoT service description. The extended concept of the task-based provision framework is shown in Fig. 1. The concept includes a separation between abstract and physical description. Besides, we have defined three layers: First, the tasks layer contains tasks that a user wants to achieve. Second, the service layer describes the services needed for accomplishing a task. The service layer is divided into an abstract (high-level) and a physical (low-level) layer. The abstract service layer describes the services with the help of semantic information and encapsulates them from explicit physical devices. This decapsulation is necessary due to the properties of distributed computing environments and the unpredictable availability of IoT services. Additionally, this decapsulation allows the transfer of the task-based IoT service description into another physical environment. Furthermore, the description of services using semantic information enables efficient service discovery, selection, and invocation. The physical service layer maps the semantic description of a service to a physical service or a composition of physical services through the service selection. Lastly, the IoT resource layer describes the physical devices and represents the relationship between them and a physical service. For instance, a user wants to accomplish the task “watch a movie”, therefore, he requires two services. First, a film must be played and displayed on a screen, and second, the room temperature must be controlled. Therefore the
100
M. Zipperle et al.
Fig. 1. Semantic tasked-based IoT service description
abstract services “play movie” and “adjust the temperature” are needed. Thus, an abstract task-based IoT service description is given, and the settings preferred by a user for this abstract service can be defined. If a user wants to execute the task, the service discovery searches for physical services that match the abstract service description. In some cases, a composition of multiple physical services is necessary to provide the functionality of an abstract service. The selected services are then invoked with the user’s preferred settings. If a physical service is no longer available, the service discovery searches for replacements. Thus, the abstract task-based IoT service description is static, and the physical description can dynamically adapt to the availability of physical services. Furthermore, the semantic task-based IoT service description can be transferred to any environment, and a user can accomplish his/her task there. 3.2
Semantic Setting Description
A user naturally has specific IoT settings preferences as he/she utilizes IoT services to accomplish his/her task. Assuming, a user needs a IoT audio service in order to accomplish his/her task “watch a movie”. The user uses a IoT audio service from a Bose speaker, which has the volume levels of 0–10. Later, the IoT audio service from the Bose speaker becomes unavailable, and the service discovery discovers another matching IoT audio service from a Sony speaker, which has volume levels of 0–20. Two problems arise in this scenario: First, how
Context-Aware Transfer of Task-Based IoT Service Settings
101
can the Bose speaker setting be adjusted for the Sony speaker? Second, how can this adjustment be automated by a machine? Therefore, we propose a concept for the semantic description of IoT service settings. This is essential to decapsulate the setting from a particular physical IoT service and to make the setting machine-readable and reasonable. For settings with a concrete value, abstract levels are defined, and a range of concrete values are assigned to them. Within the scope of our work, we propose semantic setting descriptions for IoT services, which influence environmental conditions such as the brightness, loudness, and temperature of an environment. Brightness. In the following, we present a semantic settings description for the brightness. The brightness of an environment can be specified in the measured quantity lux, which we used as a basis to define the abstract levels. The brightness can be influenced by IoT services such as light control and shutter control. The authors of [10] describe which brightness values occur in indoor and outdoor environments and also assign them to certain activities. We propose a division into five levels: very dark, dark, medium, light and very light. We mapped these levels with the brightness values of the authors, the plotted result is shown in Fig. 2.
Fig. 2. Semantic brightness description
Loudness. Next, we propose a semantic setting description for the loudness. The authors of [11] and [12] describe the noise level of different activities. For example, the loudness of a whisper is 30 dB, and that of a chainsaw is 100 dB. Next, we set the number of loudness levels to ten, as a reference an android smartphone offers 15 and amazon Alexa offers ten loudness levels. We set the minimum loudness value to 30 dB, because lower values are hardly audible for
102
M. Zipperle et al.
humans, and the maximum loudness value to 100 dB because higher values are harmful to humans’ ears. Finally, we dividedthe total loudness range from 30 dB to 100 dB into ten levels. Detailed definitions on the loudness ranges and average values for each level are given in Table 1. Table 1. Semantic loudness description ranges Level Range (dB) Average (dB) 1
0.00–30.00
15.00
2
30.01–37.78 33.89
3
37.79–45.56 41.67
...
...
10
92.23–100.0 96.11
...
Temperature. Lastly, we introduce a semantic setting description for the temperature. According to the authors of [13], the temperature preferences strongly depend on the user context. For instance, the temperature sensitivity is different between a child and senior, male and female or Asian and white. For this reason, we only defined the levels without temperature ranges for the semantic description of temperature settings. The temperature ranges for each level can be defined dynamically based on user context. The standard BS EN ISO 7730 [14] defined seven temperature levels. We used this level division for our semantic setting description of temperature settings. The temperature levels are very cold, cold, cool, medium, warm, hot, and very hot. 3.3
Semantic Tasked-Based IoT Service Framework
In this section, we propose the framework that aims to store and recover taskbased IoT service settings while considering the context. The proposed concepts of semantic IoT service description, and semantic setting description are integrated into the framework to achieve this goal. An overview of the framework is shown in Fig. 3, and the core components are described in more detail below: Context Manger. The Context manager is responsible for the administration and processing of the context. As soon as a user wants to perform a task, he/she provides his/her user context to the context manager. After receiving a new user context the context manager queries the environmental context which changes frequently. Then, the context manager processes the user context and environmental context and integrates the required services for the user task in the data schema. The final data schema is then used as input for the agent.
Context-Aware Transfer of Task-Based IoT Service Settings
103
Fig. 3. Semantic tasked-based IoT service framework Table 2. Data schema of the IoT Service setting prediction User age Context ... Envir. name
Context ... Task
Service
Setting
18
...
Kitchen
...
Watch a Movie Brightness
Bright
18
...
Kitchen
...
Watch a Movie Loudness
6
18
...
Kitchen
...
Watch a Movie Temperature Warm
Agent. The agent is the core of the framework. Based on the data schema of the context manager, the agent can predict the IoT service settings as well as the temperature values for each semantic temperature level for a user. To achieve this, two separate models are required, one for the IoT services settings and one for the temperature values. The data schema for the prediction of IoT service settings is shown in Table 2. The capabilities of the agent can be realized with supervised learning. A labeled training data set is required, in this case, the labels would be the IoT service settings or the temperature values. If no
104
M. Zipperle et al.
historical data is available, an initial training data set with random labels can be selected. The models are retrained each time a user performs a task and needs to adjust the IoT service settings or temperature values. This will increase the accuracy of the models from time to time. Service Requesting. Service requests are made to the matchmaking engine for the services required by the user to accomplish his/her task. The service defined in the user profile may be described in a different service description language than that of the advertised services in the semantic IoT service repository. Therefore, the framework must be flexible enough to convert a service description into a consistent service description language. This conversion process is performed during service requesting. Semantic IoT Service Repository. IoT devices enter an environment and connect with each other via an ad-hoc network. They then advertise their semantic service description to the semantic IoT service repository, which manages the service descriptions of all IoT services available in the environment. Web Ontology Language (OWL)-S is used to model semantic service descriptions, which provides a reasonable basis for the matchmaking engine to analyze the descriptions. Matchmaking Engine. The service matchmaking compares requested services with the advertised service from the semantic IoT service repository. As a result, a list of the candidate services is provided along with a matching score. Service Negotiation. The candidate services may differ in their Quality of Service (QoS) properties. The goal of the service negotiation is to refine the list output by the matchmaking engine while considering the QoS characteristics. In cases where several candidate services offer the same functionality but differ in quality and cost, the most appropriate service must be prioritized through the service negotiation. Service Selection. The final process is the service selection, which selects the IoT services based on the results of the service negotiation and invokes them with the settings predicted by the agent. 3.4
Transfer of User Profile
Now, we show how a user can make his/her profile available to the framework. First, the structure of the user profile is explained in more detail. A user profile consists of two parts, the user context, and the task profiles. The user context defines attributes like age, gender, origin, etc. of the user, and each task profile defines which services a user requires to perform this task. Furthermore, a semantic description is given for each service so that the framework can match the
Context-Aware Transfer of Task-Based IoT Service Settings
105
requested services by the user with the advertised services in the environment. Services are described with OWL-S, where the service profile, service grounding, and service model are defined. A user has three different methods to provide a user profile to the framework. First, nowadays most users have a mobile device like a smartphone or smartwatch. The user profile can be stored on the mobile device, and as soon as the user enters an environment, the mobile device connects to the framework via an ad-hoc network and can thus provide the user profile. This variant also offers the advantage that the user context can be automatically updated and managed by embedded sensors in the mobile device [15]. Secondly, the user can make a user profile available on the web and activate facial recognition as an authentication method. In this case, cameras are distributed in the environment that can recognize a user and then retrieve the user profile from the web. Lastly, analog to the latter method, voice recognition can also be activated as an authentication method, and microphones distributed in the environment can recognize the user’s voice. The advantage of the last two methods is that the user doesn’t have to carry a mobile device or storage medium in order to make a user profile available to the framework.
4
Implementation
In the following, we present the prototypical implementation. It was not possible to implement all components of the semantic task-based IoT service frameworks within the scope of this work. Therefore, we have implemented the components most necessary to evaluate our concept, which are the context manager and the agent. 4.1
Scenario
We selected the task “watch a movie” as scenario for the implementation and evaluation. This scenario is very well suited since the environmental context can be easily simulated, and thus, the concept can be verified by a user study. In this scenario, a user uses four IoT services to perform a task: Brightness adjustment, loudness adjustment, temperature adjustment, and content playback. Moreover, we selected three different environments for the scenario to check the dependency of the IoT service settings from the environmental context. The selected environments are bedroom, living room, and kitchen. 4.2
Implementation of the Agent
Now the prototypical implementation of the agent is explained, where we followed the Industry(CRISP-DM) methodology [16]. It’s a robust and well-proven methodology and provides a structured approach to the implementation of a data mining and machine learning.
106
M. Zipperle et al.
Business Understanding. First, the agent should be able to predict the IoT service settings for a user’s task. In order to achieve this, machine learning will be used to predict the task-based IoT service settings based on the user context and environmental context. Second, the agent should be able to predict the temperature ranges based on the user context for a semantic temperature setting description. In more detail, both are classification problems that can be solved by supervised learning. Since neural networks have proven themselves in recent years to solve numerous problems in various areas through their high accuracy and excellent performance, we used a neural network as the algorithm to implement the agent’s models. Data Understanding. One requirement of supervised learning is a large labeled training dataset. The training dataset forms the basis for supervised learning, the quality of which decides the accuracy of the agent predictions. Therefore, we conducted an online survey, which was created with Google Forms and divided into five parts: 1. Introduction: The first part gave the participant a brief introduction to the subject area and explains the purpose of the survey. 2. Personal information: In the second part, the participants gave information about themselves, i.e. age, sex, country of origin, and ethnic origin. In order to better evaluate the survey results, the participants had to select a suitable answer from the predefined list of answers [17]. 3. Semantic Temperature Description: In this part, the participants were asked to indicate for each semantic temperature level in an indoor environment the most appropriate temperature value. 4. Scenario: Next, the participants were asked to specify their settings for the IoT services needed to accomplish their task “watch a movie” in the environments living room, bedroom, and kitchen. For simplification, the movie “Batman 7 ” was determined, since the IoT service settings could be different for movies of different genres. The IoT service settings for brightness, loudness, and temperature were requested. Data Preprocessing. We created two datasets based on the online survey results. One dataset for the task-based IoT service setting prediction model and the other one for the temperature value prediction model. We preprocessed the datasets to adapt to the input format of the models and to apply various data optimization methods to improve the performance of the models [18]. The particular methods were added iteratively, and after each iteration, the models were evaluated to check the change of the performance of the model. The following steps were taken to preprocess the data: Formatting, encoding, splitting, oversampling, scaling, and transforming. Modeling. After preprocessing the datasets, we created the machine learning models. The models were implemented with the latest version (1.13.1) of the
Context-Aware Transfer of Task-Based IoT Service Settings
107
deep learning framework TensorFlow. In the following the architectures of the two models are explained.
(a) of the IoT Service Setting Prediction (b) of the Temperature Values Prediction Model Model
Fig. 4. Learning accuracy
A sequential model was used for the IoT service setting prediction, which is a linear stack of hidden layers. The model contains a total of five layers, where the first layer contains 2048 neurons, the second layer contains 1024 neurons, the third layer contains 512 neurons, the fourth layer contains 256 neurons, and the fifth layer contains 128 neurons. The number of layers and neurons per layer was set experimentally and then adjusted based on the result of the model evaluation until the performance of the model could not be improved. For the first four layers in the stack the Rectified Linear Unit (ReLU) is used as activation function, which returns the same value for a positive output and 0 for a negative output. For the last layer the softmax is used as activation function. Thereby, the results are converted into probabilities, whereby the sum of the probabilities is equal to one. Therefore, the output of the softmax function corresponds to a categorical probability distribution. This distribution indicates the probability that a class is true. In addition, the dropout mechanism was applied to the IoT service settings prediction model to reduce overfitting [18]. The architecture of the temperature value prediction model was constructed similarly. However, fewer layers and fewer neurons were used per layer, as this resulted in better model performance. The final model contains a total of four layers, where the first layer contains 1024 neurons, the second layer contains 512 neurons, the third layer contains 256 neurons, the fourth layer contains 128 neurons. Model Evaluation. We used three metrics to evaluate the models: top 1 accuracy, top 3 accuracy, and top 5 accuracy. The models were evaluated after each training iteration, and the architecture of the models were optimized based on the results. The final accuracies of the models evaluated by the training dataset are shown in Fig. 4a for the IoT service setting prediction model and in Fig. 4b for the temperature value prediction model.
108
5
M. Zipperle et al.
Evaluation of Concept
In this section, we present the evaluation of the concept by the online survey and user study. The evaluation is intended to confirm that the proposed concepts provide a solution to the problems we have posed at the beginning of this work. 5.1
Online Survey Results
The results from the online survey prove the assumptions mentioned in the beginning of our paper and describe the requirement for an approach to solve the problems. First, we analyzed the participants of the online survey, a total of 189 participants took part. It turns out that the participants have a high variance in their user context, which provides a reasonable basis for the training dataset. Participants came from 8 different age groups, with the majority aged between 18 and 24. Moreover, three-quarters of the participants were male and one-quarter female. Participants came from 17 different countries, most of them from Germany (about 60%). The participants were distributed into four ethnic groups, most of them white (about 80%). Second, we analyzed the assignment of temperature values to the semantic temperature levels. Figure 5 shows that the distribution of temperature values for a semantic temperature level is extensive. Besides, when considering the distribution of temperature values for a semantic temperature level in relation to the country of origin, the temperature values of southern countries are higher than those of northern countries. This confirms the assumption and requires a dynamic mapping of temperature values to a semantic temperature level based on the user context. Finally, we analyzed the selected IoT services settings of the participants for the task “watch a movie”. Table 3 shows exemplary excerpts from the results. Rows 1 to 3 show the IoT service settings for the same participant (18–24, female, Germany, white) for the three environments living room, bedroom, and kitchen. The results confirm the assumption that IoT service settings depend on the environmental context. Furthermore, the IoT service settings of two further
Fig. 5. Online survey: temperature distribution analysis
Context-Aware Transfer of Task-Based IoT Service Settings
109
Table 3. Examples of the IoT service settings for the task “Watch a Movie” from the online survey User context
Environmental context Service
Setting
18–24, Female, Germany, White Living Room 18–24, Female, Germany, White Bedroom 18–24, Female, Germany, White Kitchen
Temperature Warm Temperature Medium Temperature Warm
18–24, Male, Germany, White
Living Room
Temperature Medium
25–34, Male, Thailand, Asian
Living Room
Temperature Cool
participants for the environment living room are shown in row 4 and 5. By comparing the IoT service settings of the three participants, it can also be confirmed that the IoT service settings depend on the user context. 5.2
User Study
In the following, we describe the experimental setup, conduction, and results of the user study. Experimental Setup. We conducted the user study using the scenario “watch a movie” presented in Sect. 4.1. We followed the recommendations of the authors of [19] and [20] for the experimental setup and conduction of the user study. The different environmental contexts were simulated in the same room by adjusting the objects, and their arrangement. Next, the simulation of the IoT services to adjust the environmental settings are explained: – Brightness: The room is equipped with two ceiling lights. Besides, we installed three more light sources to adjust the brightness more finely. Then we measured the intensity with a lux meter, and we determined which combination of the five light sources have to be switched on to reach the five brightness levels defined in Fig. 2. – Temperature: The room is equipped with an air conditioner, and an additional fan has been installed. Since the temperature ranges for the semantic temperature levels are determined dynamically on the basis of the user context, the corresponding settings for the air conditioner or the fan could not be determined in advance. As soon as the user context is clear, the agent can predict the temperature values for each semantic temperature level, and the settings of the air conditioner and fan can be adjusted accordingly. We used a thermometer to check these settings. – Loudness: A blue tooth box reproduced the sound of the movie, which was controlled by a laptop. Thereby, we set the bluetooth box to maximum volume, and we determined the laptop settings for the respective loudness levels defined in Table 1. Besides, we wrapped the scenario in a story so that the participants could follow it more easily. Before a participant took part in the user study, the user context is provided. A user profile could be created through the user context, which was then be used in combination with the environmental context (living
110
M. Zipperle et al.
room, kitchen, bedroom) as input of the agent to predict the IoT service settings and temperature values. After a participant competed the user study, we conducted a inquiry to find out the participant’s satisfaction with the particular IoT service settings for each environmental context. A participant could describe feedback as satisfied, almost satisfied, not satisfied. Conduction and Results. We recruited participants with different user contexts for the user study. In each environmental context, the participant was given as much time as needed to experience the IoT service settings. A total of 19 participants took part in the user study. The participants were between 18 and 34 years old, three quarters were male, and one quarter female. They came from nine different countries of origin, most of them from South Korea. Likewise, 60% of the participants were white and 40% Asian. It turns out that no participant was dissatisfied. In the evaluation of user satisfaction in all three environmental contexts, three quarters were fully satisfied, and one quarter was almost satisfied. There are small differences between the different environmental contexts. Nearly 70% of the participants were satisfied in the living room, 75% in the bedroom, and 85% in the kitchen. Besides, the satisfaction of the participants for the IoT service settings temperature, brightness, and loudness can be compared. The participants are more satisfied with the temperature settings than with the brightness and volume settings. This confirms the effectiveness of the dynamic temperature range prediction approach based on the user context. The high satisfaction of the participants of the user study confirms the ability of the agent to predict task-based IoT service settings based on the user context and environmental context. Thus, the efficiency of the semantic taskbased IoT service framework and the concepts of the semantic task-based IoT service description and the semantic setting description contained therein can be proven. However, the effectiveness of the semantic setting description could be confirmed but not its necessity. Therefore, a further user study is necessary, in which on the one hand, the semantic setting descriptions and on the other hand, the device-dependent settings are transferred into another environmental context with other IoT devices. The two approaches can be compared based on the respective satisfaction of the participants. Unfortunately, we could not be carry out this user study within the scope of this work but is planned for future work. As a result, we can say that the presented concepts in our work allow to store the user’s task-based IoT service settings using the semantic IoT service description and semantic setting description. 5.3
Limitations
In this section, we highlight the limitations of our work. A limitation is that the increasing number of various tasks, user contexts, and environmental contexts raises the complexity of machine learning models. Thereby, the number of possible outputs increases, whereby the accuracy of the prediction can be negatively affected.
Context-Aware Transfer of Task-Based IoT Service Settings
111
A limitation of the implementation is that the data collection from the online survey only considered four attributes of the user context. Moreover, these attributes were static, and dynamic attributes such as the mood of a user were not considered so far. The evaluation of the online survey showed that there are several possible settings for a IoT service for the same user context. These possible settings can be used to improve the order of the settings in the User Interface (UI). However, considering more than four attributes of the user context can reduce the number of possible settings for a IoT service. Also, for the environmental context, only one static attribute, which was the name of the environment, was used. Also, here is the need to consider more and particularly dynamic attributes that describe the environmental context in more detail. A more detailed description of the user context and environmental context enables the agent to learn and predict more accurate. Furthermore, only the agent was prototypically implemented, so that it was not possible to evaluate the entire semantic task-based IoT service framework. One reason for this is that other components of the framework are also still part of the research, and therefore, it is challenging to implement them.
6
Conclusion
In this section we summarize the main findings and research contributions of our work and lastly presents, issues for future work. 6.1
Summary and Results
The number of available IoT devices is increasing rapidly, and users accomplish their tasks more efficiently. These IoT devices can be invoked with chosen settings via the associated IoT services. However, there is currently no approach to save and recover these settings. Consequently, a user has much effort to set up IoT services with settings. Besides, the settings strongly depend on the user context and the environmental context. Moreover, the distributed computing environments in which IoT devices can communicate with each other via an adhoc network pose additional challenges, such as the unpredictable availability of IoT services. Therefore, we proposed in Sect. 3 an approach to tore user’s IoT service settings, recover them in the same or different physical environment, and allows to efficiently setup IoT service settings for a new task or physical environment. This approach consists of three main research contributions, which will be briefly summarized. The semantic task-based IoT services description enables the allocation of IoT services to a task. An abstract IoT service description separates the description from the physical environment. This allows the context-independent description of IoT services, which makes the description exchangeable between different physical environments. Besides, the corresponding semantic setting description can be assigned to these semantic task-based IoT service descriptions. The semantic settings description allows an abstract description of settings. Thus, the settings can be separated from a physical IoT device and thus made interchangeable. This makes it possible to transfer settings from an unavailable
112
M. Zipperle et al.
IoT service to a comparable available IoT service or to recover IoT service settings in a different physical context with other available IoT services. Besides, specific semantic setting descriptions for brightness, loudness, and temperature were specified. Semantic levels were created to which a particular value range was assigned. For the semantic setting description of the temperature, a dynamic assignment of the value range for each semantic level was defined, since the value ranges strongly depend on the user context. Finally, the semantic task-based IoT service framework includes an agent that uses the semantic task-based IoT service description to predict the associated semantic setting description based on the user context and the environmental context. For this purpose, the agent uses a neural network trained by historical user data. When a user adjusts his IoT service settings, the model is optimized based on the changes. Also, the agent is able to predict tasked-based IoT service settings for a new user, task, or environment. A user has three options to make his user context available to the framework. First, the user stores his user context on a mobile device that connects to the framework via an ad-hoc network. Second, the user stores his user context on the web and registers his face or voice as an authentication method. In the environment of the framework, a face or voice recognition detects the user and then requests his user context on the web. The environmental context is captured by the framework itself through the context manager. In Sect. 4, we described the prototypical implementation of the agent. An online survey was conducted to obtain an initial training dataset in order to train the models. For the models, a neural network was used, which was realized by the Python Framework TensorFlow and Keras. The models allow the agent to predict the IoT service settings for either scenario based on the user context and environmental context. We used the prototypical implementation of the agent to evaluate the concept, detailed results were explained in Sect. 5. Three-quarters of the participants were satisfied with all predicted IoT service settings, only one-quarter of the participants were almost satisfied, and none of the participants were dissatisfied. This confirmed the effectiveness of the proposed concepts to solve the research questions. However, there are still some limitations of the approach. First, the complexity of the model can increase rapidly with an increasing number of different tasks, user contexts, and environmental contexts, which can lead to a deterioration in the accuracy of the prediction. Second, in the prototypical implementation, only four attributes for the user context and one attribute for the environment context were considered. 6.2
Future Work
To conclude this work, issues for future work will be discussed. First of all, further user studies can be conducted to confirm on the one hand the necessity of semantic setting description, and on the other hand, the effectiveness of dynamic assignment of value ranges to each semantic level based on the user context.
Context-Aware Transfer of Task-Based IoT Service Settings
113
Furthermore, the prototypical implementation can be improved by first, considering more attributes for the user context as well as for the environmental context. Second, conducting a user study in a real environment and thus collecting a higher quality of training data for the training of the model. In addition, the proposed concept can be applied to many other scenarios such as the car industry. In recent years, cars have become more intelligent by installing more smart devices such as sensors and actuators, which has improved both comfort and safety. Also, the car-sharing concept is being accepted more and more by the population [21]. Every time a user borrows a car, he has to manually adjust IoT services like radio channel selection, temperature adjustment, seat position adjustment, etc. The manual adjustment is very time-consuming and permanently very annoying for the user. The use of the framework presented in this work allows an automatic adjustment of all these IoT services based on the user context and environmental context. This considerably increases user satisfaction, and the car-sharing concept can spread even better. Acknowledgment. This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C1087430). This work was also supported by Next-Generation Information Computing Development Program through NRF funded by the Ministry of Science and ICT (NRF-2017M3C4A7066210).
References 1. Gartner. Prognose zur anzahl der vernetzten ger¨ ate im internet der dinge (iot) weltweit in den jahren 2016 bis 2020 (in millionen einheiten). Statista, Ed. https://de.statista.com/statistik/daten/studie/537093/umfrage/anzahl-dervernetzten-geraete-im-internet-der-dingeiot-weltweit/. Accessed 07 May 2019 2. Derek, P.M.: Transferring user settings from one device to another (2015). US20150067099A1, 2015. https://patents.google.com/patent/US20150067099A1/ en. Accessed 07 Sep 2019 3. Ferreira, D., Kostakos, V., Dey, A.K.: Aware: mobile context instrumentation framework. Front. ICT 2 (2015). https://doi.org/10.3389/fict.2015.00006 4. Antunes, M., Gomes, D., Aguiar, R.L.: Scalable semantic aware context storage. Future Gener. Comput. Syst. 56, 675–683 (2016). https://doi.org/10.1016/j.future. 2015.09.008. ISSN: 0167739X 5. Medvedev, A., Hassani, A., Haghighi, P.D., Ling, S., Indrawan-Santiago, M., Zaslavsky, A., Fastenrath, U., Mayer, F., Jayaraman, P.P., Kolbe, N.: Situation modelling, representation, and querying in context-as-a-service IoT platform. In: Piscataway, NJ (ed.) 2018 Global Internet of Things Summit (GIoTS), GIoTS (2018), IEEE, pp. 1–6, 2018. https://doi.org/10.1109/GIOTS.2018.8534571. ISBN: 978-1-5386-6451-3 6. Khaled, A.E., Helal, A., Lindquist, W., Lee, C.: IoT-DDL-device description language for the “T” in IoT. IEEE Access 6, 24048–24063 (2018). https://doi.org/10. 1109/ACCESS.2018.2825295 7. Medvedev, A., Indrawan-Santiago, M., Haghighi, P.D., Hassani, A., Zaslavsky, A., Jayaraman, P.P.: Architecting IoT context storage management for context-as-aservice platform. In: GIoTS2017, pp. 1–6. IEEE, Piscataway (2017). https://doi. org/10.1109/GIOTS.2017.8016228. ISBN: 978-1-5090-5873-0
114
M. Zipperle et al.
8. Baek, K.-D., Ko, I.-Y.: Spatially cohesive service discovery and dynamic service handover for distributed IoT environments. In: Cabot, J., de Virgilio, R., Torlone, R. (eds.) Web Engineering. Lecture Notes in Computer Science, vol. 10360, pp. 60–78. Springer International Publishing and Springer, Cham (2017). https://doi. org/10.1007/978-3-319-60131-1 4. ISBN: 978-3-319-60130-4 9. Jimenez-Molina, A., Ko, I.-Y.: Spontaneous task composition in urban computing environments based on social, spatial, and temporal aspects. Eng. Appl. Artif. Intell. 24(8), 1446–1460 (2011). https://doi.org/10.1016/j.engappai.2011.05.006. ISSN: 09521976 10. Recommended light levels (illuminance) for outdoor and indoor venues. https:// www.noao.edu/education/QLTkit/ACTIVITY Documents/Safety/LightLevels outdoor+indoor.pdf. Accessed 15 April 2019 11. IAC Acoustics, a Division of Sound Seal Inc. Comparitive examples of noise levels—industrial noise control. http://www.industrialnoisecontrol.com/comparativenoise-examples.htm. Accessed 15 April 2019 12. Intensity and loudness of sound—ck-12 foundation. https://www.ck12.org/book/ CK-12-Physical-Science-Concepts-For-Middle-School/section/17.3/. Accessed 16 April 2019 13. Xu, J., Lee, Y.-H., Tsai, W.-T., Li, W., Son, Y.-S., Park, J.-H., Moon, K.-D.: Ontology-based smart home solution and service composition. In: ICESS 2009, pp. 297–304. IEEE Computer Society, Los Alamitos (2009). https://doi.org/10.1109/ ICESS.2009.60. ISBN: 978-0-7695-3678-1 14. Dwyer, T.: Module 113: determining thermal comfort in naturally conditioned buildings (2017). https://www.cibsejournal.com/cpd/modules/2017-07nat/. Accessed 15 April 2019 15. Korpipaa, P., Mantyjarvi, J., Kela, J., Keranen, H., Malm, E.-J.: Managing context information in mobile devices. IEEE Pervasive Comput. 2(3), 42–51 (2003). https://doi.org/10.1109/MPRV.2003.1228526. ISSN: 1536-1268 16. CRISP-DM: Cross industry standard process for data mining (CRISP-DM). http://crisp-dm.eu/. Accessed 17 Aug 2019 17. Connelly, R., Gayle, V., Lambert, P.S.: Ethnicity and ethnic group measures in social survey research. Methodol. Innov. 9(4), 205979911664288 (2016). https:// doi.org/10.1177/2059799116642885. ISSN: 2059-7991 18. Moolayil, J.: Learn Keras for Deep Neural Networks: A Fast-Track Approach to Modern Deep Learning with Python. Apress, New York (2019). ISBN: 9781484242407 19. Veal, R.L.: How to conduct user experience research like a professional. careerfoundry, Ed. https://careerfoundry.com/en/blog/ux-design/how-to-conduct-userexperience-research-like-aprofessional/. Accessed 18 May 2019 20. Vermeeren, A.P.O.S., Law, E.L.-C., Roto, V., Obrist, M., Hoonhout, J., V¨ aa ¨n¨ anenVainio-Mattila, K.: User experience evaluation methods: current state and development needs. In: Hvannberg, E.T. (ed.) Proceedings of the 6th Nordic Conference on Human-Computer Interaction, p. 521. ACM, New York (2010). https://doi.org/ 10.1145/1868914.1868973. ISBN: 9781605589343 21. Horizont & Bundesverband CarSharing. Anzahl registrierter carsharing-nutzer in deutschland in den jahren 2008 bis 2019 statista. https://de.statista.com/statistik/ daten/studie/324692/umfrage/carsharing-nutzer-in-deutschland/. Accessed 16 Aug 2019
Agent-Based Architectural Models of Supply Chain Management in Digital Ecosystems Alexander Suleykin(B) and Natalya Bakhtadze(B) V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Science, 65 Profsoyuznaya Street, Moscow 117997, Russia [email protected], [email protected]
Abstract. Digitalization and its penetration into the field of supply chain management are developing rapidly, and digital ecosystems of supply chains are emerging integrated platforms for monitoring, planning and management in the supply chain. The paper provides an overview of the main results of research on the creation of digital ecosystems in the field of supply chains, a description of the business model is presented, as well as a model of general and extended Agent-based architectural interaction of information flows between different classes of systems and within the digital ecosystem itself. The main advantages of developing such ecosystems for companies and enterprises are presented. Keywords: Digital ecosystems · Supply chains · Agent-based architectural ecosystem · Services
1 Introduction Industrial competitiveness manufacturers today are largely determined by the flexibility and efficiency of their activities through the use of advanced technologies, providing fast and efficient use of digital information between various production management systems, as well as intelligent applications for solving specific production problems. Such systems dubbed Smart Manufacturing System (SMS) or Industrie 4.0 (respectively, in the USA and Germany). Since being offered in Germany in 2011, Industrie 4.0 Programs experience in various companies and industrial enterprises of many countries shows that global digital transformation has become a reality and is being observed the transition from high-tech industry to the “digital industry”, characterized by: the use of cloud services, additive technologies, the industrial Internet of things, RFID technologies, digital twins, big data analysis technologies; the predominance of cluster network systems with horizontal ties reducing time: decision making, project execution, conclusion products to market, etc. A pronounced trend in the global economy is the fact that industry and intersectoral digital platforms are transforming into digital ecosystems, which allow to create new business models, innovations and increase competitiveness [1]. The last period of management of developing industrial production was characterized by the development of digital platforms, in particular - the expansion of their functionality [2]: “Companies © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 115–127, 2021. https://doi.org/10.1007/978-3-030-55190-2_9
116
A. Suleykin and N. Bakhtadze
create their products, letting other companies use their products or create new products based on their services. Thanks to this approach, Google, Facebook, Apple, Amazon, AirBnB, and most of the current major IT companies.” However, as the authors note, today “a new era has come - the creation of digital ecosystems.” By digital ecosystem we consider a distributed socio-technical system with the properties of adaptability, self-organization and sustainability, functioning in a competitive environment and cooperation between various subjects of this system (automated systems and economic entities) for the exchange of knowledge in the context of the evolutionary development of the system. The digital ecosystem is functioning based on computer network infrastructure using intelligent control technologies, for example, multi-agent technologies [3]. The formation of national digital ecosystems based on global digital space today is necessary a condition for faster growth of the economy of states [1]. Breakthrough economic development can be achieved through high-quality changes in the structure and management system economic assets. One of the most significant assets become voluminous diversified information data. Priority becomes Big Data Management (Big Data) and real-time predictive models. In-memory computing technologies speed up processing large amounts of data and therefore, as such phenomena as large data are becoming increasingly popular among enterprises. Models of objects and their dynamics, as well as models of dynamic processes, customizable in real time, use patterns recoverable through historical and current data. These patterns are the essence of the term inductive knowledge [4]. Increasing information richness and complexity of real-time diverse production tasks at different levels of production management (process control, operational production management and production logistics, resource and supplier relationship management and by consumers) make a natural transition from rigid hierarchical verticals of management organization to clusternetwork format. Of particular importance are ensuring stability under dynamic conditions. unstable external environment (for example, economic) and ensuring sustainable development of the system [5]. In recent years, the most popular class of control systems for complex technological objects (non-linear objects, objects with significant transport lag objects with interconnected adjustable variables, objects with many simultaneously observed restrictions) has become a management technology based on predictive object models [6]. Currently, there are certain differences in the definition of a digital ecosystem and below are listed some of them. F. Nachira, P. Dini, A.A. Nicolai [7]: the digital ecosystem (DE) is formed through the integration of IT networks (IT - information technology), social and exchange knowledge. E-learning ecosystem and digital concepts ecosystem (machine learning ecosystems and digital ecosystems) are identified. The DE provides access to knowledge, global value chains, specific services, adaptation of new technologies, adoption of new business models. The economy is no longer seen as fully managed system for which a functioning plan is drawn up: individual active elements determine its functioning depending on the current situation, this is an ecosystem [8]. H. Dong, F.K. Hussain, E. Chang by DE mean digital twins and the infrastructure of data transmission, storage and processing, as well as system users, including social
Agent-Based Architectural Models of Supply Chain Management
117
economic, political, psychological and other factors affecting the implementation of interactions [9]. In the digital ecosystem “partners and competitors interact as a team, combining resources, knowledge for joint work on projects in a mutual completeness of information and creation (co-creation), without stopping competing in the line of others processes” [10]. Information becomes a resource that can be used, produced and transformed the same way as material resources. A key environmental idea concerns preserving and increasing user cost of information [11]. An ecosystem is defined as a domain of a cluster environment in which all participants are weakly connected, respect their own benefits and save the environment. With the development of information and communication technologies, human began to live simultaneously in digital and ecological environments, thus in a double environment [12]. In a digital ecosystem, an economy spontaneously transforms into a network, that is, into “Continuous space of flows”, getting the ability of continuous updates. Nonlinear forms of communication arise with erased spatial and temporal boundaries [13]. However, to ensure sustainable development of the production system and achieve competitive advantage is not necessary only guarantee the optimal management of key production processes of a particular enterprise. Necessary is to study the entire value chain, including the interface between the various phases of the production process, including previous stages. Therefore supply chain management technology (Supply Chain Management, SCM) plays a decisive role in the integration of all logistics processes - both directly at the enterprise manufacturing the products, as well as at the logistic suppliers of raw materials, components, information, transport, storage services. This work focuses on digital ecosystem research in SCM.
2 Digital Ecosystems in Supply Chain Management Considering the features of modern digital output of infrastructure management, the processes of design tooling, planning and implementation of supply chain management are becoming highly important maximizing the synergy of internal and interorganizational integration and coordination as part of a digital industrial business ecosystems. Production and business ecosystems in terms of organization of supply chain operations are a complex tasks, the solution of which is impossible without building an effective, masstackable, decentralized and failover sustainable architecture of such systems. We give a brief overview of the main directions of research conducted today in digital ecosystems development governing supply chain. Next will be proposed methodology, technological basis and the basic architecture of building of systems of such class. Many models are known from the literature and supply chain management practices [14]. The concept of an ecosystem as applied to a supply chain is formulated in various ways [15]. So, in [16], the term ecosystem used as a unit of analysis for description of supplier groupings and distribution dividing chains, which are understood as free groups of organizations involved the creation and delivery of products and services. The same term is used by Iansit and Levien [17] in description of “strategy as an ecology.” The
118
A. Suleykin and N. Bakhtadze
concept of value creation determined by Porter [18] like “a vertical chain continuing from a resource provider to businesses and customers goods and services from these firms.” The chain of value creation in this context is a set of subjects, resources and processes consistent with the main and supporting activities that together represent the stages of product processing and set of services. Primary activities directly related to the creation of the product or service. Each of these main activities related to supportive actions that help increase their effectiveness. This structure was further expanded [19], and the concept of a supply chain has emerged - the union of all suppliers, intermediaries and customers, which together represent the value chain of the organization and the market. The supply chain expands the internal value chain of the enterprise and company for external cooperation and the exchange of raw materials, knowledge, goods and services with other players. In [20] a supply chain is defined as a “network of organizations that are involved through ascending and descending relationships in various processes and activities that view a product in the form of products and services in consumer’s hand at the end.” The advent of digital ecosystems produces a reconfiguration of value creation. In [21] presented three types of “constellations of control points, which represent three models of topologies of chain members: closed vertically integrated model, model of a loosely coupled coalition and multilateral platform model.” A new topological model arises around large players who seek to dominate global markets and disrupt industry boundaries. This model is formed overlapping business communities that include large organizations and their respective ecosystems. So, in [22] the author describes the “global evolutionary tendency towards the formation of supply chains driven by digital ecosystems”, describing the general concept of supply chain development and its place in digital ecosystems. 2.1 Mathematical Modeling Attention to the use of math in supply chain optimization models increases [14], mainly because of their lower cost and wider possibilities. In supply chain management, the use of the mathematical modeling is not specific to any particular level of production management. They can be used at any level (strategic, tactical or operational), taking into account factors such as routing, distribution of networks or warehouse operations. Mathematical methods and simulations commonly considered in supply chain management tasks including: linear programming, mixed integer/integer linear programming, nonlinear programming, multi-purpose programming, fuzzy mathematical programming, stochastic programming, heuristic algorithms, as well as meta-heuristics and hybrid models. In [23], the authors examine the relationship between the wave effect and the “whip effect” under conditions of perishable products, stochastic demand, inventory management policies, batch planning and occasional outages loading. Designed simulation model used for experiments to obtain management information to mitigate the spread of violations in the supply chain. The objective function is formulated, where arguments may include: the number of distribution centers, the number of days of delivery, the volume of batches (batch - production batch), batch production time, warehouse volume, etc. So, in [24] as the objective function of the model is selected as a minimum of the
Agent-Based Architectural Models of Supply Chain Management
119
total cost of the product, taking into account the minimum of required level of service (i.e., the ratio of delivered and ordered products). Another type of objective function may be the function of total costs, which is the sum of the costs of storing stocks in distribution centers, the costs of writing off products (i.e., expired products), transportation costs, production costs, as well as fines for late deliveries. Total costs are calculated for a large number of periods as a sum of total storage costs, transportation costs, write-off costs, penalty costs and production costs. When inventory expires, write-off costs increase in proportion to purchase prices. If order size of customer exceeds stock in distribution center, a fine applies. Production costs depend on setup costs, taking into account number of installations and production costs for used capacity. 2.2 Digital Ecosystems, Agent-Based Modeling and Supply Chains Digital ecosystems can be described [25] as digital analogues of biological ecosystems. Self-organizing have been described, properties of biological ecosystems that provide a robust, self-organizing and scalable architecture to automatically solve complex and dynamic problems. Principles and semantics used in digital ecosystems are formulated in [26]. Research into digital ecosystems is ongoing by expanding its areas of application in areas such as transport, education and health [27]. In previous work, we have already researched the application of Digital Ecosystems to the Energy sector, described architecture of such systems and methodology for creation of new mathematical models based on wavelet transformations and associative search methods [28]. There are many researches of Agent-Based systems in manufacturing made by A. Dolgui. In [29], the dynamic behavior of the system is investigated using concepts and methods of the theory of nonlinear dynamical systems, and suggested new ways for chaotic behavior control, while in [30] presented a multi-agent system design technique and algorithms for relay intercity freighting within a geographical cluster using swap body trucks. In [31] authors investigate how dynamical behavior depends on some parameters inherent to the agent’s negotiation rules showing that depending on the values of these parameters, chaotic dynamics may occur. In [32], the authors consider the possibility to develop a digital ecosystem for transport and warehouse logistics. Virtual Joint Consortium Implementing Australia’s digital ecosystem is an example of a collaborative environment for all, who is involved in creating a distribution chain product. The latter involves the creation of a supply chain that promotes integration and cooperation of small and medium enterprises, in particular, stimulates cooperation and improves business efficiency [33]. In [34], a distributed supply chain model is described based on multi-agent technology. In [35], the delivery chain problem is formulated in terms of task dependency of networks; based on a mathematical model studied the equilibrium and convergence and proposed application for the formation of the supply chain in the automotive industry. In [36], attention is paid to research in the field of value creation in digital supply chains for organizations in the research program, where theoretical research is combined and fully autonomous chain implemented deliveries, starting from the factory and ending with the client company.
120
A. Suleykin and N. Bakhtadze
2.3 Existing Digital Platforms Digital platforms already exist on the market, which are essentially supply chain ecosystems. An example is the Oracle platform for Transportation Management (OTM) for management of all transport activities in the supply chain. OTM - a solution including Oracle Logistics. Oracle Transportation Solution Management is designed to support both transport companies and logistics providers, and allows manage all aspects of transport in the global supply chain. The product helps reduce costs, freight, optimize service levels and automate processes so that companies can more efficiently perform logistic operations. Another example is the independent pan-European platform Cargo Stream [18], which coordinates the actions of shippers, providers of logistics services and multimodal terminals, which allows to optimize transportation routes. Shippers communicate their regular needs for delivery. Platform Optimization Option anonymizes and makes this data available for approved optimizers. Optimizers analyze, optimize and implement launch of approved providers for development of proposals that satisfy the applications of all shippers. After developing a solution, the providers send updates on the execution of orders “on the platform” for updating the information from relevant shippers. Another example: PwC Australian Audit Office, Chamber of Commerce (ACCI) and Port of Brisbane (Australia) are developing a solution to improve supply chain efficiency based on blockchain technology. The solution is called Trade Community System [37].
3 Business-Model of Digital Supply Chain The business goal of the digital supply chain is to produce and deliver the product in the hands of the client not only as fast as possible, but also do it responsibly and reliably, while increasing efficiency and reducing costs by automation. This goal cannot be achieved until the supply chain is formed as a reliable and efficient association of suppliers, production, logistics, warehousing, as well as customers. With this integration format, the signals that trigger supply chain events may come from anywhere in the network and notify of all problems affecting supply or demand, such as: lack of raw materials or materials, components, finished products or spare parts. This form of organization of production processes becomes especially effective for the implementation of flexible, in particular, “custom production” (custom production), which is rapidly gaining popularity (and increasing customer demands). The key to success for any supply chain is an effective exchange of information. The traditional organization of supply chains is characterized by complications caused by lack of complete and timely information about the process. Sudden changes of demand, lack of raw materials and emergency situations produce a high risk of chain breakdown of supplies. That’s why the main goal of the digital supply chain is to open a supply chain to public viewing, make it transparent to all participants. B2C markets use information from companies for this purpose. Also, to provide the required level of visibility, more information is required on the arrival of cargo with real-time updates. In B2B networks manufacturers expect timely information on the status of their supplies, which is usually associated with production plans. Continuously updated and reliable transportation
Agent-Based Architectural Models of Supply Chain Management
121
information support can significantly improve customer satisfaction, which can also positively affect his brand loyalty. Getting high degree of the transparency in the system is not an easy task, the solution of which is characterized by both high technical complexity and sufficient level of human actions. But once this is achieved, the benefits will be significant and not limited by only stockpiling and better planning.
4 Agent-Based Architectural Models for Digital Supply Chain 4.1 Agent-Based Modeling and Intelligent Agents “Intelligence” in this context is understood as the possibility of feedback in accordance, for example, with the results of the analysis of search queries according to the designed model. In our research of SCM, we consider that each Agent should have the following features: • autonomy (the agent independently solves local problems), which means that all Agents should serve the assigned tasks without human interaction; • interaction with other agents, the ability to play different roles when interacting with the same agent; • reactivity - the ability to maintain interaction with the external environment, receiving information from it and reacting with its actions. According to the developed algorithms Agents should react differently to specific inputs of data received from external Agents; • communicability - agents can communicate with other agents using restful APIs and trigger the possible launch of new Agents; • mobility – possibility to transfer the agent code from one server to another and deploy agents to any server that satisfy Agent system requirements; • repeatability of tasks – ability to autonomously repeat the same tasks that are assigned to Agent, as well as possibility to trigger new tasks. 4.2 Basic Agent-Based Architectural Model for Digital SCM For interaction and informational exchange between the components (Agents) of the digital supply chain ecosystem, it is proposed to highlight the architectural components shown in Fig. 1: The main data sources for the digital ecosystems are various transaction systems ERP, CRM, SRM, WMS, where data are collected on suppliers, consumers, planned production resources, warehouse balances, planned dates of transmission from cargo delivery, etc. Next, data are transferred through the corporate data bus (Integration Agent) to the Message-Oriented Agent, where then it consumed by both Data Lake Agent and Predictive Modeling Agent. At the Visualization Agent and decision making, access to the model outputs is granted to all those interested in efficiency and transparency suppliers, consumers, and also managers of production, warehouse, distribution of goods
122
A. Suleykin and N. Bakhtadze
Fig. 1. Basic agent-based architectural model for digital SCM.
and/or services. At this level operational, tactical and strategic management is carried out as well as decision support based on data analysis. It can be decisions on changing production periods, inventory levels in warehouses, the number of suppliers, customers, etc. All decisions of this levels have a direct impact on plans and records for orders from customers and suppliers, changing the corresponding plans and records in the source systems. 4.3 Extended Agent-Based Architectural Model for Digital SCM An extended model of digital ecosystems in supply chain is characterized by “transition to full digitalization” on all components of the architecture. Such systems must have the following characteristics: • the ability to obtain data from internal and external sources, including tracking; • transport devices and social listening made on a single platform; • the data are aggregated and supplemented by cross-referencing, such as supply chain events affecting a supply proposal. Relevant information may be extracted from weather monitoring data, traffic and news feeds, as well as social networks; • this “enriched” information then “Communicates” inside the using additional analytical and simulation algorithms implementing strategic optimization on various levels. This information should go to control center, which uses modern methods of analysis and forecasting and logistics management algorithms; • as a result, “a single source of truth” allows companies to optimize their choice in various conditions, using all available information to warn enterprises, warehouses and customers about various risks, and engage activities that reduce these risks. Monitoring the status of transport units and expected external influences during the execution of the order, as well as the possibility of change real-time plans will be useful for companies that seek to use their supply chain to achieve a competitive advantage and more carefully manage risks associated with the supply chain.
Agent-Based Architectural Models of Supply Chain Management
123
Over time, machine learning algorithms will reach a sufficient level (“will be smart enough”) to automate this kind of human intervention, allowing managers and other stakeholders to take more reasonable solutions daily. These algorithms will provide advices on mitigating past decisions. The visibility of the chain will increase due to the creation of an effective tracking system that will allow participants to determine the status of any parameter of the shipped goods at any point of time at any place. Transport data and process status information will come from the system enterprise resource planning as well as from carriers, either through direct connections or through third-party portals. GPS technology will allow companies to check precise location, and sensors - monitor environmental conditions such as temperature and humidity, and even provide remote protection against theft. Because the data comes from different sources - suppliers, carriers, warehouses, distributors - their quality and interoperability are crucial. One of the most important achievements and advantages of the proposed extended architectural model is a great prospect for the emergence of new services that can subsequently be developed as independent ecosystems, interacting with other, external ecosystems. Forming such “Ecosystem networks”, ecosystems in their own right serve as drivers for the development of new services, services and, possibly, whole new promising technological solutions. The model of extended Agent-Based architecture in digital supply chain is presented below in Fig. 2:
Fig. 2. Extended agent-based architectural model for digital SCM.
124
A. Suleykin and N. Bakhtadze
This model embodies the principles of distributed computing and the construction of highly loaded systems of the big data class (Big Data) based on Lambda architectural principals [38]. Lambda architecture is a universal scalable fault-tolerant processing architecture designed for batch and delay reduction scenarios. It provides efficient processing of large level-based datasets: batch processing, streaming processing and maintenance that minimizes delays, related to query speed of big data. Presented Agents can be described as follows: • Integration Agent serves as a single entry point for all incoming flows of information for the exchange of data between different ecosystems (agents), transforming all incoming data flows to a single format using unified data transfer protocols as well checking data quality for the presence of mandatory integration fields between ecosystems (agents). • Message-Oriented Agent in turn is needed for access to the same data to different subscribers, or data consumers (Consumer Agent). It allows the usage of data to build different control models both in real time and using batch data streams, upload them to a repository (Data Lake – Data Lake Agent) for subsequent analysis, visualization and construction of other models that do not require constant online model update. • Data Lake Agent presents a repository of a large amount of unstructured data generated or collected by company or enterprise and all related ETL-processes and web-services that serve this data repository. In addition, Batch-Based models can be trained and then applied to predictions. • Predictive modeling Agent – all predictive models that forecast the output according to the trained algorithm in (near) real-time regime. • Visualization Agent and decision making allows making decisions based on data the result of analysis, data visualization or constructed predictive results of machine learning models – as for real time, and for batch. Decision makers monitor and control the real system based on the results of the analysis and signaling messages (alert module) in real time and also involve suppliers and consumers in accepting such decisions which, in turn, form collaboration to maximize economic effect of joint activities. Thus, new services are being formed, which subsequently serve as a driver for the development of other new services. Such a model becomes autonomous, selflearning and self-organizing.
5 Conclusion Supply chains are extremely complex organisms, and efficiency of organization of their interaction, providing transparency, flexibility and scalability directly affects the success of the development of such systems. Digitalization leads to the emergence of ecosystems and their creation in the field of supply chains - to ensure integrity, transparency, monitoring, management and control for the entire product life cycle - from the choice of supplier, production, warehousing, distribution and delivery - to the final consumer. Modeling SCM as a Digital ecosystem can not only increase efficiency process and make it more transparent on at all stages and for all involved entities, but also lead to the emergence of new services that can develop as a separate ecosystem, interacting with other external ecosystems.
Agent-Based Architectural Models of Supply Chain Management
125
In summary, the proposed methodology of usage of Digital Ecosystems in SCM describes main Agents, their features and requirements to them, and architecture of such systems, both in Basic configuration and Extended, where the new services (Agents) are being triggered by the current Agents. Eventually, the complexity of such systems and amount of Agents are supposed to increase a lot together with creation of new Agents, services, goods and new technologies.
References 1. © 2018 International Bank for Reconstruction and Development. World Bank, Washington, DC. https://openknowledge.worldbank.org/bitstream/handle/10986/30584/AUS000015 8RU.pdf?sequence=4&isAllowed=yMark 2. Babiolakis, M.: Forget Products. Build Ecosystems. How products are transforming to open interconnectable interfaces. https://medium.com/@manolisbabiolakis/forget-productsbuildecosystems-792dea2cc4f2 3. Senyo, P.K., Liu, K., Effah, J.: Understanding behaviour patterns of multi-agents in digital business ecosystems: an organisational semiotics inspired framework. In: Advances in Human Factors, Business Management and Society. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-94709-9_21 4. Vapnik, V.N.: Statistical Learning Theory. Wiley, Hoboken (1998) 5. “Industry 4.0”: the creation of a digital enterprise. Global review of the implementation of the Industry 4.0 concept for 2016. http://www.pwc.ru/ru/technology/assets/global_industry2 016_rus.pdf 6. Qin, S.J., Badgwell, T.A.: A survey of industrial model predictive control technology. Control Eng. Pract. 11, 733–764 (2003) 7. Nachira, F., Dini, P., Nicolai, A.A.: Network of Digital Business Ecosystems for Europe: Roots, Processes and Perspectives. Digital Business Ecosystems. European Commission, Bruxelles (2007) 8. Chang, E., West, M.: Digital ecosystems: a next generation of the collaborative environment. In: iiWAS, pp. 3–24 (2006) 9. Dong, H., Hussain, F.K., Chang, E.: An integrative view of the concept of digital ecosystem. In: Proceedings of the Third International Conference on Networking and Services, pp. 42–44. IEEE Computer Society, Washington, DC (2007) 10. Baker, K.S., Bowker, G.C.: Information ecology: open system environment for data, memories, and knowing. J. Intell. Inf. Syst. 29(1), 127–144 (2007) 11. Kastel’s, M.: Informatsionnaia epokha. Ekonomika, obshchestva, kul’tura. GU VShE, Moscow (2000). 129 s. (in Russian) 12. Fuller, M.: Media Ecologies: Materialist Energies in Art and Technoculture (Leonardo Books). The MIT Press, Cambridge (2007) 13. Papaioannou, T., Wield, D., Chataway, J.: Knowledge ecologies and ecosystems. An empirically grounded reflection on recent developments in innovation systems theory. Environ. Plan. C: Gov. Policy 27(2), 319–339 (2009) 14. Ivanov, D.A., Sethi, S.P., Dolgui, A., Sokolov, B.V.: A survey on control theory applications to operational systems, supply chain management, and Industry 4.0. Ann. Rev. Control 46, 134–147 (2018) 15. Seuring, S.: A review of modeling approaches for sustainable supply chain management. Decis. Supp. Syst. 54(4), 1–8 (2012) 16. Markus, M.L., Loebbecke, C.: Commoditized digital processes and business community platforms: new opportunities and challenges for digital business strategies. MIS Q. 37(2), 649–654 (2013)
126
A. Suleykin and N. Bakhtadze
17. Iansiti, M., Levien, R.: Strategy as ecology. Harv. Bus. Rev. 82(3), 68–81 (2004) 18. Porter, M.E.: Competitive Advantage: Creating and Sustaining Superior Performance. Free Press, New York (1985) 19. Brandenburger, A.M., Stuart, H.W.J.: Value-based business strategy. J. Econ. Manag. Strategy 5, 2–25 (1996) 20. Christopher, M.: Logistics and supply chain management. Financial Times (1992) 21. Pagani, M.: Digital business strategy and value creation: framing the dynamic cycle of control points. MIS Q. 37(2), 617–632 (2013) 22. Averian, A.: Supply chain modelling as digital ecosystem. In: International Scientific Conference ITEMA, Budapest, Hungary, 26 October 2017 (2017) 23. Dolgui, A., Ivanov, D., Rozhkov, M.: Does the ripple effect influence the bullwhip effect? An integrated analysis of structural and operational dynamics in the supply chain. IFACPapersOnLine 51(11), 1448–1452 (2018) 24. Briscoe, G., De Wilde, P.: Digital ecosystems: evolving service-orientated architectures. In: 2006 1st BioInspired Model. Network, Network, Information, and Computing Systems (2006) 25. Boley, H., Chang, E.: Digital ecosystems: principles and semantics. In: Proceedings of the 2007 Inaugural IEEE-IES Digital EcoSystems and Technologies Conference, DEST 2007, pp. 398–403 (2007) 26. Chang, E., West, M.: Digital ecosystems and comparison to existing collaboration environment. WSEAS Trans. Environ. Dev. 2(11), 1396–1404 (2006) 27. Chang, E., West, M.: Digital ecosystems a next generation of the collaborative environment. In: Eighth International Conference, vol. 214, pp. 3–23 (2006) 28. Suleykin, A., Bakhtadze, N., Pavlov, B., Pyatetsky, V.: Digital energy ecosystems. IFAC PapersOnLine 52(13), 30–35 (2019) 29. Benaissa, K., Diep, D., Dolgui, A.: Control of chaos in agent based manufacturing systems. In: IEEE International Conference on Emerging Technologies and Factory Automation, ETFA, pp. 1252–1259 (2008). https://doi.org/10.1109/etfa.2008.4638561 30. Dolgui, A., Bakhtadze, N., Sabitov, R., Smirnova, G., Sirazetdinov, B., Sabitov, S., Lototsky, V.: Design of a multi-agent system to manage relay intercity freighting. IFAC-PapersOnLine 49, 1656–1661 (2016). https://doi.org/10.1016/j.ifacol.2016.07.818 31. Benaissa, K., Diep, D., Dolgui, A.: Emergent chaotic behaviour in agent based manufacturing systems. In: WETICE, 17th International Workshop on Enabling Technologies: Infrastructures for Collaboratives Enterprises, Rome, Italy, June 2008 (2008). https://doi.org/10.1109/ wetice.2008.42. hal-00356827 32. Camarinha-Matos, L.M., Afsarmanesh, H., Galeano, N., Molina, A.: Collaborative networked organizations - concepts and practice in manufacturing enterprises. Comput. Ind. Eng. 57(1), 46–60 (2009) 33. Walsh, W.E., Wellman, M.P.: Modeling Supply Chain formation in Multiagent Systems, vol. 1788, pp. 94–101 (1999) 34. Walsh, W.E., Wellman, M.P.: Decentralized supplychain formation: a market protocol and competitive equilibrium analysis. J. Artif. Intell. Res. 19, 513–567 (2003) 35. Roto, V., Heikkilä, M.: Design for value in a digital supply chain ecosystem. In: New Value Transactions - Understanding and Designing for Distributed Autonomous Organisations, Edinburgh, UK, June 2017 36. Cargo Stream. https://www.cargostream.net/
Agent-Based Architectural Models of Supply Chain Management
127
37. Ozhiganova, M.S., Rudskaya, E.N.: The role of blockchain technology in the development of payment systems. In: Vector of Economics, no. 11 (2018). www.vectoreconomy.ru. ISSN 2500-3666 38. Suleykin, A., Panfilov, P.: Distributed big data driven framework for cellular network monitoring data. In: Balandin, S., Deart, V., Tyutina, T. (eds.) Proceedings of the 24th Conference of Open Innovations Association FRUCT, Moscow, Russia, pp. 430–436. FRUCT Oy (2019). https://doi.org/10.23919/fruct.2019.8711912. ISSN 2305-7254, ISBN 978-952-68653-8-6, Helsinki, Finland, ©2019 . e-ISSN 2343-0737 (license CC BY-ND)
A Deep Insight into Signature Verification Using Deep Neural Network Umair Muneer Butt1,2(B) , Fatima Masood2 , Zaib unnisa2 , Shahid Razzaq3 , Zaheer Dar2 , Samreen Azhar2 , Irfan Abbas2 , and Munib Ahmad4 1
University Sains Malaysia, Gelugor, Malaysia [email protected] 2 University of Lahore chenab Campus, Gujrat, Pakistan [email protected], [email protected], [email protected], [email protected], [email protected] 3 National University of Sciences and Technology (NUST), Islamabad, Pakistan [email protected] 4 Minhaj University, Lahore, Pakistan [email protected]
Abstract. Signature is an essential biometric modality used in daily life to authenticate one person on his claim. Due to the widespread usage of stamps in office work, banks, forensic documents, etc., it captures the great interest of the researchers for the past many years. With the advancement in the technology and internet, automatic signature verification considered with renewed interest by the researchers. In this paper, we present the state-of-the-art signature verification mechanism and a novel method to perform signature verification in the RGB images. We train our model using advanced deep learning technique (Deep Neural network (DNN)) to differentiate between genuine and forged classes of signatures. The proposed algorithm evaluated on 4NSigComp2010 and 4NSigComp2012 datasets with the same experimental setup. The results showed that the proposed algorithm outperforms the systems presented in the competitions with an EER of 12 and 11.75, respectively.
Keywords: Feature analysis EER
1
· SURF · Signature verification · DNN ·
Introduction
Signature verification is an important and challenging problem in different fields of life. Automatic signature verification is a broad field that is involved in many disciplines from computer science to neuroscience, system science to engineering and human anatomy [10]. Signatures used in banks, security agencies, governmental organisations, and in forensic institutes. Handwritten signatures considered as an essential biometric modality due to their widespread usage and increasing security demands in daily life work [18]. Biometric systems used an c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 128–138, 2021. https://doi.org/10.1007/978-3-030-55190-2_10
Signature Verification
129
individual’s distinguishing and vital features for recognition. Recognition techniques used by the community broadly divided into two categories physical and inherited. Physical methods perform verification based on a badge, ID card, or RFID etc. one drawback of these techniques is that these material things may be stolen, lost or duplicated. So, we cannot wholly rely on it. On the other hand, biometric systems use the inherited properties of an individual to perform verification and identification, which is a challenging and trusted way to perform an automatic audit. Handwritten signature verification systems can perform two tasks in a pipeline; it can identify and verify the signatures. In the identification phase, it establishes the questioned signature relationship with the identities present in the database [12]. A wide range of verification systems reported so far, but no one is much efficient to deal with the practical issues in the automatic signature verification. Handwritten signatures are considered as an essential biometric modality for per authorisation and authentication in the daily life routines [25,31] and [19]. Handwritten signatures are used in the financial institutions as legal means of verification for last many years [30] and [22]. People also feel more comfortable and ease of use to write signature in the paperwork for the administrative process [37]. Handwriting is a complex biological and physiological process that depends on writer mood and behaviour. Many psychophysical theories reported identifying and to understand the underlying handwriting mechanism [23,32,35] and how this mechanism directly related to ink usage process while writing signatures [13,20,24]. All the contributions made towards automatic signature verification are still not enough; confirmation is always a challenging and open area of research to judge the signatures as genuine, forged, or disguised.
Fig. 1. A general signature verification process
130
U. M. Butt et al.
Automatic signature verification is generally consisting of three main components, as shown in Fig. 1: 1. Data acquisition and Pre-processing 2. Feature extraction 3. Classification In the preprocessing phase, signatures cleared from noise, background information, and other information that disturb the trademarks. In the next step, most essential and distinguishing features are extracted and stored in a database called reference database. The preprocessed signatures sent to the classification phase. A model trained using the extracted features. Key point features are derived from the questioned signatures and pass through the model for classification. Based on some threshold calculated during experiments, it is classified as genuine forged. In this paper, we presented the state-of-the-art signature verification mechanism and discussed the recent advancements in the area of automatic signature verification. We also perform the signature verifications in RGB and the future hyper-spectral domain. For the RGB domain, we evaluated our proposed algorithm on 4NSigComp2010 [8] and 4NSigComp2012 [25] datasets which contain genuine, forged, and disguised signatures. The signature is said to be authentic if verified that it is the claimed person signature. If the name is simulated or written by some other to forge is called the forged signature. Disguised signatures are done by the writer itself (write his signature that looks like formed) so that later he can deny. The questioned signature can also be seen in different scenarios, as follows: 1. The queried name written by the original writer itself called genuinely. 2. Questioned signatures can be fabricated, written by one person and claimed by another person. 3. Questioned signature is not genuine, and the writer traced the original name done by the original writer with the intention of forgery. We used part based keypoint detection techniques to extract signature features from the document and used them for verification purpose. We performed experiments on SURF, SIFT, HOG, LBP, FREAK, FAST etc. We proposed a hybrid approach that combines the features of SURF and HOG and beaten the systems presented in the competitions mentioned above with low equal error rate. The rest of this paper organised as follows. Section II provides an overview of the existing methods and the usage of signature verification in different areas of research like document analysis, Banking, forensics etc. Section III gives details of the data set we used in our experiments. Section IV provides details of our automatic signature verification method. In the Section V results are presented. The paper concluded with a discussion on future work.
2
Related Work
Signature verification is an active area of research from the last many years. Signature verification is divided into two parts online and offline. A large number
Signature Verification
131
of systems proposed for offline signature verification. These systems are based on global, local and combined features and evaluated of various datasets. Therefore, the results are not directly comparable to the proposed method. Bajaa et al. [16] presented a system based on a global feature, particularly profile and contour analysis. They classified the signature decision using a feed-forward neural network. Fang et al. [4] discuss the sparse data problem with the help of Mahalanobis distance classifier and peripheral features. Armand et al. [5] use contour base global features, centroid, skew, surface area and length for signature verification. Deng et al. [15] use comprehensive curvature data in a multiresolution approach and decompose them into signals using wavelet transformation. Signature is verified using some statistical measures to get the most stable signature. Fedhel et al. [7] considered wavelet-based global features with geometrical and analytical features of the signature to verify them. Ferrer et al. [11] performed a comparison based on Euclidean distance, SVM, and HMM using signature contours, where HMM outperforms other approaches to detect genuine or forged signatures. Various geometric parameters are extracted by scaling signature at different magnitudes and then use MLP for final classification. Ramesh et al. [14] applied genetic approach on geometric and global features to classify the signatures. Ueda et al. [17] develop a system for pattern matching by extracting, thinning and blurring signature strokes. In the past few years, systems based on signatures local features reported, e.g., methods applied local keypoint detectors, and descriptors like, SURF [33], FAST [36], SIFT [6], FREAK [34] etc., local features used for object and character recognition, degraded handwritten signatures and writer identification. Malik et al. [3,26] applied combination of local features from SURF, FAST, and FREAK on 4NSigComp2010 data [27]. Srikanta et al. [9] also used SURF and proposed GSURF for offline verification using SVM.
3
Dataset
In this paper, we used ICFHR 4NSigComp2010 [8]1 and 4NSigComp2012 [25]2 data sets for evaluation. These dataset used for competitions in the ICFHR conference. The collection contains only offline signature images. The images are scanned at 600 DPI resolution and cropped at Netherland Institute of forensics ass shown in the Fig. 2 and 3. For model training, there are 209 RGB images. These divided into nine reference signatures and 200 questioned signatures. The 200 questioned signatures consist of 76 genuine signatures written by the original writer with his natural writing style, 104 skilled forgeries (written by 27 forgers to copy the signatures by practising as long as they want); and 20 disguised signatures. To evaluate the results, it contains 125 signatures. These comprised of 25 reference signatures by a different other and 100 questioned signatures as shown in Table 1. The 100 1 2
http://www.iapr-tc11.org/mediawiki/index.php. http://www.iapr-tc11.org/mediawiki/index.php.
132
U. M. Butt et al.
Fig. 2. Glimpse of 4NSigComp 2010 dataset
questioned signatures further divided into three genuine signatures written by reference writer in her typical style; 90 forged documents by skilled forgers. All writings are made up of the same type of pen, ink and on the same kind of paper. Table 1. 4NSigComp 2010 dataset specifications Parameter
Values
Training set
209
Test set
125
Resolution (dpi) 600
Table 2. 4NSigComp 2012 dataset specifications Parameter
Values
Training set
330
Test set
500
Resolution (dpi) 300
4NSigComp2012 dataset also contains offline signatures samples. These images are captured under 300 DPI of resolution and cropped in the same institute as mentioned above. The training and test set of 4NSigComp2010 used as a training set in the competition. For the test set, La Trobe University provided the samples from three different authors A1, A2, A3. The questioned signatures were a mixture of genuine, disguised and skilled forgeries a shown in Table 2. All images are written using the same ballpoint and the same quality of the paper.
Signature Verification
133
Fig. 3. Glimpse of 4NSigComp 2012 dataset
4
Proposed Methodology
The proposed methodology of automatic signature verification consists of four phases. The first phase is the preprocessing phase, where raw signatures preprocessed to make them suitable for feature extraction. Preprocessing involves noise removal, Binarization, and resisting operations so that image becomes smooth, and in an appropriate size, resizing can improve the system performance analysed by experiments. The second phase is the feature extraction phase, where preprocessed signatures used to detect signs and to distinguish features of the signatures using part crucial based point detection and extraction. The last step is a classification phase where we use deep learning and train the system using keypoint extracted from the train data. For evaluation, the proposed system again obtained the features from the questioned signature and passed through the model to classify it as genuine or forged. We analyse and extract the signature features through SURF and HOG. SURF is a keypoint detector and descriptor that represent the image as a set of keypoints. It is a translation, rotation, robust and scale-invariant which makes it more powerful and efficient. To extract part-based features, it first detects the critical points in the image which are unique and distinguished in their neighbourhood and identify blob-like structure from the image. HOG is feature descriptor only that is used for objection detection generally. As the name indicates, it is based on histogram processing and considers the gradient orientation in the local areas of the image. HOG features are efficient and effective than other feature descriptors because it uses the uniformly sized cells and overlapping information for contrast capturing. For the RGB domain, signature verification performed on 4NSigComp2010 and 4NSigComp2012 datasets. The following procedure performed: 1. Keypoint features are extracted of all the preprocessed genuine signatures using SURF and HOG and stored them in a temporary database. 2. Take first genuine signature and compare its keypoint distance with all the other valid signature key points present in the temporary database.
134
U. M. Butt et al.
Fig. 4. Proposed signature verification methodology
3. Mark all the key points having distance less than some empirically found threshold and save them in a reference database with their votes (matching score with other keypoints) in an n-1 cross-validation manner. 4. The reference database is then further scrutinised and features with maximum votes or most essential features which have less distance and familiar in all genuine signatures added to vote database. 5. Deep Neural Networks applied on Voted database to train the model. 6. Once we trained the model, features are detected and extracted for the questioned signature using SURF and HOG. The extracted features are input to the trained model. 7. Finally, we use a threshold to decide whether it is genuine or forged. The automatic signature verification procedure is shown in Fig. 4.
5
Evaluation
To evaluate the performance of the proposed method, we used a test set of ICFHR 4NSigComp2010 and dataset of 4NSigComp2012 competitions. 4NSigComp2010 is the first-ever competition of signature verification which contains disguised signatures. The evaluation dataset of the 4NSigComp2010 competition contains seven disguise, 3genuine, and 90 forged signatures. The evaluation process will not be affected by the use only test set because we used the same experimental setup as used by in the competitions. The performance of the proposed algorithm evaluated on standard measures like Equal Error Rate (EER), where False Accept Rate (FAR, the rate at which forged signatures misclassified as genuine signatures) and False Rejection Rate (FRR, the percentage at
Signature Verification
135
which authentic signatures misclassified as forged) calculated. We used 25 valid signatures for training, and 100 questioned signatures were classified based on the training set. We performed various experiments and compared our proposed system with the other methods presented in the competition. The proposed scheme outperforms all the systems offered in the competition, as shown in Table 3. The proposed system achieved an equal error rate of 12 and beat the other system presented yet. One reason could be that the proposed system considers local and global insights of the signatures and consider only those signatures which are stable. Table 3. Proposed VS state-of-the-art on 4NSigComp2010 System
Equal Error Rate (EER)
Locally stable SURF [28] 209 Proposed system
12
The proposed method also evaluated on ICFHR 4NSigComp2012 dataset. We used the same experimental setup as used in 4NSigComp2010 competition. The test set of data contains signatures from the three authors. Table 4. Proposed VS State-of-the-art on 4NSigComp2012 System
Equal Error Rate (EER)
Griffith University [29] 15.82 Proposed system
11.75
Five systems presented in the competition, system1 from Griffith University collectively won the competition with FRR 15.8, FAR 14.29, and accuracy of 85%. The proposed system beat the top system with FRR 11.75 and Accuracy of 88%, as shown in Table 4. Combination of local and global features that captures the potential and different areas of the signatures made the proposed system remarkable as compared to other systems.
6
Conclusion and Future Work
This section presents the concluding remarks about the general signature verification system and how in daily life. Signature is an essential biometric modality used in every field of life from simple utility store to multinational organisations. For the past, many years, various signature verification systems reported, but they are unable to deal with the real-world scenarios. Most of the techniques assume that signatures are in the pre-segmented form or they do not overlap with
136
U. M. Butt et al.
other information present in the documents like bank cheques, tables, application forms, suicide notes, and wills etc. Signature verification is the subsequent step in the whole signature verification process. Verification directly dependent on the segmentation process and can improve or degrade the accuracy and performance of the system. We used part based keypoint features in a hybrid manner. We performed various approached of the local feature detection in a single and hybrid method. We present a HoggySURF feature which is the composition of HOG and SURF. Keypoint features are extracted of the genuine signatures and build a voted-database which used to classify questioned signatures as real or forged. We evaluated our proposed algorithm on 4NSigComp2010 and 4NSigComp2012 datasets, and it outperforms all the system present in the competition. In this paper, we perform verification on RGB images, but in future, we can take it into the hyper-spectral domain. Hyperspectral imaging provides more information as compared to RGB. It has been used for segmentation [9], mismatch detection and in the agriculture industry. Hyperspectral imaging (HSI) is becoming surprisingly popular due to algorithmic advancement. It used in unmixing of spectra captured in a single image pixel [1], creating high resolution of HSI images with the aid of high spatial density RGB images of the same scene [2]. There are methods for compressive sensing of HSI data [21]. We will extend this verification process in the hyperspectral domain in future. Hyperspectral imaging becomes surprisingly effective.
References 1. Akhtar, N., Shafait, F., Mian, A.: Futuristic greedy approach to sparse unmixing of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 53(4), 2157–2174 (2014) 2. Akhtar, N., Shafait, F., Mian, A.: Bayesian sparse representation for hyperspectral image super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3631–3640 (2015) 3. Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–517. IEEE (2012) 4. Amer, M., Goldstein, M., Abdennadher, S.: Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, pp. 8–15 (2013) 5. Bajaj, R., Chaudhury, S.: Signature verification using multiple neural classifiers. Pattern Recogn. 30(1), 1–7 (1997) 6. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008) 7. Blumenstein, M., Armand, S., Muthukkumarasamy, V.: Off-line signature verification using the enhanced modified direction feature and neural based classification. In: International Joint Conference on Neural Networks (2006) 8. Blumenstein, M., Ferrer, M.A., Vargas, J.F.: The 4NSigComp2010 off-line signature verification competition: scenario 2. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 721–726. IEEE (2010) 9. Butt, U.M., Ahmad, S., Shafait, F., Nansen, C., Mian, A.S., Malik, M.I.: Automatic signature segmentation using hyper-spectral imaging. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 19–24. IEEE (2016)
Signature Verification
137
10. National Research Council: Strengthening forensic science in the United States: a path forward. National Academies Press (2009) 11. Deng, P.S., Mark Liao, H.-Y., Ho, C.W., Tyan, H.-R.: Wavelet-based off-line handwritten signature verification. Comput. Vis. Image Underst. 76(3), 173–190 (1999) 12. Di Lecce, V., Dimauro, G., Guerriero, A., Impedovo, S., Pirlo, G., Salzo, A., Sarcinella, L.: Selection of reference signatures for automatic signature verification. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR 1999 (Cat. No. PR00318) (1999) 13. Dyer, A.G., Found, B., Rogers, D.: Visual attention and expertise for forensic signature analysis. J. Forensic Sci. 51(6), 1397–1404 (2006) 14. Fadhel, E.A., Bhattacharyya, P.: Application of a steerable wavelet transform using neural network for signature verification. Pattern Anal. Appl. 2(2), 184–195 (1999) 15. Fang, B., Leung, C.H., Tang, Y.Y., Tse, K.W., Kwok, P.C.K., Wong, Y.K.: Offline signature verification by the tracking of feature and stroke positions. Pattern Recogn. 36(1), 91–101 (2003) 16. Fergani, B., Davy, M., Houacine, A.: Speaker diarization using one-class support vector machines. Speech Commun. 50(5), 355–365 (2008) 17. Ferrer, M.A., Alonso, J.B., Travieso, C.M.: Offline geometric parameters for automatic signature verification using fixed-point arithmetic. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 993–997 (2005) 18. Gonzalez-Rodriguez, J., Fierrez-Aguilar, J., Ramos-Castro, D., Ortega-Garcia, J.: Bayesian analysis of fingerprint, face and signature evidences with automatic biometric systems. Forensic Sci. Int. 155(2–3), 126–140 (2005) 19. Huang, K., Yan, H.: Stability and style-variation modeling for on-line signature verification. Pattern Recogn. 36(10), 2253–2270 (2003) 20. Huber, R.A., Headrick, A.M.: Handwriting Identification: Facts and Fundamentals. CRC Press, Boca Raton (1999) 21. Khan, Z., Shafait, F., Mian, A.: Joint group sparse pca for compressed hyperspectral imaging. IEEE Trans. Image Process. 24(12), 4934–4942 (2015) 22. Kovari, B., Charaf, H.: A study on the consistency and significance of local features in off-line signature verification. Pattern Recogn. Lett. 34(3), 247–255 (2013) 23. Lindblom, B.S., Kelly, J.S.: Organization and content of book. In: Scientific Examination of Questioned Documents, pp. 21–26. CRC Press (2006) 24. Liwicki, M., Malik, M.I.: Surprising? Power of local features for automated signature verification. In: The 15th International Graphonomics Society Conference (IGS2011), pp. 18–21 (2011) 25. Liwicki, M., Malik, M.I., Alewijnse, L., van den Heuvel, E., Found, B.: ICFHR 2012 competition on automatic forensic signature verification (4NsigComp 2012). In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 823–828. IEEE (2012) 26. Lowe, G.D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999) 27. Malik, M.I., Ahmed, S., Liwicki, M., Dengel, A.: Freak for real time forensic signature verification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 971–975. IEEE (2013) 28. Malik, M.I., Liwicki, M., Dengel, A., Uchida, S., Frinken, V.: Automatic signature stability analysis and verification using local features. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 621–626. IEEE (2014)
138
U. M. Butt et al.
29. Nguyen, V., Blumenstein, M.: An application of the 2D Gaussian filter for enhancing feature extraction in off-line signature verification. In: 2011 International Conference on Document Analysis and Recognition, pp. 339–343. IEEE (2011) 30. Nyssen, E., Sahli, H., Zhang, K.: A multi-stage online signature verification system. Pattern Anal. Appl. 5(3), 288–295 (2002) 31. Pirlo, G., Impedovo, D.: Cosine similarity for analysis and verification of static signatures. IET Biom. 2(4), 151–158 (2013) 32. Plamondon, R., Srihari, S.N.: Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000) 33. Ramesh, V.E., Murty, M.N.: Off-line signature verification using genetically optimized weighted features. Pattern Recogn. 32(2), 217–233 (1999) 34. Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. In: Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol. 2, pp. 1508–1515. IEEE (2005) 35. Stoker, W.H.: Questioned documents. In: Introduction to Forensic Sciences, p. 133 (1996) 36. Viswanathan, D.G.: Features from accelerated segment test (FAST). Inf. Ed. Ac. Uk (2009) 37. Wijesoma, W.S., Yue, K.W., Chien, K.L., Chow, T.K.: Online handwritten signature verification for electronic commerce over the Internet. In: Asia-Pacific Conference on Web Intelligence, pp. 227–236. Springer (2001)
Measures to Ensure the Reliability of the Functioning of Information Systems in Respect to State and Critically Important Information Systems Askar Boranbayev1(B) , Seilkhan Boranbayev2 , and Askar Nurbekov2 1 Nazarbayev University, Nur-Sultan, Kazakhstan
[email protected] 2 L.N. Gumilyov Eurasian National University, Nur-Sultan, Kazakhstan
Abstract. This article is devoted to the development of measures, such as organizational, technical and legal measures, to ensure the reliability of the functioning of information systems in respect to governmental and critically important information systems. The article talks about measures and recommendations to ensure the reliability of Information Systems in relation to Public Information Systems; measures to Ensure the reliability of Information Systems in relation to Critical Information Systems, main criteria for classifying information and communication infrastructure objects as critical information and communication infrastructure objects, critical sectors, their subsectors and critical services, questionnaires for the evaluation of infrastructure providing critical services, criteria for assessing the criticality of services, differences between standard information system (IS) and critical information system; looks at one of the possible options for the process of sustainable operation of critical information system. The application of the proposed methods and approaches will increase the level of reliability of information systems, to ensure the timely identification and elimination of relevant risks. Ensuring the reliability of information systems in relation to public information systems envolves things such as: organizing control over the activity planning aimed at reliable and safe functioning of state information system; implementing processes to ensure the reliability, fault tolerance and security of information systems; formation of regulatory requirements to ensure the reliability of public information system; monitoring the continuous training and certification of employees working with information system; raising awareness of employees and management responsible for the performance of public information systems on the most relevant threats, vulnerabilities, risks and incidents related to the reliability and security of information system; conducting regular audit and assessment of the level of ensuring the reliability of state information system; carrying out preventive measures to eliminate incidents and reduce the risk of information system failures; connecting the telecommunications network to a single gateway to the Internet; connecting the state information system to the information security event monitoring system and transmission of operational information on threats and incidents to the national Information Security coordination center. Keywords: Information system · Reliability · Security · Fault tolerance © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 139–152, 2021. https://doi.org/10.1007/978-3-030-55190-2_11
140
A. Boranbayev et al.
1 Introduction The law of the Republic of Kazakhstan “On Informatization” regulates public relations in the field of informatization arising on the territory of the Republic of Kazakhstan between state bodies, individuals and legal entities in the creation, development and operation of information objects, as well as with the state support of the development of information and communication technologies [1]. The law also regulates the basic rules regarding state information systems and resources. In accordance with the law, the protection of state information systems is provided by their owners.
2 Measures to Ensure the Reliability of Information Systems in Relation to Public Information Systems Let us consider the basic requirements for the information system of the state body. The information system of the state body is created, operated and developed in accordance with the legislation of the Republic of Kazakhstan; the standards that are in force on the territory of the Republic of Kazakhstan, the life cycle of the information system and taking into account the unified requirements in the field of information and communication technologies and Information Security. The main stages of creation and development of the information system of the state body, taking into account Information Security, include: – carrying out pilot operation of the information system of the state body, carried out in accordance with uniform requirements in the field of information and communication technologies and Information Security, including testing for compliance with Information Security requirements, optimization and elimination of identified defects and defects with their subsequent correction; – implementation of the information system of the state body in accordance with the applicable standards on the territory of the Republic of Kazakhstan; – putting the information system into commercial operation of the state body (testing of the information system, information and communication platform “electronic government” and the Internet resource of the state body). In the industrial operation of the information system of the state body are provided: 1) compliance with uniform requirements in the field of information and communication technologies and information security; 2) safety, protection, restoration of electronic information resources in case of failure or damage; 3) monitoring of information security events of the information system of the state body and transfer of its results to the monitoring system of Information Security of the state technical service.
Measures to Ensure the Reliability of the Functioning of Information Systems
141
Table 1. Information security measures taken regarding the state of information system in the context of the stages of the life cycle Information system life cycle phase
Information security measures
Legislation
Pilot operation, commissioning
Testing for compliance with Information Security requirements
Methods and rules of testing: service software, information and communication platform “electronic government”, the Internet resource of the state body and the information system to meet the requirements of information security
Commercial operation
Monitoring of information security of information system
Rules of information security monitoring of “electronic government” Informatization objects and critical information and communication infrastructure objects
Creation/implementation/, industrial operation (on the initiative of the owner/IS owner
Audit of information system
Law of the Republic of Kazakhstan «On Informatization»
Industrial operation (telecommunication networks)
Connecting telecommunication Law of the Republic of networks to the Internet Kazakhstan «On through a single Internet access Informatization» gateway
On January 9, 2017, the government resolution “On approval of Unified requirements in the field of information and communication technologies and information security” came into force. The requirements contained in it in terms of Information Security and reliability are mandatory for state information system and critical information and communication infrastructure facilities, which include industrial enterprises and other categories of economic facilities that have automated technological processes, the violation of which may affect the security of the country. It is important to note the importance of organizing appropriate measures at certain stages of the life cycle. The main information system measures taken in relation to information system in the context of life cycle stages are reflected in Table 1. Table 1 shows that currently in Kazakhstan it is mandatory to pass tests at the stage of pilot operation of the state information system. When monitoring the protection of objects of informatization of “electronic government” instrumental survey is carried out – that is scanning by means of software for
142
A. Boranbayev et al.
remote or local diagnostics of communication channels, nodes, servers, workstations, application and system software, databases and network elements to identify vulnerabilities in them (lack of an object that can disrupt the operation of the object, or lead to unauthorized access bypassing the used means of information protection) [2]. Taking into account the above, it is necessary to note the positive trend of improving the legislation in the field of development of reliable and stable informatization of the Republic. However, despite the construction of legislative requirements, the implementation of these requirements is not always implemented in a timely manner. This often leads to undesirable exploitation of vulnerable and unstable information system. Thus, referring to the example of imposing fines on information system owners who process personal data of EU citizens, it can be noted that in case of failure to comply with the requirements established by the General data protection regulations of the European Union GDPR, there is a risk of imposing a fine of up to 20 million euros or up to 4% of the annual turnover of the company, depending on what amount will be more. In this regard, it is recommended to strengthen the control over the requirements for ensuring the reliability of the information system with the imposition of certain fines if these requirements are not fulfilled. 2.1 Recommendations for Ensuring the Reliability of Information Systems in Relation to Public Information Systems Summarizing the above, we present the main recommendations to the state information system: – to organize control over the activity planning aimed at reliable and safe functioning of state information system; – to implement and develop the main processes to ensure the reliability, fault tolerance and security of public information system (vulnerability management, incident management, network security management, update management, management of information system user rights, reservation management, etc.); – to use international experience in the formation of regulatory requirements to ensure the reliability of public information system; – to monitor the continuous training and certification of employees working with information system, providing public services; – to raise awareness of employees and management responsible for the performance of public information system on the most relevant threats, vulnerabilities, risks and incidents related to the reliability and security of information system; – to conduct regular audit and assessment of the level of ensuring the reliability of state information system; – to carry out preventive measures to eliminate incidents and reduce the risk of information system failures; – to connect the telecommunications network in which the information system is located to a single gateway to the Internet; – to connect the state information system to the information security event monitoring system and transmission of operational information on threats and incidents to the national Information Security coordination center.
Measures to Ensure the Reliability of the Functioning of Information Systems
143
3 Measures to Ensure the Reliability of Information Systems in Relation to Critical Information Systems Identification and protection of critical information systems is one of the main and priority tasks of any country. To prevent threats, failures and Information Security incidents in a country’s critical infrastructure, it is necessary to begin by identifying critical information system. Critical information system plays a significant role in all countries because of the importance of national, social and socio-economic security, so their timely and correct identification and application of appropriate protection measures against them can prevent numerous disasters [3–5]. Countries around the world are faced with negative Information Security events caused by various reasons in the critical infrastructure sector [6–8]. They often lead to numerous and serious casualties and losses, stop of critical production, destruction of the environment [9–12]. Currently, to determine the critical information systems in Kazakhstan, there are Rules and criteria for classifying the objects of information and communication infrastructure as critical objects of information and communication infrastructure” [13]. The criteria proposed in this normative legal act are given in Table 2. Table 2. Main criteria for classifying information and communication infrastructure objects as critical information and communication infrastructure objects [14] Impact of information and communication infrastructure on continuous and safe operation Especially important state objects, in case of violation of the functioning of which the activity of especially important state objects will be stopped
Strategic objects, in case of violation of the functioning of which the activity of strategic objects will be stopped or there is a threat of an emergency of man-made nature
Objects of economic sectors of strategic importance, the violation of which will stop the activities of objects of economic sectors of strategic importance, or there is a threat of an emergency of a man-made nature
In the Republic of Lithuania, the Government Decree No. 742 of 20 July 2016 “On approval of the methodology for identification of critical information infrastructure” is in force. In accordance with this Decree, the country has approved a list of critical information infrastructure, access to which is limited. The Decree identifies critical sectors, their subsectors and services (Table 3). For each sector, there are questions to calculate the level of damage in the event of a sector service interruption (Table 4). In case of exceeding 16 points, the infrastructure is checked for compliance with the criteria for assessing what information infrastructure is necessary to ensure the continued provision of essential services (Table 5). The information system that meets all criteria after evaluation is defined as a critical information system.
144
A. Boranbayev et al. Table 3. Critical sectors, their subsectors and critical services [15]
Sector
Subsector and services
Energy sector
Electricity subsector: power generation, transmission and distribution services, electricity market services Subsector of oil and petroleum products: oil production, oil refining and Refining, oil pipeline maintenance, storage of oil and petroleum products Natural gas subsector: transmission, distribution and storage of natural gas Sub-sector district heating
Information technology and electronic communications sector
Information technology subsector - data center/cloud service Subsector of electronic communications: fixed and mobile telephony, data services, Internet access, lt domain service
Water sector
Drinking water subsector - drinking water storage, supply and quality assurance Sewage subsector - waste water collection and treatment service
Food industry
Agricultural and/or food production Provision of food (public storage) Food quality and safety
Health sector
Emergency medical services, inpatient and outpatient care, supply of medicines, vaccines, blood and medical supplies, control of infectious and/or epidemic diseases
Financial sector
Banking services, transfer payment, stock exchange
Transport and postal sector
Air transport subsectors: air traffic management and navigation services; airport services Road transport subsector: bus service; road network maintenance Railway transport subsector: railway passenger service; transportation of goods by rail Marine subsector - marine control service The postal sub-sector is the universal post service
Public security and law enforcement sector The social security The service of judicial and penal system (continued)
Measures to Ensure the Reliability of the Functioning of Information Systems
145
Table 3. (continued) Sector
Subsector and services
Industrial sector, chemical and nuclear subsector
Hazardous waste storage and waste management Security service of industrial enterprises with a high level of risk
General government
Service of public authority functions
Civil defence sector
General emergency telephone communication; warning of emergency situations, their elimination, elimination of consequences, organization of the population for rescue and coordination
Environmental sector
Air pollution monitoring and early warning service Meteorological observation and early warning service Surveillance and early warning service (rivers, lakes) Marine pollution monitoring and control service
National defence sector
State defence service
Foreign affairs and security policy sector
Foreign affairs and security policy service
Table 4. An example of a questionnaire for the evaluation of infrastructure providing critical services [15] Question
The answer is A (3 points)
…
The answer is D (0 The scoring points)
Industry questions: can the destruction, damage or disruption of an object adversely affect the provision of a critical service Electricity subsector services
Electricity supply will be … cut off for more than 145,000 residents or in more than 3 municipalities or category I consumers, which will last more than 24 h
Electricity supply will be cut off for less than 500 residents
2
…
…
…
…
General assessment of the importance 17 Is this a critical infrastructure/system (at least 16 points) (Yes/no) Yes
…
146
A. Boranbayev et al. Table 5. Criteria for assessing the criticality of services [15]
The critical service provided by a critical infrastructure depends on the proper functioning of the facility’s information infrastructure
A cyber incident in a critical infrastructure has a significant impact on the disruption of a critical service provided by that facility
There is no other alternative for the operation of this facility when providing critical service due to the failure of critical infrastructure information infrastructure
Today, the action plan for the implementation of the Concept of cybersecurity (“Cyber Shield of Kazakhstan”) until 2022 sets the task of improving the procedure for determining critical information system, taking into account best international practices. This paper examines the procedure for classifying information system as critical based on the experience of other countries. Once critical information system is identified, it is important to take appropriate measures to ensure the reliability of the information system. The main measures that have been taken to apply to critical information systems in Kazakhstan at certain stages of the life cycle are similar to the measures of Information Security taken with respect to state information systems (Table 1). The concept of “infrastructure reliability” can be understood as the ability of an infrastructure that is at risk to adapt to the situation and recover from losses while maintaining the functioning of critical structures and elements. Increased reliability is achieved through risk management. The reliability of infrastructure, systems and network assets means that they must be flexible and adaptable. To improve reliability, it is important to have accurate, timely and reliable threat information and analysis of expected risks, identification of mitigation measures, response to threats and, accordingly, the possibility of recovery. The main differences between critical information system requirements and standard are shown in Table 6. Table 6. Differences between standard information system (IS) and critical information system. Requirements
Standard IS
Critical IS
Availability
Average (IS rebooting is permissible) High (reservation is most often used)
Integrity
Depends on the requirements of the company
High (loss of information is not allowed)
Confidentiality
Depends on the requirements of the company
High
Performance
High
Average
Risk management
Depends on the requirements of the company
The risk of significant harm
Measures to Ensure the Reliability of the Functioning of Information Systems
147
Countries around the world are facing adverse information security developments in the critical infrastructure sector. They often lead to numerous and significant losses, significant disruption of production, destruction of the environment, etc. The consequences of the implementation of threats can be catastrophic, so it is necessary to take all possible measures to protect and ensure the sustainable functioning of information system data. Figure 1 shows one of the options for ensuring the sustainability of critical information system.
Fig. 1. One of the possible options for the process of sustainable operation of critical information system
As shown in Fig. 1, the sustainable functioning of critical information system depends on three components: cybersecurity, the human factor and physical protection. Figure 1 shows cyber security in more detail. Identification of critical information system is the first phase to protect critical infrastructure. The second phase includes: identification, assessment and elimination of vulnerabilities; identification, assessment and risk reduction; identification, assessment and
148
A. Boranbayev et al.
mitigation of threats; incident handling; monitoring (24 × 7); configuration of protective devices; certification; audit; mitigation; standardization; law enforcement and other. Taking into account the above, it should be noted that despite the fact that the list of critical objects of ICI was determined only in 2017, the legislation has already defined a number of requirements for their protection. The following measures are recommended to ensure the reliability of information systems in relation to critical information systems: – to continue to improve legislation on the identification of critical ICI objects and requirements for their reliable and safe operation; – to use international experience in the formation of regulatory requirements for the identification and reliability of critical information system; – to form requirements for the reliability of information systems, taking into account their criticality and importance for production, i.e. requirements that do not bear the risks (minimum risks) of failure or failure of information systems; – to divide types of critical systems into information systems, information resources, ICI, automated process control systems; – to control and monitor the digitalization of critical non-IT systems (for example, automated process control systems); – to monitor the continuous training and certification of employees working with critical information systems; – to raise awareness of employees and management responsible for the performance of critical information systems about the most relevant threats, vulnerabilities, risks and incidents related to the reliability and security of critical information systems; – to provide for the requirement to ensure the duty shift for round-the-clock monitoring of the performance and availability of critical information systems; – to conduct regular audit and assessment of the level of reliability of critical information system; – to carry out preventive measures to eliminate incidents and reduce the risk of information system failures. 3.1 Recommendations Aimed at Improving Measures to Ensure the Reliability of Information Systems Here are some indicators that indicate the growth of the use of ICT/ICI-technologies according to the statistics of the official resource stat.gov.kz (Table 7). As can be seen from Table 7, the information services and ICT technologies presented are in demand. This makes it possible to predict with high probability the growth of demand for the reliability of these technologies, including information systems. Thus, the recommendations to ensure the reliability of the information system are relevant. Also, the importance of this task is emphasized by the strategic objectives in the legislative acts of the country. Thus, in accordance with the Concept of Cyber security of Kazakhstan, one of the important tasks is the development of domestic reliable software products and information systems. The search for such products, including information systems, facilitates the availability of a register of trusted electronic products and software. The registry
Measures to Ensure the Reliability of the Functioning of Information Systems
149
Table 7. Indicators of ICT/ICI-technologies use for 2016–2018 [16] Indicator
2016
2017
2018
The proportion of Internet users aged 6–74 years, the percentage
76,8
78,8
81,3
Total costs of information and communication technologies 269 526 349 943 (including the organization of public administration), million tenge
305217
Number of computers in organizations (including public administration organizations), computer
1042813
951 777 977 192
Number of organizations using the Internet (including public 75 779 administration organizations), organization
79 658
100702
Volume of information and communication technologies services, million tenge
944 398 1 034 849 –
The proportion of organizations with Internet resources, the percentage
18,5
21,7
22,3
The proportion of organizations using the Internet, the percentage
26,7
30,2
32,5
includes products awarded with the certificate of system of certification of information systems for the information security requirements is not below 4 the level of trust in accordance with the ST RK ISO/IEC 15408-3-2006. The increase in software products in the register will increase competitiveness among domestic products, providing a wide choice for potential consumers. In addition, we note that the basis for ensuring the reliability of the operation and information security of information systems is the maximum level of secure operation of the information system server. Recommendations aimed at improving measures to ensure the reliability of information systems in Kazakhstan: – – – – – – – –
regular installation of security updates for the operating system; regular software updates; use SSL certificate to work with the resource only via HTTPS; removal of software not related to the necessary components of the functioning of the information system; disable unused services installed by default (such as FTP or SMTP) and unused server extensions; disable directory browsing if it is not necessary; logging of cases and its periodic analysis with the setting of automatic notifications on detection of suspicious activity; providing installation of a firewall and other means of information protection if necessary.
150
A. Boranbayev et al.
4 Conclusion Summarizing the above recommendations, it is important to implement a set of measures aimed at improving measures to ensure the reliability of the functioning of information systems in Kazakhstan: organizational and legal measures, organizational and technical measures, measures for human resources management, cooperation and promotion of measures for the safe use of information and communication technologies. The application of the proposed methods and approaches aimed at ensuring the reliability of the functioning of information systems, will increase the level of reliability of information systems, to ensure the timely identification and elimination of relevant risks. The proposed approaches can be used for further research development in works [17–32].
References 1. The law of the Republic of Kazakhstan dated November 24, 2015, № 418-V «About Informatization» 2. The Order of the Acting Minister of Investments and Development of the Republic of Kazakhstan dated January 28. 2016, № 108 «On approval of the Methodology for the certification examination of the information system, the information and communication platform of “electronic government”, the Internet resource of a state body for compliance with information security requirements» 3. Yazdani, M., Alidoosti, A., Zavadskas, E.K.: Risk analysis of critical infrastructures using fuzzy COPRAS. Econ. Res. 24(4), 27–40 (2011) 4. Assaf, D.: Models of critical information infrastructure protection. Int. J. Crit. Infrastruct. Protect. 1, 6–14 (2008) 5. Nickolov, E.: Critical information infrastructure protection: analysis, evaluation and expectations. Inf. Secur. Int. J. 17, 105–119 (2005) 6. Too, E.G.: Capability for infrastructure asset capacity management. Int. J. Strat. Prop. Manag. 15(2), 139–151 (2011) 7. Rudock, L., Amaratunga, D.: Post-Tsunami reconstruction in Sri Lanka: assessing the economic impact. Int. J. Strat. Prop. Manag. 14(3), 219–232 (2010) 8. Miao, X., Yub, B., Xic, B., Tangd, Y.-H.: Modeling of bilevel games and incentives for sustainable critical infrastructure system. Technol. Econ. Dev. Econ. 16(3), 365–379 (2010) 9. Darby, S.: Energy feedback in buildings - improving the infrastructure for demand reduction. Build. Res. Inf. 36(5), 499–508 (2008) 10. Little, R.G.: Tending the infrastructure commons: ensuring the sustainability of our vital public systems. Struct. Infrastruct. Eng. 1(4), 263–270 (2005) 11. Yusta, J.M., Correa, G.J., Lacal-Arántegui, R.: Methodologies and applications for critical infrastructure protection: state-of-the-art. Energy Policy 39(10), 6100–6119 (2011) 12. Tofani, A., Castorini, E., Palazzari, P., Usov, A., Beyel, C., Rome, E., Servillo, P.: An ontological approach to simulate critical infrastructures. J. Comput. Sci. 1(4), 221–228 (2010) 13. Boranbayev, A., Boranbayev, S., Nurusheva, A., Yersakhanov, K.: The modern state and the further development prospects of information security in the Republic of Kazakhstan. In: Advances in Intelligent Systems and Computing, vol. 738, pp. 33–38 (2018) 14. Decree of the Government of the Republic of Kazakhstan dated September 8, 2016 No. 529 “On approval of the Rules and criteria for classifying information and communication infrastructure objects as critical information and communication infrastructure objects”
Measures to Ensure the Reliability of the Functioning of Information Systems
151
15. Decree of the Government of the Republic of Lithuania “On Approving the Methodology for Identifying Critical Information Infrastructure” dated July 20, 2016 No. 742 16. Official website of the Committee on Statistics of the Ministry of National Economy of the Republic of Kazakhstan. http://stat.gov.kz/faces/wcnav_externalId/homeNumbersInforma tionSociety?_afrLoop=8713314598324629#%40%3F_afrLoop%3D8713314598324629% 26_adf.ctrl-state%3D1a9qhbqz2k_148 17. Boranbayev, A., Boranbayev, S., Nurusheva, A., Yersakhanov, K.: Development of a software system to ensure the reliability and fault tolerance in information systems. J. Eng. Appl. Sci. 13(23), 10080–10085 (2018) 18. Boranbayev, A., Boranbayev, S., Yersakhanov, K., Nurusheva, A., Taberkhan, R.: Methods of ensuring the reliability and fault tolerance of information systems. In: Advances in Intelligent Systems and Computing, vol. 738, pp. 729–730 (2018) 19. Akhmetova, Z., Boranbayev, S., Zhuzbayev, S.: The visual representation of numerical solution for a non-stationary deformation in a solid body. In: Advances in Intelligent Systems and Computing, vol. 448, pp. 473–482 (2016) 20. Akhmetova, Z., Zhuzbaev, S., Boranbayev, S.: The method and software for the solution of dynamic waves propagation problem in elastic medium. Acta Physica Polonica A 130(1), 352–354 (2016) 21. Hritonenko, N., Yatsenko, Y., Boranbayev, S.: Environmentally sustainable industrial modernization and resource consumption: is the Hotelling’s rule too steep? Appl. Math. Model. 39(15), 4365–4377 (2015) 22. Boranbayev, S., Altayev, S., Boranbayev, A.: Applying the method of diverse redundancy in cloud based systems for increasing reliability. In: The 12th International Conference on Information Technology: New Generations, ITNG 2015, Las Vegas, Nevada, USA, 13–15 April 2015, pp. 796–799 (2015) 23. Turskis, Z., Goranin, N., Nurusheva, A., Boranbayev, S.: A fuzzy WASPAS-based approach to determine critical information infrastructures of EU sustainable development. Sustainability (Switzerland), 11(2), 424 (2019) 24. Turskis, Z., Goranin, N., Nurusheva, A., Boranbayev, S.: Information security risk assessment in critical infrastructure: a hybrid MCDM approach. Informatica (Netherlands) 30(1), 187– 211 (2019) 25. Boranbayev, A.S., Boranbayev, S.N., Nurusheva, A.M., Yersakhanov, K.B., Seitkulov, Y.N.: Development of web application for detection and mitigation of risks of information and automated systems. Eurasian J. Math. Comput. Appl. 7(1), 4–22 (2019) 26. Boranbayev, A.S., Boranbayev, S.N., Nurusheva, A.M., Seitkulov, Y.N., Sissenov, N.M.: A method to determine the level of the information system fault-tolerance. Eurasian J. Math. Comput. Appl. 7(3), 13–32 (2019) 27. Boranbayev, A., Boranbayev, S., Nurbekov, A., Taberkhan, R.: The development of a software system for solving the problem of data classification and data processing. In: 16th International Conference on Information Technology - New Generations, ITNG 2019, vol. 800, pp. 621–623 (2019) 28. Boranbayev, A., Boranbayev, S., Nurusheva, A., Seitkulov, Y., Nurbekov, A.: Multi criteria method for determining the failure resistance of information system components. In: Advances in Intelligent Systems and Computing, vol. 1070, pp. 324–337 (2020) 29. Boranbayev, A., Boranbayev, S., Nurusheva, A., Yersakhanov, K., Seitkulov, Y.: A software system for risk management of information systems. In: Proceedings of the 2018 IEEE 12th International Conference on Application of Information and Communication Technologies, AICT 2018, Almaty, Kazakhstan, 17–19 October 2018, pp. 284–289 (2018) 30. Boranbayev, S., Altayev, S., Boranbayev, A., Nurbekov, A.: Mathematical model for optimal designing of reliable information systems. In: Proceedings of the 2014 IEEE 8th International
152
A. Boranbayev et al.
Conference on Application of Information and Communication Technologies-AICT 2014, Astana, Kazakhstan, 15–17 October 2014, pp. 123–127 (2014) 31. Boranbayev, S., Altayev, S., Boranbayev, A., Seitkulov, Y.: Application of diversity method for reliability of cloud computing. In: Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies-AICT 2014, Astana, Kazakhstan, 15–17 October 2014, pp. 244–248 (2014) 32. Boranbayev, S.: Mathematical model for the development and performance of sustainable economic programs. Int. J. Ecol. Dev. 6(1), 15–20 (2007)
IoTManager: Concerns-Based SDN Management Framework for IoT Networks Radwa Hamed1 , Mohamed Rizk1 , and Bassem Mokhtar1,2(B) 1 Department of Electrical Engineering, Faculty of Engineering, Alexandria University,
Alexandria, Egypt {eng-radwa.hamed,mohamed.rizk,bmokhtar}@alexu.edu.eg 2 College of Information Technology, University of Fujairah, Fujaiarh, UAE [email protected]
Abstract. Software Defined Networking (SDN) is an evolving technology in computer networks. One of SDN challenges is the lack of a standard SDN highlevel framework, which provides an abstraction to lower levels through an easy to use high level interface without getting involved in difficult and complex programming tasks. Consequently, we propose IoTManager, an SDN concern-based management framework for IoT networks. IoTManager provides high level abstraction in SDN management layer through the separation of network concerns. IoTManager translates a set of input keywords into policies based on network management function and network concerns. Our translator is hybrid; proactive as it inserts OpenFlow rules to switches based on defined policies and reactive where it handles events according to the defined policies. A network scenario simulation via mininet shows our framework’s functionality and the high-level policies the framework can support converting them to low level flow rules. In addition, the performance is measured through calculating the different processing times. Moreover, the proposed abstraction facilitates managing the network through separation of concerns thus dividing the network management problem to less-complex sub-problems. Additionally, it helps closing the semantic gap between the intents of network administrators and the languages in which those intents can be encoded. Keywords: Software Defined Networking · IoT · Network management · Separation of concerns · Network abstraction
1 Introduction Software defined networking [1] separates the network to layers or planes. The infrastructure layer or data plane contains the forwarding switches. The control plane contains the controller doing all the thinking and control. And the management plane is represented in a programming language, a policy-based framework or network applications. Each layer provides abstraction for the underlying layers. The south bound interface is the communication protocol between the infrastructure layer and the controller. The most commonly used protocol is OpenFlow [2]. While the north bound interface is the © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 153–167, 2021. https://doi.org/10.1007/978-3-030-55190-2_12
154
R. Hamed et al.
communication protocol between the management plane and the controller. There is no standard north bound interface. The SDN [1] literature reviewed the proposed work in different SDN planes, the limitations, the challenges and further research domains in SDN [3, 4]. These challenges include providing a high-level abstraction framework to be easily used by network administrators lacking programming background. To overcome these challenges, related work addressed a set of programming languages to provide an abstraction for developers to create network applications. Also, policy-based frameworks were proposed providing an interface and control parameters for the network administrator to define network policies. However, further research and investigations are required before having a standard high-level abstraction in SDN adopted by enterprises. An example of recent related work is; OSDF [5] which is an intent-based SDN programming framework. Another example is Opensec [6] which is a policy-based security framework. Other works in literature such as [7–9]. introduced SDN controller designs swith different SDN architectures, such as, logically centralized physically distributed architecture, which provide management capabilities for networking operations and various applications and services in computer networks. IoTManager differs in its design, architecture, the provided control domains, and the interface to define policies through keywords. Furthermore, IoTManager is event-driven handling events related to time, data usage, traffic type, network, and server resources. IoTManager is one of the proposals in the network management plane, providing an abstraction for lower layers and representing an interface for the network administrator to manage the network. This paper is an extension to our previously published paper [10] which introduces our framework and its main components and architecture, providing preliminary simulation results. According to the previous paper, the framework’s main contribution and goals are as follows, details can be found in the published paper: • Providing a high-level abstraction in the SDN model in the management plane with a multifunction framework and easy interface for policy definition. • Extending the control model provided by other SDN systems. This paper is arranged as follows: Sect. 2 presents the updated building blocks of our framework and its components and architecture. Sect. 3 evaluates our framework. Some future work-related points are discussed in Sect. 4. Finally, Sect. 5 concludes our work.
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
155
2 IoTManager Components IoTManager components are shown in Fig. 1. They are built on pox controller [11] and implemented in python:
Fig. 1. IoTManager components and building blocks.
2.1 Policy Input GUI Policy definition keywords are represented in Table 1. The keywords are presented in a graphical user interface as shown in Fig. 2. Primary tabs are QoS, monitoring and security. Secondary tabs include application, communication, and resources. Under each tab are the keywords related to this concern. The keywords present control parameters that correspond to flow fields [2] to pick up the traffic to which the policy should be applied. Other control parameters are event-related, defining the network conditions in which the policy should be applied through defining when to raise events and how to handle them. Global keywords can be changed by the administrator causing change in the action and can be used to handle conflict between the policies.
156
R. Hamed et al.
Fig. 2. Policy input graphical user interface.
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
157
Table 1. IoTManager policy definition keywords. Global keyword Network management function
Quality of service, traffic monitoring, security
Concern
Application, communication, resources
Concern
Control parameter
Keyword
Application
Flow fields:
-Protocol TCP/UDP -src port -dest port
Traffic type (event-related)
-Application type -Attack
Flow fields
-TOS field -Src mac addr -Dest mac addr -vlan -frame type -src ip addr -des ip addr
Data usage (event-related)
-datarate (mb/s) -bytes -flow duration (s) -no. of connections
Flow fields
Sw in_port
Network resources (event-related)
-link bandwidth (mb/s) -Server resources: -server ip -RAM usage % -Processor usage % -Storage usage %
Action
-Allow -Notify the admin by mail -Block -Log -Decrease data rate (using TOS field and queues) -Change vlan -Reroute -Specify queue -Detecting application type -Detecting attacks
Time (event-related)
Time interval start Time interval end
Priority
Number
Policy expiry time (event-related)
Y/M/D/H Year/month/day/hour
Communication
Resources
Default App Comm Resources ACR AC AR CR
158
R. Hamed et al.
2.2 Policy Database Policy keywords’ values defined through the policy input interface are written to a file until the policy is deleted by the administrator or when it expires. 2.3 Policy Reader Reads the policies from the file and converts them to policy objects. 2.4 Policy to Flow Rules Block A function which is called to check policies’ objects and insert flow rules to the switches accordingly. It is called by “handling packetin” function after the switching or routing function that determines the flow rule’s output port. It is also called by events’ handlers to modify the flow rule’s action or add a new rule according to the raised event. It works by checking every policy’s keyword value if specified, then it adds the keyword value to the corresponding flow field in the flow rule. On the other hand, for event related specified keywords, it checks whether the event has been raised for the flow to which the packetin belongs and it updates a flag accordingly. There is an array for every event to keep track of all the policies related to this event through adding the policy ID in the array. After checking all the keywords, the flag indicates whether all the policy conditions apply including flow fields and other network parameters. In this case the flow rule created according to the policy is added to the switch. Otherwise the rule is not added and instead a rule with the default action is added for this flow. A switch flow rule contains only flow fields and action, while the policy contains other control parameters. The flow rule action is specified proactively according to the current values of control parameters. The rules inserted in the switch are changed reactively when an event is raised indicating any change in these parameters. According to the defined policies if the keyword related to an event is set, a flag activates the event source which checks for the specified condition and raises event accordingly. 2.5 Packetin Handler A function which is called when packetin event is raised when a packet is sent from the switch to the controller. When a flow reaches the switch, it searches the rules in its flow tables if none of the rules matched, a packet from the flow is sent to the controller to take the decision and to add a rule for the flow in the switch. The packetin handler checks flags for any activated event source related to the flow or network conditions according to the inserted policies and calls the activated event sources’ functions. 2.6 Application and Attack Detection Block: (Event Source) Application type and attack are detected through traffic behavior from flow and port statistics received from the switch and from the flow fields values. When application type or attack specified in a policy is detected an event is raised to be handled according to the action in the policy. Also, for policies with actions to detect application or attack, a packet is sent to the block to detect its application type. Moreover, the packet can be sent to external intrusion detection systems to be analyzed.
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
159
2.7 Switching and Routing Block A function which specifies through layer 2 and layer 3 switches’ tasks the output switch port to be added in the flow rule action to determine the path of the flow. 2.8 Statistics Processing Block: (Event Source) A function which processes the statistics received from the switches to measure flow bytes, flow duration, consumed bandwidth, available bandwidth and data rate. Furthermore, it acts as event source to raise event when the values in related policies are reached. Statistics are requested every 10 s. The event source corresponding to a policy keyword is activated when this keyword is set. 2.9 Event Handler Handles events received from event sources due to change in network control parameters and according to the traffic. It checks policies related to this event and adds or modifies corresponding flow rules according to the raised event and defined policies. 2.10 Time Check: (Event Source) Checks current time and raises event when it matches the time interval start or end specified in a policy or when a policy expires. 2.11 Servers Resources’ Detector: (Event Source) Measures the resources of the server specified in the policy (processor/RAM/storage). And acts as an event source to raise event when the values in the related policies are reached.
3 Implementation and Evaluation In this section, we implement a network scenario simulated in mininet. The scenario shows our framework usage, functionality, and the policies the framework can support. Through the given control parameters, the network administrator can generate the required rules to manage and tune the network saving the effort needed for writing a code to achieve the required scenario. 3.1 Network Topology Each policy is implemented separately in a client-server model with 3 hops between them as shown in Fig. 3. s1, s2, s3 represent the OpenFlow switches and c0 represents our framework built on pox controller.
160
R. Hamed et al.
Fig. 3. Network topology for testing the network scenarios in mininet
3.2 Network Scenario Implementation Policy1: QoS-oriented. The aim of this policy is to prevent applications like FTP and traffic of high data rate of a certain subnet from consuming the bandwidth and blocking the network. This is done through taking an action for the corresponding flow, like decreasing the data rate or rerouting traffic or even blocking or allowing traffic while notifying the network administrator. However, taking the action is according to the time the policy is applied, and other network conditions and resources such as available link bandwidth. The objective of this policy is to enhance the QoS. Below QoS tab, the network administrator can find the required keywords classified according to network concerns. Policy definition using keywords. Policies are defined through the GUI shown in Fig. 2. Using the keywords presented in Table 1. Table 2 shows the different values to be entered in keywords below different tabs to define policy1. Simulation results. The policy action differs according to global keywords. Policy 1 presents three different cases as defined by the network administrator. In case1 when the function is “QoS” and the concern is “resources” the action is “block”. In case 2 when the function is “QoS” and the concern is “application/resources” the action is “decrease data rate”. In case3 when the function is “Monitoring” regardless of the concern the action is “allow and notify the administrator”. The simulation results for the three cases are shown in Fig. 4. The default action is “allow” and it is taken after about 0.0759 s from flow’s start. The policy-based action is taken when all the policy’s conditions apply. According to Table 2. in policy 1 the action is taken for ftp traffic from subnet 10.0.0.0/24, when
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
161
Table 2. Policy1 keywords definition. Keywords below QoS/application tab and Monitoring/application tab Application type
ftp
Keywords below QoS/communication tab and Monitoring/communication tab Source network address
10.0.0.0
Source mask
24
Data rate
9
Keywords below QoS/resources tab and Monitoring/resources tab link bandwidth consumption %
50
Concern-based action Function: QoS, concern: resources
Block
Function: QoS, concern: application/resources Decrease data rate Function: monitoring, regardless of concern
Allow-notify
Time interval start
9
Time interval end
14
Fig. 4. Throughput for the three cases for ftp traffic from the subnet 10.0.0.0/24 according to policy1 and global keywords’ values.
the traffic’s data rate exceeds 9 mb/s and the link’s bandwidth utilization exceeds 50%. The conditions were met after about 10 s from the flow’s start.
162
R. Hamed et al.
Policy2: (Control Traffic According to Server Resources) Policy definition using keywords. Policy 2 checks server resources and take action when server’ storage usage reaches a certain percentage. The action differs according to the concern specified by the network administrator through the global keywords as follows; Concern: resources (server resources), action: block. Concern: communication, resources (link resources) action: decrease data rate. Concern application, action: allow and notify Fig. 5 shows the policy definition using keywords.
Fig. 5. Check the storage of server 10.0.0.3 and when it exceeds 55% raise event to take an action
Simulation results. Figure 6 shows the data rate of traffic from client sending a big file to the server, and how it is controlled according to the policy Fig. 7 shows the storage utilization percentage while traffic is being transferred. In case 1 (“resources” is the only concern), the traffic is blocked once the server storage limit is exceeded after about 170 s, to prevent server crash due to full storage. The maximum storage utilization percent reaches 56% and increases no more. The file can be lately transferred after solving the storage problem by increasing the server storage or deleting unnecessary data. In case 2 (“communication” and “resources” concerns), the traffic data rate is decreased so the traffic continues with lower data rate consuming less bandwidth. The file is completely transferred after about 750 s. The maximum storage utilization percent reaches 61% when the file is completely transferred. In case 3 (“application” is the only concern) the traffic is allowed. The file is completely transferred after about 470 s. The maximum storage utilization percent reaches 61% when the file is completely transferred.
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
163
Fig. 6. Traffic data rate from the client 10.0.0.5 writing to the server 10.0.0.3 according to policy2 and global keywords values.
Fig. 7. Storage percent utilization on server 10.0.0.3 according to policy2 and global keywords values
Policy 3: Security-Oriented Policy (Access Control) For a certain subnet this policy allows access only to a given list of mac addresses to prevent unauthorized computers from accessing the network. The objective of this policy is security. Below security tab
164
R. Hamed et al.
Fig. 8. Matching traffic by mac address and subnet, allowed mac addresses entered separated by hyphen (00:00:00:00:00:01-00:00:00:00:00:02-00:00:00:00:00:03-00:00:00:00:00:0400:00:00:00:00:05)
Fig. 9. Traffic with source mac address specified in policy is allowed, while other traffic is blocked.
the network administrator can find the required keywords classified according to network concerns. Policy definition using keywords: Define all actions to “allow” regardless of the concern as shown in Fig. 8. Simulation Results For the host of mac address “00:00:00:00:00:2” The traffic is accepted as the source mac address is in the access list. The policy-based action is taken allowing traffic after about 0.053 s. The traffic from the host of mac address “00:00:00:00:00:12” is blocked as shown in Fig. 9 as it is not in the access list. Policy 3 is also applied on the traffic of policy1 and policy2 where security function is activated, so policy 3 is active.
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
165
3.3 Evaluation Usability: qualitative evaluation: IoTManager facilitates network management through: • Separation of concerns. • Saving the effort needed for writing code to achieve the required network scenario. Feasibility: quantitative evaluation: Table 3 shows the different processing times measured for the three policies getting the average values. Processing times are calculated provided that statistics are requested from switches every 10 s, and mininet is set in a Linux virtual machine on a windows host with an Intel CORE i7 processor and 8 GB of RAM. Also provided that the number of active policies is three and the number of hops between each server and client is three switches. Table 3. IoTManager processing times. Different processing times
Average value in sec.
Time writing defined policy keywords to a file (T1)
0.0104
Time converting policy saved in policy file to policy object (T2)
0.0024
Time converting policy objects to OpenFlow rules inserted in switches T3 (time between reading policy objects and adding flow to switch)
0.0316
Time converting defined policy keywords to OpenFlow rules T1 + T2 + T3
0.0445
Time handling events (time between raising the 0.0557 event and handling it by calling “policy to flow rules function” to check policies related to this event) Time taking the action (time between receiving packetin event and adding a corresponding flow to the switch) Default action
0.0759
Policy based action
When policy conditions are met (shown in graphs)
4 Future Work The paper introduced the initial version of IoTManager framework, there are ideas to improve the framework to obtain better results and get the most out of the design idea.
166
R. Hamed et al.
Possible future work may include the following. • Complete defining a list in IoTManager for applications and attacks and how to detect them through behavior, so that the network administrator can define the application or attack by name, and they are detected by IoTManager. • Adding a block for user authentication. • Adding a block for policy feedback to assure policies are correctly implemented. • Deploying the framework using OpenFlow enabled switches with real traffic. • Measuring the processing time with greater number of policies, hosts and switches, determining the effect of increasing each on processing times. • Developing a generic translator for policies to any language to be compatible with all controllers and newer OpenFlow version. • Using machine learning to automatically define the policies without the network administrators’ intervention. Where concern and function global values change automatically according to network conditions and parameters.
5 Conclusion We have introduced IoTManager, an SDN concern-based network management framework. It operates in the SDN management plane and provides abstraction for lower level layers. IoTManager provides an interface for defining policies through keywords saving the effort needed to write a code in the controller to achieve the required scenario. The framework’s design is based on separation of concerns according to network management function and various network concerns (application, communication and resources). The provided keywords allow defining flow fields to match them with the traffic’s flow. Such keywords are proactively converted to flow rules inserted in policies. Also, they allow defining other network conditions to raise events when conditions are met and to handle them according to the policies. A network scenario implemented in mininet showed our framework’s functionality and the policies it can support. In addition, the performance was measured through measuring the processing times to enforce the policies. The introduced abstraction adopting the separation of concerns concept in the management plane, helps dividing the network management problem to easily handled lightweight sub problems. Additionally, it would serve the IoT networks through meeting the different requirements of network services.
References 1. Fundation, O.N., Software-defined networking: the new norm for networks. ONF White Paper, 2, 2.6–6.1 (2012) 2. McKeown, N., et al.: OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Comput. Commun. Rev. 38(2), 69–74 (2008) 3. Kreutz, D., et al.: Software-defined networking: a comprehensive survey. Proc. IEEE 103(1), 14–76 (2015) 4. Cox, J.H., et al.: Advancing software-defined networks: a survey. IEEE Access 5, 25487– 25526 (2017)
IoTManager: Concerns-Based SDN Management Framework for IoT Networks
167
5. Comer, D., Rastegarnia, A.: OSDF: a framework for software defined network programming. In: 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC) (2018) 6. Lara, A., Ramamurthy, B.: OpenSec: policy-based security using software-defined networking. IEEE Trans. Network Service Manage. 13(1), 30–42 (2016) 7. Krainyk, Y., Dvornik, O., Krainyk, O.: Software-defined network application-aware controller for Internet-of-Things. In: 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT) (2019) 8. Bera, S., et al.: Soft-WSN: software-defined WSN management system for IoT applications. IEEE Syst. J. 12(3), 2074–2081 (2016) 9. Tadros, C.N., Mokhtar, B., Rizk, M.R.: Logically centralized-physically distributed software defined network controller architecture. In: 2018 IEEE Global Conference on Internet of Things (GCIoT) (2018) 10. Hamed, R., Mokhtar, B., Rizk, M.: Towards concern-based SDN management framework for computer networks. In: 2018 35th National Radio Science Conference (NRSC) (2018) 11. Kaur, S., Singh, J., Ghumman, N.S.: Network programmability using POX controller. In: ICCCS International Conference on Communication, Computing & Systems (2014)
JomImage: Weight Control with Mobile SnapFudo Viva Vivilyana1 , P. S. JosephNg1(B) , A. S. Shibghatullah1 , and H. C. Eaw2 1 Institute of Computer Science and Digital Innovation, UCSI University, Kuala Lumpur,
Malaysia [email protected], {josephng,abdulsamad}@ucsiuniversity.edu.my 2 Faculty of Business and Information Science, UCSI University, Kuala Lumpur, Malaysia [email protected]
Abstract. Tracking calorie of nourishment expended has become more difficult when eating outside. With numerous decisions of inexpensive food accessible, the utilization of fast food is high among Malaysians. In 2014, WHO has revealed that Malaysia is the fattest country of South East Asia. These days, there is numerous calories following applications accessible in-store to help with weight reduction. In any case, these applications may not be reasonable as it requires numerous user inputs for food logging. This study is proposed to improve the smart dieting propensity and advance weight reduction a mobile app for android. This study will involve image recognition of food to minimize user input in food logging. From the survey result, most respondents are aware of their number of body mass index but find a difficulty to track their calorie intake when dining out. The opinion concluded from the interview, it is difficult to know every fixing and how much of the culinary expert put in their meals served. In this manner, the mobile application comes out with the solution of image recognition and simple interface to ease the burden of inputs requirements that most calories following applications are battling now. As the outcome, the executed apps can offer information on the most proficient method to get thinner and propel them to accomplish their weight objective. Keywords: Calorie · Diet · Healthy diet · Weight loss · Calorie tracker · Food control · Obesity · Overweight · Malaysia · Image recognition · Food recognition · Image classification · My Fitness Pal · Calorie tracking · Fat loss · Dietary habits
1 Introduction When eating outside, individuals will generally pick food in their helpful. Besides, fast food restaurants are accessible wherever with the less expensive value offered. As time goes, the consumption of fast food could be is higher. In Malaysia, the consumption of fast food among young Malays is high [1]. Lack of exercise likewise elevates them to be fat. Choi, J. & Zhao (2012) mentioned that the high consumption of food at a © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 168–180, 2021. https://doi.org/10.1007/978-3-030-55190-2_13
JomImage: Weight Control with Mobile SnapFudo
169
restaurant can affect the obesity level to be increased [2]. Currently, there is more than 70% populace over the world are sorted as overweight or obese [3]. In 2014, WHO also has reported that “Malaysia is the fattest country in South East Asia” [4]. Based on the Malaysian National Health and Morbidity Survey in 2012, there are 33.3% people are pre-obese and 27.2% are categorized obese [1]. Due to overweight and obesity disease, 2.8 million population are dying per year [5]. People who have obesity have a higher probability to have a variety of diseases such as high cholesterol, high blood pressure, hypertension, high lipids, cardiovascular disease, and type-2 diabetes [3, 5]. Other than medical issues, overweight and obesity could influence job performance. Obese people have the capability of a higher score of productivity loss contrasted with those individuals who are in normal BMI [6]. However, most people who are obese tend to eat unhealthy food which has a very high calorie [7]. It would be difficult to change the habit once it is engrained [8]. As an impact, controlling the appetite on eating unhealthy food might be hard to resist due to unhealthy eating habit. Appetite level could be reflected by how the human eat and how much do they consume [9]. The significant level of fat mass could awaken the hormone which controls hunger to appear [9]. It will lead people to indulge and gain more weight. However, dietary plan with user’s self-control can facilitate them to contribute successful diet and prevent overeating [10]. Thus, a balanced calorie and nutrition level could influence people to maintain or lose weight. Controlling calorie intake is also very important to achieve weight loss. Normally, the approximation of calorie intake for male adults is around 2500 calories while 2000 calories for female adults [5]. Nonetheless, following all records of food intake might be tedious particularly for busy people. The mobile app has come with its convenience to track on the user’s calorie intake. It is proven that mobile apps can affect individuals’ propensities on a healthy diet [11]. Smartphone now becomes a must-have the item in people’s lifestyle. Currently, around 72% population in Malaysia are a smartphone user [12]. My Fitness Pal is the most popular mobile application from the US to help to track food and calorie intake [13]. In Malaysia, other mobile application like My Diet Coach would also provide BMI counter and motivational tips on the element of a balanced diet. However, the users would still free to choose the most effective health application which can provide the greatest value in return. Based on Hingle and Patrick health application can only provide the nutrition tips to the users and the users will still need engaging experience during the iterative process of health improvement [14], as such, they can be maximized the chances that experimental on the health tips given. There are also other similar mobile apps such as Lifesum and Fat Secret. To elaborate on the comparison of these apps, the table of comparison is illustrated in Table 1. The Table 1 shows all apps have food input and calorie tracking as their main features. However, [15] mentioned that many people dislike using calorie counter apps to help weight loss as it requires many input and measurement. Hence, those apps will not be sustainable due to the consistent input required from users. Thus, the user can be demotivated when they update their tracking whenever they log their food. However, those data are important to process output from the program to the user. User input with just-in-time food recording might help to reduce the problem of food logging [3].
170
V. Vivilyana et al. Table 1. Feature comparison in current mobile applications Features
MFP Lifesum Fat Secret
Food Logging
/
/
/
Scan Barcode
/
/
/
Calorie Goals
/
/
/
Food Diary
/
/
/
Location (Restaurant nearby) /
X
X
Recipes
/
/
/
Water intake
/
/
X
Nutrition chart
/
X
/
Diet Progression
/
/
/
Meal recommendation
X
X
X
This paper aims to answer the research objective via the research question in Table 2. Table 2. Research questions
RQ1: What effect could mobile applications approach on healthy food intake of the students and staffs in university? RQ2: What type of food that people should eat to lose weight when they dine out in university area? RQ3: How to encourage people to use mobile applications to track calorie intake without many user inputs required? RQ4: How the technology could encourage people in the university environment to eat healthy food and keep them motivated to achieve their weight loss?
There are four research questions determined to examine the value creations. To reduce the scope area, the investigation will be conducted in a Private University only. Each research question will be answered by each research objective in Table 3. Based on Table 3, this study is aimed to develop a mobile application that could encourage people to lose weight and achieve their healthy lifestyle even though with food served in the restaurant. R01: To encourage people to achieve their weight goal and healthy body, the mobile application will come out as a platform to facilitate user on their diet. It could give more convenient for people whenever and wherever they are.
JomImage: Weight Control with Mobile SnapFudo
171
Table 3. Research objective
R01: To encourage people to achieve their weight goal and healthy body among staff and student in a Private University with a mobile application. R02: To recommend healthy food available within all restaurant around a Private University. R03: To minimize user’s action by adding image recognition in the program for inputting data. R04: To develop a mobile application that encourages people in a Private University to eat healthily.
R02: To recommend healthy food available within all restaurant around a Private University. This recommendation could give the choice of food for the user and aims to persuade them to choose low-calorie food over tasty food. R03: To minimize user’s action by adding image recognition in the program for inputting data could promote people to log their food easier with less effort. The user could input their food without burden typing and searching the name of the food they are looking for. R04: To develop a mobile application that encourages people in a Private University to eat healthily. With all features provided in the app, the user could keep in the track on what food they have consumed. From the objectives derived, these are hypotheses for each point of it and summarized in Fig. 1 below.
Fig. 1. Research model
H1: The control of calorie will bring people’s BMI level into normal. Calorie has the most important roles in achieving a healthy weight. Deficit calorie is important in weight loss while surplus calorie is important in weight gain. For instance, to lose weight, the calorie consumed within a day must not exceed the TDEE number. No matter how healthy the food is eaten, if it exceeds the recommended calorie intake of how much body needed, it will give an effect on gain weight. Having a normal BMI level could promote a healthy body. Moreover, it will prevent many diseases like obese and overweight could.
172
V. Vivilyana et al.
H2: Food Recommendation will improve consumption of healthy food at a Private University. The recommendation of healthy food available could give the option for the user. By recommending healthy food along with the calories contained provided, the user will tend to choose the healthy food recommended by the system rather than following their instinct of what is tastier which is mostly not healthy for diet. H3: Image recognition will help to reduce data input required for the apps. By taking a picture of the food that is going to be consumed, the user could input the food without typing one by one of the ingredients. This would be more efficient and save time which impact user to be more motivated on their diet. Moreover, there is no reason to postpone their diet plan because of the troublesome of data input using the calorie tracking app. H4: Calorie tracking apps will help people to achieve their weight goals. With all features provided in the app, the user could keep in the track on what food they have consumed. On the other hand, with the convenience of a smartphone, the user can use the app whenever they go by only one click. Thus, the awareness of taking care of their body could be stimulated so that they can achieve their weight goals and a healthy body.
2 Methodology This study will apply the mixed method to answer how and why questions from the target audience more effectively. Survey and interview will have occurred during the data collection. The data is to be collected via the following methodology as summarized below. Table 4. Research methodology [16–18] Research dimension
Explanatory sequential design
Research methodology Qualitative reasoning Research methods
Personalized interview Simulation testing
Based on Table 4, the research dimension will be explained in the sequential design that will explain each step on data collection. Mixed-mode or mixed-method is applied by doing a random survey and interview as its primary data collection. To elaborate on the steps of data collection, the sequential design will illustrate as shown in Fig. 2.
JomImage: Weight Control with Mobile SnapFudo
173
Fig. 2. Sequential design [19–24]
Surveying the quantitative data collection, the generalized information will be gathered. The survey will be done by randomly choose people at a Private University. After the generalized explanatory analysis is identified, it will be followed with the qualitative data collection which is the interview. The interview will be done by questioning people who are on a diet and eat healthily as part of their habit. From the interview, the depth analysis and reasoning behind the quantitative research will be accumulated. Finally, all the information will be characterized and concluded together to meet one conclusion. Preliminary data collection is conducted first beforehand to test if the questionnaires are understandable for the respondents. There are 5 respondents for the survey and 3 respondents for interview. Small changes in survey and interview questions have occurred due to there are some irrelevant answers from the respondents. In the actual data collection, there are 31 respondents for the survey and 7 people asked for an interview. For the survey, random people were selected around a Private University to fill in the questionnaire. Besides, people who are on diet and trying to achieve a healthy lifestyle are asked for an interview about their opinion about calorie intake and how mobile apps can take a role in weight loss. After the data collection is completed, the result will be used to analyse how the app will be developed. Android Studio is used as the IDE tools as it is easy and fast to implement the app. Moreover, based on the survey result [26], most people use an Android as their Operating System of their phone. As this study proposed image recognition as another input of food for calorie tracking, the app will involve the algorithm of image recognition. This study reuses the image recognition algorithm from Google API that has higher accuracy and safe more time instead of training the system to recognize the item. Moreover, the system will match the image recognized with the food in the database. Therefore, if the image captured by the camera does not exist in the database, the app will not result in anything on the screen.
174
V. Vivilyana et al.
3 Results and Findings According to out of the research have been found, the awareness of BMI in a Private University is high in both preliminary and technical data collection. Figure 3 below summarizes the responses.
Post-test: awareness on BMI 14 12 10 8 6 4 2 0
12 10 6 2
1
Very Not Aware
Not Aware
Neutral
Aware
Very Aware
Fig. 3. Post-test results of awareness of BMI
As illustrated in Fig. 3, most respondents are aware of BMI and their weight. This statement could support encouraging people to achieve their weight goal and a healthy body with a mobile app. On the other hand, 15 out of 31 people chose food and calorie is the most impactful cause for obesity. Based on the interview resulted, all respondents are agreed that calorie takes the most important role in the diet because it could determine how much the energy used and consumed from the food eaten. However, it is difficult to track the calorie of food served in the most restaurant. From the survey result, 55% of people (15 out of 31) have difficulty in controlling their calorie intake while they are eating out at a restaurant. Because it is difficult to know what ingredients and condiments the chef put in their meals. Further, most restaurants are focus more on taste than the healthiness of their food. Figure 4 conclude that 5 out of 7 respondents from interview think that because of the ingredients used, it is hard to count how many calories contained in the certain food. Besides, most restaurant tends to cook their meals to be tastier rather than healthier to approach more customers to eat their food. Therefore, it is hard to determine the exact number of calories contained. All respondents agreed that food recommendation could promote weight loss if the nutritional information of food is provided and give many choices for the user. Thus, the user can compare the nutrients of each food and decide what they should eat. The application of image recognition in the mobile app for food logging has the positive feedback from the respondents. There are 40% respondents prefer image and 57% of respondents prefer both image and text-based for food logging embedded in the app.
JomImage: Weight Control with Mobile SnapFudo
175
Reason why tracking calorie of food outside is difficult 12% 63%
25%
Por on size Taste Ingredients
Fig. 4. Interview result of reason why tracking calorie of food outside is difficult
The interview’s respondents also gave positive feedback about the mobile app as a calorie tracker because of its convenience. With the app, user can get the notification as to the reminder and no reason for them to not track their food since they have the phone and the app with them. Besides, the app could give knowledge to the people who are new in diet and lead them to achieve their goals. From the feedback from data collected, the product solution has been implemented with three types of the user include admin, shop owner, and the general user. The app has been implemented and recognizes the food captured on camera. Figure 5 below illustrates the app’s home screen.
Fig. 5. Home Screen of the app
176
V. Vivilyana et al.
Figure 5 displays the home screen where the user can log their food and track the calorie. This home screen is displayed for the first time when the user successfully logged in to the app. The right view in Fig. 5 shows the food has been successfully logged when the user logs their food. The app automatically calculates the calorie that user has eaten within a day. In the bottom navigation menu, there are three menus for the user to select such as home, camera, and account. The camera menu will show new activity where the user can access the camera to log their food. The image recognition is working in this activity which is shown in Fig. 6.
Fig. 6. Real-time image recognition on the developed app
Figure 6 above shows that the app recognized the image captured as sweet and sour and results from it in the new screen together with its calorie and shop in which the food is sold. As this dish commonly found in many shops in the UCSI, the result has many foods listed. After the user clicks to one of the items listed, they will be brought to the food details. From the food details screen, the user can input how many serving of food they have eaten and let them log into their food diary. Afterwards, the system will perform the calculation and bring the user back to the home screen as shown previously in Fig. 5. Besides image recognition, another main feature of the app is food recommendation. Figure 7 below will show the function of the developed app. As illustrated in Fig. 7, the food recommendation is based on the meal category include breakfast, lunch, dinner, snack and drink. This app will calculate the suggested calorie for the user in each meal. There is 25% of total calorie for breakfast, 30% for lunch, 35% for dinner, and 10% of total snack and drink. This percentage is the basic and
JomImage: Weight Control with Mobile SnapFudo
177
Fig. 7. Food recommendation of the app
recommended amount of calorie for each meal per day. The illustrated result is the food recommendation for breakfast. The food listed is sorted from the lowest to the highest calorie with the information details displayed in each item. Therefore, the user will tend to choose lower-calorie food for their meal. Another menu which is account page is where the user can personalize their data such as weight, height, or their name. This activity also allows them to see weight loss progression and BMI indicator. With the chart and indicator displayed, the user will be more aware of their weight and keen to achieve the target weight they have indicated. In Fig. 8, the progress bar shows how far the target weight that user want to achieve from their starting point. Below the chart, there are two buttons to update their weight and height. The system also shows the BMI value and in which category is the user. After a few users tested the app, with these features included in the calorie tracking app developed, the process of tracking calorie is more enjoyable and motivate the user to keep tracking their calorie intake. The calorie tracker works well like another calorie tracking app in the market. All features developed successfully without error occurred.
178
V. Vivilyana et al.
Fig. 8. Account page of the app
4 Conclusion and Future Works In conclusion, the implementation of the mobile app as a medium for calorie tracking can encourage people to achieve a healthy body is led to success. The implementation of image recognition in the app as another way to log food could minimize user input. Besides, food recommendation can control the calorie intake and support weight loss easily. The feature is achieved by determining the calorie intake recommended for the user and showing the resulted food based on its calorie and macronutrient value. Effectively usage on the health app could help the user reduce the risk of heart disease. Based on the statistic figure reported in the year 2019, Dr. Lee the Deputy Health Minister described that about 50 Malaysian was kill daily by the coronary heart disease [25]. Hence, with the aid of a health app, the government can save more budgets on medicine costs allocation [26, 27]. However, as this app has a limited scope, which is within shops in a Private University only, the future study may enlarge the scope to more restaurants involved. Even though this app judge itself could motivate the user to achieve their weight goals and a healthy body, the impact could be seen from the user’s action and consistency. If many users participate to use this app and successfully achieved a healthy body, Malaysia can be better and healthier. However, it might take time to result in the significant difference of each person since every person has a different metabolism. The future study might be enlarged the scope of food available and add more option of vegetarian and halal food. Moreover, an
JomImage: Weight Control with Mobile SnapFudo
179
experimental focus group conducted to monitor closely the effectiveness of the health app tips for the user and their BMI result. Besides, quantitative research on data-driven could derive more numerical based statistics on this health app study.
References 1. Abdullah, N.N, Mokhtar, M.M., Bakar, M.H.A., Al-Kubaisky, W.: The trend of fast food consumption in relation to obesity among Selangor urban community. Proc. – Soc. Behav. Sci. 202, 505–513 (2015). ABRA International Conference on Quality of Life, AQoL2014, Istanbul, Turkey, 26–28 December 2014 2. Choi, J., Zhao, J.: Customers’ behaviours when eating out: does eating out change customers’ intention to eat healthily? Br. Food J. 116(3), 494–509 (2012) 3. Dun, C.G., Turner-McGrievy, G.M., Wilcox, S., Hutto, B.: Dietary self-monitoring through calorie tracking but not through a digital photography app is associated with significant weight loss: the 2SMART pilot study—A 6-month randomized trial. J. Acad. Nutr. Diet. 119(9), 1525–1532 (2019) 4. Tan, A.K.G., Wang, Y., Yen, S.T., Feisul, M.I.: Physical activity and body weight among adults in Malaysia. Appl. Econ. Perspect. Policy 38(2), 318–333 (2016) 5. Harous, S., Menshawy, M.E., Serhani, M.E., Benharref, A.: Mobile health architecture for obesity management using sensory and social data. Inform. Med. Unlocked 10, 27–44 (2018) 6. Ku, B., Phillips, K.E., Fitzpatrick, J.J.: The Relationship of body mass index (BMI) to job performance, absenteeism and risk of eating disorder among hospital-based nurses. Appl. Nurs. Res. 49(10), 77–79 (2019) 7. Crovetto, M., Valladares, M., Espinoza, V., Mena, F., Oñate, G., Fernandez, M., DuránAgüero, S.: Effect of healthy and unhealthy habits on obesity: a multicentric study. Nutrition 54, 7–11 (2018) 8. Ohtomo, S.: Exposure to diet priming images as cues to reduce the influence of unhealthy eating habits. Appetite 109, 83–92 (2017) 9. Blundell, J.E.: Appetite control – biological and psychological factors (chap. 3). In: Eating Disorder and Obesity in Children and Adolescents, pp. 17–22 (2019) 10. Naughton, P., McCarthy, M., McCarthy, S.: Acting to self-regulate unhealthy eating habits. An investigation into the effects of habit, hedonic hunger and self-regulation on sugar consumption from confectionery foods. Food Qual. Prefer. 46, 173–183 (2015) 11. Ipjian, M.L., Johnston, C.S.: Smartphone technology facilitates dietary change in healthy adults. Nutrition 33(2017), 343–347 (2017) 12. Osman, M.A., Talib, A.Z., Sanusi, Z.A., Shiang-Yen, T., Alwi, A.S.: A study of the trend of smartphone and its usage behavior in Malaysia. Int. J. New Comput. Archit. Appl. (IJNCAA) 2(1), 275–286 (2012) 13. Chen, J., Berkman, W., Bardouh, M., Kammy, C.Y., Farinelli, M.A.: The use of a food logging app in the naturalistic setting fails to provide accurate measurements of nutrients and poses usability challenges. Nutrition 57(2019), 208–216 (2019) 14. Hingle, M., Patrick, H.: There are thousands of apps for that: navigating mobile technology for nutrition education and behaviour. J. Nutr. Educ. Behav. 48(3), 213–218 (2016) 15. Levinson, C.A., Fewell, L., Brosof, L.C.: My Fitness Pal calorie tracker usage in the eating disorders. Eat. Behav. 27(2017), 14–16 (2017) 16. JosephNg, P.S., Kang, C.M., Mahmood, A.K., Wong, S.W., Phan, K.Y., Saw, S.H., Lim, J.T.: EaaS: available yet hidden infrastructure inside MSE. In: 5th International Conference on Network, Communication and Computing, ACM International Conference Proceeding Series, Kyoto, Japan, pp. 17–20 (2016)
180
V. Vivilyana et al.
17. JosephNg, P.S., Ahmad Kamil, M.: Infrastructure utility framework for SME competitiveness during economic turbulence. In: Forum for Interdisciplinary and Integrative Studies, Langkawi, Malaysia, pp. 1–10 (2013) 18. Ng Poh Soon, J., Yin, C.P., Wan, W.S., Nazmudeen, M.S.H.: Energizing ICT infrastructure for Malaysia SME during economic turbulence. In: Student Conference on Research and Development, Cyberjaya, Malaysia, pp. 310–314. IEEE Explore (2011) 19. JosephNg, P.S.: EaaS infrastructure disruptor for MSE. Int. J. Bus. Inf. Syst. 30(3), 373–385 (2019) 20. JosephNg, P.S.: EaaS optimization: available yet hidden information technology infrastructure inside medium size enterprises. J. Technol. Forecast. Soc. Change 132(July), 165–173 (2018) 21. JosephNg, P.S., Kang, C.M.: Beyond barebone cloud infrastructure services: Stumbling competitiveness during economic turbulence. J. Sci. Technol. 24(1), 101–121 (2016) 22. JosephNg, P.S., Kang, C.M., Mahmood, A.K., Choo, P.Y., Wong, S.W., Phan, K.Y., Lim, E.H.: Exostructure services for infrastructure resources optimization. J. Telecommun. Electron. Comput. Eng. 8(4), 65–69 (2016) 23. JosephNg, P.S., Yin, C.P., Wan, W.S., Yuen, P.K., Heng, L.E.: Hibernating ICT infrastructure during rainy days. J. Emerg. Trends Comput. Inf. Sci. 3(1), 112–116 (2012) 24. Vivilyana, V., et al.: JomImage SnapFudo: control your food in snap. In: IEEE 6th International Conference on Engineering Technologies and Applied Science, Kuala Lumpur, Malaysia (2020) 25. AdrianChin, Y.K., JosephNg, P.S., Shibghatullah, A.S., Loh, Y.F.: JomDataMining: learning behaviour affecting their academic performance, really?. In: IEEE 6th International Conference on Engineering Technologies and Applied Science, Kuala Lumpur, Malaysia (2020) 26. Kang, C.M., et al.: JomCai: the contribution of computer assisted instruction on learning improvement of science student. In: Terengganu International Business and Economic Conference, Terengganu, Malaysia, pp. 707–802 (2016) 27. Zahari, B.J.: Heart disease kills 50 Malaysian every day (2019). https://www.nst.com.my/ news/nation/2019/11/541868/heart-disease-kills-50-malaysians-every-day. Accessed 30 Jan 2020
Smart Assist System for Driver Safety Etee Kawna Roy(B) and Shubhalaxmi Kher Arkansas State University, Jonesboro, AR 72467, USA [email protected], [email protected]
Abstract. This work presents our research aimed to develop a driver safety assistant system. The idea is to use in-vehicle camera with vision sensor to detect emotional distress level of the driver while driving. An algorithm to identify facial expression of the driver is developed using Python programming language. In addition, a prototype of facial expression detection along with a car parking assist system is developed by using an Arduino Uno ATMEL ATMEGA328 microcontroller interfaced with a webcam and a motor to demonstrate the concept. The camera mounted on the dashboard continuously monitors the driver’s face and captures the facial expressions. The facial expressions so captured help assess the driver’s (particularly, the truck driver) situation and identify it in terms of severe pain, headache, cardiac arrest, etc. Once the system identifies the situation, controller then assists in driving the car to the curb and bringing it to a complete stop. The facial expression identification algorithm uses the sensors (like speed, steering etc.) to detect the abnormality from the facial expression and subsequently alert the driver for 30 s. The system continuously checks the driver’s profile. If the driver is driving while continuously in pain for another 30 s, further assistance in terms of embedded vehicle controlling system will take charge of maneuvering the vehicle and slowly parking on the curb. While parking to the right side of the road, the vehicle control system will continuously check the traffic on the adjacent lane before parking slowly on the curb. Additionally, turn indicators will help maneuver the vehicle by keeping the turn signal on. To model the system, a network is trained using deep learning with 5000 data instances. The trained model is then validated by using real time images from camera to check whether the image of face confirms to the normal pattern or in pain. Keywords: Safe maneuvering · Deep learning · Closed loop control · Smart assist system
1 Introduction Today’s technology may become tomorrow’s worst nightmare. According to World Health Organization (WHO), about 1.35 million people die each year due to the occurrence of road crashes [1]. Approximately 38,800 people were died and an estimated 4.4 million were injured in motor vehicle crashes involving a distracted driver in 2019 in the United States [2]. 10% of all drivers involved in fatal crashes are within ages of 15 to 19 years and were reported as distracted at the time of the crash. Many health problems ranging from bad cold to heart disease may be the causes of emotional distress while © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 181–187, 2021. https://doi.org/10.1007/978-3-030-55190-2_14
182
E. K. Roy and S. Kher
driving. Emotions can have a great effect on our driving ability. We may not be able to drive comfortably if we are overly worried, excited, afraid, angry, or just “down”. According to current research, factors such as fatigue, impairment, and distraction while getting behind the wheel in a certain mental state (angry or sad) were reported in nearly 90 percent of motor vehicle crashes. Virginia Tech Transportation Institute researchers find that drivers increase their crash risk nearly tenfold when they get behind the wheel while observably angry, sad, crying, or emotionally agitated [3]. The motivation to work on this project is a result of focusing on the driver safety. While the driver is driving and frequently experiences sickness, there is a possibility that he/she may lose the control and need immediate attention. A camera installed on the dash, in front of the driver is aimed to capture images of the face at a predetermined time interval and send it to controller. An algorithm that can comprehensively read the data and identify if the assistive control of the vehicle should also be applied is designed. Subsequently, the algorithm uses deep learning model to analyze and detect driver’s condition, normal/in-pain. Additionally, it generates several alarm signals for emergency help, etc. (depending on driver’s condition) and maneuvers the vehicle to the shoulder after assessing the severity of the situation, and finally brings the vehicle to a safe zone i.e., to a complete stop.
2 Literature Review A number of researches have been done on facial expression recognition. For example, an Adaptive Attenuation Quantification Retinex (AAQR) method is used to enhance details in nighttime images to increase the performance of a face detection method for the deployment of nighttime driver assistance strategies in future Advanced Driver Assistance Systems (ADASs) or human-vehicle cooperative driving systems for autonomous vehicles [4]. The differences in response time in facial emotion recognition were studied [5]. Also, an introductory analysis is done to develop a real-time driver safety assistant system using an in-vehicle video camera to detect passengers and driver fatigue conditions for safe driving [6]. A fast-Facial Expression Recognition (FER) algorithm has been developed for monitoring a driver’s emotions that is capable of operating in low specifications devices installed in vehicles [7]. A prototype alcohol detection and car locking system is developed to monitor the driver and check if the driver is drunk. If the sensor detects contents of the alcohol in the driver’s breath, the car will not start or accelerate [8]. Several commercial products have been launched so far. For example, in 2006 Toyota introduced a “Driver Attention Monitor” vehicle safety system for Toyota and Lexus latest models with closed-eye detection [9, 10]. The system detects whether the driver’s eyes are open or not and it forwards an alert if an object is in front of the vehicle. Nissan has developed a new concept car to check for drunk driver and take steps like warning and stopping the car based on the driver condition [11].
3 Methodology The method presented below is implemented using HAAR cascade machine learning [12] object detection algorithm. To detect the emotional distress of the driver while
Smart Assist System for Driver Safety
183
driving, the flowchart in Fig. 1 is considered. A prototype for facial emotion detection system as well as control system for car parking is developed using an Arduino Uno microcontroller interfaced with a web camera and a breadboard to blink LED and run a motor in order to demonstrate the concept. The flowchart for this prototype is shown in Fig. 2. The data collected by the camera and other sensors are analyzed by the system and applied to a neural network for training. This is essentially a deep learning mechanism. To train the network, first step is to ensure the frame alignment for capturing the face of a person. While the camera is continuously capturing the facial expressions of the driver, analysis related to observing several frames in a row to identify expression that is not normal is carried out. On occurrence of such an expression, processing of that image is carried out which includes cropping, edge detection mechanisms, etc. Such normalized images are then saved to a directory. The network is then trained with these input images using TensorFlow [15] and HAAR cascade with a number of classifiers.
Fig. 1. Facial expression detection of the system
For the detection of facial expression, the OpenCV algorithm [13] is used. Inputs for the OpenCV algorithm are the webcam and the driving profile. It takes frames from the webcam and the driving profile the driver exhibits to identify the driver’s condition (normal condition/in pain). For the purpose of the proof of concept, Arduino Uno microcontroller board with a breadboard with several output devices like switches, LEDs, motor was employed. For the algorithms, code was written in Python and in Arduino IDE (C ++). Deep learning
184
E. K. Roy and S. Kher
Fig. 2. Prototype design
[14, 15] algorithm using Neural Network was employed and a training data set with 5000 samples was used. After the network was trained, real time testing data was used to check the correctness of the proposed algorithm and also to check the validity of the output. Figure 3 shows the block diagram of the system and Fig. 4 shows the prototype using Arduino Uno and microcontroller ATMEL ATMEGA328.
Fig. 3. Block schematic of the system.
Smart Assist System for Driver Safety
185
Fig. 4. Prototype interface using Arduino Uno.
4 Testing and Experimental Results The software implementation for this system is shown in Fig. 5. After generating the hex file from the Arduino sketch IDE environment, the hex file was copied from the Arduino file directory and linked to the prototype for simulation.
Fig. 5. Device programming.
An LED is used as indicating unit for the condition of “In Pain”. A resistor of 660 is used in front of the LED for controlled amount of current flow through LED. To connect the motor, an NPN transistor is used which acts like a switch, controlling the power to the motor and a diode is used to control the direction of current flow. The input resistor R is connected to Arduino board pin 9. The interfacing of DC motor and the microcontroller is shown in Fig. 6. To train the model, 5000 images with different expressions were used. Offline sample datasets were created and used. Additionally, real time images were used to validate the system, some of which are shown in Fig. 7 and Fig. 8. From Fig. 7 and Fig. 8, it can be seen that the code outputs the results as “Normal” or “In Pain” based on the facial expressions detected. As soon as the controller detects the driver is in pain, alarm is generated for the next 30 s. Since the controller monitors driver’s activity continuously, it checks if the driver regains normal condition during the
186
E. K. Roy and S. Kher
Fig. 6. Interfacing DC Motor with Arduino Uno [16].
Fig. 7. Captured facial emotion test data: “Normal”.
Fig. 8. Captured facial emotion test data: “In Pain”.
next 30 s alarm period. If the driver resumes normal pattern within that alarm period, then the alert as well as speed checking goes off. Otherwise, if the driver does not regain normal condition, then the system checks the speed fluctuation of the vehicle and begins another control routine to maneuver the vehicle safely. In other words, if the vehicle is in steady speed and the driver is still in pain, the controller takes control of maneuvering the vehicle to the shoulder while continuously checking its position, traffic, speed of the vehicle as well as the blind spot before coming to a complete stop at the shoulder.
5 Conclusion and Future Work In this research, we implemented a driver safety smart assist system prototype to recognize the emotional distress or pain of the drivers. Based on the facial expression, the embedded controller alerts the driver NOT to drive as well as turns on an emergency signal. The system prototype is designed and implemented successfully using Arduino
Smart Assist System for Driver Safety
187
Uno ATMEGA328 microcontroller. Validating the system showed that the webcam was able to deliver response when emotional distress (“In Pain” condition) was detected. However, there was a small amount of time delay between the webcam response and the LED blinking as the network was detecting the expression of the image captured just before the current image. The speed controlling using the real time images is shown using a 6 V DC motor. In future, an alarm system and a demo car with a feedback control using small camera will be utilized to implement the system. The system will generate an alarm to warn the driver of his health condition and also to take charge of the vehicle to safely pull it on the shoulder and bring it to a stop. Besides, the device can be programmed to continuously monitor for any still obstacle within a particular range of the car and if so, warn the driver or slowly lower the speed.
References 1. World Health Organization: Road traffic injuries. https://www.who.int/news-room/fact-she ets/detail/road-traffic-Injuries. Accessed 4 Nov 2019 2. National Safety Council: Fatality Estimates. https://www.nsc.org/road-safety/safety-topics/ fatality-estimates. Accessed 24 May 2020 3. EHSToday: Sad or Mad? Stay Out of the Car!. https://www.ehstoday.com/safety/article/219 17996/sad-or-mad-stay-out-of-the-car. Accessed 4 Nov 2019 4. Shen, J., et al.: Nighttime driving safety improvement via image enhancement for driver face detection. IEEE Access 6, 45625–45634 (2018) 5. Trepáˇcová, M., et al.: Differences in facial affect recognition between non-offending and offending drivers. Transp. Res. Part F: Traffic Psychol. Behav. 60, 582–589 (2019) 6. Alsibai, M.H., Abdul Manap, S.: A study on driver fatigue notification systems. ARPN J. Eng. Appl. Sci. 11(18), 10987–10992 (2016) 7. Jeong, M., Ko, B.C.: Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12), 4270 (2018) 8. Gbenga, D.E., et al.: Alcohol detection of drunk drivers with automatic car engine locking system. Nova J. Eng. Appl. Sci. 6(1), 1–15 (2017) 9. Wikipedia: Driver monitoring system. http://en.wikipedia.org/wiki/Driver_Monitoring_S ystem. Accessed 5 Nov 2019 10. Lexus: LS hybrid features – safety. https://www.lexus.com/models/LS-hybrid/safety. Accessed 25 Oct 2019 11. Nissan motor corporation: Drunk-driving prevention concept car - Future technology. https://www.nissan-global.com/EN/TECHNOLOGY/OVERVIEW/dpcc.html. Accessed 5 Nov 2019 12. Will Berger: Deep Learning HAAR Cascade Explained. http://www.willberger.org/cascadehaar-explained/. Accessed 4 Nov 201911/4 13. OpenCV, About. https://opencv.org/about/. Accessed 4 Nov 2019 14. Investopedia: Deep learning. https://www.investopedia.com/terms/d/deep-learning.asp. Accessed 5 Nov 2019 15. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016) (2016) 16. Arduino - DC motor. https://www.tutorialspoint.com/arduino/arduino_dc_motor.htm. Accessed 12 Dec 2019
On the Applicability of 2D Local Binary Patterns for Identifying Electrical Appliances in Non-intrusive Load Monitoring Yassine Himeur1(B) , Abdullah Alsalemi1 , Faycal Bensaali1 , Abbes Amira2 , Christos Sardianos3 , Iraklis Varlamis3 , and George Dimitrakopoulos3 1
3
Department of Electrical Engineering, Qatar University, Doha, Qatar {yassine.himeur,a.alsalemi,f.bensaali}@qu.edu.qa 2 Institute of Artificial Intelligence, De Montfort University, Leicester, UK [email protected] Department of Informatics and Telematics, Harokopio University of Athens, Kallithea, Greece {sardianos,varlamis,gdimitra}@hua.gr
Abstract. In recent years, the automatic identification of electrical devices through their power consumption signals finds a variety of applications in smart home monitoring and non-intrusive load monitoring (NILM). This work proposes a novel appliance identification scheme and introduces a new feature extraction method that represents power signals in a 2D space, similar to images and then extracts their properties. In this context, the local binary pattern (LBP) and other variants are investigated on their ability to extract histograms of 2D binary patterns of power signals. Specifically, by moving to a 2D representation space, each power sample is surrounded by eight neighbors at least. This can help extracting pertinent characteristics and providing more possibilities to encode power signals robustly. Moreover, the proposed identification technique has the main advantage of accurately recognizing the electrical devices independently of their states and on/off events, unlike existing models. Three public databases including real household power consumption measurements at the appliance-level are employed to assess the performance of the proposed system while considering various machine learning classifiers. The promising performance obtained in terms of accuracy and F-score proves the successful application of the 2D LBP in recognizing electrical devices and creates new possibilities for energy efficiency based on NILM models. Keywords: Appliance identification · Non-intrusive load monitoring Local binary pattern · Classifiers · Power consumption signals · Feature extraction
c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 188–205, 2021. https://doi.org/10.1007/978-3-030-55190-2_15
·
On the Applicability of 2D Local Binary Patterns
1
189
Introduction
The recent preeminence of the collection and analysis of data is an ever uprising topic in various disciplines and a catalyst to broadening technological progress [6,26]. In energy efficiency, power consumption data can act as a significant factor in understanding consumer behavior and fostering positive change towards improved efficiency [4,9]. In particular, appliance-level energy consumption can lead to finer analysis of “good” and “bad” energy usage, can help us identifying the corresponding source (e.g. a given appliance) [24], and its surrounding context (e.g. indoor/outdoor environmental conditions and consumer presence) [5,7]. Hence, it is prudent to collect and nurture this rich source of information in regular basis via a pragmatic and convenient means. Non-intrusive load monitoring can superbly aid in accumulating appliance-level datasets with minimum technical burden in terms of hardware installation cost and scalability [33]. The power consumption monitoring of electrical devices at households can supply the consumers with their daily consumption footprints. This can be directly invoked in triggering the end-users’ energy saving behavior [8]. Further it is has a major contribution into the elaboration and development of smart-grid demand management [16,28]. The power consumption monitoring mainly includes two major scheme categories: intrusive and non-intrusive load monitoring, namely ILM and NILM, respectively. ILM requires the installation of smart-sensors at the front-end of every domestic device to collect real-time load usage fingerprints. Although this approach has a high accuracy, it has a high implementation cost, since it requires a significant number of sensors to be installed and is highly intrusive, since it requires to transform the existing cable installation [15]. On the other hand, the NILM technique makes use of aggregated power records collected from the main supply and extracts approximate consumption data at appliance-level. This type of techniques has an easier installation than the ILM and a very low cost. It offers a wide practical perspectives and has attracted a lot of interests recently [14]. The 2D local texture techniques are quite popular for feature extraction mainly from images and videos. For example, the local binary pattern (LBP) has been applied in various signal analysis problems, mainly related to image and video processing (e.g. face identification, object recognition, medical image segmentation and iris and fingerprint recognition) with outstanding results. The LBP represents a non-parametric feature extraction technique that aims at effectively summarizing local regions of an image while assuring a very low complexity [1,2]. Recently, it has attracted growing attention in several areas of computer vision and has demonstrated efficacy in various applications such as face identification [1], object recognition, medical image segmentation [29], and iris and fingerprint recognition [18]. The features extracted with the LBP technique have many benefits, since they are highly discriminative and are extracted at a very low complexity, since the dimensionality of the query data is highly reduced by the use of the LBP descriptor. These facts, make the LBP a good candidate for more signal processing applications.
190
Y. Himeur et al.
Intrigued by the benefits of LBP, we propose in this paper a new appliance identification system based on 2D LBP, which is used to enhance the performance of NILM systems. More specifically, the 1D power consumption signals of electrical devices are transformed to 2D representations and the LBP model is then applied on the resulting 2D matrices for collecting the LBP histograms of each power signal. These histograms can robustly represent power consumption signals from different appliance groups through helping in minimizing the distance between devices of the same class, and in contrast increasing the gap between those from unsimilar appliance classes. This leads to better identification performance in comparison to existing techniques. To the best of our knowledge this is the first work that considers the application of 2D LBP descriptor to extract features from the power signals. In addition, different variants of LBP are investigated in this paper and an extensive comparison of their performance is performed in this framework as well. The remainder of this paper is structured as follows. An review of related appliance identification frameworks is conducted in Sect. 2. In Sect. 3, detailed explanations of the proposed appliance identification system based on the LBP descriptor are presented. Empirical results on public realistic databases are exhibited in Sect. 4 to assess the performance of the proposed solution and compare it with recent related works. Furthermore, Sect. 5 draws the main findings and recommendations emerging from this framework.
2 2.1
Related Work Literature Review
Appliance identification is an important step for developing low-cost and easy to setup NILM solutions. For this purpose, the methods that identify the appliances and their status from a collective consumption signal have gained in popularity among researchers. They are mainly based on extracting useful features from the current, voltage or active power signals, mainly associated with appliance on/off events, and then use these features to create a separate fingerprint for each appliance [19]. They usually employ machine learning techniques, classifier ensembles and clustering. The success of machine learning and deep learning techniques in several data mining tasks (e.g. classification, prediction, etc.) attracted the interest of researchers on appliance identification for NILM. In [21], authors train a set of auto-associative neural networks (AANNs) in order for each one to capture the characteristics of a particular electrical appliance. The transient power signal of an on/off event in each electrical appliance is used as input and output vector for training each AANN. In operation (test) time, the AANN ensemble processes the input vector of the collective power signal and the output signals are compared to the input. The closest reconstruction (winner AANN) is used to identify the electrical appliance that has been switched on or off. A cogent confabulation neural network (CCNN) has been used in [22] for the same purpose. The advantage of CCNN is that it does not require multiplications in the identification phase,
On the Applicability of 2D Local Binary Patterns
191
which makes it an effective choice for systems with low-computational capability. They also allow to learn power patterns of combined appliances usage, which further improves identification performance. In [10] authors use second and fourth order cumulants of the electric current signal and a trained artificial neural network (ANN) for appliance identification through classification. Finally, authors in [34] choose the extreme learning machine model as their base learner and the AdaBoost algorithm for building their ensemble extreme learning machine load recognition model. In [27] authors use only one power meter, installed on the main electric supply of a residence, and employ mean-shift clustering and multidimensional linear discriminates for identifying the appliance associated with each power consumption change. An event detection module, distinguishes between insignificant fluctuations, fast switching events (triangles) and steady working events (rectangles) using the recorded values of current, voltage and active power. The aim of the device identification module is to detect the device associated to each fast switching event. In a similar line, authors in [25] employed a clustering technique for associating changes in the collective current measurements of three appliances for identifying the appliance that has been switched on or off. In a similar task, authors in [13] followed a fuzzy rule-based approach and used the harmonics impedance of different groups of load (multiple appliances) in order to identify appliances. Their evaluation however, was on simulated data. A load event matching method that builds on the improved Kuhn-Munkras algorithm is proposed in [30]. The method first detects the load change event (appliances are turned on and/or off). Then uses a number of independent features (including appliances’ active and reactive power signatures) and their distribution and a graph matching technique in order to find the best match between power changes and appliance status changes. A different approach is used in [11], and is based on the spectral information that is extracted by low frequency smart meters. In that work authors distinguish three types of appliances depending on how they operate: i) the “single state” (SS), ii) the “continuous varying” (CV) and iii) the “multi state” (MS) appliances. This differs from the binary status (on/off or steady/changing) used in other works. In [31], to identify household devices, the features are extracted using an iterative time representation that is based on detecting the peaks and estimating the correlation between the feature set and training set pertaining to the same class and other classes, and thus two threshold values are used to extract the characteristics that represent each power signal. The collected features are then fed to a Bayesian classifier to recognize each device category. 2.2
Drawbacks and Limitations
One of the main drawbacks of machine learning techniques used for appliance identification is that they can identify only the appliances for which they have been trained for. Moreover, techniques based on artificial neural networks (ANN) require a high computational complexity when implemented.
192
Y. Himeur et al.
Fig. 1. The block diagram of the proposed NILM system.
When unsupervised techniques are employed (e.g. clustering), they are mainly used for feature extraction and not for identifying similar power change behaviors between appliances. So they still rely on classification (supervised) techniques for identifying appliances. A third issue is that most of the techniques assume only transient state and normal operation, whereas certain appliances may have additional modes of operation (e.g. gradually reducing or increasing power consumption). Finally, even though it can be rare, there is a possibility that a composite event can happen, e.g. turning on an appliance whilst switching off another. In such events, the appliance identification task is harder and single-appliance models will fail. In this paper, however, we propose an original approach that is based on transforming electrical power signatures to the 2D space, extracting binary features and hence returning back to initial representation space through extracting histograms of 2D representations. The proposed scheme has many advantages e.g.; (i) through representing the power signals within a 2D space, new power signatures are generated that characterize the power signal differently and we can use local texture descriptors to extract prominent features, (ii) it can identify household with a high accuracy without relying on the appliances’ events or states and (iii) the identification task is performed at a very low complexity because the proposed approach acts also as a dimensionality reduction, in which
On the Applicability of 2D Local Binary Patterns
193
short feature vectors with a length of 256 samples are derived to represent each electrical device.
3 3.1
Proposed System Feature Extraction from Power Signals
In this section, we present details of the proposed NILM method that is based on a 2D feature extraction. Initially, we portrays the block diagram of the whole architecture of the NILM system in Fig. 1. The proposed approach is based on representing the power consumption signals in a 2D representation space and then considering the appliance identification problem as a content-based image retrieval (CBIR) task. In this context, any feature extraction approach that can be applied on images can be also deployed on 2D representations of power signals. To that end, the proposed 2D based descriptor performs a transformation of the power signal to an image representation. Then an analysis of the local neighborhood of every power sample is conducted using a block splitting process to derive local information. Specifically, the LBP descriptor is deployed to abstract a histogram representation of the power signal values from their 2D representation. It mainly focuses on encoding the local structure around a central power sample in each specific block. As mentioned in Fig. 2, every central sample is compared with its surrounding patterns in a patch of 3 × 3 neighbors via the subtraction of the central sample value. An encoding process follows in which positive values become 1 while negative values are shifted to 0. A binary stream is then collected using a clockwise strategy. The extracted binary patterns are the corresponding LBP codes. The next step is to collect all the binary codes from all the kernels in a binary 2D representation that is transformed to a decimal representation. Finally, a histogramming approach is processed on the resulting decimal matrix to extract the histogram of LBP representation. The proposed feature extraction process based on the LBP descriptor is outlined in Algorithm 1. As a result, the LBP-based power local descriptor generates a histogram of 256 values. Consequently, it acts as a novel feature representation space and as a dimensionality reduction scheme, which can effectively reduce the complexity of the NILM system. 3.2
Classification Models
This section describes briefly machine learning classifiers used to classify the electrical devices based on the histograms of LBP. – Support vector machines (SVM): This classification model is built upon the theory of minimizing structural risk. It tries to retrieve the optimum separation hyperplane, which can reduce the distance between the features pertaining to the same appliance class. If the feature patterns cannot be separated
194
Y. Himeur et al. Result: HLBP : The histogram of local binary patterns (LBP) a. Set the power signals matrix Y (i, j), where i is the index of feature vectors and j is the index of patterns in each vector; while i ≤ M ( with M the total number of power signals in the whole dataset) do 1. Normalize and convert the power signal i into a 2D representation. 2. Estimate the LBP values for a power sample (uc , vc ) in the power matrix using a S × S patch as follows LBPn,S (uc , vc ) =
N −1
b(jn − jc )2n
(1)
n=0
where jc and jn represent the power value of the central sample and n the surrounding power values in the square neighborhood with the patch size S. The the binary encoding function b(u) can be given as: 1 if u ≥ 0 b(u) = (2) 0 if u < 0 3. Collect the binary pasterns LBPn,S (uc , vc ) from each block and convert the obtained binary matrix to a decimal representation. 4. Apply a histogramming process on the novel decimal matrix to derive the histogram of LBP representation HLBP (n, S) estimated through each block and use it as texture features to identify electrical devices based on their 2D representations. The descriptor HLBP (n, S) produces 2n new values related to 2n binary samples generated by N neighboring power samples of each patch. end
Algorithm 1: The LBP algorithm applied on the power consumption signals to extract the LBP histograms.
linearly in the initial space, the feature data are transformed to a new space with higher dimensions by making use of kernel modules. – K-nearest neighbors (KNN): To classify appliance feature patterns, this model computes distance of a candidate feature vector to detect K close neighbors. Their labels are analyzed and employed to affect a class label to the candidate feature vector based on majority vote, and thus define the class of the respective appliance. – Decision tree (DT): This model mainly consists of a root node and various internal-nodes and leaf nodes. The main idea behind this classifier is to split a complex problem into various simple ones and then approve a progressive approach to fix the classification issue step by step. – Deep Neural Networks (DNN): This model encompasses multiple hidden layers, each one is fully linked to the previous layer. Usually, rectified linear units (ReLUs) are deployed at the output of hidden layers except the last layer to improve nonlinear discrimination. A Softmax function is also deployed after the last hidden layer to forecast the appliance category.
On the Applicability of 2D Local Binary Patterns
195
– Ensemble Bagging Trees (EBT): This is a machine learning architecture in which various weak learners are used in the training process to fix similar issue and then fused to obtain better prediction performance through a Bootstrapaggregation.
Fig. 2. The flowchart of the proposed 2D descriptor applied to power signals.
4 4.1
Experimental Tesults Datasets Description
This section presents the results of an empirical evaluation performed on the three benchmark power consumption datasets, GREEND [20], PLAID [12] and WHITED [17], which are widely used in NILM and appliance identification systems. The selected datasets are collected at different resolutions, i.e. at both low and high frequencies to conduct a comprehensive study and check the efficiency of the proposed solution when the frequency varies. For the GREEND dataset, consumption profiles are recorded for time periods ranging from 6–12 months at a frequency resolution of 1 Hz. To validate the proposed system, we use the energy usage footprints gathered from a typical house that includes six appliances. Fingerprints of 11 appliance classes are monitored in the PLAID dataset at a sampling frequency of 30 kHz. For the WHITED dataset, power consumption footprints of 11 device groups are collected at a resolution of 44 kHz. Table 1 summarizes the appliance categories, number of electrical devices and/or number of observed days in each category.
196
Y. Himeur et al.
Table 1. Description of monitored appliances and their number for both the PLAID and WHITED and observed days for the GREEND. PLAID Tag Appliance category
WHITED # app Tag Appliance category 1
Modems/receiver 20
1
# days
1
Fluorescent lamp
2
Fridge
30
2
CFL
20
3
Hairdryer
96
3
Charger
30
4
Microwave
94
4
Coffee machine
20
4
Dishwasher 242
5
Air conditionner
51
5
Drilling machine
20
5
kitchen lamp
242
6
Laptop
107
6
Fan
30
6
TV
242
7
Vacuum
8
7
Flatron
20
8
Incandescent 79 light bulb
8
LED light
20
9
Fan
96
9
Kettles
20
10
Washing machine
22
10
Microwave
20
11
Heater
30
11
Iron
20
4.2
90
GREEND # app Tag Appliance category Coffee machine
242
2
Radio
242
3
Fridge w/freezer
240
Evaluation Metrics
To evaluate the performance of the proposed system objectively, the accuracy and F-score metrics are used. The accuracy metric represents the percentage of the correctly identified appliances in the testbed. Often however, due to the fact that the accuracy alone is not considered as a robust metric when evaluating data, specially for the case of imbalanced databases (typically the case of the PLAID dataset that is highly imbalanced), the F-score, that is quite reliable in such an example, is assessed as well, in order to guarantee an objective performance inspection. In practice, the F-score is specified as the harmonized mean of the precision and recall metrics. 4.3
Comparison of Normalized Cross-Correlation (NCC) Matrices
First, it would be helpful to understand how LBP features differ from original power consumption signals. In that regard, this section seeks to investigate what kind of relationships original power signals pertaining to the same appliance have with one another, and further how can LBP histograms of the same signals increase their discriminative ability and improve the classification rates. For this purpose, six power consumption signals s1, s2, · · · , s6 are selected randomly from each appliance class of the GREEND dataset. The NCC rates
On the Applicability of 2D Local Binary Patterns
197
between the signals are computed to clearly explain why the LBP can help in easily correlating between signals of the same appliance group. Figure 3 illustrates the NCC matrices estimated between the six raw signals and the respective LBP descriptions from four appliance classes, including coffee machine, fridge /w freezer, radio and washing machine. As it is shown, plots at the left side of the Fig. 3 portrays the correlation between the original powers signals. It can clearly be seen that NCC values are very low and change randomly when comparing two signals, there is no specific interval that can limit the NCC measure. However, for the case of LBP descriptions at the right of Fig. 3, the NCC values clearly outperforms those obtained from the original power signals. More specifically, NCC values for LBP descriptions are more than 0.97 for all combinations considered in this study. Figure 4 portrays an example of six power signals from the GREEND dataset, their encoded 2D representations and final histograms collected using the LBP descriptor. It is clear that by moving to a high dimensional space, the power signal are considered as images and any image feature extraction technique can be applied accordingly. Further, by using the 2D representation, each power sample is surrounded by 8 neighbors instead of only 2 neighbors in the 1D representation, and hence this gives more opportunities and alternatives to extract features from the power signals in a robust manner. It can thus help correlating between appliance pertaining to the same appliance category, and on the other hand, increasing the distance between appliances from different classes. Furthermore, the proposed appliance identification scheme based on LBP does not depend on the appliance state (steady or transient) nor on appliance on/off events. This is another advantage of this technique, which can identify every appliance without referring to its state. 4.4
Performance with Reference to Various Classifiers
Through this subsection, the classification performance of several machine learning schemes presented in Sect. 3.2, including K-NN, DNN, SVM, DT and EBT is reported in terms of the accuracy and F-score. Table 2 depicts the accuracy and F-score of the LBP descriptor with reference to several classifiers using different classification parameter settings. These results are derived on the basis of a 10fold cross validation. It is witnessed that KNN classifier based on the Euclidean distance and K = 1 outperforms other classification models an provides the best results on the three datasets considered in this framework. For the GREEND dataset, an accuracy of 97.50% and F-score of 97.49% are obtained while for the case of the PLAID and WHITED datasets the performance is slightly worse. This can be justified by the increase of the frequency resolution in comparison to the GREEND. This leads to the conclusion that the LBP descriptor performs better on the low frequencies than on high frequencies. Further, the LBP descriptor acts not just a feature extraction scheme but also as a dimensionality reduction technique. This is because the final LBP histogram is represented in only 256 bins while the raw power signals have much higher lengths (i.e. 30000,
198
Y. Himeur et al.
(I) Coffee machine
(II) Radio
(III) Fridge w/ freezer
(IV) Washing machine
Fig. 3. Correlation matrices measured between: (a) raw signals pertaining to the same appliance classes and (b) their LBP histograms.
On the Applicability of 2D Local Binary Patterns Power (Watts)
Power (Watts)
4 2 0
0
0.5
1 Samples
(d)
1.5
150
0
0.5
1 Samples
1.5
100 50 0
2 4
x 10
5
0
0.5
1 Samples
1.5
0.5
1 Samples
1.5
1 Samples
1.5
2 4
x 10
200
40 20 0
2
0
(f)
60
Power (Watts)
Power (Watts)
1
(e)
10
0
2
0
2 4
x 10
(c)
3
Power (Watts)
(b)
6
Power (Watts)
(a)
199
0
0.5
1 Samples
4
x 10
1.5
100
0
2
0
0.5
4
x 10
2 4
x 10
I. Example of power signals from the GREEND dataset. (a)
(b)
(c)
(d)
(e)
(f)
II. Image representation of LBP encoding of the power signals.
0
100
200
0
100
200
(f)
0
100
200 Samples
300
Amplitude
1
0.5
0
0
100
200 Samples
300
0.04
0.02
0
300
Samples
(e)
0.5
0
0.1
0
300
Samples 1
Amplitude
(d)
(c)
0.2
Amplitude
0.5
0
Amplitude
(b)
1
Amplitude
Amplitude
(a)
0
100
200
300
200
300
Samples 0.06 0.04 0.02 0
0
100 Samples
III. Histograms of LBP representations of the power signals.
Fig. 4. Example of power signals, their 2D LBP representations and their LBP histograms from the GREEND dataset: a) Coffee machine, b) Radio, c) Fridge w/freezer, d) Dishwasher, e) kitchen lamp and f) TV.
57600 and 22491 samples for the GREEND, PLAID and WHITED datasets, respectively). 4.5
Comparison with LBP Variants
The encountered success of the LBP descriptor in other fields has pushed many researchers into developing various improved versions of this descriptor. In this
200
Y. Himeur et al.
Table 2. The accuracy and F-score results obtained using the LBP descriptor of different classifiers. Classifier Classifier parameters
GREEND
PLAID
WHITED
Accuracy F-score Accuracy F-score Accuracy F-score LDA
/
93.71
93.53
84.71
77.93
82.50
77.41
DT
Fine, 100 splits
97.42
97.37
75.42
66.9
92.5
90.49
DT
Medium, 20 splits
96.51
96.5
65.85
50.20
91.25
90.84
DT
Coarse, 4 splits
73.86
69.38
49
31.15
3416
28.36
DNN
50 hidden layers
71.69
69.82
78.14
76.09
82.37
81.86
EBT
30 learners, 42 k splits 82.51
81.26
82.57
74.98
91.66
88.67
SVM
Linear Kernel
94.84
95
81.85
71.61
84.58
82.52
SVM
Gaussian kernel
89.31
98.93
85
77.57
84.91
87.91
SVM
Quadratic kernel
93.93
93.81
89.14
85.34
92.5
89.07
KNN
K = 10/Weighted Euclidean dist
96.96
96.81
82.14
73.57
87.91
82.71
KNN
K = 10/Cosine dist
96.13
96.01
75.57
65.57
84.58
80.1
KNN
K = 1/Euclidean dist
97.50
97.49
91.85
89.18
92.50
90.04
section, we investigate the performance of three LBP variants in comparison to the original LBP model. Local Directional Patterns (LDP) [23]: For every sample of the power matrix, an 8-bit binary stream is computes using the LDP. This binary stream is estimated through the convolution of small patch of the power matrix of size 3 × 3 with Kirsch kernels in 8 orientations. An example of the Kirsch kernels is depicted in Fig. 5.
Fig. 5. Example of the Kirsch kernels used in the LDP descriptor.
Local Ternary Pattern (LTeP) [32]: In contrast to the LBP descriptor, the LTeP does not threshold the power sample in each matrix into 0 or 1, instead
On the Applicability of 2D Local Binary Patterns
201
it makes use of the threshold parameter to encode the power samples into three new values. Given thr as the threshold factor, sc as the central power sample in a kernel of 3 × 3 and sn as the neighboring power samples, the LTeP encoding process of each central power sample sc is defined as: ⎧ if sn > sc + thr ⎨ 1 0 if sn > sc − thr and sn < sc + thr sc = (3) ⎩ −1 if sn < sc − thr Local Transitional Pattern (LTrP) [3]: It encodes the 2D representation of the power signals through the comparison of transitions of intensity change of neighboring power samples in various orientations. Additionally, aiming at generating the bit value of the code, the LTrP compare the intensity of the central power sample in 3 × 3 kernel with only the intensity of its two neighbors referring to specific directions to finally setting a single bit value. Table 3 depicts the comparison of the LBP descriptor with other variants, including the LDP, LTeP and LTrP in terms of the histogram length, accuracy and F-score with reference to the KNN classifier based on the Euclidean distance (K = 1). It is clearly witnessed that all descriptors perform well om the GREEND, they all have an accuracy and F-score of more than 97%. Moreover, the LDP and LTeP outperform slightly the LBP under this database. However, on the PLAID and WHITED, we notice that the performance of all descriptors are dropped. However, the LBP shows a much better performance, especially for the case of the WHITED dataset. For example, its accuracy outperforms those of the LDP, LTep and LTrP descriptors by more than 10.5%, 10% and 11%, respectively. In addition, regarding the F-score measure, the LBP outperforms the LDP, LTep and LTrP descriptors by more than 10%, 19% and 11%, respectively. From another side, the difference of the performance obtained between the GREEND and both the PLAID and WHITED datasets is mainly related to the fact that GREEND monitors the same appliances for different days. In otherwords, it records different daily power observations of these devices, whereas the PLAID and WHITED collects power fingerprints of different appliances (from different brands) pertaining to the same appliance category. Table 3. Comparison of the LBP descriptor with other variants in terms of the histogram length, accuracy and F-score. Algorithm Histogram length GREEND
PLAID
WHITED
Accuracy F-score Accuracy F-score Accuracy F-score LDP
56
99.46
99.50
89.85
85.82
81.66
79.38
LTeP
512
98.86
98.80
91.28
88.97
82.08
80.15
LTrP
256
97.04
96.99
85.85
81.37
81.25
78.78
LBP
256
97.50
97.49
91.85
89.18
92.5
90.04
202
4.6
Y. Himeur et al.
Comparison with Recent Appliance Identification Systems
Table 4 illustrates a comprehensive comparison between the proposed identification system based on LBP descriptor with other recent state-of-the-art systems in terms of different characteristics, including the architecture model, learning type, number of appliance classes, frequency resolution, complexity level and accuracy rate. It can be observed that the proposed system presents the best compromise between the complexity and identification accuracy since it provides more than 90% accuracy at a low computational complexity in comparison to other frameworks. Moreover, It is evaluated on three different datasets having different sampling rates, which gives more credibility to our evaluation study and further proves that it can implemented independently of the frequency resolution of collected data. Table 4. Comparison of the proposed appliance identification system wither recent frameworks with regard to various parameters. Work Architecture Learning scheme
#device Resolution classes
Complexity Accuracy (%)
[21]
AANN
supervised
5
1/3 Hz
high
98.7
[27]
CCNN
unsupervised 9
1/10 Hz
medium
80
[22]
ANN
supervised
8
1 Hz
medium
83.8
[11]
Karhunen Lo´ eve
supervised
–
1/3 Hz
high
87
[19]
fingerprintweighting KNN
supervised
6
1/6 Hz
low
83.25
[10]
ANN
supervised
11
15.36
high
96.8
[34]
AdaBoost
supervised
5
10 kHz
medium
94.8
[13]
Fuzzy model supervised
7
2 kHz
high
91.5
[31]
Bayesian classifier + correlation
supervised
29
1 min
medium
95.6
supervised
11/6/11 1 Hz/30 kHz/44 kHz low
Our LBP + KNN
5
97.5/91.85/92.5
Conclusion
In this framework, an original system for household appliance identification in NILM is presented, which relies on representing power consumption signals in a 2D representation space and then applying LBP descriptor to extract short histograms that can efficiently represent each device category with a unique code. The main idea behind our novel approach relies on the transformation of the power signals to the 2D space opening more possibilities to extract relevant
On the Applicability of 2D Local Binary Patterns
203
features in contrast to the 1D representation. Furthermore, through representing power signals as images, we can easily take advantage of the image feature extraction techniques to recognize the electrical devices. The proposed system has successfully been evaluated by considering three accessible databases comprising real household power consumption footprints at the appliance-level. Various classifiers are also used with different setting parameters. In addition, the proposed appliance identification based on the LBP descriptor is evaluated with regard to other LBP variants and other recent appliance identification systems, in terms of different properties such as architecture model, learning type, number of electrical devices, resolution of collected data, complexity level and accuracy. In fact, the proposed solution shows more than 90% accuracy for all the evaluated databases at a low complexity, and further it presents the best compromise between the accuracy and computational complexity when compared to other frameworks considered in this paper. Acknowledgments. This paper was made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
References 1. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006) 2. Ahonen, T., Hadid, A., Pietik¨ ainen, M.: Face recognition with local binary patterns. In: Pajdla, T., Matas, J. (eds.) Computer Vision - ECCV 2004, pp. 469–481. Springer, Heidelberg (2004) 3. Ahsan, T., Jabid, T., Chong, U.-P.: Facial expression recognition using local transitional pattern on Gabor filtered facial images. IETE Tech. Rev. 30(1), 47–52 (2013) 4. Alsalemi, A., Himeur, Y., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Achieving domestic energy efficiency using micro-moments and intelligent recommendations. IEEE Access 8, 15047–15055 (2020) 5. Alsalemi, A., Sardianos, C., Bensaali, F., Varlamis, I., Amira, A., Dimitrakopoulos, G.: The role of micro-moments: a survey of habitual behavior change and recommender systems for energy saving. IEEE Syst. J. 13(3), 3376–3387 (2019) 6. Alsalemi, A., Bensaali, F., Amira, A., Fetais, N., Sardianos, C., Varlamis, I.: Smart energy usage and visualization based on micro-moments. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications, pp. 557–566. Springer, Cham (2020) 7. Alsalemi, A., Ramadan, M., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Boosting domestic energy efficiency through accurate consumption data collection. In: 5th International Symposium on Real-Time Data Processing for Cloud Computing (RTDPCC), Leicester, UK (2019) 8. Alsalemi, A., Ramadan, M., Bensaali, F., Amira, A., Sardianos, C., Varlamis, I., Dimitrakopoulos, G.: Endorsing domestic energy saving behavior using micromoment classification. Appl. Energy 250, 1302–1311 (2019)
204
Y. Himeur et al.
9. Amasyali, K., El-Gohary, N.M.: A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energy Rev. 81, 1192–1205 (2018) 10. Guedes, J.D.S., Ferreira, D.D., Barbosa, B.H.G., Duque, C.A., Cerqueira, A.S.: Non-intrusive appliance load identification based on higher-order statistics. IEEE Latin Am. Trans. 13(10), 3343–3349 (2015) 11. Dinesh, C., Nettasinghe, B.W., Godaliyadda, R.I., Ekanayake, M.P.B., Ekanayake, J., Wijayakulasooriya, J.V.: Residential appliance identification based on spectral information of low frequency smart meter measurements. IEEE Trans. Smart Grid 7(6), 2781–2792 (2016) 12. Gao, J., Giri, S., Kara, E.C., Berg´es, M.: PLAID: a public dataset of high-resolution electrical appliance measurements for load identification research: demo abstract. In: Proceedings of the 1st ACM Conference on Embedded Systems for EnergyEfficient Buildings, BuildSys 2014, pp. 198–199. ACM, New York (2014) 13. Ghosh, S., Chatterjee, A., Chatterjee, D.: Improved non-intrusive identification technique of electrical appliances for a smart residential system. IET Gener. Transm. Distrib. 13(5), 695–702 (2019) 14. Himeur, Y., Elsalemi, A., Bensaali, F., Amira, A.: Efficient multi-descriptor fusion for non-intrusive appliance recognition. In: The IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, May 2020 15. Himeur, Y., Elsalemi, A., Bensaali, F., Amira, A.: Improving in-home appliance identification using fuzzy-neighbors-preserving analysis based QR-decomposition. In: International Congress on Information and Communication Technology (ICICT), pp. 1–8, February 2020 16. Houidi, S., Auger, F., Sethom, H.B.A., Fourer, D., Mi`egeville, L.: Multivariate event detection methods for non-intrusive load monitoring in smart homes and residential buildings. Energy Build. 208, 109624 (2020) 17. Kahl, M., Haq, A.U., Kriechbaumer, T., Jacobsen, H.-A.: Whited-a worldwide household and industry transient energy data set. In: 3rd International Workshop on Non-Intrusive Load Monitoring (2016) 18. Kruti, R., Patil, A., Gornale, S.S.: Fusion of local binary pattern and local phase quantization features set for gender classification using fingerprints. Int. J. Comput. Sci. Eng. 7(1), 22–29 (2019) 19. Ma, M., Lin, W., Zhang, J., Wang, P., Zhou, Y., Liang, X.: Toward energyawareness smart building: discover the fingerprint of your electrical appliances. IEEE Trans. Ind. Inf. 14(4), 1458–1468 (2018) 20. Monacchi, A., Egarter, D., Elmenreich, W., D’Alessandro, S., Tonello, A.M.: GREEND: an energy consumption dataset of households in Italy and Austria. In: IEEE International Conference on Smart Grid Communications (SmartGridComm), pp. 511–516, November 2014 21. Morais, L.R., Castro, A.R.G.: Competitive autoassociative neural networks for electrical appliance identification for non-intrusive load monitoring. IEEE Access 7, 111746–111755 (2019) 22. Park, S.W., Baker, L.B., Franzon, P.D.: Appliance identification algorithm for a non-intrusive home energy monitor using cogent confabulation. IEEE Trans. Smart Grid 10(1), 714–721 (2019) 23. Srinivasa Perumal, R., Chandra Mouli, P.V.S.S.R.: Dimensionality reduced local directional pattern (DR-LDP) for face recognition. Expert Syst. Appl. 63, 66–73 (2016) 24. Sardianos, C., Varlamis, I., Chronis, C., Dimitrakopoulos, G., Alsalemi, A., Himeur, Y., Bensaali, F., Amira, A.: A model for predicting room occupancy based on motion sensor data, vol. 45, September 2020
On the Applicability of 2D Local Binary Patterns
205
25. Sardianos, C., Varlamis, I., Dimitrakopoulos, G., Anagnostopoulos, D., Alsalemi, A., Bensaali, F., Amira A.: “i want to... change”: micro-moment based recommendations can change users’ energy habits. In: Proceedings of the 8th International Conference on Smart Cities and Green ICT Systems (SMARTGREENS 2019), pp. 30–39. SCITEPRESS (2019) 26. Wang, R., Ji, W., Liu, M., Wang, X., Weng, J., Deng, S., Gao, S., Yuan, C.A.: Review on mining data from multiple data sources. Pattern Recogn. Lett. 109, 120– 128 (2018). Special Issue on Pattern Discovery from Multi-Source Data (PDMSD) 27. Wang, Z., Zheng, G.: Residential appliances identification and monitoring by a nonintrusive method. IEEE Trans. Smart Grid 3(1), 80–92 (2012) 28. Welikala, S., Dinesh, C., Ekanayake, M.P.B., Godaliyadda, R.I., Ekanayake, J.: Incorporating appliance usage patterns for non-intrusive load monitoring and load forecasting. IEEE Trans. Smart Grid 10(1), 448–461 (2019) 29. Wu, C.-H., Lai, C.-C., Lo, H.-J., Wang, P.-S.: A comparative study on encoding methods of local binary patterns for image segmentation. In: International Conference on Smart Vehicular Technology, Transportation, Communication and Applications, pp. 277–283. Springer (2018) 30. Xiao, Y., Hu, Y., He, H., Zhou, D., Zhao, Y., Hu, W.: Non-intrusive load identification method based on improved KM algorithm. IEEE Access 7, 151368–151377 (2019) 31. Yan, D., Jin, Y., Sun, H., Dong, B., Ye, Z., Li, Z., Yuan, Y.: Household appliance recognition through a Bayes classification model. Sustain. Cities Soc. 46, 101393 (2019) 32. Yuan, J.-H., Zhu, H.-D., Gan, Y., Shang, L.: Enhanced local ternary pattern for texture classification. In: Huang, D.-S., Bevilacqua, V., Premaratne, P. (eds.) Intelligent Computing Theory, pp. 443–448. Springer, Cham (2014) 33. Zeifman, M., Roth, K.: Nonintrusive appliance load monitoring: review and outlook. IEEE Trans. Consum. Electron. 57(1), 76–84 (2011) 34. Zhiren, R., Bo, T., Longfeng, W., Hui, L., Yanfei, L., Haiping, W.: Non-intrusive load identification method based on integrated intelligence strategy. In: 2019 25th International Conference on Automation and Computing (ICAC), pp. 1–6, September 2019
Management of Compressed Air to Reduce Energy Consumption Using Intelligent Systems Mohamad Thabet1 , David Sanders1 , Malik Haddad1(B) , Nils Bausch1 , Giles Tewkesbury1 , Victor Becarra1 , Tom Barker2 , and Jake Piner3 1 University of Portsmouth, Portsmouth PO1 2UP, UK {mohamad.thabet,malik.haddad}@port.ac.uk 2 Gems Sensors, Lennox Road, Basingstoke RG22 4AW, UK 3 InTandem, Watton Farm, Southampton SO32 3HA, UK
Abstract. This research investigated the use of intelligent systems for reducing energy consumption in compressed air systems. An initial literature review has been completed and mathematical models that describe typical compressed air components (compressor, tank, piping network, etc.) were created. The investigations suggested that energy used or wasted in connection with compressed air was a valuable research area to attempt to save energy. The research progressed to investigating ways of minimising energy use for air compressors based on real-time conditions (including anticipated future requirements), using intelligent systems to monitor and make decisions. Keywords: Systems · Intelligent · Efficiency · Air · Energy
1 Introduction The research described in this paper aims to create innovative intelligent methods to reduce energy consumption in a compressed air system (CAS). The current goal is to combine ambient sensing information with artificial intelligence (AI) and knowledge management (KM) in real time. This is expected to increase the efficiency of energy intensive manufacture. Comprehensive information about performance will be provided from ambient sensors (data sources), which will be used by an AI systems to make automated decisions. Information will be processed by a KM system and human operators will be provided with advice on how to maintain productivity and reduce energy usage. The aims and objectives are summarised as follows: 1. Use AI and ambient sensing to evaluate and monitor performance. 2. Use KM to decrease energy use. 3. Construct a collaborative interface for the human operators. Air compressors alone account for over 10% of UK industrial energy use (>15,000 GWh) [1, 2]. Industry is faced with high energy costs and needs to reduce both the financial and environmental impact of using energy. The scientific community can help © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 206–217, 2021. https://doi.org/10.1007/978-3-030-55190-2_16
Management of Compressed Air to Reduce Energy Consumption
207
by introducing new technologies and paradigms to create a step improvement in the energy efficiency of manufacturing industries. CAS are often the most expensive and inefficient industrial systems. For every 10 units of energy supplied during the creation of compressed air, fewer than one unit is actually turned into useful compressed air [1, 3]. This research will minimise energy use for air compressors based on real-time manufacturing conditions. Over a 10 year life cycle, energy accounts for 75% of the cost of running a CAS [4]. Many of these systems could operate more efficiently, leading to energy savings and more reliable supply of compressed air [4]. Improving energy performance of these systems has been of interest to many researchers, with some studies claiming that savings of 20–50% may be achievable [5]. This paper includes a general literature review on research concerned with improving energy efficiency of CAS using intelligent energy management techniques. Moreover, mathematical models to evaluate energy performance of CAS were developed. Our research is now concentrating on the analysis and automatic detection of malfunctions (such as leaks) that negatively impact energy performance. Ongoing and future research at the University of Portsmouth will combine real time ambient sensing with artificial intelligence (AI) and knowledge management (KM) to automatically improve efficiency, especially in energy intensive manufacturing where compressed air is widely used. Ambient data will provide information on performance and AI will make sense of that data and automatically act. KM will facilitate the processing of information to advise human operators on actions to reduce energy use and maintain productivity. Some decisions might be made automatically, for example to increase pressure or shut down a compressor if a leak is detected. Characteristics of inefficiencies in CAS are being considered and the new work will address automatic detection and decision making. Compressed air leaks, heat waste and overall inefficiencies within the process are some of the major causes of energy waste in CAS. By creating new methods for automatic detection and performance optimisation, a new framework for energy management of compressed air systems may be realised. Section 2 of this paper reviews similar research in the area of intelligent energy management of CAS. Section 3 describes the gaps identified from the literature review. Section 4 begins to address the gaps by creating MATLAB mathematical models for CAS components. Sections 5 and 6 are the discussion and conclusion, respectively.
2 Review of Similar Research Traditionally research in energy efficiency of CAS has focused on methods to reduce energy waste, approaches to model and simulate CAS and the development of tools and methods to identify air leakage. During the past three decades, energy management systems were gradually being introduced into companies and facilities, mainly to decrease high economic and environmental costs in energy exhaustive processes [6]. These systems control energy consumption by evaluating performance, identifying malfunctions or energy efficiency opportunities and recommending curative actions.
208
M. Thabet et al.
Because complex systems such as CAS have an inherent variability in their energy consumption, are influenced by many factors and require regular maintenance, controlling their energy performance for long periods of time can be challenging. These challenges, in addition to other factors, incentivised an interest in self-learning energy management and automation technologies [6]. Santolamazza investigated the energy and performance optimization of CAS with machine learning [7–9]. In [7], a methodology to monitor and control energy performance in industrial plants was presented. The methodology applied a sequence of steps that supported the identification of changes in energy consumption patterns or decline in energy performance due to faults. In [8], Artificial Neural Networks (ANNs) to monitor energy performance of CAS and detect failure were implemented. Failures are typically preceded with irregularities in energy usage. It was concluded that ANNs were capable of characterizing system behavior, allowing the association of irregularities in performance with their causes through a control chart. In [9], Santolamazza studied three methods to monitor and control energy consumption in CAS, two machine learning approaches, ANNs and support vector machines (SVM), and a classical statistical approach. The results demonstrated that machine learning techniques allowed detection of anomalies, failure analysis and prescriptive maintenance, whereas statistical methods were mainly effective in identifying major anomalies. Santolamazza’s concluded that data collected from CAS and operating environments, could help in the detection of abnormalities (faults or energy inefficiencies) and in the suggestion of suitable counter measures. However, the association of these irregularities with their probable causes, and the generation of a troubleshooting procedure, were not examined comprehensively. Moreover, the suggested methodology and algorithm, were not validated experimentally, and therefore their effectiveness in real time remained subject to confirmation through positive experimentation. Energy management systems may be combined with sensors and information systems that assist with collecting and analyzing data. Boehm and Franke investigated the concept of cyber physical CAS (CPCAS) [10]. CPCAS are industrial CAS armed with automation equipment and AI. These systems obtain basic operating parameters (such as pressure, volume, temperature, etc.) and assist in achieving more efficient operation. Further research into technical characteristics and specifications of CPCAS components was required. Other researchers investigated the use of machine learning in compressor design and monitoring. Ghorbanian investigated the construction of compressor design performance maps using ANNs instead of costly and time consuming experimental methods [11]. ANN approach results agreed with the experimental data, while reducing development time and cost at initial design stages. Study in [11] investigated design and monitoring of compressor and the approach used may be applied to other CAS components. The control and monitoring of CAS using data and intelligent technologies is an emerging field [6]. A major gap in research in this specific field included the creation of algorithms capable of detecting abnormalities in performance while associating those abnormalities with appropriate causes and developing a troubleshooting procedure.
Management of Compressed Air to Reduce Energy Consumption
209
Another gap in research was the development of CPCAS characteristics and specifications of CPCAS. CPCAS may reduce energy usage however, they required more research into system components and functionalities.
3 Gaps in Literature Six gaps in research were identified: 1- Ideas about the measures to reduce energy consumption are well developed theoretically in the literature but many are not realized, due to different barriers [12]. One of these barriers, is the high cost of gathering information and deciding on the most cost effective and applicable measures. Traditionally manufacturers look for energy saving opportunities through expensive and time consuming energy audits. A methodology to support decision making regarding suitable measures to save energy in CAS may result in a useful and innovative tool for enhanced energy efficiency. 2- Evaluation of CAS performance may be achieved through modelling and simulating, however most models in the literature model either the supply or demand side separately. Coupling compressed air supply and demand in a single model may be investigated in future research. Moreover, such a model could integrate and combine other energy consuming equipment (heating, cooling, lighting, etc.) with CAS. 3- Investigations into intelligent CAS energy management technologies has been gaining momentum. Machine learning techniques proved to be a suitable tool for CAS performance optimization by detecting anomalies in their performance but that was not investigated thoroughly. An energy management system that detects abnormalities in energy performance, associates the abnormalities with suitable causes and sets up a troubleshooting procedure has not yet been established. 4- Intelligent energy management technologies for CAS anomaly detection and prescriptive maintenance using machine learning and manufacturing data has not been validated experimentally. To properly test the effectiveness of such energy management system, future work needs to experimentally validate the theoretical work. 5- The concept of cyber physical CAS (CPCAS) was introduced in [10]. CPCAS are equipped with additional components, such as sensors, actuators and processors for autonomous control. In addition, they are capable of exchanging information with energy management systems, and therefore may play a role in the intelligent energy management of CAS. However, the concept of CPCAS was not well developed and there is a gap in research regarding technical and functional characteristics of the components. 6- Compressed air leaks are a major source of energy waste in CAS and therefore their detection and treatment is an essential step in reducing energy waste. Compressed air leak detection technologies face many challenges including: incapability of operating during production, inaccuracy in sensors and noise from operating environments. Moreover, technologies such as ultrasound and acoustic leakage detection, fail to effectively identify leaks from small size openings [13]. Future research could further develop these technologies to improve accuracy, ease of use and range of applicability.
210
M. Thabet et al.
4 Modelling CAS Equipment Compressed air systems consist of a supply and demand side. The supply side converts inlet air into compressed air, and typically includes equipment such as compressors, dryers, filters and coolers. The demand side delivers required compressed air to end users and normally includes piping networks, controllers and end use equipment [14]. A variety of sub-components make up a CAS, and numerous system configurations are possible. In this section, a basic CAS consisting of a compressor, cooler, filter, storage tank and distribution network is considered. A schematic diagram of such a system is shown in Fig. 1.
Fig. 1. CAS configuration considered
Mathematical formulations that describe performance of the sub-components were considered. 4.1 Compressor According to [15], the power consumed by a compressor during normal operation can be modelled using (1). n−1 Pair out n Pair in V˙ air n × × −1 (1) Pcomp = ηComp n−1 Pair in The term ‘η’ refers to the efficiency of the compressor; however, if other equipment is involved in the compression process (motor, gearbox, etc.), their efficiency should also be taken into account. Air was assumed to behave like an ideal gas, and the volumetric flow (capacity) was obtained from mass flow in at ambient conditions using ideal gas law shown in (2): m ˙ air in × R × Tair in V˙ air = Pair in
(2)
It can be seen from the above equations that air inlet conditions, flow rate and compressor specifications (efficiency), are the main variables influencing the compressor performance.
Management of Compressed Air to Reduce Energy Consumption
211
4.2 Cooler The mechanical compression of air causes an increase in its temperature. To reduce air temperature, coolers are typically installed after the final stage of compression. As air temperature decreases, water vapor in the air is condensed, separated, collected and drained from the system [14]. Anglani et al. stated in [15] that the heat content of compressed air can be estimated using (3). ˙ CA × Cp × (TCA − Tair in ) + m ˙ v hv QCA = m
(3)
The temperature of compressed air leaving the compressor depends on air inlet temperature and on compression ratio. Temperature of air leaving the compressor was estimated using (4). n−1 P air out n TCA = Tair in × (4) P air in Water vapor mass flow rate was estimated using (5). m ˙ v = Ur ρsin − ρsoc
1 − Ur Pρairsinin TCA Pair in × × ρsoc Tair in Pair out 1 − Pair out
(5)
An important parameter to calculate is the compressed air temperature at the cooler outlet. Usually, the cooler’s effectiveness, which is a ratio that measures the cooler ability to transfer heat [16], is provided by the manufacturer. The effectiveness was obtained with (6). εcooler =
Th,i − Th,o Th,i − Tc,i
(6)
In Eq. (6) is valid when the mass flow rate of cooling fluid is larger than the mass flow rate of compressed air. 4.3 Filters Compressed air usually contains particles, dust and oil (except for oil free-compressors). Removing these impurities is essential to provide improved compressed air quality, and to reduce component wear. For this purpose, filters are fitted in a CAS. The air flow through the filter could cause a pressure drop [15]. This pressure drop is usually specified by the filter manufacturer. Over time, the efficiency of the filter might deteriorate and the pressure drop might increase. The pressure of air exiting the compressor was estimated using (7). Pf , out = Pf , in − Pf , drop
(7)
Equation (7) indicates that the main variable describing filter performance is the pressure drop associated with it.
212
M. Thabet et al.
4.4 Storage Tank Kleiser et al. developed a dynamic model for the pressure and mass variation in the storage tank [17]. Variation of air mass inside was obtained using the conservation of mass principle, shown in (8). dm =m ˙ Tank in − m ˙ Tank out dt
(8)
The mass of air at any point in time was estimated by integrating Eq. (8) with respect to time. The result of this integration is shown in (9). t
m(t) = ∫(m ˙ Tank out ) + mTank initial ˙ Tank in − m
(9)
0
The instantaneous pressure of air inside the tank was then estimated using the ideal gas law equation in (10): P(t) =
m(t) × R × Ttank Vtank
(10)
4.5 Pressure Drop in Pipes Anglani et al. [15] estimated the pressure drop in a circular pipe with compressed air flowing through it using Darcy’s formula shown in (11). P = f
2 ρair × L × Vair 2×D
(11)
L is the equivalent length of the pipe, which includes the actual length of the pipe and equivalent length of curved parts, valves, etc. The coefficient ‘f’ represents the pipe friction factor, which will depend on flow characteristics. For a laminar flow (Reynolds number (Re) less than 4000) friction factor was estimated using (12). f =
Re 64
(12)
For the case of turbulent flow (which is most common in CAS), the friction factor was estimated with the Colebrook-White empiric formula, which is shown in Eq. (13). In this equation, E corresponds to the pipe absolute roughness, which depends on the material it is made of. ε 1 2.51 D = −2 × log × + (13) 3.7 Re f f
Management of Compressed Air to Reduce Energy Consumption
213
4.6 Combining the Models The individual models described in Sects. 4.1 to 4.5 were implemented in MATLABSimulink. To properly describe the overall performance, these models are being combined to create a single integrated CAS model. The next stage of the research will investigate the development of an integrated CAS model and a possible coupling with a model for heating, ventilation and air conditioning system.
5 Discussion The authors aim to address all of the six identified research gaps. In response to the first gap mentioned in Sect. 3, a methodology is being created to support making decisions regarding suitable measures to save energy in CAS that may result in a useful and innovative tool for improved energy efficiency. The second gap stated that most models in the literature focused on modelling either the supply or demand side of compressed air. The research described in this paper has so far investigated modelling individual CAS components. Future research may investigate integrating other plant energy usage (HVAC, lighting, etc.) with CAS, to obtain a model that describes a plant’s total energy consumption. The third gap, emphasized the need for smart energy management technologies to assist in automatic malfunction detection. This research will investigate machine learning techniques for optimizing the performance of CAS. An energy management system that detects abnormalities in energy performance, associates the abnormalities with suitable causes and sets up a troubleshooting procedure will be investigated. Previous research regarding smart energy management and machine learning technologies for CAS, was not validated experimentally. Currently, an experimental set-up is being developed at the University of Portsmouth. The aim is therefore to investigate control, sensors, actuators and data processing tools to enable exchanging information and subsequently optimizing performance. This research will investigate validating intelligent energy management tools experimentally. This might help address the gaps relevant to CPCAS. Last but not least, leak detection and treatment is an essential step to reduce waste in CAS. Technologies that are currently being used face several challenges such as: inability to operate during production, inaccuracy in sensors and noise coming from operating environments. Research into this area will investigate the creation of new intelligent leak detection techniques. The modelling completed so far has provided an understanding of the equations and variables describing CAS components and processes. The main variables describing the performance of CAS will be investigated to discover their effects. This will provide an understanding of data required to describe the current state of CAS and that data can be used in the creation of new algorithms capable of detecting energy performance abnormalities. The algorithms will also associate each abnormality with a possible causes and will suggest the most probable cause. Such systems may also automate detection of faults and energy efficiency measures, which may be an innovative tool in CAS
214
M. Thabet et al.
performance management. In that way, innovative intelligent methods will be created that reduce energy consumption in CAS. In the future, research at the University of Portsmouth will combine ambient sensing information with AI and KM in real time. This should increase the efficiency of energy intensive manufacture. The work in this paper is a first step towards creating the AI systems to make automated decisions. KM system and will help to provide human operators with advice on how to maintain productivity and reduce energy usage. AI will make sense of that data and then will automatically act, for example to increase pressure or shut down a compressor if a leak is detected.
6 Conclusions The initial stages of the research has consisted of identifying research gaps, defining problem characteristics and studying compressed air and buildings energy consumption. A preliminary literature review was completed. Mathematical models for CAS components were created and implemented in MATLAB. That is all the first step in creating innovative intelligent methods to reduce energy consumption in CAS. So as to increase the efficiency of energy intensive manufacture. The future work aims to use AI and ambient sensing to evaluate and monitor performance, use KM to decrease energy use and then to construct a collaborative interface for the human operators. The models provided an understanding of critical variables that influence CAS performance. This is a first step towards determining the type of data required to build AI algorithms to reduce energy use and maintain productivity. AI will then be combined with KM, which will advise human operators on any actions required to. Some decisions might be made automatically, for example for safety reasons or to increase pressure or shut down a compressor if a leak is detected. Next steps will include collecting data from real CAS. This data will be analysed and the possibility of using pattern recognition to improve energy efficiency will be evaluated. In addition to that, CAS modelling will be further investigated and the possibility of combining CAS with an HVAC model will be evaluated. The approach might enable the creation of new tools and methods to save energy in industrial facilities. Future work will need to investigate AI [18–25] and Decision Making. The decision making research will include AHP and Preference Ranking Organization METHod for Enrichment of Evaluation (PROMETHEE) [26–33] and uncertainty will probably be denoted using probability functions, fuzzy set theory and percentages. Acknowledgment. Research in this paper was funded by the DTA3/COFUND Marie Skłodowska-Curie PhD Fellowship programme partly funded by the Horizon 2020 European Programme.
Nomenclature Pcomp P air in Vdot air
Compressor input power Absolute air pressure at compressor inlet Volumetric flow rate of air
Management of Compressed Air to Reduce Energy Consumption
η Pair out n m ˙ air in Tair in R QCA Cp TCA m ˙v hv Ur ρs, in ρs, out E cooler Th,i Th,o Tc,i Pf,out Pf,in Pdrop m ˙ tank in m ˙ tank out mtank intial Ttank Vtank f ρair L D Vair Re E
215
Compressor efficiency (including drive system) Absolute air pressure at compressor outlet Polytropic coefficient Air mass flow at compressor inlet Temperature of Air at compressor inlet Gas Constant Heat content of compressed air Air specific heat Temperature of compressed air at compressor outlet Vapor water mass flow rate Latent heat of condensation for water vapor Relative Humidity Water vapor content in saturation at compressor inlet conditions Water vapor content in saturation at compressor outlet conditions Cooler heat transfer effectiveness Temperature of hot fluid entering the cooler Temperature of hot fluid exiting the cooler Temperature of cold fluid entering the cooler Pressure of Air at filter outlet Pressure of Air at filter inlet Pressure drop in filter Air mass flow rate at storage tank inlet Air mass flow rate at storage tank outlet Initial mass of air in storage tank Temperature of Air in storage tank Volume of storage tank Piper friction factor Air density Pipe length Pipe Diameter Air velocity in pipe Reynolds Number Pipe Roughness
References 1. Sanders, D.A., Robinson, D.C., Hassan, M., Haddad, M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. Adv. Intell. Syst Comput. 869(September), 1229–1236 (2018) 2. Thabet, M., Sanders, D., Beccera, V., Tewkesbury, G., Haddad, M., Barker, T.: Intelligent energy management of compressed air systems. In: IEEE Proceedings of 10th International Conference on Intelligent Systems, Varna, Bulgaria (2020, in press) 3. Saidur, R., Rahim, N.A., Hasanuzzaman, M.: A review on compressed-air energy use and energy savings. Renew. Sustain. Energy Rev. [Internet] 14(4), 1135–1153 (2010)
216
M. Thabet et al.
4. Fridén, H., Bergfors, L., Björk, A., Mazharsolook, E.: Energy and LCC optimised design of compressed air systems: a mixed integer optimisation approach with general applicability. In: Proceedings of 2012 14th International Conference Model Simulation, UKSim, pp. 491–496 (2012) 5. Murphy, S., Kissock, K.: Simulating energy efficient control of multiple-compressor compressed air systems. In: Proceedings of Ind Energy Technology Conference (2015) 6. Benedetti, M., Cesarotti, V., Introna, V., Serranti, J.: Energy consumption control automation using artificial neural networks and adaptive algorithms: proposal of a new methodology and case study. Appl. Energy [Internet] 165, 60–71 (2016) 7. Bonfá, F., Benedetti, M., Ubertini, S., Introna, V., Santolamazza, A.: New efficiency opportunities arising from intelligent real time control tools applications: the case of compressed air systems’ energy efficiency in production and use. Energy Procedia [Internet] 158, 4198–4203 (2019) 8. Santolamazza, A., Cesarotti, V., Introna, V.: Anomaly detection in energy consumption for condition-based maintenance of compressed air generation systems: an approach based on artificial neural networks. IFAC-PapersOnLine [Internet] 51(11), 1131–1136 (2018) 9. Santolamazza, A., Cesarotti, V., Introna, V.: Evaluation of machine learning techniques to enact energy consumption control of compressed air generation in production plants. Proc. Summer Sch. Fr. Turco. 2004, 79–86 (2018) 10. Boehm, R., Franke, J.: Demand-side-management by flexible generation of compressed air. Procedia CIRP [Internet] 63, 195–200 (2017) 11. Ghorbanian, K., Gholamrezaei, M.: An artificial neural network approach to compressor performance prediction. Appl. Energy [Internet] 86(7–8), 1210–1221 (2009) 12. Nehler, T., Parra, R., Thollander, P.: Implementation of energy efficiency measures in compressed air systems: barriers, drivers and non-energy benefits. Energy Effi. 11(5), 1281–1302 (2018) 13. Dudi´c, S., Ignjatovi´c, I., Šešlija, D., Blagojevi´c, V., Stojiljkovi´c, M.: Leakage quantification of compressed air using ultrasound and infrared thermography. Meas. J. Int. Meas. Confed. 45(7), 1689–1694 (2012) 14. Berkeley, L.: Compressed air: a sourcebook for industry, pp. 1–128 (2003) 15. Anglani, N., Bossi, M., Quartarone, G.: Energy conversion systems: the case study of compressed air, an introduction to a new simulation toolbox. In: 2012 IEEE International Energy Conference Exhibition ENERGYCON 2012, pp. 32–38 (2012) 16. Bergman, T., Lavine, A., Incropera, F., Dewitt, D.: Fundamentals of Heat and Mass Transfer, 1076 p. Wiley (2011) 17. Kleiser, G., Rauth, V.: Dynamic modelling of compressed air energy storage for small-scale industry applications. Int. J. Energy Eng. 3(3), 127–137 (2013) 18. Sanders, D., Gegov, A., Ndzi, D.: Knowledge-based expert system using a set of rules to assist a tele-operated mobile robot. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Studies in Computational Intelligence, vol. 751, pp. 371–392. Springer (2018) 19. Sanders, D., Sanders, H., Gegov, A., Ndzi, D.: Rule-based system to assist a tele-operator with driving a mobile robot. In: Lecture Notes in Networks and Systems, vol. 16, pp. 599–615. Springer (2018) 20. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.: Using a simple expert system to assist a powered wheelchair user. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–679. Springer (2019) 21. Gegov, A., Gobalakrishnan, N., Sanders, D.A.: Rule base compression in fuzzy systems by filtration of non-monotonic rules. J. Intell. Fuzzy Syst. 27(4), 2029–2043 (2014)
Management of Compressed Air to Reduce Energy Consumption
217
22. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: IEEE Proceedings of the SAI Conference on IntelliSys, London, U.K., pp. 426–433 (2018) 23. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 822–838. Springer (2019) 24. Sanders, D.: Recognizing shipbuilding parts using artificial neural networks and Fourier descriptors. Proc. Inst. Mech. Eng. Part B – J. Eng. Manuf. 223(3), 337–342 (2009) 25. Sanders, D.: Using self-reliance factors to decide how to share control between human powered wheelchair drivers and ultrasonic sensors. IEEE Trans. Neural Syst. Rehabil. Eng. 25(8), 1221–1229 (2017) 26. Haddad, M., Sanders, D., Bausch, N., Tewkesburyvv, G., Gegov, A., Hassan Sayed M.: Learning to make intelligent decisions using an expert system for the intelligent selection of either PROMETHEE II or the analytical hierarchy process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1303–1316. Springer (2019) 27. Haddad, M.J.M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 680–693. Springer (2019) 28. Haddad, M.J.M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.: Initial results from using preference ranking organization METHods for enrichment of evaluations to help steer a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 648–661. Springer (2019) 29. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M.J.M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer (2018) 30. Haddad, M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Mak. 8(4), 333–351 (2019) 31. Haddad, M., Sanders, D.: The behavior of three discrete multiple criteria decision making methods in the presence of uncertainty. Oper. Res. Perspect. (to be published) 32. Haddad, M.J.M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Mak. 18(4), 333–351 (2019) 33. Haddad, M.J.M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235. https://doi.org/ 10.1109/TNSRE.2019.2892587
Multi-platform Mission Planning Based on Distributed Planning Method Yang Guo(B) and Shao-chi Cheng PLA Academy of Military Science, Beijing 10091, China [email protected]
Abstract. At present, modules such as sensors deployed on an unmanned platform are strongly coupled with the unmanned platform. This means that these modules can only be used by the unmanned platform alone. The disadvantage of this model is the low efficiency of resource utilization. Multi-agent systems consisting of unmanned platforms are booming. How to achieve efficient allocation of tasks and resources is a key problem that multi-agent systems need to solve. In order to solve this problem, the idea of decoupling modules such as sensors from unmanned platforms is proposed by referring to the development of software defined radio technology. In this way, the resources of one platform can be used for multiple tasks at the same time. Based on this, a distributed task planning algorithm is designed with the goal of shortening the task completion time. Simulation experiments verify the effectiveness of the distributed task planning method. The results show that the distributed task planning method can improve resource utilization efficiency and shorten task completion time. Keywords: Multi-agent system · Resource allocation · Heuristic algorithm · Distributed planning · Critical path
1 Introduction Scientific allocation of platforms to achieve the optimization of mission objectives is the core issue of mission planning [1, 2]. The problem can be simply described as follows: A task needs certain resources to be conducted, and platforms can provide these resources for the completion of tasks. Mission planning is essentially a task-platform allocation problem with given constraints. With the development of unmanned technology, there will be more unmanned platforms to replace human beings to perform various tasks in the future. These unmanned platforms can cooperate with each other to form a multi-agent system to jointly complete a specified task. And the resource allocation problem involved is worth studying. Levchuk [3] proposed a multi-dimensional dynamic list scheduling (MDLS) algorithm to establish a mathematical model to minimize the mission time. However, the model defines that a platform can only serve a single task at the same time, and the platform must reach the task location to provide services. With the development of technology, the capabilities of modules such as communications equipment installed on unmanned © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 218–228, 2021. https://doi.org/10.1007/978-3-030-55190-2_17
Multi-platform Mission Planning Based on Distributed Planning Method
219
platform have been significantly improved. Taking the communication service scenario as an example, On the one hand, the service radius of the platform has been greatly expanded, and the platform can serve users who are far away from it. On the other hand, modules such as communications equipment on the old platform are tightly coupled to it, and these modules can only be controlled by the platform to which they belong. However, with the development of technologies such as interoperability [4], communication modules can be decoupled from the platform, and different modules equipped on the platform can serve different tasks at the same time. This means that one platform can process multiple tasks simultaneously. This can not only improve the flexibility of the service, but also achieve the goal of shortening the mission time. In this paper, we propose a distributed multi-platform task planning method. A platform can serve multiple tasks at the same time by decoupling various resources from the platform. The effectiveness of the distributed collaboration framework is verified by simulation experiments.
2 Distributed Planning Framework 2.1 Mission Structure The elements involved in distributed mission planning include resources, task and platform. The definitions are as follows: Definition 1: Resource are indivisible basic units of communication services in service activities. Different types of resources have different capabilities. Definition 2: A Task is an activity that entails the use of relevant resources to accomplish the mission objectives. We characterize task T i by specifying the following attributes: (1) task ID i, i = 1,…, N, where N is the number of tasks. (2) geographical constraint that specifies the location (x i , yi ) of task T i . (3) resource requirement vector [Ri1 ,Ri2 ,…Ril ], where Ril is the number of unites of resource l required for processing of task T i (l = 1,…, L, where L is the number of resource types). (4) processing time t i of task T i . The example used in this study is modified from the example in Reference [1]. There are 18 tasks in the mission, and the dependency diagram that details the interrelationships between these tasks is shown in Fig. 1. A task can be started only after all its direct predecessors have been completed. For example, Task 9 can only be ready to be conducted when Task 1, Task 3 and Task 7 have been completed. In addition, the number of resource types required to accomplish these tasks is 8 (L = 8). The task parameters are shown in Table 1. Definition 3: A Platform is the basic unit that provides resource capabilities and is used to process tasks. We characterize platform Pj by specifying the following attributes:
220
Y. Guo and S. Cheng
Fig. 1. Task dependency graph.
Table 1. Illustration of task parameters. Task Location Resource requirement ID
Processing time
1
(70,15)
5
3
10
0
0
8
0
6 30
2
(64,75)
5
3
10
0
0
8
0
6 30
3
(15,40)
0
3
0
0
0
0
0
0 10
4
(30,95)
0
3
0
0
0
0
0
0 10
5
(28,73)
0
3
0
0
0
0 10
0 10
6
(24,60)
0
0
0 10 14 12
0
0 10
7
(28,73)
0
0
0 10 14 12
0
0 10
8
(28,83)
0
0
0 10 14 12
0
0 10
9
(28,73)
5
0
0
0
0
5
0
0 10
10
(28,83)
5
0
0
0
0
5
0
0 10
11
(25,45)
0
0
0
0
0 10
5
0 10
12
(5,95)
0
0
0
0
0 10
5
0 10
13
(25,45)
0
0
0
0
0
8
0
6 20
14
(5,95)
0
0
0
0
0
8
0
6 20
15
(25,45)
0
0
0 20 10
4
0
0 15
16
(5,95)
0
0
0 20 10
4
0
0 15
17
(5,60)
0
0
0
0
0
8
0
4 10
18
(5,60)
0
0
0
8
6
0
4 10 20
(1) platform ID j, j = 1,…, K, where K is the number of platforms. (2) geographical constraint that specifies the location (x i , yi ) of platform Pj .
Multi-platform Mission Planning Based on Distributed Planning Method
221
(3) resource capability vector [r j1 , r j2 ,…r jl ], where r jl is the number of unites of resource type l available on platform Pj (l = 1,…, L, where L is the number of resource types). (4) average velocity vj of platform Pj . (5) service radius Rj of platform Pj . There are 20 platforms in this model. Different from the example in reference [3], we add the service radius of the platform to the model, making it closer to the real situation. For simplicity, we assume that the initial position and service radius of all platforms are the same. The platform parameters are illustrated in Table 2. Table 2. Illustration of platform parameters. Platforms ID
Initial location
Resource capability
1
(100,0)
10
10
1
2
(100,0)
1
4
10
0
4
3
0
0
3
(100,0)
10
10
1
0
9
2
0
0
4
(100,0)
0
0
0
2
0
0
5
0
4
R
5
(100,0)
1
0
0
10
2
2
1
0
1.35
R
6
(100,0)
5
0
0
0
0
0
0
0
4
R
7
(100,0)
3
4
0
0
6
10
1
0
4
R
8
(100,0)
1
3
0
0
10
8
1
0
4
R
9
(100,0)
1
3
0
0
10
8
1
0
4
R
10
(100,0)
1
3
0
0
10
8
1
0
4
R
11
(100,0)
6
1
0
0
1
1
0
0
4.5
R
12
(100,0)
6
1
0
0
1
1
0
0
4.5
R
13
(100,0)
6
1
0
0
1
1
0
0
4.5
R
14
(100,0)
0
0
0
0
0
0
10
0
2
R
15
(100,0)
0
0
0
0
0
0
0
6
5
R
16
(100,0)
0
0
0
0
0
0
0
6
7
R
17
(100,0)
0
0
0
6
6
0
1
10
2.5
R
18
(100,0)
1
0
0
10
2
2
1
0
1.35
R
19
(100,0)
1
0
0
10
2
2
1
0
1.35
R
20
(100,0)
1
0
0
10
2
2
1
0
1.35
R
0
9
5
0
0
Velocity
Service radius
2
R
2
R
2
R
222
Y. Guo and S. Cheng
2.2 Distributed Collaboration Framework The objective of our model is to minimize the mission completion time. The following parameters will be used in subsequent chapters to describe the relevant calculation details: (1) READY is the set of all tasks that are eligible for execution. (2) FREE is the set of free platforms. (3) CP is the critical path of the task. The distributed collaboration framework designed in this paper mainly includes the following steps. Task Selection. Firstly, a task enters the READY set when all its direct predecessors have been completed. Secondly, critical path (CP) algorithm [5, 6] is used to select the task with the highest priority from the READY set. Many of the task allocation schemes are designed based on CP algorithm. The idea is that CP determine the shortest possible execution time for the mission. CPs are calculated for each task based on the task precedence graph (Fig. 1) and the task processing times. In this step, a ready task is selected with the largest CP. When the CPs of several tasks are equal, a task with the largest number of direct successors is chosen. Platform Selection. In this step, a group of platforms is chosen for processing the selected task. The selection of platform refers to two main factors. On the one hand, we want to select platforms such that the amount of resource that are consumed by the selected task should affect the processing of other tasks in the READY set as little as possible. On the other hand, we should choose the platform that can reach the task location as soon as possible to minimize the completion time of the selected task. Each platform is assigned a coefficient V and assignments are made in ascending order of V. The coefficient V is computed as: dlm,i vm
(1)
B(m, i) BR(m) − B(m, i)
(2)
V 1 = slm + tlm + V2 =
V = V1 · V2 The relevant parameters were listed as follows. (1) (2) (3) (4) (5) (6) (7)
lm is the last task processed on platform m. i is the next task selected for execution on platform m. slm is the start time of task lm . t lm is the processing time of task l m . d lm,i is the distance between the location of task lm and task i. vm is the average speed of platform m. B(m, i) is the amount of resources from platform m used to process task i.
(3)
Multi-platform Mission Planning Based on Distributed Planning Method
(8) BR(m) is
i∈READY
223
B(m, i).
When a task is completed, all the platforms processing the completed task become free (enter FREE set). All the tasks for which this task was the last processed predecessor become ready (enter READY set). Then, if there exists a task in READY set such that the aggregated capability vector of FREE set is component-wise more than or equal to this task requirement vector, an assignment can be made. Otherwise, the next completion time is considered. In the initial stage of task planning, the platforms in the FREE set can meet the resource requirements of the selected task. At this stage, when a platform is assigned to a task, the platform will be removed from the FREE set. This means that the selected platform serves only a single task. The platform’s moving path is designed as follows. We connect the position of the platform and the position of the target in a straight line. The platform needs to move along this line to a position R (service radius of platform) away from the target to process the task, which is shown in Fig. 2. When the platforms in FREE set cannot meet the resource requirements of the selected task, a distributed task planning mechanism is activated.
Fig. 2. The moving path of selected platform before the distributed planning mechanism is activated.
Distributed Planning Mechanism. The distributed planning mechanism in this model contains two meanings. On the one hand, resources such as communication equipment are no longer tightly coupled with the platform. The rights to use these modules are not limited to the platforms on which they are deployed. With the development of interoperability technology, one platform can use modules deployed on other platforms, which can make full use of resources deployed on all platforms in different regions. On the other hand, we can learn from the idea of software-defined networks [7–10]. For a specific module, the resources of the module can be finely divided and used. In this way, there will be no situation in which other tasks cannot use the remaining resources of a module at the same time because one task uses part of the resources of the module. Different platforms can use module resources accurately at the same time, which can further promote the efficient use of platform resources. These ideas are illustrated in Fig. 3.
224
Y. Guo and S. Cheng
Fig. 3. Schematic diagram of distributed task planning mechanism.
According to the above ideas, the following algorithm is designed in this model. (1) Step 1: Initialization. In the beginning, all tasks without a direct predecessor constitute the READY set. The FREE set includes all platforms. (2) Step 2: Task selection. Critical path algorithm is used to select the task with the highest priority from the READY set. The selected task is then removed from the READY set. (3) Step 3: Platform group selection. For the selected task, a group of platforms is selected from the FREE set according to the coefficient V. By default, a single platform can only perform one task at a time, and these platforms are then removed from the FREE set. When all the resources required by the task reach the specified location, the task starts. The start and end time of the task are recorded. At the same time, the resources of each platform used by this the task are also recorded. When the task ends, the resources of the platform it occupies will become free and enter FREE set again. (4) Step 4: Determine if distributed planning mechanism needs to be activated. When the remaining platforms in the FREE set cannot meet the resource requirements of the selected task, a distributed task planning mechanism is activated. At this time, the composition of the FREE set is adjusted. The “New” FREE set is now composed of all the platforms with remaining resources. And the platform’s stock resources will be finely divided to serve different tasks. This means that a single platform can process two or more tasks simultaneously. For the selected task, the platforms that process it will be selected from the “New” FREE set. (5) Step 5: Update data. Update the task completion time, READY set and FREE set. And then, repeat the above calculation process until all tasks are completed. The distributed multi-platform mission planning algorithm is illustrated in Fig. 4.
Multi-platform Mission Planning Based on Distributed Planning Method
225
Fig. 4. Distributed multi-platform mission planning algorithm
3 Results and Discussion In this section, an example is presented to verify the effectiveness of the distributed task planning mechanism. For simplicity, we assume that a single platform can process at most two tasks simultaneously. When a platform is processing one task, whether it can serve another task at the same time depends on the distance between the locations of the two tasks. There are three situations: (1) When the distance between the locations of the two tasks is more than twice the service radius of the platform, the platform could not process both tasks at the same time. (2) When the distance between the locations of the two tasks is exactly twice the service radius of the platform, the platform can process both tasks at the same time. It only needs the platform to move to the midpoint of the connecting line of the two task’s location. (3) When the distance between the locations of the two task is less than twice the service radius of the platform, the platform also could process both tasks at the same time. However, it is necessary to determine whether the platform needs to move according to the location relationship between the platform and the two tasks, as shown in Fig. 5. When the platform is in the shadow area (such as location 2 in Fig. 5), the platform can handle two tasks at the same time without moving. When the platform is outside the shadow area (such as location 1 in Fig. 5), it needs to move along the shortest path to the edge of the shadow area. In this model, the task-platform allocation and the mission schedule are calculated based on the distributed framework shown in Fig. 4. For illustration, the results when R
226
Y. Guo and S. Cheng
Fig. 5. The moving path of selected platform after the distributed planning mechanism is activated.
is 60 are shown in Fig. 6. It can be seen from Fig. 6 that platform 9 processes task 14 and task 15 simultaneously, which improves the utilization of platform and resources.
Fig. 6. Results of an experiment of platform-task assignment when R is 60.
In contrast to distributed task planning, we refer to the methods adopted in reference [3] as centralized task planning. In order to compare the distributed task planning and centralized task planning methods, the task completion time of the two methods under the same service radius is calculated. The results are shown in Table 3. The results show that the mission completion time of distributed planning is less than that of centralized planning in all experiments. Distributed planning has obvious advantages over centralized planning in this model, which improves the utilization efficiency of the platform and shortens the mission completion time. In addition, in the experiments of this paper, it is limited that one platform can handle up to two tasks at
Multi-platform Mission Planning Based on Distributed Planning Method
227
Table 3. Comparison of the results of centralized planning and distributed planning. Service radius Completion time Centralized planning Distributed planning 10
194.742
158.5714
20
181.5308
141.5132
30
161.6715
132.8348
40
166.5969
147.6543
50
157.3341
121.5963
60
149.4606
120.3337
70
142.0531
111.5963
80
137.3429
106.5963
90
129.6457
105
100
119.172
105
the same time. With the development of intelligent platforms and control technologies in the future, a single platform may be able to process more tasks simultaneously, and the utilization efficiency of resources is expected to be further improved.
4 Conclusions Aiming at the specific scenario of task-platform collocation, a distributed task planning framework was proposed. Based on this, the task-platform collocation and the mission schedule under specific conditions are calculated. And the calculation results are compared with that of centralized planning. The experimental results show that under distributed planning, the same platform can process two tasks simultaneously, which improves the utilization efficiency of the platforms. At the same time, the mission completion time of distributed planning is also shorter than that of centralized planning, which has obvious advantages.
References 1. Han, X., Bui, H., Mandal, S., et al.: Optimization-based decision support software for a team-in-the-loop experiment: asset package selection and planning. IEEE Trans. Syst. 43(2), 237–251 (2013) 2. Yu, F., Tu, F., Pattipati, K.R.: Integration of a holonic organization control architecture and multi-object evolutionary algorithm for flexible distributed scheduling. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 38(5), 1001–1017 (2008) 3. Georgiy, M.L., Yuri, N.L., Jie, L., et al.: Normative design of organizations-Part I: mission planning. IEEE Trans. Syst. Man, Cybern. Part A: Syst. Hum. 32(3), 346–359 (2002) 4. Gabriel, S.S.L., Wided, G., Herve, P.: Interoperability assessment: A systematic literature review. Comput. Ind. 106, 111–132 (2019)
228
Y. Guo and S. Cheng
5. Franca, P.M., Gendreau, M., Laporte, G., et al.: A composite heuristic for the identical parallel machine scheduling problem with minimum makespan objective. Comput. Oper. Res. 21(2), 205–210 (1994) 6. Shirazi, B., Wang, M., Pathak, G.: Analysis and evaluation of heuristic methods for static task scheduling. J. Parallel Distrib. Comput. 10, 222–232 (1990) 7. Fadi, A.T., Mohanmmad, A., Arman, M., et al.: UAVs assessment in software-defined IoT networks: an overview. Comput. Commun. 150, 519–537 (2020) 8. Anithaashri, T.P., Ravichandran, G., Baskaran, R.: Security enhancement for software defined network using game theoretical approach. Comput. Netw. 157, 112–121 (2019) 9. Mohammad, A., Fadi, A.T., Murat, F.: Software-defined wireless sensor networks in smart grid: an overview. Sustainable Cities and Society. 51, 101754 (2019) 10. Babu, T.K.S.R., Balamurugan, N.M., Suresh, S., et al.: An assessment of software defined networking approach in surveillance using sparse optimization algorithm. Comput. Commun. 151, 98–110 (2020)
Development of Artificial Intelligence Based Module to Industrial Network Protection System Filip Holik, Petr Dolezel(B) , Jan Merta, and Dominik Stursa University of Pardubice, Pardubice, Czech Republic [email protected] http://www.upce.cz/fei
Abstract. The paper deals with the software-defined networking concept applied to industrial networks. This innovative concept supports network programmability and dynamic implementation of customized features, including security related ones. In a previous work of the authors, the industrial network protection system (INPS) was designed and implemented. The INPS provides complex security features of various traditional and modern security solutions within a single system. In this paper, the AI module, which is one of the crucial parts of the INPS, is dealt with. In particular, a detailed report focused on the development of the AI module decision function is provided. As a result, an artificial neural network, used for the network traffic evaluation in the AI module, is developed and comprehensively tested.
Keywords: Artificial neural network defined networks
1
· Industrial networks · Software
Introduction
The recent transformation of industrial networks from private and closed networks into standardized IP networks brought many advantages, but also introduced new security risks. These networks become connected to cloud data centers and centralized management systems and integrated Internet of Things (IoT) devices. These changes lead to increase of traffic volume, heterogeneity and complexity, resulting in wider scope for potential attacks. This danger is magnified by the fact, that these networks are nowadays connected to the Internet all the time and they can be therefore theoretically accessed by anyone. To perform an attack is nowadays easier than anytime in the past. Attacking tools are now publicly accessible and they require no deep knowledge in order to use them. Defense against these threats requires a more comprehensive approach than just a manual filtering by a human element. An automation with involvement of artificial intelligence (AI) is nowadays a necessity. There are many commercially available protection tools which use cloud-based artificial intelligence, but on c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 229–240, 2021. https://doi.org/10.1007/978-3-030-55190-2_18
230
F. Holik et al.
the other hand, not so many, which could be deployed locally. This could be an important fact, if data privacy is an issue. One of the most promising approaches in this area is utilization of softwaredefined networking (SDN), which provides a solid prefiguration for AI implementation. Unfortunately, the utilization of SDN in security areas is still being researched. Traditional firewalls were implemented in SDN in many works [3,6,18,19], but AI-based firewalls, on the other hand, were rarely researched. Only the paper [5] presented a firewall with machine learning for securing cloud data centers. However, the operation of this firewall was severely limited, as it supported only two functions: allow and block packets. Such a system does not meet criteria for modern security systems. Therefore, a protection system with AI was designed and developed by the authors of this article in [11]. The solution was called Industrial Network Protection System (INPS) and its aim was to provide complex security features within a single system. One of the most crucial parts of the system was the AI module, used for network traffic evaluation. The work [11] utilized only the basic design of artificial neural networks and did not explore multiple approaches. The goal of this contribution is to provide a detailed report focused on the development of the AI module decision function. Hence, two approaches to the AI module decision function are defined, implemented and tested in order to get the suitable functionality of the AI module. The rest of the article is organized as follows. In the next chapter, the idea of an industrial network protection system is summarized. Then, the AI module is proposed. The main contribution of this article, the AI module decision function architecture development, is described within this section. Afterwards, as the last part of the article, the AI module performance is tested and evaluated.
2
Industrial Network Protection System
This section describes only the overall architecture of INPS. The more detailed definition can be found in [11]. The INPS is developed to comply with the main operational requirements of industrial networks, i.e. component lifetime, critical infrastructure, fault tolerance, high availability, limited component access, non-upgradability, performance, proprietary communication protocols and system certification [12,23]. It provides centralized network management with monitoring of data flows in realtime. Each data flow can be blocked or allowed and the system also supports more advanced filtering features including redirecting the flow to specified ports, setting QoS values, and storing payload for the application layer inspection. All these features can be performed manually or automatically by the AI module. The architecture of the system can be segmented into four components, as shown in Fig. 1. 1. The main module - it provides the basic INPS functionality including its control via two separate web pages - one for traffic monitoring and filtering, and the second for the system settings. The module is integrated into the ONOS
Industrial Network Protection System
231
Fig. 1. INPS architecture
controller and it interacts with its distributed core directly via customized internal interfaces. 2. The AI module - it performs optional automated filtering. It is located in a separate package, which uses files with trained artificial neural networks for determining traffic decisions. These files can be imported directly or via the application web interface. 3. The ONOS SDN controller - it performs standard networking functions. Based on the network topology, this can include forwarding, routing, loop prevention, load-balancing etc. The controller also has the web user interface for configuration and management of provided features. 4. External AI training application - it is implemented as a stand-alone application. It takes an exported traffic map - a file with data flows and manually set firewall rules. Based on this map, it trains artificial neural networks by one-time offline learning. The output of the application is a file with trained neural networks. This file can then be imported to the AI module. In the next section, a suitable architecture for the AI module decision function is discussed and developed.
3
Artificial Intelligence in INPS
The AI in the INPS has functionality of a decision method. In simple words, this element determines one of the decision states according to incoming flow characteristics. The state space includes the following items: allow, block, forward to selected ports, application layer inspection, and four levels of QoS settings (low, normal, high and critical).
232
F. Holik et al.
Communication protocols used in industrial networks can be classified into two basic types - network layer protocols (L3) and transport layer protocols (L4) [1]. In order to keep the clarity and transparency of the article at the acceptable level, the more general L4 communication is considered in this work. However, the more granular functionality, which includes other possible protocols, can be achieved by the INPS by performing similar steps. Hence, the AI-based decision method is supposed to decide, based on the incoming flow characteristics defined by both source and destination IP addresses and also by the amount of traffic expressed by packets per second. Each IP address is composed from four octets, which are used as unique inputs. In addition, source and destination port numbers are considered as relevant inputs. Therefore, eleven inputs specify the decision. It is shown to advantageously solve such input-output mapping problems using feedforward multilayer neural networks [7,13]. An artificial neural network is a group of algorithms that, generally, aims to recognize relationships in a set of data through a process that emulates the way the biological neural network operates. A feedforward multilayer neural network is one of the mostly applied architectures [10]. It consists of two or more subsequently connected layers of artificial neurons, with signal propagated only in forward direction. During last decades, two types of feedforward multilayer neural networks have proven themselves to be particularly competent for input-output mapping problems. The first of them is a feedforward neural network with dense (fully connected) layers (FFNN), the second one is a feedforward convolutional neural network (CNN). Both mentioned types are considered in this approach as possible architectures for the AI module decision function. The procedure of a neural network design involves training set, and validation set acquisition, training, pruning and validating. The essential information related to this procedure, as it is adapted for the INPS, is described in the following sections. More information regarding the design, as well as the discussion to each part of the process, can be found in [10]. 3.1
Traning Dataset
The used dataset is simulating a highly utilized industrial network and the traffic was generated by a custom developed application [11]. The traffic map, generated for a neural network training, contains 80 000 unique data patterns. The dataset is then divided into training set (70%), validation set (15%) and testing set. Training set is used for a neural network parameters adaptation during training process, validation set is used to identify the best network configuration during training, and testing set is used for final AI module evaluation.
Industrial Network Protection System
3.2
233
FFNN Design
In order to have a capability to solve the problem, the FFNN needs to follow some statements [10]. Specifically, at least one hidden layer with enough neurons needs to be implemented. Besides, monotonic, continuous and bounded transfer (activation) functions must be applied in neurons of the hidden layer. In this contribution, a number of hidden layers is set based on experimental results. A hyperbolic tangent transfer function is considered for the neurons in the hidden layers. Since the FFNN is intended to be used as a decision element, the softmax transfer function is considered for the neurons in the output layer. Apparently, the number of output neurons is defined by the number of elements in the output state space. FFNN design especially consists of training and pruning. The result of training provides suitable weights and biases of FFNN. The pruning is superior to the training and it should convert the redundantly-designed network into a simpler one with no decrease of the performance. In our work, the pruning is based on the repeated training of various FFNNs with different topologies. A mean square error function (E), applied to the validation set, works for us as a cost function. It detects the performance quality of the network. Since the training is a stochastic process, 100 training performances for every considered topology are executed in order to get statistically significant results. The training parameters are set according to the pilot study and based on the previous authors’ experience. Specifically, the Nguyen-Widrow technique [21] is used for the initial setting of the weights and biases in the beginning of the training. Then, the Levenberg-Marquardt search technique [9] is applied for the weights and biases adaptation. The input values in the training set are normalized in order to avoid the unequal influence of individual values during the training process. The parameters of the training and pruning process, including the formal parameters of all the applied techniques, are summarized in Table 1. Table 1. Parameters of the experiments with FFNN Training algorithm
Levenberg-Marquardt search technique
Initialization
Nguyen-Widrow technique
Maximum epochs
500
Stopping criterion
Maximum epochs reached
Adaptive coefficient μ 0.001 Increment μ
10
Decrement μ
0.1
Box plots with the resulting cost function value obtained for various FFNN topologies during training process are shown in Fig. 2. Each box plot consists of median value (central mark), 25th and 75th percentiles (edges of the boxes) and
234
F. Holik et al.
extreme data points (the whiskers). It is clearly visible, that the most suitable performance is provided using a topology with ten neurons in one hidden layer. The topologies with two hidden layers fail to provide better results. However, best representatives of all topologies will be also tested using testing set.
0.17 0.16 0.15 0.14
E
0.13 0.12 0.11 0.1
0.09
[2-1]
[4-1]
[6-1]
[10-1]
[12-1]
[15-1]
[5-5-1]
[7-7-1]
Topology
Fig. 2. Resulting values of error function for various FFNN topologies.
3.3
CNN Design
With current possibilities in parallel computing, CNNs are considered a leading topology among neural networks. A list of well-recognized CNN topologies can be found in [2]. Convolutional neural network layers mainly include three representatives, namely convolutional layer, pooling layer and dense (fully-connected) layer. A good summary of convolutional neural networks and how to implement them can be found in [8]. In these days, a huge number of various topologies of CNNs are available for implementation. In this paper, five different architectures are selected for possible application. Architectures Net1 and Net2 are relatively simple. They consist of the sequence of convolutional layers and max-pooling layers. Both architectures end with a last hidden dense layer with 512 neurons. Then, a softmax layer is applied as the output layer to classify the output. Both architectures are adapted from [20]. In addition to these networks, more complex and widely accepted topologies are selected; LeNet-5 [4,17], AlexNet [16] and VGG-16 net [22]. As well as in the previous case, the mentioned architectures are trained in order to map correctly the dataset described in Sect. 3.1. However, it is generally
Industrial Network Protection System
235
accepted feature of CNNs, that the performance is especially high when applied to multidimensional data processing. Image processing can be stressed as one of the most obvious examples [15]. Hence, it could be useful to find an operation of transformation, which transforms eleven inputs, considered as inputs to the AI module decision function, into two or three-dimensional structure, preferably a graphical figure. As the first engineering approach to this transformation, a polar line chart is suggested in this article, as demonstrated in Fig. 3. The operation is referred to in the further text as depiction.
Fig. 3. Demonstration of visualization of multidimensional data in 2D. In this demonstration, a 6-dimensional vector [1, 0.2, 0.8, 0.6, 0.8, 0.4] is visualized.
Therefore, the whole dataset (see Sect. 3.1) is normalized and transformed to a set of filled polar line chart figures. The charts are stored as [122 × 122] px grayscale images. A selected group of resulting images is demonstrated in Fig. 4.
Fig. 4. Examples of transformed dataset.
Consequently, the training of the selected architectures is performed. The ADAM search technique is used as an optimizer based on its generally acceptable
236
F. Holik et al.
performance [14]. Initial weights are set randomly in this case, with Gaussian distribution (location = 0, scale = 0.05). Similarly to the previous training, the training processes are executed a hundred times and the same cost function is computed over the validation set - see Table 2 for all the parameters of the training processes. The resulting values are shown in Fig. 5. After the training process, LeNet-5 is indicated to be the correct CNN to be implemented for the AI module decision function. However, the best representatives of each architecture are tested in the next section. Table 2. Parameters of the experiments with CNN Input shape
122 × 122 × 1
Training algorithm
ADAM algorithm
Initialization
Normal distribution (mean = 0, std = 0.05)
Maximum epochs
50
Stopping criterion
Maximum epochs reached
Learning rate α
0.001
Exponential decay rate 1 β1 0.9 Exponential decay rate 2 β2 0.999
0.6 0.5
E
0.4 0.3 0.2 0.1 0 Net1
Net2
LeNet-5
AlexNet
VGG-16
Fig. 5. Resulting values of error function for various CNN topologies
Industrial Network Protection System
4
237
FFNN and CNN Testing and Evaluation
In the previous sections, two architectures of feedforward multilayer artificial neural networks are designed to be implemented as a decision method in the AI module. Both are drawn up in Fig. 6.
Fig. 6. Considered architectures of AI module
Now, the best representatives of both architectures are tested using the testing set (see Sect. 3.1). Note that the data in the testing set are not used during training. Accuracy, defined as the ratio of correctly made decisions to all performed decisions, is used as the metric. The resulting values of the metric are shown in Table 3. The results show a number of interesting outcomes. Above all, the highest accuracy is provided by the VGG-16 architecture in combination with depiction of inputs. Thus, a best value of error function during training does not guarantee a best accuracy over the testing set. Then, feedforward neural networks with dense layers provide generally less variant results. Convolutional architectures, on the other hand, go from totally unacceptable to a very reasonable behavior.
238
F. Holik et al. Table 3. Testing results Topology Accuracy
5
2-1
0.7920
4-1
0.9088
6-1
0.9126
10-1
0.9188
12-1
0.8918
15-1
0.9097
5-5-1
0.9084
7-7-1
0.8664
Net1
0.8301
Net2
0.8752
LeNet
0.8346
AlexNet
0.7262
VGG-16
0.9501
Conclusion
As a continuation of previous work of the authors, the development of the AI module, which is a part of the industrial network protection system, is dealt with in this article. Two architectures, based on feedforward multilayer neural networks, are considered for implementation as decision function for the AI module. The extensive development of both architectures is performed then in order to achieve a high accuracy of decision making of the AI module. The tests presented at the end of the paper indicate, that the highest accuracy is provided by the VGG-16 architecture in combination with depiction of inputs. However, this outcome should definitely be understood as the preliminary result, since many aspects of the development procedure still need to be examined. First of all, there exist many possibilities of depiction process. In this paper, only one is considered. It is indispensable possibility of totally different results with different depiction process. The other thing is a computational complexity. The protection system is supposed to provide efficient real-time traffic monitoring and depiction process could be one of the weak spots from this point of view. Hence, these aspects are aimed to be deal with in the future works.
References 1. IEC 61850-5: Communication networks and systems in substation (2003) 2. Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 0588–0592, April 2017
Industrial Network Protection System
239
3. Bakhareva, N.F., Polezhaev, P.N., Ushakov, Y.A., Shukhman, A.E.: SDN-based firewall implementation for large corporate networks (2019) 4. Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Jackel, L.D., LeCun, Y., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No. 94CH3440-5), vol. 2, pp. 77–82, October 1994 5. Cheng, Q., Wu, C., Zhou, H., Zhang, Y., Wang, R., Ruan, W.: Guarding the perimeter of cloud-based enterprise networks: an intelligent SDN firewall. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pP. 897–902, June 2018 6. Fiessler, A., Lorenz, C., Hager, S., Scheuermann, B.: FireFlow - high performance hybrid SDN-firewalls with OpenFlow, October 2018, pp. 267–270 (2019) 7. Gencay, R., Liu, T.: Nonlinear modelling and prediction with feedforward and recurrent networks. Phys. D 108(1–2), 119–134 (1997) 8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http:// www.deeplearningbook.org 9. Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 5(6), 989–993 (1994) 10. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1999) 11. Holik, F., Dolezel, P.: Industrial network protection by SDN-based IPS with AI. In: Communications in Computer and Information Science (2020). In press 12. Holik, F.: Meeting smart city latency demands with SDN. In: Studies in Computational Intelligence, pp. 43–54 (2020) 13. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989) 14. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR, abs/1412.6980 (2014) 15. Kizuna, H., Sato, H.: The entering and exiting management system by person specification using Deep-CNN. In: 2017 Fifth International Symposium on Computing and Networking (CANDAR), pp. 542–545, November 2017 16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks, vol. 2, pp. 1097–1105 (2012) 17. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 18. Li, H., Wei, F., Hu, H.: Enabling dynamic network access control with anomalybased IDS and SDN, pp. 13–16 (2019) 19. Charfadine, S.M., Flauzac, O., Nolot, F., Rabat, C., Gonzalez, C.: Secure exchanges activity in function of event detection with the SDN. In: Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 275, pp. 315–324 (2019) 20. Millstein, F.: Deep Learning with Keras. CreateSpace Independent Publishing Platform (2018) 21. Nguyen, D., Widrow, B.: Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, pp. 21–26 (1990)
240
F. Holik et al.
22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, September 2014 23. Stouffer, K.A., Falco, J.A., Scarfone, K.A.: SP 800-82. Guide to industrial control systems (ICS) security: supervisory control and data acquisition (SCADA) systems, distributed control systems (DCS), and other control system configurations such as programmable logic controllers (PLC). Technical report, Gaithersburg, MD, United States (2011)
Learning and Cognition in Financial Markets: A Paradigm Shift for Agent-Based Models Johann Lussange1(B) , Alexis Belianin2 , Sacha Bourgeois-Gironde3,4 , and Boris Gutkin1,5 1
´ Laboratoire des Neurosciences Cognitives, INSERM U960, D´epartement des Etudes ´ Cognitives, Ecole Normale Sup´erieure PSL University, 29 rue d’Ulm, 75005 Paris, France [email protected] 2 ICEF, National Research University Higher School of Economics and Primakov Institute for World Economy and International Relations, Russian Academy of Sciences, 8 Myasnitskaya st., 101000 Moscow, Russia 3 ´ Institut Jean-Nicod, UMR 8129, D´epartement des Etudes Cognitives, ´ Ecole Normale Sup´erieure PSL University, 29 rue d’Ulm, 75005 Paris, France 4 ´ Laboratoire d’Economie Math´ematique et de Micro´economie Appliqu´ee, EA 4442, Universit´e Paris II Panth´eon-Assas, 4 rue Blaise Desgoffe, 75006 Paris, France 5 Center for Cognition and Decision Making, Department of Psychology, NU University Higher School of Economics, 8 Myasnitskaya st., 101000 Moscow, Russia
Abstract. The history of research in finance and economics has been widely impacted by the field of Agent-based Computational Economics (ACE). While at the same time being popular among natural science researchers for its proximity to the successful methods of physics and chemistry for example, the field of ACE has also received critics by a part of the social science community for its lack of empiricism. Yet recent trends have shifted the weights of these general arguments and potentially given ACE a whole new range of realism. At the base of these trends are found two present-day major scientific breakthroughs: the steady shift of psychology towards a hard science due to the advances of neuropsychology, and the progress of reinforcement learning due to increasing computational power and big data. We outline here the main lines of a computational research study where each agent would trade by reinforcement learning. Keywords: Financial markets · Agent-based models systems · Reinforcement learning
1
· Multi-agent
Past Research
The field of finance and economics has used various approaches to model financial markets dynamics. Among these we can historically distinguish three important c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 241–255, 2021. https://doi.org/10.1007/978-3-030-55190-2_19
242
J. Lussange et al.
classes of models. The first and most encountered are statistical models which are calibrated to fit times series like past prices history. These can bring interesting results pertaining to general volatility [1,2] or log-return forecasting [3] as long as the variability of the parameters of calibration is not too strong. The second are known as Dynamic Stochastic General Equilibrium (DSGE) models which provide explicit agent-based microfoundations for the sectoral dynamics and aggregate fluctuations [4]. Modern developments of DSGE models strive to add realism to the basic model structure, accounting for agent heterogeneity, bounded rationality and imperfect learning, and (in the New Keynesian versions) replace the rational expectations hypothesis by market rigidities and exogenous stochastic shocks to emulate true market environment conditions [5–7]. These two classes of models have shown a variety of promising results over the years. However if we consider a top-down approach to system inference, we can say that they are based on rough approximations of reality [8,9], and will not explain the wealth and diversity of price microstructure traditionally seen in markets. This leads to a third class of models called Agent-Based Models (ABM) or sometimes Multi-Agent Systems (MAS) to probe and emulate markets from a pure bottom-up approach [10–12], and considering them as the complex systems [13] that they truly are. Among financial ABM models, we can also include order book models [14,15] even though some may see those as a midway approach. In a financial ABM, market investors or traders are modelled as agents trading together via an order book (such as a double auction order books [16]). This is a discrete-time algorithm taking in the trading bids at t and offers of specific securities from all agents, and matching them at transaction prices which then collectively define the price of the market for such securities at the time step t + 1. ABM have been used in many scientific disciplines [17–19]. In economics, these models have emerged by way of psychological learning models [20], evolutionary biology [21,22], and especially game theory [23–27]. In recent years, ABM also became popular as a tool to study macroeconomics [28–31]—specifically, the impact of trading taxes, market regulatory policies, quantitative easing, and the general role of central banks [32]. ABM can also play an important role in analysis of the impact of the cross-market structure [33]. From a regulatory point of view, this implies a general stronger role for ABM to play [34]. Jean-Claude Trichet declared for instance in 2010: “As a policymaker during the crisis, I found the available models of limited help. In fact, I would go further: in the face of the crisis, we felt abandoned by conventional tools. [...] Agent-based modelling dispenses with the optimisation assumption and allows for more complex interactions between agents. Such approaches are worthy of our attention.” A decade after the financial crisis, the structural causes to the repetition of such systemic risks in financial markets are far from being eliminated. One could hence say that their social and political implications makes ABM research all the more relevant today as it was a decade ago. Even if ABM are often designed with many parameters and hence subject to the delicate issue of overfitting, one of their biggest advantages is that they
Financial ABM and AI
243
require fewer assumptions (e.g normal distribution of returns, no-arbitrage) than top-down models [35]. Added to this, ABM display the famous complexity emergence proper to bottom-up approaches [36] and can hence show completely new regimes of transitions, akin to phase transitions in statistical physics [37]. However, being models, ABM are of course imperfect and need a thorough and lengthy cross-market validation [38]. Yet at the same time, one should keep in mind that such a general and cautious validation of ABM shall be in fact applicable and necessary to any other model as well [39]. From now on we consider the application of ABM to financial markets. We shall note that among financial ABM some exclusively pertain to high-frequency trading [40,41], while other take both high- and low-frequency into account [42– 44]. Another popular topic of literature in financial ABM is the emulation of the widespread Minority Game [45,46], which formally is not a financial market problem, but a general game theory problem which can be related to the financial issues of pricing and forecasting. In order to generate a dynamic trading activity in financial ABM, a basic economic assumption is that the agents disagree on the present security price or trade at different frequencies (which possibility is sometimes explicitly denied in economics literature [47]), and are hence willing to long or short a same security at different prices. In other words, there must be some sort of price disagreement happening and an original pricing mechanism at the discretion of each individual agent. In the literature, this mechanism of pricing in financial ABM has in general been designed according to two basic mechanisms of trading: in some models at least a part of the agents trade in a random way as ‘noise traders’ [40,41,48–52], and in other models agents use realistic trading strategies known to real financial markets, depending on the variability and stochasticity of the market [53–56].
2
Accuracy
Over the years, economic research (and especially econophysics research) has gradually discovered a certain number of non-trivial statistical features of or stylised facts about financial times series. These stylised facts are based on variations in prices that have universal statistical properties in common from market to market, over different types of instruments, and time periods. Among these, those pertaining to returns distribution or volatility clustering for example were gradually discovered during the nineties: Kim-Markowitz [57], Levy-Levy-Solomon [58–64], Cont-Bouchaud [65], Solomon-Weisbuch [66], LuxMarchesi [53,67], Donangelo-Sneppen [68–71], Solomon-Levy-Huang [72]. It was also not before this time that ABM started to emulate these stylised facts. The importance of the universality of stylised facts to really gauge financial markets comes from the fact that the price evolutions of different markets may have very different exogenous or endogenous causes. As a consequence they highlight general underlying financial mechanisms that are market-independent, and which can in turn be exploited for ABM architecture design. From a scientific point of view, stylised facts are hence extremely interesting and their
244
J. Lussange et al.
faithful emulation has been an active topic of research in the past fifteen years or so [73,74]. Their definite characteristics has varied ever so slightly over the years and across literature, but the most widespread and unanimously accepted stylised facts can in fact be grouped in three broad, mutually overlapping categories: Non-gaussian returns: the returns distribution is non-gaussian and hence asset prices should not be modeled as brownian random walks [75,76], despite what is taught in most text books, and applied in sell-side finance. In particular the real distributions of returns are dissimilar to normal distributions in that they are: (i) having fatter tails and hence more extreme events, with the tails of the cumulative distribution being well approximated [75,77] by a power law of exponent belonging to the interval [2, 4] (albeit this is still the subject of a discussion [78,79] famously started by Mandelbrot [80] and his Levy stable model for financial returns), (ii) negatively skewed and asymmetric in many observed markets [81] with more large negative returns than large positive returns, (iii) platykurtic and as a consequence having less mean-centered events [82], (iv) with multifractal k-moments so that their exponent is not linear with k, as seen in [83–86]. Clustered volatilities: market volatility tends to aggregate or form clusters [2]. Therefore compared to average, the probability to have a large volatility in the near-future is greater if it was large also in the near-past [73,87,88]. Regardless of whether the next return is positive or negative, one can thus say that large (resp. small) return jumps are likely followed by the same [80], and thus display some sort of long memory behaviour [89]. Because volatilities and trading volumes are often correlated, we also observe a related volume clustering. Decaying auto-correlations: the auto-correlation function of the returns of financial time series are basically zero for any value of the auto-correlation lag, except for very short lags (e.g. half-hour lags for intraday data) because of a mean-reverting microstructure mechanism for which there is a negative autocorrelation [81,89]. This is sometimes feeding the general argument of the wellknown Efficient Market Hypothesis [90,91] that markets have no memory and hence that one cannot predict future prices based on past prices or information [87,88]. According to this view, there is hence no opportunity for arbitrage within a financial market [77]. It has been observed however that certain nonlinear functions of returns such as squared returns or absolute returns display certain steady auto-correlations over longer lags [89]. Since then, ABM of financial markets have steadily increased in realism and can generate progressively more robust scaling experiments. We can specifically highlight the potential of these simulations to forecast real financial time series via reverse-engineering. A promising recent perspective for such use of ABM has been highlighted in the field of statistics by [92–94]: the agent-based model parameters are constrained to be calibrated and fit real financial time series and then allowed to evolve over a given time period as a basic forecast measure on the original time series used for calibration. With this, one could thus say
Financial ABM and AI
245
that ABM are now reaching Friedman’s [95] methodological requirement that a theory must be “judged by its predictive power for the class of phenomena which it is intended to explain.”
3
Calibration
Just as any other model, the parameters of the ABM must be calibrated to real financial data in order to perform realistic emulation. This part of calibration is together with architecture design the most technical and crucial aspect of the ABM [96]. Yet at the same time in the literature most calibration techniques are done by hand, so that the stylised facts are re-enacted in a satisfactory way. Therefore so far the ABM calibration step is often performed in way that is sub-optimal [48,97,98]. On the other hand, an efficient methodology for calibration would need two important steps. First a fully automated meta-algorithm in charge of the calibration should be incorporated, so that a decently large amount of financial data could be treated and the aforementioned scope of validity of ABM studied via cross-market validations [48,99]. This is important as the robustness of a calibration always relies on many runs of ABM simulations [100]. Second, this calibration meta-algorithm should be working through the issues of overfitting and underfitting, which may constitute a severe challenge due to the ever-changing stochastic nature of financial markets. Part of this calibration problem is to thoroughly and cautiously define the parameter space. This step is particularly sensitive, since it can lead to potentially problematic simplifications. For instance, what should be the size of the time step of the simulation? ABM with a daily time tick will of course produce time series that are much coarser than those coming from real financial data, which include a wealth of intraday events [101–103].
4
Trends
As previously said, emergence and recent progress of two separate fields of research will likely have a major upcoming impact on economic and financial ABM. The first one is the recent developments in cognitive neuroscience and neuroeconomics [104–108], which has revolutionised behavioural economics with its ever lower cost experimental methods of functional magnetic resonance imaging (fMRI), electro-encephalography (EEG), or magneto-encephalography (MEG) applied to decision [109,110] and game theories [111]. The second one concerns the recent progress of reinforcement learning which has reach in some tasks superhuman performance [112,113]. Among the multiple reinforcement learning research fields, we can highlight in particular the recent progress of self-play reinforcement learning [113,114], end-to-end reinforcement learning and artificial neural networks [115–117], reinforcement learning and Monte Carlo tree search [118], multi-agent learning [119–121], not to mention new types of unsupervised algorithms [122,123].
246
J. Lussange et al.
To the ABM field, this implies that the realism of the economic agents can be greatly increased via these two recent technological developments: the agents can be endowed with numerous cognitive and behavioural biases emulating those of human investors for instance, but also their trading strategies can be more faithful to reality in the sense that they can be dynamic and versatile depending on the general stochasticity or variability of the market, thanks to the fact that via reinforcement learning they will learn to trade and invest. In this respect, we shall mention with recent ABM literature [124] the recent attempts to design order book models with reinforcement learning [125], and the study of market making by reinforcement learning in a zero-intelligence ABM framework [126]. At a time where the economy and financial markets are progressively more and more automated, this impact of reinforcement learning on ABM should thus be explored. One should keep in mind that the central challenge (and in fact hypotheses) of ABM are the realism of the agents, but also the realism of the economic transactions and interactions between the agents.
5
Cognition and Behaviour
The reinforcement learning framework proper to all agents gives the possibility to implement certain traits or biases in the agents’ cognition and behaviour, that are similar to those of human investors. The correspondance between reinforcement learning and decision making in the brain is an active field of research [127–130]. One could then reverse-calibrate these agents’ traits and biases implemented in the reinforcement learning architecture of the autonomous agents in order to quantitatively gauge their impact on financial market dynamics at the macrolevel. As financial stocks markets are increasingly impacted by the role played by algorithmic trading and the automation of transaction orders, the relevance of such a study hangs on the tight portfolio management constraints (e.g. risk management, liquidity preferences, acceptable drawdown levels, etc.) imposed by human investors, which algotrading strategies take as cost-functions. We hence propose here a set of a dozen of particularly interesting and important cognitive and behavioural biases, and their possible implementation within reinforcement learning algorithmics. In such a framework, each agent would be initialised at the beginning of the simulation with some or all of the following cognitive and behavioural biases, according to specific biases distributions in the population of agents: Belief Revision: Defined as to changing one’s belief insufficiently in the face of new evidence. Each agent is naturally endowed with a parameter relevant to this bias in reinforcement learning, called the reinforcement learning rate. Hyperbolic Discounting: Defined as having greater economic utility for immediate rewards, rather than delayed ones. This economic value could follow a quasi-hyperbolic discount function following [131]. It could be modeled through a certain amount of the agents having a shorter investment horizon, or via a quasi-hyperbolic function accordingly weighting the returns of the agents.
Financial ABM and AI
247
Loss Aversion: Defined as demanding much more to sell an asset than to buy it. Because of loss aversion, the agent would for instance favour an ask price much higher than the bid price, in its transaction order sent to the order book. Illusory Superiority (Resp. Inferiority): Defined as overestimating (resp. underestimating) one’s own abilities compared to others. This could be implemented in two ways: because market volatility is generally considered in portfolio management as a main indicator of risk or uncertainty, the agent’s illusory superiority (resp. inferiority) could first be modelled via an alteration of the agent’s state, by decreasing (resp. increasing) its perceived stock price volatility. Another possible implementation would be to enhance (resp. lessen) the agent’s past returns. These updates could be performed with varying degrees of frequency. Fear and Greed : These are the two main driving forces of economic investors. For a long-only equity strategy implemented in the agent, it would display fear (resp. greed) by being more prone to short (resp. long) equity. One could also implement such fear and greed in the agents by varying the type of order sent to the order book, e.g. limit vs. market orders. These updates could be performed with varying degrees of frequency. Exaggerated (Resp. Lowered) Expectation: Also called regressive bias, this is defined as overestimating (resp. underestimating) high values and likelihoods while underestimating (resp. overestimating) low values and likelihoods. This could be implemented in two ways: the agent state relevant to the perceived stock price trend (weather it would increase, decrease, or remain stable) could be modified accordingly, or the stock price volatility perceived by the agent could be modified accordingly. These updates could be performed with varying degrees of frequency. Negativity (Resp. Positive) Bias: Defined as better recalling unpleasant (resp. pleasant) memories than pleasant (resp. unpleasant) ones. This could easily be modelled by amending the agent’s past returns accordingly. This is an easy implementation in the reinforcement learning framework applied to portfolio optimisation, since the return accomplished by the agent is fully known at the end of its investment time horizon, and would not require a computationally heavy operation on the agent policy and Q-function to forget and relearn the associated impact of its returns. These updates could be performed with varying degrees of frequency. Egocentric Bias: Defined as remembering the past as better or worse than it was in a self-serving manner. This could be modelled as above.
6
Application
Besides studying the agents’ collective trading interactions with one another, an ABM stock market simulation could be used to probe the following specific fields of study:
248
J. Lussange et al.
Market Macrostructure: one could study how human cognitive and behavioural biases change agents’ behaviour at a level of market macrostructure. The two main topics of market macrostructure that would be of interest are naturally all those pertaining to systemic risk [132,133] (bubble and crash formation, problem of illiquidity), and also those related to herding-type phenomena [134] (information cascades, rational imitation, reflexivity). A key aspect of this work would be also to carefully calibrate and compare the stylised facts to real financial data (we need to see standardised effects such as volatility clustering, leptokurtic log-returns, fat tails, long memory in absolute returns). Indeed some of these macro-effects have been shown to arise from other agent-based market simulators [11,135], however it is not yet fully understood how these are impacted by the agents learning process nor their cognitive and behavioural biases such as risk aversion, greed, cooperation, inter-temporality, and the like. Price Formation: one could study how these biases change the arbitrage possibilities in the market via their blatant violation of the axioms of the Efficient Market Hypothesis [136]. Indeed agent-based market simulators so far have often relied on the use of the aforementioned ’noise traders’ in order to generate the necessary conditions for basic business activity [12,33,137]. The novel aspect here would be to go one step further and replace this notion of purely random trading by implementing specific neuroeconomic biases common to real human behaviours. Another problem of past research in agent-based stock market simulators is that these have often relied on agents getting their information for price forecasting from a board of technical indicators common to all agents [138]. In contrast, we should develop agent models that allow each agent to be autonomous and use and ameliorate its own forecasting tools. This is a crucial aspect in view of the well-known fact that information is at the heart of price formation [141]. We hypothesise that such agent learning dynamics is fundamental in effects based on market reflexivity and impact on price formation, and hints to the role played by fundamental pricing versus technical pricing of assets. Of major interest is the issue of global market liquidity provision and its relation to the law of supply and demand and bid-ask spread formation [139,140,142]. Credit Risk : one could study the effect of market evolutionary dynamics when the agents are allowed to learn and improve their trading strategy and cognitive biases via reinforcement learning. In other words, it would be interesting to see the agents population survival rates [143–145] (cf. credit risk), and overall price formation with respect to the arbitrage-free condition of markets when we increase the variability of the intelligence in the trading agents [146]. Indeed, recent studies [137,147,148] suggest that arbitrage opportunities in markets arise mainly from the collective number of non-optimal trading decisions that shift the prices of assets from their fundamental values via the law of supply and demand. Another topic would be to compare such agent survivability with Zipf’s law [145].
Financial ABM and AI
7
249
Preliminary Results
We have recently started to develop an ABM stock market simulator with autonomous agents trading by reinforcement learning, whose general architecture is described in [149]. In such a simulator, whose parameters are calibrated to real data coming from the London Stock Exchange between 2008 and 2018, a number I of agents trade over T time steps a quantity of J different stocks. These agents first learn by reinforcement how to forecast future prices, with actions as econometric parameters set for mean-reverting, averaging, or trendfollowing, and states depending on the market volatilities at different time scales, and rewards as the mismatch between such predictions and the realised market price. A second reinforcement learning algorithm is used by each agent to trade according to this econometric output and learned information, with its action being to sell, buy or hold its assets, and at what price, with the reward being defined as the realised cashflow consequent to this decision. At each time step of the simulation, agents thus send their trading orders to a centralised order book, which shuffles and clears them by matching orders. The latest transaction sets the market price of the stock for the next time step, which in turn is used by all agents to update their state. We show below some early results from the simulations of this ABM model.
Fig. 1. (a) Absolute logarithmic price returns of the simulated stock market as a function of time. (b) Price volatility at two-weeks intervals of the simulated stock market as a function of time. (c) Trading volumes of the simulated stock market as a function of time. The simulation is for 500 agents over 2875 time steps.
250
J. Lussange et al.
We first want to check on the model capacity to emulate the aforementioned clustering activity of the agents. We show this on Fig. 1, as an output of the simulation for 500 agents trading a given stock over 2875 time steps wrt. to absolute logarithmic price returns, price volatilities at a two weeks rolling interval, and trading volumes. We then want to see as a preliminary result whether the agents learn correctly, and if their performance can be revealed by de-trending their profits from market prices. We show on Fig. 2 the performances of four randomly selected agents over 2875 time steps, de-trended from stock price. We can notice, after a number of time steps corresponding to about half the total simulation time, a real performance and hence learning process for the second agent.
Fig. 2. Example of certain agents’ profit, de-trended from market performance and as a function of time. The simulation is for 500 agents over 2875 time steps.
8
Conclusion
We have thus highlighted new possible exciting perspectives for financial ABM, where the agents would be designed with neuroeconomic biases and having trading or investment strategies updated by reinforcement learning. One should recall that the main argument against ABM, and indeed their main challenge, has always been about the realism of the agents, albeit one should also consider the realism of the economic transactions. We thus argue that these recent trends should set a totally new level of realism for financial ABM. In particular, whereas early financial ABM generations increased their realism of emulation of real stock markets by re-enacting stylised facts gradually during the late nineties, and whereas the issue of calibration is still undergoing the process of automation so that ABM may be validated on large scales of data, we expect these trends to bring in several revolutionary breakthroughs, and the emergence and recognition of ABM as the relevant tools that they are in finance and economics. Acknowledgment. This work was supported by the RFFI grant nr. 16-51-150007 and CNRS PRC nr. 151199, and received support from FrontCog ANR-17-EURE-0017. A preliminary preprint of this work was uploaded to arXiv [150].
Financial ABM and AI
251
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33.
Bollerslev, T.: CREATES research paper 2008, p. 49 (2008) Engle, R.F.: Econometrica 50(4), 987 (1982) Brownlees, C.T., Engle, R.F., Kelly, B.T.: J. Risk 14(2), 3 (2011) Sbordone, A.M., Tambalotti, A., Rao, K., Walsh, K.J.: Econ. Policy Rev. 16(2) (2010) Evans, G.W., Honkapohja, S.: Learning and Expectations in Macroeconomics. Princeton University Press, Princeton (2001) Eusepi, S., Preston, B.: Am. Econ. Rev. 101, 2844 (2011) Massaro, D.: J. Econ. Dyn. Control 37, 680 (2013) Farmer, J.D., Foley, D.: Nature 460(7256), 685 (2009) Grauwe, P.D.: Public Choice 144(3–4), 413 (2010) Tesfatsion, L., Judd, K.L.: Handbook of Computational Economics: Agent-Based Computational Economics, vol. II. Elsevier, Amsterdam (2006) Samanidou, E., Zschischang, E., Stauffer, D., Lux, T.: Rep. Prog. Phys. 70(3), 409 (2007) LeBaron, B.: Building the Santa Fe Artificial Stock Market (2002) Bonabeau, E.: Harvard Bus. Rev. 80(3), 109 (2002) Smith, E., Farmer, D.J., Gillemot, L., Krishnamurthy, S.: Quant. Finance 3, 481 (2003) Huang, W., Lehalle, C.-A., Rosenbaum, M.: J. Am. Stat. Assoc. 110, 509 (2015) Mota, R., Larralde, H.: arXiv:1601.00229 (2016) Macal, C.M., North, M.J.: J. Simul. 4, 151 (2010) Axelrod, R.M.: The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press, Princeton (1997) Grimm, V., Berger, U., Bastiansen, F., Eliassen, S., Ginot, V., Giske, J., GossCustard, J., Grand, T., Heinz, S.K., Huse, G., et al.: Ecol. Model. 198(1), 115 (2006) Bush, R.R., Mosteller, F.: Stochastic Models for Learning. Wiley, Oxford (1955) Smith, J., Price, D.: Nature 246, 15 (1973) Taylor, P.D., Jonker, L.B.: Math. Biosci. 40, 145 (1978) Mookherjee, D., Sopher, B.: Games Econ. Behav. 7, 62 (1994) Erev, I., Roth, A.E.: Am. Econ. Rev. 88, 848 (1998) Erev, I., Roth, A.E.: PNAS 111, 10818 (2014) Camerer, C.F., Ho, T.H.: PNAS 67, 827 (1999) Fudenberg, D., Levine, D.: The Theory of Learning in Games. MIT Press, Cambridge (1998) Colander, D., Howitt, P., Kirman, A., Leijonhufvud, A., Mehrling, P.: Am. Econ. Rev. 236–240 (2008) Dosi, G., Fagiolo, G., Napoletano, M., Roventini, A.: J. Econ. Dyn. Control 37(8), 1598 (2013) Gualdi, S., Tarzia, M., Zamponi, F., Bouchaud, J.-P.: J. Econ. Interact. Coord. 1–31 (2016) Gualdi, S., Tarzia, M., Zamponi, F., Bouchaud, J.-P.: J. Econ. Dyn. Control 50, 29 (2015) Westerhoff, F.H.: Jahrbucher Fur Nationalokonomie Und Statistik 228(2), 195 (2008) Xu, H.-C., Zhang, W., Xiong, X., Zhou, W.-X.: Math. Prob. Eng. 2014, 563912 (2014)
252
J. Lussange et al.
34. Boero, R., Morini, M., Sonnessa, M., Terna, P.: Agent-Based Models of the Economy, From Theories to Applications. Palgrave Macmillan, New York (2015) 35. LeBaron, B.: Agent-based computational finance. In: The Handbook of Computational Economics, vol. 2. Elsevier, Amsterdam (2005) 36. Heylighen, F.: Complexity and Self-Organization. CRC Press, Boca Raton (2008) 37. Plerou, V., Gopikrishnan, P., Stanley, H.E.: Nature 421, 130 (2003) 38. Hamill, L., Gilbert, N.: Agent-Based Modelling in Economics. Wiley, Hoboken (2016) 39. Wilcox, D., Gebbie, T.: arXiv:1408.5585 (2014) 40. Hanson, T.A.: Midwest finance association 2012 annual meetings paper (2011) 41. Bartolozzi, M.: Eur. Phys. J. B 78, 265 (2010) 42. Wah, E., Wellman, M.P.: Proceedings of the Fourteenth ACM Conference on Electronic Commerce, pp. 855–872 (2013) 43. Paddrik, M.E., Hayes, R.L., Todd, A., Yang, S.Y., Scherer, W., Beling, P.: SSRN 1932152 (2011) 44. Aloud, M., Tsang, E., Olsen, R.: Business science reference, Hershey (2013) 45. Challet, D., Marsili, M., Zhang, Y.-C.: Minority Games: Interacting Agents in Financial Markets. Oxford University Press, Oxford (2005) 46. Martino, A.D., Marsili, M.: J. Phys. A 39, 465 (2006) 47. Kyle, A.S., O. A.: Economertrica forthcoming (2016) 48. Preis, T., Golke, S., Paul, W., Schneider, J.J.: Europhys. Lett. 75(3), 510 (2006) 49. Farmer, J.D., Patelli, P., Zovko, I.I.: Proc. Natl. Acad. Sci. U.S.A. 102(6), 2254 (2005) 50. Maslov, S.: Phys. A 278(3), 571 (2000) 51. Challet, D., Stinchcombe, R.: Quant. Finance 3(3), 155 (2003) 52. Schmitt, T.A., Schfer, R., Mnnix, M.C., Guhr, T.: Europhys. Lett. 100 (2012) 53. Lux, T., Marchesi, M.: J. Theor. Appl. Finance 3, 67 (2000) 54. Cont, R.: Volatility clustering in financial markets: empirical facts and agentbased models. Springer (2007) 55. Bertella, M.A., Pires, F.R., Feng, L., Stanley, H.E.: PLoS ONE 9(1), e83488 (2014) 56. Alfi, V., Cristelli, M., Pietronero, L., Zaccaria, A.: Eur. Phys. J. B 67(3), 385 (2009) 57. Kim, G., Markowitz, H.M.: J. Portfolio Manag. 16, 45 (1989) 58. Levy, M., Solomon, S.: Int. J. Mod. Phys. C 7, 595 (1996) 59. Levy, M., Levy, H., Solomon, S.: Econ. Lett. 45, 103 (1994) 60. Levy, M., Levy, H., Solomon, S.: J. Phys. I 5, 1087 (1995) 61. Levy, M., Solomon, S.: Int. J. Mod. Phys. C 7, 65 (1996) 62. Levy, M., Persky, N., Solomon, S.: Int. J. High Speed Comput. 8, 93 (1996) 63. Levy, M., Levy, H., Solomon, S.: Phys. A 242, 90 (1997) 64. Levy, M., Levy, H., Solomon, S.: Microscopic Simulation of Financial Markets. Academic Press, New York (2000) 65. Cont, R., Bouchaud, J.P.: Macroecon. Dyn. 4, 170 (2000) 66. Solomon, S., Weisbuch, G., de Arcangelis, L., Jan, N., Stauffer, D.: Phys. A 277(1), 239 (2000) 67. Lux, T., Marchesi, M.: Nature 397, 498 (1999) 68. Donangelo, R., Hansen, A., Sneppen, K., Souza, S.R.: Phys. A 283, 469 (2000) 69. Donangelo, R., Sneppen, K.: Phys. A 276, 572 (2000) 70. Bak, P., Norrelykke, S., Shubik, M.: Phys. Rev. E 60, 2528 (1999) 71. Bak, P., Norrelykke, S., Shubik, M.: Quant. Finance 1, 186 (2001)
Financial ABM and AI 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87.
88. 89.
90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108.
253
Huang, Z.F., Solomon, S.: Eur. Phys. J. B 20, 601 (2000) Lipski, J., Kutner, R.: arXiv:1310.0762 (2013) Barde, S.: School of economics discussion papers 04. University of Kent (2015) Potters, M., Bouchaud, J.-P.: Phys. A 299, 60 (2001) Plerou, V., Gopikrishnan, P., Amaral, L.A., Meyer, M., Stanley, H.E.: Phys. Rev. E 60(6), 6519 (1999) Cristelli, M.: Complexity in Financial Markets. Springer, Cham (2014) Weron, R.: Int. J. Mod. Phys. C 12, 209 (2001) Eisler, Z., Kertesz, J.: Eur. Phys. J. B 51, 145 (2006) Mandelbrot, B.: J. Bus. 394–419 (1963) Cont, R.: Quant. Finance 1, 223 (2001) Bouchaud, J., Cont, R., Potters, M.: Scale invariance and beyond. In: Proceedings of CNRS Workshop on Scale Invariance. Springer, Les Houches (1997) Ding, Z., Engle, R., Granger, C.: J. Empir. Finance 1, 83 (1993) Lobato, I.N., Savin, N.E.: J. Bus. Econ. Stat. 16, 261 (1998) Vandewalle, N., Ausloos, M.: Phys. A 246, 454 (1997) Mandelbrot, B., Fisher, A., Calvet, L.: A multifractal model of asset returns. Cowles Foundation for Research and Economics (1997) de Vries, C., Leuven, K.: Stylized facts of nominal exchange rate returns. Working Papers from Purdue University, Krannert School of Management Center for International Business Education and Research (CIBER) (1994) Pagan, A.: J. Empir. Finance 3, 15 (1996) Cont, R.: Volatility clustering in financial markets: empirical facts and agentbased models. In: Kirman, A., Teyssiere, G. (eds.) Long Memory in Economics. Springer (2005) Fama, E.: J. Finance 25, 383 (1970) Bera, A.K., Ivliev, S., Lillo, F.: Financial Econometrics and Empirical Market Microstructure. Springer, Cham (2015) Wiesinger, J., Sornette, D., Satinover, J.: Comput. Econ. 41(4), 475 (2012) Andersen, J.V., Sornette, D.: Europhys. Lett. 70(5), 697 (2005) Zhang, Q.: Disentangling financial markets and social networks: models and empirical tests. Ph.D. thesis, ETH Zurich (2013) Friedman, M.: Essays in Positive Economics. Chicago University Press, Chicago (1953) Canova, F., Sala, L.: J. Monetary Econ. 56(4), 431 (2009) Chiarella, C., Iori, G., Perello, J.: J. Econ. Dyn. Control 33, 525 (2009) Leal, S.J., Napoletano, M., Roventini, A., Fagiolo, G.: J. Evol. Econ. 26, 49 (2016) Fabretti, A.: J. Econ. Interact. Coord. 8, 277 (2013) Axtell, R.: Center on social and economic dynamics working paper 17 (2000) Gilli, M., Winker, P.: Comput. Stat. Data Anal. 42, 299 (2003) Farmer, J.D., Joshi, S.: J. Econ. Behav. Organ. 49, 149 (2002) Kirman, A.: Epidemics of opinion and speculative bubbles in financial markets. In: Money and Financial Markets. Macmillan, New York (1991) Glimcher, P.W., Camerer, C.F., Fehr, E., Poldrack, R.A.: Neuroeconomics: Decision Making and the Brain. Academic Press, Cambridge (2009) Camerer, C.: J. Econ. Lit. 51(4), 1155 (2013) Martino, B.D., Doherty, J.P.O., Ray, D., Bossaerts, P., Camerer, C.: Neuron 79(6), 1222 (2013) Camerer, C.: Ann. Rev. Econ. 5, 425 (2013) Camerer, C.: Neuroscience, game theory, monkeys. TEDx talk (2013)
254
J. Lussange et al.
109. Kahneman, D., Tversky, A.: Econometrica 47(2), 263 (1979) 110. Frydman, C., Barberis, N., Camerer, C., Bossaerts, P., Rangel, A.: NBER working paper 18562 (2012) 111. Camerer, C.F.: Behavioral Game Theory: Experiments on Strategic Interaction. Princeton University Press, Princeton (2003) 112. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al.: Science 362, 1140 (2018) 113. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Nature 550, 354 (2017) 114. Doll, B.B., Duncan, K.D., Simon, D.A., Shohamy, D.S., Daw, N.D.: Nat. Neurosci. 18, 767 (2015) 115. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 116. Schmidhuber, J.: Neural Netw. 61, 85 (2015) 117. Turchenko, V., Beraldi, P., Simone, F.D., Grandinetti, L.: The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (2011) 118. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., et al.: Nature 529, 484 (2016) 119. Tuyls, K., Weiss, G.: AI Mag. Fall (2012) 120. Heinrich, J., Silver, D.: AAAI Workshop (2014) 121. Heinrich, J., Silver, D.: IJCAI (2015) 122. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Lee, H., Schiele, B.: ICML (2016) 123. Lerer, A., Gross, S., Fergus, R.: ICML (2016) 124. Biondo, A.E.: J. Econ. Interact. Coord. 14(3) (2018) 125. Spooner, T., Fearnley, J., Savani, R., Koukorinis, A.: Proceedings of the 17th AAMAS (2018) 126. Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., Veloso, M.: arXiv:1911.05892 (2019) 127. Lefebvre, G., Lebreton, M., Meyniel, F., BourgeoisGironde, S., Palminteri, S.: Nat. Hum. Behav. 1, 1 (2017) 128. Duncan, K., Doll, B.B., Daw, N.D., Shohamy, D.: Neuron 98, 645 (2018) 129. Momennejad, I., Russek, E., Cheong, J., Botvinick, M., Daw, N.D., Gershman, S.J.: Nat. Hum. Behav. 1, 680–692 (2017) 130. Palminteri, S., Khamassi, M., Joffily, M., Coricelli, G.: Nat. Commun. 1–14 (2015) 131. Laibson, D.: Q. J. Econ. 112(2), 443 (1997) 132. The financial crisis inquiry report. Official government edition (2011) 133. Fouque, J.-P., Langsam, J.A.: Handbook on Systemic Risk. Cambridge University Press, Cambridge (2013) 134. Bikhchandani, S., Sharma, S.: Int. Monetary Fund. 47(3) (2001) 135. Bikhchandani, S., Hirshleifer, D., Welch, I.: J. Polit. Econ. 100(5), 992 (1992) 136. Fama, E.: J. Bus. 38, 34 (1965) 137. Sornette, D.: arXiv:1404.0243v1 (2014) 138. da Costa Pereira, C., Mauri, A., Tettamanzi, A.G.B.: IEEE Computer Society WIC ACM (2009) 139. Kyle, A.S.: Econometrica 53, 1315 (1985) 140. Sanford, G.J., Miller, M.H.: J. Finance 43, 617 (1988) 141. Grossman, S.J., Stiglitz, J.E.: Am. Econ. Rev. 70, 393 (1980) 142. Cason, T.N., Friedman, D.: Exp. Econ. 2, 77 (1999)
Financial ABM and AI
255
143. Evstigneev, I.V., Hens, T., Schenk-Hopp, K.R.: Evolutionary finance. In: Handbook of Financial Markets, Dynamics and Evolution. North-Holland, Elsevier (2009) 144. Saichev, A., Malevergne, Y., Sornette, D.: Theory of Zipf’s Law and Beyond. Lecture Notes in Economics and Mathematical Systems, vol. 632. Springer, Heidelberg (2010) 145. Malevergne, Y., Saichev, A., Sornette, D.: J. Econ. Dyn. Control 37(6), 1195 (2013) 146. Hasanhodzic, J., Lo, A.W., Viola, E.: Quant. Finance 11(7), 1043 (2011) 147. Malkiel, B.G.: A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing, 10th revised edn. W. W. Norton and Company (2012) 148. Black, F.: J. Finance 41(3), 529 (1985) 149. Lussange, J., Bourgeois-Gironde, S., Palminteri, S., Gutkin, B.: arXiv:1909.07748 (2019) 150. Lussange, J., Belianin, A., Bourgeois-Gironde, S., Gutkin, B.: arXiv:1801.08222 (2018)
Agent-Based Simulation for Testing Vehicle-On-Demand Services in Rural Areas Marius Becherer(B) and Achim Karduck(B) Furtwangen University, Robert-Gerwig-Platz 1, 78120 Furtwangen, Germany [email protected], [email protected] https://www.hs-furtwangen.de
Abstract. Conventional road traffic is reaching its limits in many cities because the existing infrastructure is often not designed for the number of vehicles used in a city. This causes inefficient traffic, which is apparent as congestion, and wasting of spatial resources. Therefore, the need for new mobility concepts such as “vehicle-on-demand” can open up new possibilities to succeed in challenges in contemporary mobility. This concept has already been simulated in various large cities scenarios with promising results. However, rural areas have not yet been taken into account in such simulations. Therefore, the concept “vehicle-on-demand” is implemented as part of the open-source simulation framework Simulation of Urban Mobility (SUMO). The considerations in the implementation of the vehicle-on-demand service are presented, and finally, the implemented service is evaluated with the rural area scenario of Furtwangen. By this, the result of the simulation reveals contrasting results in comparison to large cities. Ultimately, the concept of “vehicle-ondemand” is applicable in rural areas with the implemented service. Keywords: Vehicle-on-demand · Agent-based simulation · Shared autonomous vehicles · Rural area · Mobility services · SUMO
1
Introduction
Mobility is one of the critical factors of the success of high-developed nations. The conventional road traffic, however, limits the performance of present traffic through single-vehicle usage of one person. The high amount of used vehicles causes traffic jams at rush hour in big cities, spatial resources shrink in cities through streets as well as parking slots, and the average speed in cities sometimes is slower than pedestrian speed [6]. There are further challenges that limit the efficiency of daily transportation through conventional road traffic. The mobility concept vehicle-on-demand can overcome current issues by sharing the resources in terms of vehicle and street more efficient. Through vehicleon-demand, people do not own the vehicles, rather than use them as a taxi. Consequently, this results in a high level of individualism and flexibility. Such c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 256–269, 2021. https://doi.org/10.1007/978-3-030-55190-2_20
Agent-Based Simulation in Rural Areas
257
a mobility service provider like Uber or Ola usually operates mainly in cities because there is a higher population density, and therefore, the mobility service operates more efficiently regarding capacity utilization. Rural areas often are not considered, because the travel demand is lower and waiting time is longer. However, such mobility services in rural areas will increase life quality and might attract more people since the infrastructural gap to cities will be reduced. Therefore, the urban housing market can ease up through attractive offers in the surrounding areas. Furthermore, senior citizens still have the opportunity to enjoy the benefits of mobility, even though they permitted to drive a car. Although there are benefits to bring mobility services to rural areas, the current research focuses only on cities. Towns in the countryside should have the opportunity to conduct simulations to test and estimate the proper settings of operating mobility service in the given boundaries. Currently, most vehicle-on-demand simulations are costly and difficult to set up. Consequently, there is a need for communities to test the performance of the vehicle-on-demand mobility concept in order to improve the quality of life and make the rural area more attractive to people. In the future, shared autonomous vehicles can decrease the cost and will become more favorable than conventional taxi services [18]. Through these factors, vehicle-on-demand will become more attractive to citizens, mobility services providers, and communities in rural areas by improving service quality and profitability. In this work, the implementation of a module to execute vehicle-on-demand services with the SUMO simulator is delivered. In detail, the generic concept of vehicle-on-demand is presented as well as the implementation steps of the mobility services that enable vehicle-on-demand investigations. With the support of the recent development, the differences between multi-agent systems in rural and conventional traffic concepts are investigated and evaluated. At first, the related work in Sect. 2 presents the previous study and investigation on the domain of vehicle-on-demand. Then, the simulation framework in Sect. 3 offers the generic concept for vehicle-on-demand. Additionally, the development of the mobility service is explained as well as the scenario generation. Afterward, evaluations in the field of vehicle-on-demand services in rural areas are conducted and presented for the given scenario in Sect. 4. The implemented scenario of Furtwangen determines the ideal fleet size. Finally, the perfect fleet size is compared with conventional mobility. Subsequently, the content of this work is summarized and discussed in Sect. 5. Furthermore, the results are placed in the scientific context of vehicle-on-demand and suggest further steps in this research domain.
2
Related Work
In the domain of vehicle-on-demand, many studies were conducted for several cities. However, the execution of those scenarios varies a lot that concludes in different results. Vehicle-on-demand was tested for several scenarios, and is was proven that there are some advantages for ride-sharing regarding cost and traffic flow
258
M. Becherer and A. Karduck
improvement in New Jersey [24] as well as in New York [22]. The studies could present in general that many vehicles could be replaced by such as mobility service. However, the result varies a bit due to the different setup and experiment execution. In Lisbon, they found that 90% of the current fleet size is replaceable, with 10% of shared taxis that will fulfill the demand. In Zurich, they got the same result. However, they had really good data regarding the spatial and temporal resolution of the traffic [3]. They state that if 10% of the current fleet size changed to shared vehicles, they could provide the same level of mobility. The study that takes charging the vehicle into account as well found that 3.7 to 6.2 can substitute one vehicle [8]. Moreover, fleet management includes different aspects like location, the assignment from passenger to vehicles, vehicle routing for diverse objectives, and the rebalancing of empty vehicles [2]. While vehicles carry passengers, conducting rebalancing improves performance, like in Austin [7], whereas the study indicates a more efficient system. Another research area is the estimation to calculate a price for a shared autonomous vehicle fleet compared with other mobility concepts. Expectations about shared autonomous vehicles cost less than human-driven taxis and ridehailing services – however, more than human-driven personal vehicles and public transit services [18]. Furthermore, the study conducted by [19] estimated the cost for low-, medium, and high-prices in order to establish shared autonomous electric vehicles[19]. The cost ranges are between 29.2 ¢/mile and 88.7¢/mile. Although, there has been done much research in order to estimate the ideal fleet size, presenting several factors that affect the availability and simulation, calculate different price model for such services in cities, and further beneficial impact, however, there has not been much done for research in rural areas. There is a proposal to set up a car-sharing in rural areas [16], but the study is more focusing on the interaction between cities and rural areas instead of identifying the behavior of such a car-sharing service in rural areas. Additionally, car-sharing differs from vehicle-on-demand services and is not the same mobility concept. Furthermore, other studies are seeing the huge potential in autonomous vehicles and vehicle-on-demand in order to make rural areas more accessible and attractive to live.
3
Simulation Framework
Popular vehicle-on-demand simulation frameworks like MatSIM are limited in features in their free version. However, the major weakness is the exclusion of other transportation opportunities. Especially in rural areas, high flexibility is required to be competitive with rural mobility service providers. The simulation framework SUMO provides many features and support intermodal traffic, but no vehicle-on-demand support. For this reason, the feature set of SUMO has to extend its capabilities through a feature that enables vehicle-on-demand. The implementation of the concept requires available and partially generated input data, and further, the simulation has to adapt vehicle-on-demand services. At
Agent-Based Simulation in Rural Areas
259
first, the generic vehicle-on-demand concept is presented by covering the key features of a scenario. Afterward, the implementation of the vehicle-on-demand services is presented to understand the exposed weaknesses in the evaluation as well as the scenario implementation. 3.1
Generic Concept of Vehicle-On-Demand Simulation
There are many ways to simulate traffic, whereby different use cases lead to distinctive characteristics. Testing vehicle-on-demand services in specific areas focus on traffic behavior, and further, parameters such as vehicle availability and costs are analyzed. In order to determine the characteristics, several articles were examined [2–5,7,12,13,16,18–20,22,24]. Common and different characteristics were discovered in order to provide a realistic description of a scenario. For the classification, a meaningful subdivision is a distinction between data and logic as presented in Fig. 1. The data defines the input into the simulation. Additionally, the input data belongs to a scenario that appears like an application for a specific area. The logic processes the input data, and so input data requires different information.
Fig. 1. Vehicle-on-demand scenarios consist of data and logic
First, the information about the geographical map is needed to map the environment and road courses correctly—moreover, various traffic guidelines to clarify the permission of speed limits and vehicles on the road. Furthermore, the traffic flows are the routes of passengers within the map. There are different techniques for the routes and traffic flow generation within the scenario. Simplified traffic flows are generated randomly whereby start and destination are defined. Route finding turns traffic to frequently used roads in everyday life. However, this approach does not provide a reliable result, and the generated traffic flows do not reflect the actual everyday life. A more accurate traffic flow demand method uses statistical characteristics. Finally, there are the possibilities of partial tracking and full tracking. By recording vehicles at specific points, algorithms can generate traffic data using a smaller amount of reliable data. Full tracking captures the exact time and position of vehicles. Moreover, scenario
260
M. Becherer and A. Karduck
constraints are required, for instance, the fleet size is a factor that needs to be correctly balanced to economical as well further service aspects. This data can be processed differently depending on the used algorithms. Different routing algorithms with different characteristics are suitable for route creation. Depending on this, the shortest or fastest routes is to calculate. Known algorithms for this are Djikstra algorithm [11], A* algorithm [15] or Bellman-Ford algorithm [14]. In order to perform well during a day, redistribution of vehicles has to be taken into account. Another aspect of efficient mobility management is the assignment of persons to vehicles, and therefore, considering different parameters is tightly coupled to the rebalancing [9]. This generic concept of vehicle-on-demand scenarios covers the key considerations for vehicle-on-demand services, and therefore, is transferable to other simulation environments. 3.2
Vehicle-On-Demand Realization
The traffic simulator SUMO was originally developed for other purposes, such as the investigation of different algorithms, traffic light controls, traffic flows, traffic behavior during infrastructure changes, and other application scenarios. However, some small things make it challenging to move from conventional road traffic to vehicle-on-demand services. For this reason, a mobility service is implemented to enable vehicle-on-demand services for SUMO. In preparatory works, three main challenges have been identified: – When the journey is complete, the vehicles are removed from the map – Routes are paired with vehicles – Routes are not always correct. The first challenge appears when the trip completes, and the vehicles got removed from the map. This happens because of the different states of vehicles. The Algorithm 1 present the pseudo implementation of adapted vehicle state machine. The vehicle initially starts in driving customer state since it starts with the first route of a customer. As soon as the vehicle stops, the system switches to the arrival driving state to pick up the next customer on the list. Hereby, the taxi toggles between the arrival driving and customer driving state as long as there are customer requests. After the taxis served all customer requests, the taxi reaches its final state arrived. This state awareness allows the vehicles to take several routes without arriving at their destination after completing the first route. The disadvantage of this approach is that manually logging to get the correct result. The second challenge deals with the issue that routes are paired with vehicles. The vehicle contains the route with further parameters. The problem becomes apparent as soon as repetitive vehicle-on-demand simulations with a different number of taxis are performed. The routes are static before the simulation starts. In this application, however, dynamic behavior is required. On the one hand, the number of vehicles varies, and on the other hand, unpredictable routes have to
Agent-Based Simulation in Rural Areas
261
Algorithm 1: Pseudo implementation of vehicle state machine foreach vehicleID in TaxiList do taxis[vehicleID].setState(DRIVING CUSTOMER); taxis[ehicleID].assignRoute(currentPosition, customerPosition); end while isTimeFinised() do simulate next step(); foreach vehicleID in TaxiList.Stopped() do if taxis[vehicleID].status = ARRIVAL DRIVING then taxis[vehicleID].assignRoute(customerPosition, customerDestination); taxis[vehicleID].setState(CUSTOMER DRIVING); else if taxis[vehicleID].status = CUSTOMER DRIVING AND currentT rips < T rips.N umber then taxis[vehicleID].assignRoute(currentPosition, customerPosition ); taxis[vehicleID].setState(ARRIVAL DRIVING); currentTrips += 1; else taxis[vehicleID].setState(ARRIVED); end end
be combined. For example, customers have a starting point and a destination. Between these points, the route is calculated in advance. However, after reaching a destination, the vehicles must pick up the next customer at the starting point. For this reason, taxis can transport different customers to different locations. Therefore, taxis have different customers and different routes to reach them. Thus a trip file is required as input, only the departure time, the start and the destination are listed caused disadvantages: On the one hand, the route is calculated at the simulation runtime, and on the other hand, cars creation depends if they have a route. The restrictions that the route is calculated at simulation runtime could not be removed. At the beginning of the simulation, a route must be assigned for creating the car. Finally, a stop must be assigned to the car to avoid removing the car from the map. The destination takes the specified exit duration into account. Thirdly, often the route is not correct even though the validation of route is checked. There are several reasons for this. The last edge can be a problem if it is not long enough, or the vehicle does not have permission to enter this edge. Therefore, several factors, such as edge length, permission have to be taken into account. Nevertheless, even with a route and edge length check, errors can still occur. For example, the input data from the map is not always correct. Streets and intersections may not be connected to the network. Therefore, the map is validated in advance and in case edited manually to avoid any issues. There may be other errors in the map, such as incorrect guidelines for a road that concludes a single lane road is usable in both directions. In practice, two cars can pass
262
M. Becherer and A. Karduck
each other on a single-lane road depending on the situation. In the theoretical simulation, two cars block the road. As a solution, two lines with the opposite direction can be created in order to deal with the affected roads. The errors in such maps can be manifold. There may be other errors in cutting out parts of the maps from the data set. Besides, some features like state logging of vehicles had to be implemented as well to evaluate the mobility service. 3.3
Scenario Implementation
The input data are of great importance for the simulation because incorrect data can distort the result of the simulation. The generated scenario applies to the Furtwangen area, which models the roads and the traffic demand. Since data is not always available exactly, approximating data is applied. Consequentially, the following data is required for vehicle-on-demand scenarios: – – – –
Maps Traffic demand (Trips) Simulation configuration Specification
Various free data of maps are available like Open Street Map (OSM) [1]. In contrast to the maps as input for the program, the traffic demand requires more information. For the scenario, statistical characteristics are taken into account to generate traffic demand [21]. Traffic parameters in cooperation with different variables and distribution variance generate traffic demand more precisely. Information on the number of inhabitants, households, and further required data are collected from different sources [10,17,23]. In Fig. 2 is the departure time of the generated trips during the day. The graph shows the total number of cars taken in one hour.
Fig. 2. Chart of trips distribution
These departure times indicate morning traffic and evening traffic. These peak times are particularly important for testing vehicle-on-demand service.
Agent-Based Simulation in Rural Areas
263
This service should not work besides peak times, but also during peak traffic times, whereby considering various infrastructures such as schools. Furthermore, the map, the generated traffic demand, the simulation configuration, and a configuration for the vehicle-on-demand scenario are required. The simulation configuration contains information about the input into the simulation, like the maps. Later, further options about the runtime behavior of the simulation are editable. The configuration simplifies the operation of the simulator for inexperienced users.
4
Evaluation
In this section, the Furtwangen scenario undergoes two investigations on vehicleon-demand services to evaluate the mobility service. The study aims to identify the ideal fleet size for the rural area. This result will be used for the second evaluation, in which the differences between the current conventional road traffic and the vehicle-on-demand service for the Furtwangen area will be determined. The generated scenario is used as the basis for the evaluation. The same map is used and the generated traffic demand, as implemented in Subsect. 3.3. 4.1
Determination of the Ideal Fleet Size for Vehicle-On-Demand Services in Furtwangen
Objective. Rural areas can vary enormously, which has an impact on fleet size. For the Furtwangen scenario, the optimal fleet size for vehicle-on-demand services is to be determined. Thus, a first ratio between the number of inhabitants and the fleet size for the rural area can be determined. If further investigations for rural scenarios are carried out, this result can serve as a reference for the research of vehicle-on-demand services in rural areas. Besides, the determined fleet size will be considered for the third investigation. Experimental Setup and Procedure. The experimental setup uses the generated trips. However, only the trips between 0–14 clock are simulated. In this period, 2.093 trips are carried out, and after the deduction of the last 15 min, 2.089 trips take place. The reason for the limited duration is that the execution time of the simulation is 40 min for a larger number of vehicles. This period includes the morning rush hour, during which approximately 450 trips per hour are executed. Therefore this should be a good setup as well. The first test series takes place with a fleet size of ten vehicles. In each further run, ten more vehicles are added. The simulation will initially be carried out up to a fleet size of 200 vehicles. The entry and exit times for the car are 30 s each. The values arrival time, customer waiting time, journey time, and total time are recorded for the generated trips. For these times, the average and the median of the measurement series are determined. Additionally, the number of completed trips and the number of traffic jams is recorded.
264
M. Becherer and A. Karduck
Results. First of all, it is noticeable that the median and the average of the journey time vary strongly. A proportionally increasing behavior is recognizable with the median. In detail, for the average journey time, there is approximately a proportional increase, but the increase is very high. The delay of the taxis after customer request is usually very high and usually takes between 90 and 120 min. Journey times also increase continuously. The number of trips does not increase proportionally to the number of traffic jams. Evaluation. The measured times are often inconstant. Situations occurring in the simulation, such as a stopping car, may block the road for other vehicles. There are also other situations where the simulation simulates too correctly compared to reality. In detail, the route doesn’t change when it is blocked. Therefore, there are times in the measurement series that are unusually large and pull the average upwards. Like in the previous investigation, the median compensates inaccuracy in time measurement. The medians of the journey times, the arrival time, and the sum of both times are listed in Fig. 3 concluding the arrival times and journey times increase slowly. Especially with a fleet size between 10 and 100, the times remain approximately constant. The complete time from the arrival of the taxi to the destination takes an average of ten minutes. The waiting times of the customers were not taken into account for this diagram. The waiting times at the customer’s premises occur in the morning at peak times. Here the demand for taxis cannot be satisfactorily met. The unfulfilled requests remain in the list and are removed only after processing. Until the traffic in the morning, it takes some time and thus accumulates ever-larger waiting times. Looking at Fig. 4, a fluctuating number of fulfilled trips can be seen.
Fig. 3. Median of arrival and journey time with different fleet sizes
Agent-Based Simulation in Rural Areas
265
Fig. 4. Served journeys for customers with jam development
The system can perform the most journeys with a fleet size between 30 and 90 vehicles. However, there are always breakdowns in the curve. This non-linear course can be an indication of the missing logic. The trips process one after the other due to the implementation. Thus different routes for the vehicle result if the fleet is increased or reduced. By intelligently assigning people to vehicles, efficiency increases, and more accurate trends are plotted. The diagram also shows the number of traffic jams. The curve leads to the conclusion that the increasing number of traffic jams limits the performance of the system. The reason for the increased congestion is the growing fleet size. 4.2
Comparison Between Vehicle-On-Demand Services and Conventional Road Traffic
Objective. The objective of this investigation is to compare a vehicle-ondemand service with conventional road traffic according to specific time criteria. Besides, the conventional road traffic and the vehicle-on-demand service are to test in two different scenarios, in which the journeys carried out in different time distribution. One time a trip profile is used, which simulates the daily traffic, and the other time the trips will take place in a periodic cycle. Experimental Setup and Procedure. In the investigation, the first experiment simulates the generated trips for an entire day from 0–24 o’clock. During this period, 4.160 trips are planned. The implemented mobility service serves for the vehicle-on-demand service. First routes are generated from the planned trips for conventional road traffic. The Dijkstra algorithm, which searches for the fastest route, is also used for this purpose. In a second experiment, a traffic volume for Furtwangen with periodic trips is generated for one day as well. There are 180 journeys per hour, which means
266
M. Becherer and A. Karduck
that every 20 s, a planned journey takes place. In both tests, the trips of the customers are using the same setup. During runtime, the mobility service records the journey time, waiting time, and arrival time. Additionally, the mobility service records each time step of the simulation. The vehicle-on-demand service specification has an entry and exit time of 30 s. Results. In the first experiment, the simulation data confirmed some hypotheses. It was proven that the average arrival time for the vehicle-on-demand service is shorter than for conventional road traffic. Of course, the waiting times for a vehicle-on-demand service are more extended, since conventional traffic does not involve waiting times for one’s vehicle. The average waiting time is more than one hour. The journey time and waiting time for shared vehicles are between 4.5 and 5 min. As soon as the request of a customer is served, the journey time and the actual trip takes slightly more than 9 min. The exact times of the simulation can be seen in Table 1. Table 1. Simulation result with two different test setup Mobility concept
Vehicle-on-demand service Trip requests Periodic trip during a day request
Conventional road traffic Trip request Periodic trip during a day request
Completed trips
4166
4301
4166
4319
Route length [m]
187658
244516
2628
2618
Arrival time [sec]
270
284
0
0
Journey time [sec] 293
306
328
329
Delay time [sec]
3297
169
0
0
Total time [sec]
563
590
328
329
The periodic trips led to more journeys for vehicle-on-demand and conventional road transport. Almost all journeys could be carried out in conventional road transport. Despite an average shorter distance, the journey time increased by one second. With the vehicle-on-demand service, the number of trips made varied enormously. Due to the periodic trips, more than twice as many trips could be carried out as during a regular daily routine. Arrival and journey times have increased slightly. However, the average waiting time decreased from 55 min to 3 min. The total time from arrival to the customer’s destination takes about 10 min, which is an increase of 30 s. Evaluation. This test shows a better performance of conventional road traffic for one day. The morning and afternoon rush hour can be better managed by conventional road traffic than by vehicle-on-demand service. Taxis can no longer catch up with customers after peak hours and subsequently leads to an average
Agent-Based Simulation in Rural Areas
267
very long delay. As a result of the periodic journeys, the traffic volume remains constant, and the vehicle-on-demand service can serve more customer requests. For conventional road traffic, there are no significant variations. As the number of journeys increases, the travel time increases minimally. The average journey time between conventional road traffic and vehicle-on-demand service is lower by the vehicle-on-demand service, which is caused by the reduced traffic volume.
5
Conclusion
The mobility concept vehicle-on-demand has great potential in urban areas as well as in rural areas. At first, the generic concept of vehicle-on-demand scenarios is presented which infers the development of a mobility service that processes the data and transfers commands to the simulation framework SUMO. For the mobility service, three challenges had to be solved, and features are developed in order to be able to simulate vehicle-on-demand realistically and also to precisely record data for evaluation. The scenario and the mobility service have some limitations that affect the quality of the investigations, such as reduced map size, estimated traffic demand, as well as lacking algorithms to perform more efficiently. Regarding the limitation, two evaluations are carried out under the previously mentioned conditions for the scenario and the mobility service. In the first evaluation, the ideal fleet size for the Furtwangen scenario is between 30 and 90 vehicles. However, the results vary greatly in this interval as a result of lacking intelligence in the allocation of persons. It was also found that the current vehicle-on-demand service could not satisfy the morning rush hours. Compare the vehicle-on-demand service with conventional road traffic, the vehicle-ondemand service could not efficiently handle the trips over the day. Therefore, the vehicle-on-demand concept is not suitable for rush hours in rural areas if the corresponding intelligence is not available. However, it is also true that without intelligent behavior, even a small fleet of 37 vehicles can reach the customer at an average of three minutes. The study thus provides some initial clues, but the mobility service cannot be conclusively evaluated. In order to improve the results of future investigations, optimization of the functionality of the mobility service and the input data is necessary for the scenario. Therefore, the influence of intelligent algorithms for rural areas is essential for testing. Additionally, different vehicle-on-demand services and car-sharing approaches need to be simulated tested to find a balanced solution that considers cost and availability. It will be challenging to implement a uniform mobility concept across the board. The many differences between regions that occur geographically or because of social status increases the need for regionally adapted mobility concepts in which vehicle-on-demand will continue to play a significant role in the future.
268
M. Becherer and A. Karduck
References 1. Index@ Www.Openstreetmap.Org 2. Azevedo, C.L., Katarzyna, M., Raveau, S., Soh, H., Adnan, M., Basak, K., Loganathan, H., Deshmunkh, N., Lee, D.-H., Frazzoli, E., Ben-Akiva, M.: Microsimulation of demand and supply of autonomous mobility on demand. Transp. Res. Rec.: J. Trans. Res. Board 2564(1), 21–30 (2016) 3. Boesch, P.M., Ciari, F., Axhausen, K.W.: Autonomous vehicle fleet sizes required to serve different levels of demand. Transp. Res. Rec.: J. Transp. Res. Board 2542(1), 111–119 (2016) 4. Brownell, C., Kornhauser, A.: A driverless alternative: fleet size and cost requirements for a statewide autonomous taxi network in New Jersey. Technical report 5. Burns, L.D., Jordan, W.C., Scarborough, B.A.: Transforming personal mobility. Earth Inst. 431, 432 (2013) 6. Canzler, W.: Mobilit¨ atskonzepte der Zukunft und Elektro-Mobilit¨ at. In: Elektromobilit¨ at, pp. 39–61. Springer, Heidelberg (2010) 7. Chen, R., Levin, M.W.: Dynamic user equilibrium of mobility-on-demand system with linear programming rebalancing strategy. Transp. Res. Rec. 2673(1), 447–459 (2019) 8. Chen, T.D., Kockelman, K.M., Hanna, J.P.: Operations of a shared, autonomous, electric vehicle fleet. Implications of vehicle & charging infrastructure decisions. Transp. Res. Part A: Policy Pract. 94, 243–254 (2016) 9. Clewlow, R.R., Mishra, G.S.: Disruptive transportation: the adoption, utilization, and impacts of ride-hailing in the United States. Institute of Transportation Studies, University of California, Davis (2017) 10. Statista Research Department: Arbeitslosenquote in Deutschland im Jahresdurchschnitt von 2004 bis 2019. Technical report, Statista (2019) 11. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische mathematik, 1(1), 269–271 (1959) 12. Fagnant, D.J., Kockelman, K.M.: The travel and environmental implications of shared autonomous vehicles, using agent-based model scenarios. Transp. Res. Part C: Emerg. Technol. 40, 1–13 (2014) 13. Fagnant, D.J., Kockelman, K.M.: Dynamic ride-sharing and fleet sizing for a system of shared autonomous vehicles in Austin, Texas. Transportation 45(1), 143–158 (2018) 14. Ford Jr, L.R.: Network flow theory. Technical report, RAND Corporation Santa Monica, CA (1956) 15. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968) 16. Illgen, S., H¨ ock, M.: Establishing car sharing services in rural areas: a simulationbased fleet operations analysis (2018) 17. Infas. Mobilit¨ at in Deutschland - Ergebnisbericht. Technical report (2018) 18. Litman, T.: Autonomous Vehicle Implementation Predictions. Victoria Transport Policy Institute Victoria, Canada (2019) 19. Loeb, B., Kockelman, K.M.: Fleet performance and cost evaluation of a shared autonomous electric vehicle (SAEV) fleet: a case study for Austin, Texas. Transp. Res. Part A: Policy Pract. 121, 374–385 (2019) 20. Martinez, L., Crist, P.: Urban mobility system upgrade–how shared self-driving cars could change city traffic, International Transport Forum, Paris (2015) 21. Behrisch, M.: ACTIVITYGEN (2011)
Agent-Based Simulation in Rural Areas
269
22. Santi, P., Resta, G., Szell, M., Sobolevsky, S., Strogatz, S.H., Ratti, C.: Quantifying the benefits of vehicle pooling with shareability networks. Proc. Nat. Acad. Sci. 111(37), 13290–13294 (2014) 23. Statistisches Bundesamt: Geb¨ aude und Wohnungen sowie Wohnverh¨ altnisse der Haushalte. Zensus 2011 (2011) 24. Zachariah, J., Gao, J., Kornhauser, A., Mufti, T.: Uncongested mobility for all: a proposal for an area wide autonomous taxi system in New Jersey. Technical report (2014)
Overcrowding Detection Based on Crowd-Gathering Pattern Model Liu Bai, Chen Wu(B) , and Yiming Wang School of Rail Transportation, Soochow University, Suzhou 215139, People’s Republic of China [email protected], {cwu,ymwang}@suda.edu.cn
Abstract. Frequent public incidents in crowd gathering areas are causing social concerns. This paper first discusses different cases of crowd gathering based on Edward Hall’s personal space theory and construct a novel crowd gathering pattern model. Based on the model, our modified multi-column convolutional neural network is proposed for extracting the overcrowding. For evaluating its effectiveness, a heterogeneous multi-granularity real-time dynamic surveillance video containing different perspectives is integrated, and a new crowd gathering safety situation assessment method is applied. We finally report our real-world application in Suzhou landmark - Urban Fountain Square for crowd gathering safety situation assessment and show that the method can definitely improve the safety of crowd gathering areas. Keywords: Computer vision · Overcrowding · Video surveillance Accident analysis and safety · Convolutional neural network
1
·
Introduction
In recent years, public incidents in crowd gathering areas have caused frequent social concerns. According to incomplete statistics, since the year of 2000, the number of disaster events caused by highly concentrated populations worldwide has reached more than 150. For example, on November 22, 2010 in Cambodia, the bridge was shaken because of too many people gathering on the bridge, causing panic, crowding and stamping. As of the 23rd, the death toll has climbed to 375, and the number of injured has reached 755. Such type of public incident has common characteristics: high crowd density, long gathering time, and rapid change. This also makes the prevention and research of crowded stamping accidents an urgent need in the field of safety management of smart cities. In fact, in the field of group disaster dynamics, many scholars have done a lot of research on the phenomenon of crowd gathering, especially in the aspect of individual modeling. Helbing et al. repeatedly analyzed the surveillance video of the Mecca pilgrimage stampeding event, and found that there are two phenomena of relatively static and sudden flow of crowd movement in [1]. Based on it, Johansson et al. explained the critical conditions for the transition between these c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 270–284, 2021. https://doi.org/10.1007/978-3-030-55190-2_21
Overcrowding Detection Based on Crowd-Gathering Pattern Model
271
two phenomena, and proposed crowd safety measures based on video analysis in [2]. Moussa¨ıd et al. started from behavioral heuristic cognitive methods and combined the behavioral trajectories of individuals with the moving patterns of groups to study the characteristics of crowds in [3]. We found that detecting crowd density changes is inseparable from video surveillance systems, and the full range of perspectives and dynamic coverage capabilities provided by interconnected video capture devices provide more data support for crowd density estimation. Therefore, more and more scholars in computer vision and other fields are starting from video data to find detection methods that can achieve higher precision of crowd density or number of people. Pu et al. proposed a new crowd density estimation method based on deep convolutional neural network (ConvNet) in [4], which accurately estimated the crowd density across scenes. Fradi et al. used the observations of the local density as a probability density function to generate an automatic crowd density map in [5], which eliminated the influence of features unrelated to the base population density and improved the detection accuracy. Grant et al. investigated the methods used to count individuals and approximate population density, and demonstrated the results of research on behavior understanding related to the crowd in [6]. However, in the current research, the heterogeneity of the data collection process and the multi-granularity of the data itself have led to the separation between crowd situation analysis and personal motion perception, which has led to the inability to determine the type of collection of crowd patterns and global indicators of overcrowding evaluation, thus failing to establish a safety early warning system. In view of the above problems, in the context of the key technology of “heterogeneous collaborative intelligent perception and data fusion” intelligent transportation, this paper comprehensively considers the global crowd status and individual movement patterns of crowd gathering places. A heterogeneous multi-granularity real-time dynamic surveillance video from different angles is integrated, a holographic model of the spatiotemporal evolution of the crowd situation is established, and a new crowd safety assessment method is proposed. The structure of this paper is as follows: Sect. 2 introduces the related work of individual movement model and crowd density detection in recent years; Sect. 3 defines the crowd gathering status based on personal space theory, summarizes the crowd gathering patterns, and uses multiple convolutional neural network (MCNN) to calculate the density of the crowd. Section 4 performs personal space calculation and density fitting on video frame images to analyze the density change trend, and discusses the accuracy of the density estimation map output by the convolutional network. Based on which, safety evaluation indicators are obtained; In Sect. 5, we discuss the results.
272
2 2.1
L. Bai et al.
Related Work Research About Pedestrian Movement Models in Crowds
The research focus of group disaster events is a pedestrian movement model that contains information about the position and state of each natural person. The main pedestrian movement models include cellular automata model, social force model, and agent-based model. Related researches based on cellular automata model, Feliciani et al. reproduced the crowd density at the maximum density by simulating a cellular automaton floor field model in [7]. Ji et al. used a new triangular mesh cellular automata model to evacuate people at high density in [8]. This type of individual model is particularly suitable for the analysis of the behavior and trajectory of crowded people in large scenes. However, when it is necessary to predict the trend of the number of people and the density change, the preset rules cannot meet this requirement. Related researches based on social force model, Helbing et al. firstly proposed the concept of a social force model in [9]. Yang et al. modified the dynamics of the social force model and compared the relationship between the relevant indicators such as density and speed, proving that the social force model can accurately reflect the movement characteristics of pedestrians in [10]. Individual models of this type cannot make accurate judgments when the crowd is highly dense and overlaps with each other. Related researches based on agent model, Tak et al. proposed a pedestrian cell transport model in order to make the agent model more flexible for different situations, and the effect was significant in [11]. Ben et al. combined cellular automata and agent models for environmental modeling, and simulated four different evacuation scenarios, which effectively guided crowd evacuation in [12]. Was et al. also merged these two models, enabling the decision-making process to adapt to more complex environments in [13]. The essence of the pedestrian movement model is to study the spatial and temporal evolution trend of the crowd. The input of the model comes from some simple rules and hypotheses. In practice, these inputs often rely on the help of human experience. In fact, with the continuous development of sensor technology, real-time crowd gathering information can effectively replace artificial experience and become the input of a group disaster model. 2.2
Research of Overcrowding Density Estimation
At present, computer vision field used for crowd density estimation and counting methods for video surveillance mainly includes two technical routes: model labeling based on individual extraction segmentation and feature extraction based on overall texture analysis. The main idea of model labeling is to directly label and count the human models in the image. Luo et al. summed up the pixel values in the density map to obtain the total number of people through integration in [14]. Zhao et al. used
Overcrowding Detection Based on Crowd-Gathering Pattern Model
273
an ellipsoid model to perform a context scan segmentation on a human body model, and summed up to obtain the crowd density in [15]. Ge et al. based on Bayesian method, combined random process and conditional labeling process to calculate the number of individuals in [16]. Rao et al. used optical flow for motion estimation, contour analysis for crowd contour detection, and obtained crowd density through clustering in [17]. The most obvious disadvantage of this type of method is that the detection accuracy is not high, too many human features are retained when marking individuals, and noise is also inevitably introduced. Moreover, when the crowd is highly dense, it is impossible to accurately locate and more difficult to meet the requirements. The central idea of the feature extraction method is to calculate the crowd density after normalizing by extracting human characteristics or transformable parameters. Nagao et al. extracted the rotational angular velocity of the human body as a transformation parameter and estimated the crowd density using continuous wavelet transform in [18]. Kok et al. estimated the density by distinguishing the contours between the crowd and the background in [19]. Meynberg et al. distinguished between different crowd densities by using two textures, bowshaped and additive Gabor filters, on aerial image plaque datasets in [20]. Zhang et al. proposed a multi-column convolutional neural network (MCNN) structure, which maps the image to its crowd density map and estimates the crowd density in [21]. By using filters with different size of receiving fields, the features of each column of CNN can adapt to the changes of head size caused by perspective effect or image resolution and the effect is remarkable. Although with the rapid development of depth network structure, the method of extracting the whole texture features and then calculating the crowd density has significantly improved the accuracy of crowd density estimation, but due to the lack of details extraction of natural people, it is difficult to remove the noise irrelevant to people when facing the challenge of complex background texture, so as to accurately grasp the effective crowd density of the site area. In fact, with the widespread popularity of surveillance cameras, dynamic surveillance videos from different perspectives at high and low altitudes can not only meet the requirements of model marking on human details, but also the needs of feature extraction on group textures. Therefore, the research center is gradually shifting to how to fuse the above two technical methods.
3 3.1
System Modeling and Methodology Definition of Crowding State
Hall’s personal space theory relies on the intimate explanation of social and public relations, defines the distance of personal space, and is generally used in individual movement research. This theory defines four personal space distances, such as intimate distance, personal distance, social distance and public distance, in [22]. Our crowd gathering patterns follows Hall’s circular personal space hypothesis and simulates step changes to improve the predictive ability of pedestrian models. In order to directly quantify the relationship between crowd
274
L. Bai et al. Table 1. Center-wrapping type and Unilateral arrangement type. Category
Pattern Figure
Center wrapping
Personal Space R, R= ∞,
Inner layer Outer layer
where R is a constant and ∞ means infinite
R =A∗x+B Unilateral arrangement
where A is an undetermined coefficient and B is a constant.
density and distance, we define a natural person as a solid circle with a radius r and personal space as a hollow circle with a radius R. In the crowd gathering place, the fitting relationship between the crowd density and the distance helps to better analyze the crowd density distribution law of the large crowd gathering place.
Definition I (Crowding State): Assume (1) The crowd exists in a limited gathering place, and is in a static status, which is represented by T ; (2) There are n attraction points Oj (j = 1, 2, ..., n) in a limited place. If there is a crowd distribution with each attraction as its core U (Oj ), then the crowding state with the attraction as the core in this limited place is: T = n j=1 U (Oj ) (1) ∂T = 0 ∂t
3.2
Crowd Gathering Patterns
According to the above definition of crowding state, we divide the crowd gathering into two patterns: center-wrapping type and unilateral arrangement type.
Overcrowding Detection Based on Crowd-Gathering Pattern Model
275
In pattern figures of Table 1, the black area represents the attraction point of the venue, the gray area represents the area where crowd activity is prohibited, the blue solid circle is an individual model, black solid line hollow circle is a personal space of limited size, and the black dotted hollow circles represent unlimited personal space. The reason for crowding state of the center-wrapping crowd is that the visible range of the attraction point is small, the crowd gathers near the attraction point. The crowd gathering on the inner layer tightly surrounds the attraction point, whose personal space is limited. While the outer crowd cannot access the attraction, they are active in the outer area and their personal space is not limited. The characteristic of this gathering type is that the crowd is densely distributed near the attraction point, and then quickly dispersed to the distance. This principle determines the natural person individual farther away from the attraction point, and the circular personal space of the individual tends to infinity. The change in the personal space of the center-wrapping crowd can be expressed in personal space of Table 1 where R represents personal space, inner personal space size is represented by constant R, and outer personal space is unlimited, represented by ∞. The reason for the aggregation of people in unilateral arrangement type is that the attraction point has a wide visual range and a large horizontal distance, which can form a wide attraction edge. So that people can arrange and gather in parallel along the attraction edge, and slowly reduce the distribution density step by step according to the principle of “closer distance, larger density”. The characteristic of this gathering type is that the distance between the circular personal space of each individual in the limited space and the individual with the attraction points is closely related, and the group gathering density is also closely related to this distance. The change of the individual space of the unilaterallyarranged crowd can be expressed in personal space of Table 1 where A, B are unknown parameters. 3.3
Influence of Topographical Factors
Based on the above basic patterns, we further consider the crowd aggregation pattern affected by topographical factors. It generally shows a trend of nonuniform distribution. Let us take the example of a unilateral arrangement type. In this type, the visual range of attraction points is large, and the crowd gathering is also distributed according to the principle of “closer distance, larger density”. As a result of the increase of steps, the attraction of the attraction points changes, thus changing the size of personal space. As shown in Table 2, the personal space of the third layer is smaller than that of the second layer. Considering the actual situation, due to the existence of steps, the vision of people on the steps is better, and their personal space is correspondingly smaller. At this time, when calculating the personal space change trend, the angle caused by the steps’ height needs to be considered, as shown in Table 2. Rmin is the radius of the nearest individual activity range from the attraction point, Rmax is the radius of the farthest individual activity range from the attraction
276
L. Bai et al. Table 2. Center-wrapping type and Unilateral arrangement type. Center-wrapping
Unilateral arrangement
point, θ is the angle between the center line of two individual activity ranges and the horizontal plane, d is the size of the area where people are not allowed to stand, L is the linear distance between the center of the farthest individual activity range and the attraction point, x is the variable between 0 and L along L. The yellow rectangle in the figure is a step, and the solid green line is the radius of the personal space without increasing steps. α is the angle between the line center of the step individual model and the initial center of the individual circle and the horizontal plane, h is the step height, Rc is the radius of the personal space after increasing the steps. So A = tan θ, B = Rmin and the change trend of personal space R is as follows: tan θ ∗ x + Rmin − h ≤ Rc ≤ tan θ ∗ x + Rmin
(2)
θ ≤ α = arctan[(Rc − Rmin )/x] tan θ ∗ x + Rmin , No steps R= Having steps Rc ,
(3) (4)
Overcrowding Detection Based on Crowd-Gathering Pattern Model
277
Fig. 1. MCNN network architecture diagram: Select one of the columns for explanation. The first column is passed through a 9 ∗ 9 filter to capture local head features of the image, then the maximum pooling layer of 2 ∗ 2. The activation function is a linear rectification function. Finally, the features are weighted and stacked by a 1 ∗ 1 filter, so that the output results are averaged and used for density classification processing.
Formula (2) and Formula (3) specify the range of Rc . In addition, referring to the related design standards, the step height (h) of indoor and outdoor steps of public buildings should not be greater than 0.15 m, and not be less than 0.1 m. 3.4
Crowd Density Estimation Based on Crowd Gathering Patterns
In fact, if the least squares method is used to fit discrete points of different sizes of activity space to different densities, we can estimate the parameters in the above formula, and then estimate the population density at a point of attraction. If the crowd density of multiple attraction points in the venue is calculated, the crowd gathering state is finally obtained. In practical applications, the crowd density in the acquisition area mainly depends on the video acquisition network based on the internet of things. At present, according to the installation position of the equipment, it is divided into two types: a high-altitude global perspective and a low-altitude local perspective. Considering that the low-altitude camera equipment is more inclined to capture the characteristics of people and more delicately perceive the local crowd density, the crowd density estimation in this paper uses the low-altitude video data. We use a multi-column convolutional neural network model to extract head features of different sizes. The input of the model is an original image of unlimited size. The four-column parallel network containing convolution kernels of different sizes is used to extract head features of different sizes in parallel,
278
L. Bai et al.
column by column. Finally, in the third convolution layer, a crowd density map is constructed by linearly weighting the obtained four columns of features. The density normalization process mainly depends on the Gaussian kernel function: F (x) =
M
δi ∗ G(x, σi ),
(5)
i=1
where δi represents the impulse function of each head, M is the number of heads in the image and σi denotes the maximum head distance of the adaptation within a certain range (note that the maximum is to make the crowds more dense), σi = α max(dij ) where α is the weight value of the adaptive range. In our experiment, it shows that when α = 0.5 the crowds intensity is the most consistent with the actual situation. We use mean square error and absolute error to evaluate the error: N 1 |zi − zˆi |, (6) M SE = N i=1 M AE =
N 1 |zi − zˆi | N i=1
(7)
where N is the number of test images, zi is the actual number of people in the ith image and zˆi is the estimated number of people in the i-th image. Fig. 1 shows the structure of our MCNN.
4 4.1
Application and Results Analysis Density Distribution Patterns Fitting
We chose the attracting incidents location as an attraction point O, and used it as the origin of the coordinates to establish a spatial coordinate system. The radius r of the individual model is determined by the shoulder width when a person standing. According to the results of the average shoulder width of adult men and women in [23] and [24], a solid circle with a radius r = 22 cm is used as the individual model. The crowd gathering pattern defines the changing trend of the personal space size, so that the crowd density in the area changes with the distance from the attraction point. The effect of fitting is shown in Fig. 2. In Fig. 2, the origin O indicates the attraction point, the horizontal axis is the distance L from the attraction point, the vertical axis is the corresponding density ρ and d is the size of the prohibited person standing area near the attraction point, ρ0 is the density of the crowd closest to the attraction point. Figure 2(a) shows the crowd density distribution trend of the central-wrapping type. The solid line indicates the crowd density near the attraction point, and the dashed line indicates that the personal space is not restricted after a certain
Overcrowding Detection Based on Crowd-Gathering Pattern Model
279
Fig. 2. Density distribution fitting of crowd gathering pattern.
Fig. 3. The convergence process of training error of the proposed model.
distance from the attraction point, and the crowd density distribution is disordered. Figure 2(b) shows the crowd density distribution trend of the unilateral arrangement type. The density closer to the center is larger, and the density is smaller from the center. Figure 2(c) shows the crowd density distribution trend of the special gathering pattern. The area between L1 and L2 represent the density of steps, which is variable: ρSteps ∈ [
4.2
1 1 , ] π(tan θ ∗ x + Rmin − h)2 π(tan θ ∗ x + Rmin )2
(8)
Real-World Application: Density Estimation Analysis in Suzhou Fountain Square
Fountain Square, located next to Jinji Lake in Suzhou, Jiangsu Province, China, covers an area of 4, 300 square meters, and periodically develops large-scale musical fountains, with a maximum flow of 35, 000 people. Our crowd gathering model has been practically applied in this square. The data comes from the camera equipment videos covering the interior of the square and the exits, and the field distribution of the square has been investigated. We selected 4176 video frames
280
L. Bai et al.
Fig. 4. Comparison of actual and estimated numbers.
as training set samples. After iteration, the downward trend of absolute error (MAE) and mean squared error (MSE) is shown in Fig. 3, showing a good convergence trend. Among them, the horizontal axis is the number of iterations, and the vertical axis is the error value of MSE as red line and MAE as blue line. We input the video frame image obtained in real time into the trained optimal model, and compared the actual truth value of the picture with the model estimated value, as shown in Fig. 4. The horizontal axis is the video frame index number, the vertical axis is the number of people, the blue line is the actual truth value of the video frame picture crowd, and the red line is the total number of people obtained by integrating and summing the pixel values of the density map output by the model. We performed a residual analysis on the number of people estimated. As shown in Fig. 5a, this is a histogram of the residuals between the actual and estimated number of people. The horizontal axis is the residual value interval, and the vertical axis is the number of residuals in the interval range, which conforms to the normal distribution in form. Fig. 5b shows the distribution of standard residuals. About 95% of the standardized residuals are between the value of −2 and 2, which also approximately follow the normal distribution. In order to verify this, we calculated the mean, standard deviation, standard error, and skewness as shown in Table 3, which basically also conforms to the characteristics of the normal distribution, so our estimation results are close to the actual results. Table 3. Statistical indicators of residuals and standard residuals Method
Mean StandardDeviation StandardError Skewness
Residual
0
StandardizedResidual 0
30.91
1.36
0.95
1
0.04
0.95
Overcrowding Detection Based on Crowd-Gathering Pattern Model
281
(a) Residual histogram.
(b) Standard residual distribution.
Fig. 5. Residual histogram and standard residual distribution.
The output density map is represented by a color gradient. As shown in Fig. 6, the first column is the real scene of the video frame, the second column is a personal space diagram labeled with the head, and the third column is the density estimate after the MCNN convolutional neural network. When the personal space is small, the density of the crowd is correspondingly high, and the density change is reflected by the color change in the density estimation map. We experimentally analyzed the MCNN density estimation map and personal space diagram on the video frame data set. Figure 7(a)(b) are the grayscale image of the output result of the convolutional network and the grayscale image of the personal space diagram. Figure 7(c)(d) are the corresponding grayscale histograms, the horizontal axis is the RGB value of the grayscale image, and the vertical axis is the corresponding number of pixels. We use the Bhattacharyya coefficient method to estimate the similarity of the gray histogram. After multiple averaging, the correlation coefficient is close to 0.8, which indicates that the MCCN density estimation and personal space representation can more accurately perceive the crowd density in the area.
282
L. Bai et al.
Fig. 6. Comparison of density estimation results: the first column is the original image, the second column is a personal space diagram, and the third column is the density estimation map after MCNN.
Fig. 7. Gray image and its similarity calculation.
In addition, according to the head position of the video frame image, at three different moments, we merged into three different levels of crowd gathering status. Figure 8(a)(b)(c) shows the crowd status at the time of low density, medium density and high density. When the video frame images of continuous moments are obtained, the dynamic evolution of the crowd gathering status can be realized, which is also one of the subsequent work in this paper.
Overcrowding Detection Based on Crowd-Gathering Pattern Model
283
Fig. 8. Crowd density map of fountain square in different states.
5
Conclusion
This article discussed Hall’s personal space theory in a crowded scene and only makes a qualitative analysis. To deal with the limitations, we explored the quantitative relationship between crowd density and distance, and summarized the basic and special modes of crowd gathering. Then our optimized MCNN was used for density estimation which compensates for the negative impact of different head sizes caused by depth of field. We also analyzed the changing trend of density along the distance of attraction points in crowd gathering mode. By calculating the correlation between the personal space and the gray histogram of the density estimation, the perception accuracy of the crowd distribution with density estimation is verified. As an application evaluation, we performed the method in Suzhou Fountain Square and showed its effectiveness. In future research, we will explore the method of counting people in a wide area based on this article. That is, how to fuse information from different perspectives at high and low altitudes to make the estimation of crowd density in the wide area more accurate.
References 1. Helbing, D., Johansson, A., Al-Abideen, H.Z.: Dynamics of crowd disasters: an empirical study. Phys. Rev. E, 75(4), 046,109 (2007) 2. Johansson, A., Helbing, D., Al-Abideen, H.Z., Al-Bosta, S.: From crowd dynamics to crowd safety: a video-based analysis. Adv. Complex Syst. 11(04), 497–527 (2008) 3. Moussa¨ıd, M., Helbing, D., Theraulaz, G.: How simple rules determine pedestrian behavior and crowd disasters. Proc. Nat. Acad. Sci. 108(17), 6884–6888 (2011) 4. Pu, S., Song, T., Zhang, Y., Xie, D.: Estimation of crowd density in surveillance scenes based on deep convolutional neural network. Procedia Comput. Sci. 111, 154–159 (2017) 5. Fradi, H., Dugelay, J.L.: Towards crowd density-aware video surveillance applications. Inf. Fusion 24, 3–15 (2015) 6. Grant, J.M., Flynn, P.J.: Crowd scene understanding from video: a survey. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 13(2), 19 (2017) 7. Feliciani, C., Nishinari, K.: An improved cellular automata model to simulate the behavior of high density crowd and validation by experimental data. Phys. A 451, 135–148 (2016)
284
L. Bai et al.
8. Ji, J., Lu, L., Jin, Z., Wei, S., Ni, L.: A cellular automata model for high-density crowd evacuation using triangle grids. Phys. A 509, 1034–1045 (2018) 9. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995) 10. Yang, X., Dong, H., Wang, Q., Chen, Y., Hu, X.: Guided crowd dynamics via modified social force model. Phys. A 411, 63–73 (2014) 11. Tak, S., Kim, S., Yeo, H.: Agent-based pedestrian cell transmission model for evacuation. Transp. A: Transp. Sci. 14(5–6), 484–502 (2018) 12. Ben, X., Huang, X., Zhuang, Z., Yan, R., Xu, S.: Agent-based approach for crowded pedestrian evacuation simulation. IET Intell. Transp. Syst. 7(1), 55–67 (2013) 13. Was, J., Lubas, R.: Towards realistic and effective agent-based models of crowd dynamics. Neurocomputing 146, 199–209 (2014) 14. Luo, H., Sang, J., Wu, W., Xiang, H., Xiang, Z., Zhang, Q., Wu, Z.: A highdensity crowd counting method based on convolutional feature fusion. Appl. Sci. 8(12), 2367 (2018) 15. Zhao, T., Nevatia, R.: Tracking multiple humans in complex situations. IEEE Trans. Pattern Anal. Mach. Intell. 9, 1208–1221 (2004) 16. Ge, W., Collins, R.T.: Crowd density analysis with marked point processes [applications corner]. IEEE Sig. Process. Mag. 27(5), 107–123 (2010) 17. Rao, A.S., Gubbi, J., Marusic, S., Palaniswami, M.: Estimation of crowd density by clustering motion cues. Vis. Comput. 31(11), 1533–1552 (2015) 18. Nagao, K., Yanagisawa, D., Nishinari, K.: Estimation of crowd density applying wavelet transform and machine learning. Phys. A 510, 145–163 (2018) 19. Kok, V.J., Chan, C.S.: Granular-based dense crowd density estimation. Multimed. Tools Appl. 77(15), 20227–20246 (2018) 20. Meynberg, O., Cui, S., Reinartz, P.: Detection of high-density crowds in aerial images using texture classification. Remote Sens. 8(6), 470 (2016) 21. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016) 22. Hall, E.T.: The Hidden Dimension, vol. 609. Doubleday, Garden City (1966) 23. Lin, Y.C., Wang, M.J.J., Wang, E.M.: The comparisons of anthropometric characteristics among four peoples in East Asia. Appl. Ergon. 35(2), 173–178 (2004) 24. Gordon, C.C., Blackwell, C.L., Bradtmiller, B., Parham, J.L., Hotzman, J., Paquette, S.P., Corner, B.D., Hodge, B.M.: 2010 anthropometric survey of us marine corps personnel: methods and summary statistics. Technical report, Army Natick Soldier Research Development And Engineering Center Ma (2013)
Multi-person Spatial Interaction in a Large Immersive Display Using Smartphones as Touchpads Gyanendra Sharma(B) and Richard J. Radke Rensselaer Polytechnic Institute, Troy, NY, USA [email protected], [email protected]
Abstract. In this paper, we present a multi-user interaction interface for a large immersive space that supports simultaneous screen interactions by combining (1) user input via personal smartphones and Bluetooth microphones, (2) spatial tracking via an overhead array of Kinect sensors, and (3) WebSocket interfaces to a webpage running on the large screen. Users are automatically, dynamically assigned personal and shared screen sub-spaces based on their tracked location with respect to the screen, and use a webpage on their personal smartphone for touchpad-type input. We report user experiments using our interaction framework that involve image selection and placement tasks, with the ultimate goal of realizing display-wall environments as viable, interactive workspaces with natural multimodal interfaces. Keywords: Spatial intelligence · Immersive spaces design · Multi-person interaction · Smartphones
1
· Interaction
Introduction
Designing interactive multi-user interfaces for large-scale immersive spaces requires accommodations that go beyond conventional input mechanisms. In recent years, incorporating multi-layered modalities such as personal touchscreen devices, voice commands, and mid-air gestures have evolved as viable alternatives [1–4]. Especially in projector-based displays like the one discussed here, distant interaction via smartphone-like devices plays a pivotal role [5]. Apart from the input modes of interaction, the size and scale of such spaces greatly benefit from contextualizing user locations within the space for interaction design purposes [6–8]. This is especially true for large enclosed displays such as CAVE [9,10], CAVE2 [11], CUBE [12] and CRAIVE [13]. Representing physical user locations on such screen spaces presents considerable challenges due to spatial ambiguities compared to flat display walls. In this paper, we present mechanisms for multiple users to simultaneously interact with a large immersive screen by incorporating three components: users’ physical locations obtained from external range sensors, ubiquitous input devices c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 285–302, 2021. https://doi.org/10.1007/978-3-030-55190-2_22
286
G. Sharma and R. J. Radke
Fig. 1. Multiple users simultaneously interacting with a large screen using their smartphones and voices, coupled with spatial information about their physical locations. This panoramic image shows a 5 m tall 360-degree display wall with a 44 m perimeter. Please refer to the video hosted at https://youtu.be/CYqXKyHTO U for detailed interaction examples.
such as smartphones and Bluetooth microphones, and automatic contextualization of personal vs. shared screen areas. As shown in Fig. 1, discrete personal interaction regions appear on two sides of a rectangular enclosed screen, where users freely move to make spatial selections and manipulate or generate relevant images. The shared screen region between the two sides can be simultaneously manipulated by multiple users to create a desired layout based on combinations of pre-selected text with user-curated images. Our method and overall architecture allows multiple users to interface with the large visually immersive space in a natural way. Integrating personal devices and voice along with spatial intelligence to define personal and shared interaction areas opens avenues to use the space for applications such as classroom learning, collaboration, and game play. We designed controlled laboratory experiments with 14 participants to test the usability, intuitiveness and comfort of this multimodal, multi-user-to-largescreen interaction interface. Based on the results, we observe that the designed mechanism is easy to use and adds a degree of fun and enjoyment to users while in the space. The rest of the paper is organized as follows. In the next section, we discuss prior work on spatial intelligence and interaction mechanisms in the context of large immersive spaces. We then introduce the overall system, consisting of our proposed multi-modal spatial interaction design along with the overall system architecture. Next we describe user studies to validate our system design for interaction and present the results. We conclude by discussing our findings and potential future work.
2
Background and Related Work
Our system is inspired by a diverse body of prior work, generally related to spatial sense-making in large immersive spaces, personal vs. shared spaces in large screens, multi-user support, and interactions using ubiquitous devices such as smartphones.
Multi-person Spatial Interaction
287
Spatial Intelligence in Immersive Spaces. Microsoft Kinects and similar 3-D sensors have been widely used for user locations or gestural interpretation in the context of various large screens [14–16] and common spaces [17]. Research has primarily been focused on developing mid-air gestures and other interaction mechanisms using methods similar to ray-casting, which require knowledge of spatial layout and users’ physical locations [18]. A unique aspect of our system is an overhead Kinect array that allows many users to be simultaneously tracked and their locations to be correlated to screen coordinates and workspaces. Personal vs. Shared Spaces. In terms of demarcating public vs. personal spaces within large screens, Vogel and Balakrishnan [1] discussed how large displays can accommodate and transition between public and personal interaction modes based on several factors. This thread of research extends to privacysupporting infrastructure and technologies [19,20]. Wallace et al. [21] recently studied approaches to defining personal spaces in the context of a large touch screen display, which we cannot directly incorporate in our system but inspired our design considerations. Multi-user Support. Realizing large immersive spaces as purposeful collaboration spaces through multi-user interaction support remains an active area of research [22]. Various approaches such as visualization of group interaction [23], agile team collaboration [24], along with use cases such as board meeting scenarios [25], have been proposed. The Collaborative Newspaper by Lander et al. [26] and Wordster by Luojus et al. [27] showed how multiple users can interact at the same time with a large display. Doshi et al. presented a multi-user application for conference scheduling using digital “sticky notes” on a large screen [28]. Smartphones as Interaction Devices. The limitations of conventional input devices for natural interactions with pervasive displays have led to several innovations, for example allowing ubiquitous devices such as smartphones to be used as interaction devices. Such touchscreen devices allow for greater flexibility and diversity in how interaction mechanisms with pervasive displays are materialized. Earlier concepts such as the one proposed by Ballagas et al. [29] have evolved towards more native web-based or standalone application-based interfaces. For instance, Baldauf et al. developed a web-based remote control to interact with public screens called ATREUS [30]. Beyond the touchscreen element of smartphones, researchers have investigated combining touch and air gestures [31], 3D interaction mechanisms [32] and using built-in flashlights for interaction [33].
3
System Design
Our system was designed and implemented in the CRAIVE-Lab, a large immersive display wall environment with a 5 m tall 360-degree front-projected screen enclosing a 12 m × 10 m walkable area [13]. The screen is equipped with 8
288
G. Sharma and R. J. Radke
1200 × 1920 resolution projectors, resulting in an effective 1200 × 14500 pixel display, and contains a network of 6 overhead downward-pointed Kinect sensors for visual tracking of multiple participants. 3.1
Spatial Sense-Making
Large immersive spaces have exciting potential to support simultaneous multiuser interactions. Flat 2D displays can support such functions simply by using multiple input devices with minimal consideration for physical user locations. However, to instrument large immersive environments for multi-person usage, it is necessary to demarcate personal vs. collaborative or shared sub-spaces within the context of the large screen. Contextualizing physical user locations in the space plays an important role.
Fig. 2. Users A and D are able to interact with the screen whereas users beyond 2 m distance to the screen (B and C) are in the inactive (light red) region and thus cannot interact with the screen.
To allow multiple users to interact with the screen at the same time, the large screen is subdivided into dynamic sub-spaces based on physical user locations. The existing ceiling-mounted Kinect tracking system returns the (x,y) location of each user in a coordinate system aligned to the rectangular floor space. Although users are tracked wherever they are in the space, we enabled display interactions
Multi-person Spatial Interaction
289
only for users that are located within 2 m of the screen, as shown in Fig. 2. In this way, the center of the room acts as an inactive zone, where users can look around and decide on their next steps instead of actively participating at all times. In order to make this behavior clear to the users, we carefully calibrated the floor (x,y) positions to corresponding screen locations. A key element of our design is a continuous visual feedback mechanism shown at the bottom of the screen when a user is in range, appearing as animated circular rings, as shown in Fig. 5b, 5c and 5d. This feedback serves a two-fold purpose. It makes users aware that their movements and physical locations are being automatically interpreted by the system, and it also allows them to adjust their movements to accomplish interactions with small sub-screens or columns on the large screen. Beyond the continuous feedback, we create discrete interaction spaces on the screen that change dynamically based on user locations. Thus, at a given point in time, users are able to visualize how the system continuously interprets their physical locations in real time, and also the column or sub-screen with which they are able to interact. 3.2
Input Modes of Interaction
Early Iterations. 3-D sensing devices such as the Leap Motion have been previously instrumented to act as interaction devices, albeit in the context of 2-D large displays [34]. In previous work on supporting mid-air gestures, we experimented with the Leap Motion device as an input interaction method that leverages the underlying spatial context in the CRAIVE-Lab [35,36]. However, instrumenting the Leap Motion as a wrist-worn device to perform gestures such as grab, swipe, tap, scroll etc., led to user fatigue as it required both hands to be extended outwards for a considerable duration. Moreover, the device works best when it is secured and sits on a stable support. However, in our initial experiments, users freely moved around the space while wearing the device on one hand and performing interaction tasks with the other, which led to shakiness and poor results. We also explored the feasibility of using voice commands as an exclusive means of interaction but quickly discarded this approach due to discouraging results from early testing and previous recommendations on usability [37–39]. Major challenges resulted from problems with the speech transcription technology due to, e.g., variation in accents. However, instead of getting rid of the voice control entirely, we selectively allowed the users to populate the content on the screen based on their voice commands. This was designed to enable users to generate visible content instead of limiting them only to manipulating existing elements on the screen. We hoped that instead of typing on a keyboard, speaking to the attached Bluetooth microphone would be more natural. We also instrumented a smart watch as an input mechanism. However, translating screen gestures on a very small watch surface to the large screen that spans a height of 5 m was not very appealing. Beyond the factor of novelty, we
290
G. Sharma and R. J. Radke
concluded that it would not serve our aim of having an intuitive interaction mechanism. Smartphones as Input Devices. Ultimately, we developed our approach around users’ personal smartphones, with which they are deeply comfortable and familiar. In addition, as discussed in earlier sections, prior research shows that instrumenting a touch enabled smart device as an interaction device for large screens has met with considerable success. We developed a web application that can run on any touch screen device connected to the internet. Users were provided with QR codes or a short link, which allowed them to navigate to the web page. The web page was designed to run as a track pad, where regular touch screen gestures such as tap, swipe, scroll, double tap, pinch, drag and zoom were supported. Developing on a web platform removes the cumbersome process of having to download and install a standalone application. 3.3
System Architecture
The system architecture of the overall system is shown in Fig. 3. It is primarily comprised of 3 components: (1) user input via smartphone and Bluetooth microphone, (2) spatial tracking via overhead Kinect sensors, and (3) the webpage running on the large immersive screen for visualization and output. All components communicate with each other in real time using the WebSocket protocol. The users’ smartphone gestures are sent via WebSocket to the web application running on the large screen, as well as any voice input, which is passed through the Google speech-to-text transcription service. The user tracking system is located in a different node, which sends the (x,y) location of all users to the screen. The web application running on the large screen receives all the data, and displays dynamic feedback and visualizations accordingly. 3.4
Overall System
Combining all the components discussed in the previous sections, we designed a multi-user spatial interaction mechanism for the large immersive space, using smartphones as input interaction devices and voice control for content generation. As shown in Fig. 5, two users can walk into the space and scan a QR code located near the entrance to launch the web application on their personal devices. The two QR codes correspond to the left and right sides of the large screen. As the users move towards their respective screens and come within the defined threshold of 2 m, the location feedback and interaction mechanisms are activated, allowing them to interact with the individual columns as they see fit. In the experiments we describe next, we populate each of the columns with images from the public Flickr API. As users move around the space, the continuous spatial feedback appears at the bottom of the large screen and the column
Multi-person Spatial Interaction
291
Fig. 3. Overall system architecture.
with which each user can interact is highlighted in bright red. Since the interaction column is tied to the spatial location of the user, it can be viewed as exclusive or personal to the user standing in front of it. A red cursor dot that appears on the spatially selected column can be moved using the web application on the phone and acts similar to a mouse pointer. Table 1 shows the list of supported gestures and how they translate to the big screen. An important aspect of our systems is that users need not look at their phone to manipulate the screen elements (in fact, the “touchpad” webpage is just a blank screen). The idea is for the users to keep their eyes on the screen while using a finger or thumb on the phone in a comfortable arm position. Images that users select on the left and right screens can be moved to the front screen, which supports a personal column for each user as well as a large shared usage area. Users can move their personally curated images to the shared area on the front screen. In our particular case, we designed an application in which users can simultaneously drag their personal images around the shared screen to design a simple newspaper-article-like layout.
4
User Studies
We gathered 14 participants to test the usability of and gain feedback about our overall system. Only 3 participants had extensive prior experience with working in immersive display environments. We designed two experiments. The first experiment was designed to gain a quantitative understanding of how long individual users take to perform various tasks on the screen using our system. The
292
G. Sharma and R. J. Radke
Table 1. List of phone gestures, physical user movements, and voice inputs, and their corresponding screen results. Phone gestures
Screen result
Move Tap Swipe (Left or Right) Swipe (Up or Down) Pinch Zoom Double Tap Long Tap
Move red pointer/drag image on shared screen Select image Move image/s to front screen Scroll up/down personal column Shrink selected image Enlarge selected image Enlarge/shrink selected image Activate/deactivate drag on shared screen
Other input
Screen Result
Move (Physical user locations)
Select different column/continuous circular visualization Voice input (“Show me pictures of X”) Populate column with pictures of “X”
second experiment was designed as a simple game, where two users simultaneously work using both their personal and shared screens to come up with a final layout. This was largely designed to understand how comfortable users felt in the space and how intuitive they felt the system to be. For this experiment, users were mostly left on their own to complete the tasks based on their understanding of how the system works. 4.1
Experiment 1
Individual users were directed to use only the left screen, where they were asked to complete tasks based on the prompts appearing on the large screen. There were 9 columns, each filled with random images, as illustrated in Fig. 4. Screen prompts would appear randomly on any of the 9 columns asking the user to complete various tasks, one after the other. We tested all the gestures and inputs by asking the user to perform tasks shown in the second column of Table 1, except for the pinch, zoom, and double tap. Each task is completed once the user performs the correct input that corresponds to the displayed prompt. For instance, if a user at a given point in time is in front of the 2nd column, a prompt might appear in the 9th column indicating “Select a picture from this column and move it to the front screen”. Then, the user would physically move until the system highlights the 9th column, and perform the corresponding scroll, tap, and swipe gestures. This would successfully complete the task and another prompt would appear on a different column, such as “Populate this column with pictures of dogs”, which would require a voice command. We recorded the time it took for the user to accomplish each task, including both the time it took to make spatial
Multi-person Spatial Interaction
293
Fig. 4. Experiment 1 setup. The red dot (appearing next to the silhouette of the user) acts as a cursor to interact with the spatially selected column. The yellow text box, shown in the column that the user is interacting with, directs user tasks, in this case “Move an image from this column.”
selections by moving between the columns and the time it took to successfully perform phone or voice input. 4.2
Experiment 2
We designed this experiment to be completed in pairs. Both users had completed Experiment 1 before taking part in this experiment. Our aim was to make sure users understand all input mechanisms and are comfortable to freely use the system. On the front screen, where the shared screen is located, we presented a simple layout with two short paragraphs of text and image placeholders. Each paragraph consisted of a heading indicating a recipe name and text below describing the ingredients and preparation. Each user was responsible for finding an appropriate image for “their” recipe. Initially, the users independently move along the left and right sides of the screen, selecting one or more images and moving them to their personal columns on the front screen. Then, they move to the front screen and select the most likely candidate image from the refined set of images and move it to the shared screen with the recipe. A screen prompt on the large screen notifies the user whether a correct image was selected (i.e., a picture of the dominant ingredient in the recipe, such as an avocado picture for a guacamole recipe). Once the correct image is moved to the shared screen, users can perform a long-tap gesture on their phone to activate dragging on the shared screen. This allows the users to simultaneously drag their answer images to an appropriate
294
G. Sharma and R. J. Radke
location, which is generally next to the corresponding text. A screen prompt notifies the user once the target image has been moved to the required location on the shared screen. When both users complete their tasks on the shared screen, the full task is complete. The steps of the process are illustrated in Fig. 5. Each
Fig. 5. Multiple users during Experiment 2. (a) Each user scans a QR code. (b) Users work in their personal spaces using their smartphones and/or voice control. (c) Users move to the front screen to view their curated list of images. (d) Both users manipulate the shared space to complete a full task. For better visualization, please refer to the video at https://youtu.be/CYqXKyHTO U.
user pair was presented with six sets of recipe “games”. Three of the recipe pairs had the correct images already placed in one of the pre-populated columns and the users had to move around, scroll the columns, and locate the correct image. The other three pairs did not have the answer images in any of the columns and this required the users to generate content on their own by verbally requesting the system to populate a blank column with images of what they thought was the main ingredient in the recipe, one of which the user had to select and move to the front screen to verify. We designed this setup to study whether users felt comfortable completing tasks based on the interaction mechanisms we designed for our display environment. We also wanted to find out if the users, most of whom had no prior
Multi-person Spatial Interaction
295
experience with these kinds of spaces, found interacting with an unconventional immersive space such as this one to be fun and intuitive. Therefore, we asked the users to fill out a NASA-TLX questionnaire along with an additional questionnaire based on a 5 point Likert scale to obtain feedback on specific spatial, gestural, and voice input mechanisms that we designed.
5
Results
On average, each of the 14 participants performed 27 tasks during Experiment 1, where each of the 5 tasks appeared at random. Users were required to perform at least 20 and at most 35 tasks depending on the randomness of the distributed tasks as well as their speed at completing them. All tasks were assigned equal probability of appearing, except for voice control tasks, which appeared less often, according to the design considerations discussed earlier. The average number of tasks per user was distributed as follows: spatial selection (7.35), scrolling image columns (4.93), selecting an image (6.43), moving images to the center screen (6.65), and populating with voice (2.36). Timing results are reported in Fig. 6.
Fig. 6. Average and median times for users √ to complete each action in Experiment 1. Error bars represent standard error (σ/ N ). All actions take longer duration than spatial selection as their completion requires spatial selection as a pre-requisite.
Even though we report both the average and median time for each of the actions, we believe that the median times for each of the tasks are more reflective of typical user performance. We observed many cases in which a certain user would take a lot of time to internalize one particular action, while completing
296
G. Sharma and R. J. Radke
other similar actions quickly. This varied significantly from one user to another and therefore led to some higher average values than expected. Unsurprisingly, voice input was the most time consuming action as can be seen in Fig. 6. For Experiment 2, where multiple users worked simultaneously on their personal screens and came together on the shared screen space to complete the full task, we recorded the time of completion. Since there were 3 games for the touch-only interface and 3 for the voice interface, each pair of participants played 6 games. Out of the 21 games for each type of input (7 participant pairs × 3 games per input type), participants completed 17 of each. 4 games for each input were not completed for various reasons, typically a system crash or one of the participants taking too long to figure out the answer and giving up. The average time taken for a pair of participants to complete the touch-only based game and voice-based game were 2.31 min and 1.67 min respectively. Even though experiment 1 revealed that voice input generally takes longer, we note that for touch input the user has to physically move and search for the correct image among a wide array of choices, while for the voice input, users can quickly generate for pictures of their guessed ingredient and move one to the shared screen area. We asked participants to fill out a NASA-TLX questionnaire after completing both experiments to investigate how comfortable and usable our overall system is, and present the results in Fig. 7. We added an extra question regarding the intuitiveness of the overall system, where on the 21 point scale, a higher number indicates a higher degree of intuitiveness. Overall, participants rated their mental, physical, and temporal demand, along with effort and frustration in using the system, to be low. Performance and intuitiveness were highly rated. In addition, users filled out another questionnaire related to how well they liked/disliked particular interaction mechanisms such as phone gestures, spatial interactions, voice input, and so on, using a 5 point Likert scale. As shown in Fig. 8, median values for most of these components are rated very highly. The ratings were also high for whether the overall tasks were fun and enjoyable. Users also highly rated the user interface and other feedback on the screen, including the constant localization feedback. Among the 14 participants, 3 were previously familiar with the physical space. However, the interaction interface was completely new to them, the same as the rest of the users. We observed that the users familiar with the large immersive space performed 25% and 30% faster than the overall average for the touch and voice games respectively. We observed that in Experiment 2, the average time for completion with voice input was less than that for the smart phone input, even though Experiment 1 revealed that voice input takes a longer time on average. This can be explained due to the time-consuming nature of search required in the phone input subtask. On the other hand, for the voice input task, upon knowing the key ingredient, users were quickly able to ask for valid pictures and move them to the shared screen area.
Multi-person Spatial Interaction
297
Fig. 7. Average and median values of user responses to the NASA-TLX questionnaire rated on a 21 point scale. Error bars represent standard error.
6
Discussion
Based on our observations and post-task interviews, many users appreciated the constant spatial feedback, allowing them to understand their impact on the space. Some users appreciated the automatic demarcation of personal vs. shared within the scope of the same large screen. We observed many issues related to the automatic speech understanding and transcription. Non-native English speakers had more difficulty populating their columns with desired input. Thus it was unsurprising that the average time for actions to be completed using voice was the largest, as shown in Fig. 6. Users were divided on the usefulness and comfort of voice input; one user wished he could carry out the entire task using his voice while another was completely opposed to using voice as any kind of input mechanism. Many participants gave high marks to the system’s approach of mapping the horizontal location of their screen cursor to their physical location and the vertical location to their smartphone screen. However, technical difficulties in which some users had to repeatedly refresh the webpage on their phone due to lost WebSocket connections contributed to a certain level of annoyance. One of the major usage challenges that many participants commented on was the appropriate appearance of screen prompts and other feedback at eye height. Designing user interfaces/feedback for large displays without blocking screen content is a continuing challenge for this type of research.
298
G. Sharma and R. J. Radke
Fig. 8. Average and median values of user responses to the second questionnaire on a 5 point Likert scale. Error bars represent standard error.
7
Conclusion and Future Work
In terms of overall performance, the results were very encouraging with regard to the usefulness of the overall interaction interface. Using standalone methods such as cross-device platforms or voice only methods have shown limited usability in the past [39]. However, in our case, we see that users adapt well to multi-modal inputs; touch screen and voice are used in conjunction with automatic interaction mechanisms based on spatial tracking. Large-display-rich immersive spaces such as the one presented in this work draw significant amount of user attention. When designing interaction interfaces that do not overwhelm users, it is important to devise methods that require minimal attention. We found that allowing users to work with their own familiar smartphones was a successful approach. Furthermore, allowing users to move freely and using the voice commands selectively, only for content generation, helped users to continuously focus on the screen and the task at hand instead of having to repeatedly glance at the phone screen or manually type input commands. The multi-modal user interface presented in this work and its success led us to work towards building new use cases for our large-display environments. We are working towards building a language learning classroom use case, where students match language characters to images. The image selection and placement tasks discussed here can be re-purposed to support classroom activities, where the room in itself is a teaching tool in contrast to conventional classrooms [40]. Assessed learning outcomes based on student feedback and the overall success
Multi-person Spatial Interaction
299
of our interface will be important in furthering interaction design choices going forward. While we only reported 2-user studies here, our immediate next step is to accommodate 3–6 users simultaneously, to fully realize the potential of our immersive environment as a multi-user space. In addition to direct extensions of the experiments we discussed here, we are investigating how the screen space for each user can be dynamically defined based on their location rather than constrained to one side of the screen. We are also working to replace the worn Bluetooth microphones with an ambient microphone array that uses beamforming, along with the users’ known locations, to extract utterances for verbal input. Finally, we hope to conduct more systematic eye-tracking experiments to explore where the users look on the big screen and how often/under what circumstances they glance down at their phone “touchpad”. Acknowledgments. The research in this thesis was partially supported by the US National Science Foundation under awards CNS-1229391 and IIP-1631674, and by Cognitive and Immersive Systems Laboratory, a research collaboration between Rensselaer Polytechnic Institute and IBM through IBM’s AI Horizon Network. Thanks to Devavrat Jivani for his contribution in setting up the multi-tracking system described in this work.
References 1. Vogel, D., Balakrishnan, R.: Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In: Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, pp. 137–146. ACM (2004) 2. Malik, S., Ranjan, A., Balakrishnan, R.: Interacting with large displays from a distance with vision-tracked multi-finger gestural input. In: Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology, pp. 43–52. ACM (2005) 3. Kister, U., Reipschl¨ ager, P., Dachselt, R.: Multilens: fluent interaction with multifunctional multi-touch lenses for information visualization. In: Proceedings of the 2016 ACM International Conference on Interactive Surfaces and Spaces, pp. 139– 148. ACM (2016) 4. Bragdon, A., DeLine, R., Hinckley, K., Morris, M.R.: Code space: touch+ air gesture hybrid interactions for supporting developer meetings. In: Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, pp. 212–221. ACM (2011) 5. Langner, R., Kister, U., Dachselt, R.: Multiple coordinated views at large displays for multiple users: empirical findings on user behavior, movements, and distances. IEEE Trans. Vis. Comput. Graph. 25(1), 608–618 (2019) 6. Liu, C.: Leveraging physical human actions in large interaction spaces. In: Proceedings of the Adjunct Publication of the 27th Annual ACM Symposium on User Interface Software and Technology, pp. 9–12. ACM (2014) 7. Wolf, K., Abdelrahman, Y., Kubitza, T., Schmidt, A.: Proxemic zones of exhibits and their manipulation using floor projection. In: Proceedings of the 5th ACM International Symposium on Pervasive Displays, pp. 33–37. ACM (2016)
300
G. Sharma and R. J. Radke
8. Kister, U., Klamka, K., Tominski, C., Dachselt, R.: Grasp: combining spatiallyaware mobile devices and a display wall for graph visualization and interaction. In: Computer Graphics Forum, vol. 36, pp. 503–514. Wiley Online Library (2017) 9. Cruz-Neira, C., Leigh, J., Papka, M., Barnes, C., M. Cohen, S., Das, S., et al.: Scientist in wonderland: a report on visualization applications in the cave virtual reality environment. In: IEEE Symposium on Research Frontiers in Virtual Reality, pp. 59–66 (1993) 10. Cruz-Neira, C., Sandin, D.J., Defanti, T.A., Kenyon, R.V., Hart, J.C.: The cave: audio visual experience automatic virtual environment. Commun. ACM 35, 64–72 (1992) 11. Febretti, A., Nishimoto, A., Thigpen, T., Talandis, J., Long, L., Pirtle, J., Peterka, T., Verlo, A., Brown, M., Plepys, D., et al.: Cave2: a hybrid reality environment for immersive simulation and information analysis. In: IS&T/SPIE Electronic Imaging, pp. 864,903–864,903. International Society for Optics and Photonics (2013) 12. Rittenbruch, M., Sorensen, A., Donovan, J., Polson, D., Docherty, M., Jones, J.: The cube: a very large-scale interactive engagement space. In: Proceedings of the 2013 ACM International Conference on Interactive Tabletops and Surfaces, pp. 1–10. ACM (2013) 13. Sharma, G., Braasch, J., Radke, R.J.: Interactions in a human-scale immersive environment: the CRAIVE-Lab. In: Cross-Surface Workshop at ISS2016, Niagara Falls, Canada (2016) 14. Ackad, C., Clayphan, A., Tomitsch, M., Kay, J.: An in-the-wild study of learning mid-air gestures to browse hierarchical information at a large interactive public display. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 1227–1238. ACM (2015) 15. Yoo, S., Parker, C., Kay, J., Tomitsch, M.: To dwell or not to dwell: an evaluation of mid-air gestures for large information displays. In: Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction, pp. 187–191. ACM (2015) 16. Ackad, C., Tomitsch, M., Kay, J.: Skeletons and silhouettes: comparing user representations at a gesture-based large display. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 2343–2347. ACM (2016) 17. Ballendat, T., Marquardt, N., Greenberg, S.: Proxemic interaction: designing for a proximity and orientation-aware environment. In: ACM International Conference on Interactive Tabletops and Surfaces, pp. 121–130. ACM (2010) 18. Kopper, R., Silva, M.G., McMahan, R.P., Bowman, D.A.: Increasing the precision of distant pointing for large high-resolution displays (2008) 19. Brudy, F., Ledo, D., Greenberg, S., Butz, A.: Is anyone looking? Mitigating shoulder surfing on public displays through awareness and protection. In: Proceedings of The International Symposium on Pervasive Displays, p. 1. ACM (2014) 20. Hawkey, K., Kellar, M., Reilly, D., Whalen, T., Inkpen, K.M.: The proximity factor: impact of distance on co-located collaboration. In: Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work, pp. 31– 40. ACM (2005) 21. Wallace, J.R., Weingarten, A., Lank, E.: Subtle and personal workspace requirements for visual search tasks on public displays. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6760–6764. ACM (2017) 22. Anslow, C., Campos, P., Jorge, J.: Collaboration Meets Interactive Spaces. Springer, Heidelberg (2016)
Multi-person Spatial Interaction
301
23. von Zadow, U., Dachselt, R.: Giant: visualizing group interaction at large wall displays. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 2639–2647. ACM (2017) 24. Kropp, M., Anslow, C., Mateescu, M., Burkhard, R., Vischi, D., Zahn, C.: Enhancing agile team collaboration through the use of large digital multi-touch cardwalls. In: International Conference on Agile Software Development, pp. 119–134. Springer, Cham (2017) 25. Horak, T., Kister, U., Dachselt, R.: Presenting business data: challenges during board meetings in multi-display environments. In: Proceedings of the 2016 ACM International Conference on Interactive Surfaces and Spaces, pp. 319–324. ACM (2016) 26. Lander, C., Speicher, M., Paradowski, D., Coenen, N., Biewer, S., Kr¨ uger, A.: Collaborative newspaper: exploring an adaptive scrolling algorithm in a multi-user reading scenario. In: Proceedings of the 4th International Symposium on Pervasive Displays, pp. 163–169. ACM (2015) 27. Luojus, P., Koskela, J., Ollila, K., M¨ aki, S.M., Kulpa-Bogossia, R., Heikkinen, T., Ojala, T.: Wordster: collaborative versus competitive gaming using interactive public displays and mobile phones. In: Proceedings of the 2nd ACM International Symposium on Pervasive Displays, pp. 109–114. ACM (2013) 28. Doshi, V., Tuteja, S., Bharadwaj, K., Tantillo, D., Marrinan, T., Patton, J., Marai, G.E.: Stickyschedule: an interactive multi-user application for conference scheduling on large-scale shared displays. In: Proceedings of the 6th ACM International Symposium on Pervasive Displays, p. 2. ACM (2017) 29. Ballagas, R., Borchers, J., Rohs, M., Sheridan, J.G.: The smart phone: a ubiquitous input device. IEEE Pervasive Comput. 1, 70–77 (2006) 30. Baldauf, M., Adegeye, F., Alt, F., Harms, J.: Your browser is the controller: advanced web-based smartphone remote controls for public screens. In: Proceedings of the 5th ACM International Symposium on Pervasive Displays, pp. 175–181. ACM (2016) 31. Chen, X., et al.: Air+ touch: interweaving touch and in-air gestures. In: Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology. ACM (2014) 32. Du, Y., Ren, H., Pan, G., Li, S.: Tilt & touch: mobile phone for 3d interaction. In: Proceedings of the 13th International Conference on Ubiquitous Computing, pp. 485–486. ACM (2011) 33. Shirazi, A.S., Winkler, C., Schmidt, A.: Flashlight interaction: a study on mobile phone interaction techniques with large displays. In: Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services, p. 93. ACM (2009) 34. Dingler, T., Funk, M., Alt, F.: Interaction proxemics: combining physical spaces for seamless gesture interaction. In: Proceedings of the 4th International Symposium on Pervasive Displays, pp. 107–114. ACM (2015) 35. Sharma, G., Jivani, D., Radke, R.J.: Manipulating screen elements in an immersive environment with a wrist-mounted device and free body movement. In: Living Labs Workshop, CHI 2018. Montreal, Canada (2018) 36. Jivani, D., Sharma, G., Radke, R.J.: Occupant location and gesture estimation in large-scale immersive spaces. In: Living Labs Workshop, CHI 2018. Montreal, Canada (2018) 37. Nutsi, A., Koch, M.: Multi-user usability guidelines for interactive wall display applications. In: Proceedings of the 4th International Symposium on Pervasive Displays, pp. 233–234. ACM (2015)
302
G. Sharma and R. J. Radke
38. Nutsi, A.: Usability guidelines for co-located multi-user interaction on wall displays. In: Proceedings of the 2015 International Conference on Interactive Tabletops & Surfaces, pp. 433–438. ACM (2015) 39. Sarabadani Tafreshi, A.E., Soro, A., Tr¨ oster, G.: Automatic, gestural, voice, positional, or cross-device interaction? Comparing interaction methods to indicate topics of interest to public displays. Front. ICT 5, 20 (2018) 40. Divekar, R.R., Zhou, Y., Allen, D., Drozdal, J., Su, H.: Building human-scale intelligent immersive spaces for foreign language learning. In: iLRN 2018 Montana, p. 94 (2018)
Predicting Vehicle Passenger Stress Based on Sensory Measurements Dario Niermann(B) and Andreas Lüdtke OFFIS e.V., Human Centered Design, Escherweg 2, 26121 Oldenburg, Germany {Dario.Niermann,Andreas.Luedtke}@offis.de
Abstract. While driving autonomously, trust and acceptance are important human factors to consider. Detecting uncomfortable and stressful situations while driving could improve trust, driving quality and overall acceptance of autonomous vehicles through adaption of driving style and user interfaces. In this paper, we test a variety of sensors which could measure the stress of vehicle passengers in real-time. We propose a portable system that measures heart rate, skin conductance, sitting position, g-forces and subjective stress. Results show that correlations between self-reported, subjective stress and sensor values are significant and a neural network model can predict stress based on the measured sensor outputs. However, the subjective self-reported stress does not always match sensor evidence, which demonstrates the problem of subjectiveness and that finding one model that fits all test-subjects is a challenge. Keywords: Autonomous driving · Discomfort · Acceptance · On-the-Road · Human factors · Human-Machine-Interaction · Machine learning · Driver study
1 Introduction We start with the hypothesis that drivers of (autonomous) vehicles have moments of discomfort or stress while driving through traffic, due to sometimes occurring slightly dangerous or unpredictable traffic conditions. Especially in autonomous vehicles (AVs), discomfort might occur more frequently for passengers who do not trust the autonomous system. Therefore, it is desired to develop methods that lower discomfort and stress, increasing trust and acceptance of AVs and thus result in a more comfortable and enjoyable ride. Besides static methods, like always using a slow driving style, the detection of the exact situations where discomfort or stress is building up is an approach to individualize the needs of the driver and to adapt only when adaption is needed. Uncomfortable or stressful situations can be measured using two different streams of information: In-Vehicle indicators about the passenger or exterior information from surrounding traffic. Since traffic information will necessarily be recorded and interpreted by AVs, we investigate the usefulness of physiological sensor data while driving through diverse traffic conditions. If sensors can be found which accurately correlate with stress, one can use this data and combine it with a context model of the surrounding traffic. This data can be mapped together with the current maneuver of the vehicle, the traffic © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 303–314, 2021. https://doi.org/10.1007/978-3-030-55190-2_23
304
D. Niermann and A. Lüdtke
context and the sensor recordings to predict if the passenger was stressed in the measured maneuver. This information could be collected over time and used to improve maneuver planning to reduce discomfort. A graphical representation of this idea is given in Fig. 1. This concept is further explained in more detail in [1].
Fig. 1. Conceptual Graph of the proposed model. Beginning from the sensory input (left), Stress is detected and connected to traffic context (middle) and driving style adaption is performed (right). GSR is an abbreviation for galvanic skin response.
In this paper, we investigate correlations between selected sensor data and selfreported stress and develop a model to predict passenger stress given the sensory inputs.
2 Definition of Stress and Selection of Physiological Sensors Our goal is to collect moments where a passenger has some kind of risk perception about the current driving style or traffic situation and feels uncomfortable or stressed. These are moments where the passenger requests to reduce stress through change of driving style or avoidance of the situation. For example, for an overtaking maneuver, the passenger perceives risk because of oncoming traffic and feels stressed. The passenger can self-report those situations while driving by pressing a button. With this method, we can measure subjective stress and assign the intensity to a value between zero (no stress) and one (high stress). As shown in [6], we consider this stress measureable by physiological sensors. From now on, we will refer to above-mentioned subjective risk perception as ‘stress’ to make the paper more easily readable. Previous work on measuring stress with physiological sensors was already conducted multiple times. In the medical field, [2] proposed that GSR is correlated with stress. Later, GSR was used to develop a stress detector with reported success rate of 78% [3]. A study on brain signals with EEG, ECG and EMG, combined with GSR, was done in [4]. Skin temperature also provided some correlation with stress [5] and facial expressions were analyzed as well [7]. Healey & Picard [6] did a very similar study, were a driving study through real traffic was conducted. They measured ECG, EMG, GSR, HR and respiration rate and concluded that HR and GSR correlate best with stress. Following differences to our work should be noted. They assigned situations of high and low stress based on urbanization type (e.g. city, highway, rural) and used five-minute intervals of data for their stress detection model. A real time detection of stress was not accomplished. We found no released work that demonstrates a real time stress detection in real traffic conditions.
Predicting Vehicle Passenger Stress Based on Sensory Measurements
305
In general, GSR and HR correlate best with stress levels of participants. Since GSR and HR sensors are also very unobtrusive, we decided to use these two bio signals in our study. Another indicator for stress situations could be body or car movement. Therefore, we added pressure sensors onto the vehicle seat.
3 Methodology 3.1 Hardware Setup We developed a portable set of sensors that can be placed easily into any car. It uses a battery to power the necessary computing components and lasts multiple hours up to a few days. The main computing is done by a Raspberry Pi 3 Model B that all sensors connect to via wired connections. The GSR1 and (optical) HR2 sensor were bought from Seeedstudio3 and connect to an Arduino Nano, which forwards the signals via USB B to the Raspberry Pi. The handset that is used to signal the stress level is from the company Carrera and is a simple button that is pressed down with the thumb (see Fig. 2, right). Pressure sensors are connected via a custom made circuit board to the Raspberry Pi. They measure the air pressure inside attached silicon tubes, on which the passenger will sit on. The setup is shown in Fig. 2. The pressure sensors and silicon tubes are attached to a seat mat, which is placed onto the seat of the study vehicle. On the bottom of the seat mat is a small pocket where the Raspberry Pi and battery is placed. A Laptop can connect to the Raspberry Pi via WLAN and can be used to monitor the recorded data in real time.
Fig. 2. Display of the portable hardware setup on an office chair.
1 GSR Sensor from Seeedstudio, http://wiki.seeedstudio.com/Grove-GSR_Sensor. 2 HR Sensor from Seeedstudio, http://wiki.seeedstudio.com/Grove-Ear-clip_Heart_Rate_Sensor/. 3 Seeedstudio Grove system, http://wiki.seeedstudio.com/Grove_System.
306
D. Niermann and A. Lüdtke
3.2 Study Setup The test subject sits on the passenger seat and on the seat mat with the pressure sensors. The HR sensor is attached to one earlobe, the GSR sensor onto two fingers of the hand not holding the handset (to remove accidental correlation). A driver drives the vehicle; another person in the back monitors the recorded data and serves as a contact person for possible questions, so the driver is able to focus on the road. While driving, the test subject is watching the traffic and if he/she feels stressed, presses the handset button accordingly. We tested nine subjects (4 male, 5 female) with an average age of 36.3 and standard deviation of 12.3. The 50-minute route started and ended at OFFIS e.V. in Oldenburg and contained suburban areas, highways and rural roads. The driver drove slightly assertive. However, most test subjects reported that they felt safe, because they trusted the driver (more on that in Sect. 5). We did not drive for the first 5 min, so that the test subject could relax and get familiar with the sensors. After driving, a questionnaire had to be filled. All data was recorded anonymously. The study is accepted by the ethics commission of the Carl von Ossietzky University of Oldenburg.
4 Acquired Data For each test subject we collected the raw outputs of three pressure sensors, one heart rate sensor, one GSR sensor and a handset. A visual sample of this data is shown in Fig. 3. We cut of the first 5 min where no driving was done. Sampling rates for pressure sensors are 60 Hz, for GSR and handset 40 Hz and for HR 1 Hz.
Fig. 3. (Left) Data samples of one test subject recorded while driving. (Right) Comparison between the raw handset values and the time delayed values. A typical handset press only lasts 1 s without time delay, which is not enough to correlate handset data to sensor data.
From the raw data, we calculated the standard deviation, minimum, maximum and derivative over small time-windows. The sizes of the windows and all used operations are shown in Table 1. The data was also smoothed to remove artifacts.
Predicting Vehicle Passenger Stress Based on Sensory Measurements
307
Table 1. Operations, symbols, units and time windows of used sensor data Sensor
Operation
Symbol
Unit
Window size in s
HR
–
H
bpm
–
Time derivative (TD)
∂H /∂t
bpm/s
1
Min and Max of TD
min(∂H /∂t), max(∂H /∂t)
bpm/s
10
GSR
–
S
arb. units (u)
–
Time derivative
∂S/∂t
u/s
0.025
u
4
Standard dev. (SD) σ (S)
Pressure i, i ∈ {1, 2, 3}
SD of TD
σ (dS/∂t)
u/s
4
Min and Max of TD
min(∂S/∂t), max(∂S/∂t)
u/s
10
–
P1 , P2 , P3
arb. units (u)
–
u
4
u
10
Standard deviation σ (Pi ) Min and Max of SD
min(σ(Pi )), max(σ(Pi ))
In addition, the handset values are manipulated such that the descent of the values is delayed in time (see example in Fig. 3). This is done because nearly all subjects pressed the button for less than one second. However, physiological responses have two important factors to consider; they need a few seconds to respond and they need time to relax again and go back to their resting values. The applied delay to the handset values allows for a simpler correlation analyses and model fit, since the response and relax times of measurements are now within high handset values.
5 Data Evaluation 5.1 Sensor Data Correlation with Self-reported Stress On average, test subjects reported 12.5 situations where they felt stressed, with two distinct groups notable: one group of participants pressed the handset rarely (less than 5 times); the other pressed it very often (more than 20 times). The first group has generally more measureable reported stress moments. To validate the usefulness of the proposed sensors, it is important that the sensor signals show a response when passengers reported stress. Sensor responses need to correlate with the handset values, e.g., when the handset value is pressed (stress) the HR sensor reports an increased value, the GSR sensor a decreased value. To analyze how discriminative our signals are, we split the datasets into two parts: Relaxed and stressed. The relaxed parts are defined as the times where the handset value equals zero. The stressed parts are defined by handset values bigger than a threshold. In
308
D. Niermann and A. Lüdtke
Fig. 4(a), distributions of a collection of sensor data are displayed, divided in the abovementioned parts. As one can see, data are significantly different across each proband. The heart rate H for example has not only different averages, but also different relations between stressed and relaxed parts. This problem persists for all raw sensory data (i.e. GSR, HR, Pressure). However, the used operations, like time derivative and standard deviations, help to overcome this problem. An example can be seen in the right side of Fig. 4(a), where relaxed parts for all test subjects have similar averages and standard deviations. However, even for many of the used operations, a clear separability between stressed and relaxed moments is not given.
Fig. 4. a) Distributions of two measures over all test subjects, divided into stressed and relaxed parts. The numbers below the boxes show how many stress intervals were found for the corresponding proband. b) T-Test values over all measurements, averaged over all test subjects.
The Welch’s t-test compares the averages of two independent normal distributions in relation to the variance of the compared distributions. The test gives a measure on how different the two averages are. In Fig. 4(b), results of this t-test between stressed and relaxed data are displayed for all measurements, with data averaged over all test subjects. Since not all distributions are exactly normal distributed, the t-values are not precise and are only used to give an overview. Since we do use neural network models later on, we are not dependent on such statistical attributes. However, the results give an impression on which measurements are best to discriminate between stress and no stress (the bigger the t-value the better). As one can see, GSR and all pressure sensors are very promising tools to measure stress. Least significant is the HR sensor, since no differences can be found between stressed and relaxed moments. This was also visible in a more thorough analysis. Heart rate often does not respond to stress, and if it does, it rises or
Predicting Vehicle Passenger Stress Based on Sensory Measurements
309
lowers, dependent on the test subject. This is also visible in Fig. 4(a). Interestingly, all pressure sensors generally show better correlation than the physiological sensors. This captures the importance of g-forces, sudden car movements and the body movement of the test subject. Another way to analyze the data is to view the time series data explicitly. To get a better overview, the data is viewed only at time-windows where stress is reported. In Fig. 5, every stress time-window is displayed in relative times, such that all windows overlap and start at t = 0. The window length is set to 20 s. A window is selected when handset values are bigger than 0.3. This removes some stress moments and gives a better overview and stronger sensory responses.
Fig. 5. Stress windows for six measurements from two test subjects. Times are only selected were the proband reported stress. Thin grey lines display singe measurements; bold black lines display averages over these measurements. a) shows good examples, b) more challenging ones.
Results of this method vary greatly across test subjects. For some, sensor data shows significant responses while stress is reported. For others (mostly these with a high count of reported stress moments) sensor data show a wide range of responses, with their average being the only viable value to detect stress. Since we want to detect stress in real-time, taking the average of all stress moments is not possible. Figure 5 shows the challenge of taking a real-time approach: Measurements show a great variety of responses during self-reported stress. A model that classifies stress based on such measurements needs to be able to combine all possible measurements to be robust against such varying responses. The temporal development of measurements is also analyzed. We determined that the HR response needs approximately 5 s before it starts rising; the time derivative of the HR rises and lowers over a time-period of 20 s, reaching its maximum value after
310
D. Niermann and A. Lüdtke
8 s. GSR responses are very quick, but takes 10 s to normalize again, visible in the time derivative. Considering these time delays explains our choice to also time delay the handset values. We applied a 5 s maximum function to the handset values, such that a peak becomes a plateau. Then we added an exponential decay at the end of the plateau, simulating the slow relaxation of sensory data. With this delay, correlations of measurements to the handset values are easier to detect. This delay is used throughout the whole paper for the handset values. 5.2 Classification of Stress for a Single Person Because of the varying results across test subjects, we first develop models that are trained on only one test subject. An approach for a model that works on all test subjects is given in Sect. 5.3. We used the open-source machine-learning library Tensorflow to create a neural network as our model. In total, we used 36 inputs and the network produces one output, the stress value. The network has one hidden layer with 40 units. The inputs consist of all measurements from Table 1, combined with the same measurements 5 s before. This gives the network the possibility to evaluate data based on time trends and to react to time delayed responses. The minimum and maximum over 10 s time windows give the network inputs that act like a short-term memory. The hidden layer allows for some complex non-linear combinations of inputs, which seem to be necessary, since we also tested a simple linear model that did not yield satisfying results. We trained the network with backpropagation, using the Adam optimizer [8], least mean square cost function and batch learning. As train and test labels we used the categorical stress, this means binary values, defined by a handset value threshold. If the threshold is surpassed, the stress label is 1. Train and test data are split randomly from the full dataset, such that the proportion of positive and negative label are equal in both sets, with the test data being approximately 20% of the full data in size. Because of the sparsity of stress moments, we will provide multiple measures to give a better understanding of the models performance. As mentioned, we trained a neural network for each proband. This results in 9 models, each with their respective performance. Our chosen performance measures are the percentage of correct classified labels, the percentage of zeros in the test data and the true and false positive rates (RTP , RFP ) RTP = TP/P RFP = FP/N , with P being the number of labels equal to one, N the number of labels equal to zero and TP, FP the amount of true and false positives, respectively. The proportion of zeros in the test data is generally above 90%, thus a model that would only predict zeros would already have a 90% accuracy. Thus, we provide both the proportion of zeros and proportion of correctly classified data, to give a better understanding of the models performance. Additionally, we provide true and false positives in percent, i.e. the proportion of detected stress values and false alarms. In Fig. 6, the
Predicting Vehicle Passenger Stress Based on Sensory Measurements
311
performance measures are displayed as distributions considering all 9 models. Looking at the true positive rate, one can see that the models correctly detected approximately 60–80% of the stressed data points. This result seems promising, especially considering the average low false positive rate of 1%.
Fig. 6. Average results of our classification models for all test subjects. One model was trained for one test subject; the resulting distributions are displayed as boxplots.
The results of the model only consider the point wise analyses for each time point t. Considering that the model will operate in an autonomous vehicle, with given context information of the current traffic and maneuver, performance may greatly increase since one can further develop these models to better analyze the current context. For example, a stress moment may last for several seconds, which corresponds to a few hundred data points, given a sensor-sampling rate of 50 Hz. If, for example, 60% of these points are detected by the model, it would already be enough to infer a stressed passenger given the current maneuver. An example for such a detection behavior is displayed in Fig. 7 and can be found often in our data. Further smoothing over time and more sophisticated interplay between context and stress classification could increase the detection rate of stress moments even more.
Fig. 7. Selected stress moment of a proband, as an example of prediction with true positive rate of only approximately 60%. The full plateau is not detected, but this performance could be good enough to classify a maneuver or traffic situation as stress inducing.
Analyzing the model’s predicted stress intervals the same way as the self-reported stress reveals that the model found stronger correlations between stress and the sensory data. This is shown in Fig. 8, where measurements are again split into stressed and relaxed parts, although now the stress intervals predicted by the model are also added. It is depicted that for most measurements in this figure, the models prediction is more
312
D. Niermann and A. Lüdtke
separable than the test subjects reported stress. This also holds true for many of our other measurements. This means that many of the predicted stress intervals show a stronger correlation to sensory measurements than the self-reported stress intervals. Subjects that reported stress very often did so without strong evidence from the sensory measures. Our model however found only the stress intervals that indeed showed such evidences. As mentioned, it did so with extremely low false positive rates.
Fig. 8. Division of measurements into stressed and relaxed parts, with both self-reported and predicted stress used
The above-mentioned observation leads to two questions: 1. Is the subjective feeling of stress always measureable? 2. Should an implementation of a stress detection model try to fit the self-reported stress exactly, or is the stronger sensory evidence more practical? We will try to give an answer to these questions in Sect. 6. 5.3 General Classification Model One model that fits all test subjects and can be generalized to every user would be ideal, because such a model could be deployed directly into an application without further training. Developing such a model proved to be a challenge and we did not find a good solution for it. We tried multiple network structures, including one and two hidden layers, different parameters and training methods. Results of our best model can be seen in Table 2. One can see that the true positive rate of 34% is much lower than the true positive rate of the individual models (see Fig. 6). The main challenge in finding a general model appears to be the response differences between test subjects. Self-reported stress and physiological responses vary a lot between
Predicting Vehicle Passenger Stress Based on Sensory Measurements
313
Table 2. Results of our general model trained on all test subjects. Correctly classified in % Proportion of zeros in % True positives in % False positives in % 95.4
93.5
34.0
0.3
test subjects, which is a challenge to model with just one simple model. More test-subjects and data or a better collection of sensors would be needed to get further performance increases.
6 Discussion and Outlook Our study results show that passengers get uncomfortable during driving, even in very common situations. Predicting moments of stress proved to be a challenging task. Physiological data is noisy, different for each person and the self-report of stress is subjective in nature thus sensory evidences need not always correlate with this subjective feeling. Therefore, a robust algorithm needs to be developed to measure stress effectively. In this paper, we used a neural network model to classify stress based on multiple sensory signals. First, we trained the model on each test subject individually. The combination of many signals eliminated most of the single sensor false positives and enabled a very low false positive rate of our models. True positive rates ranged between 45% and 100%, depending on the test subject. Additional sensors may improve classification, but note that a highly important feature of our chosen sensors is that they are very unobtrusive. One important result is the increased correlation between stress and sensory responses when the models prediction is considered. Test subjects reported stress often when no physiological evidences could be found. Those reports were not recognized by our model, which decreases its true positive rate but increased correlation between sensor measures and stress moments. As mentioned in Sect. 5.2, this leads to two questions, which we discuss in the following. The first question is whether the subject feeling of stress can always be measured. Defining stress also as being uncomfortable and the desire to change driving style makes measurements more difficult, since test subjects can report this subjective feeling under a wide range of possible emotions. It could be that the need for change does not follow from physiological stress, but just from logical thinking. Thinking about legal issues, possible risks or just own preference could make the test subject uncomfortable and report stress without actually being stressed from a physiological point of view. Stress that is measureable by our chosen sensors may only be a subset of all reported stress situations. This leads to the second question: Should a model try to predict the self-reported stress intervals exactly, or is the stronger sensory evidence more desirable for a comfortable ride? Test subjects sometimes reported no stress even when sensory evidence was strong and often reported stress without evidence. A model that only classifies stress with strong sensory evidences could reduce vehicle adaptions only to the necessary amount
314
D. Niermann and A. Lüdtke
and adapt where passengers would not have suggested it. This could make passengers feel more comfortable subconsciously. However, this method would be problematic in situations where logical thinking makes the passenger wish for an adaption, which would not be predicted by our model. Our test subjects knew that a human was driving, which leads to more trust and comfort. Therefore, further studies in actual AVs should be done next. Comparisons between the amounts of reported stress intervals could quantify how safe the autonomous system compared to a human driver appears to the passengers. Additionally, context information about the surrounding traffic could be collected and linked to the reported stress. This opens up more possibilities for a classification model and could greatly increase stress prediction performance. Studies with good implemented models could be done in simulators to test the possible performance increase with traffic context data. The stress prediction combined with this contextual information could be used to adapt simulated maneuvers of an AV. The reduction of stress could be quantified for many maneuvers to show how effective the adaption is.
References 1. Niermann, D., Lüdtke, A.: Measuring driver discomfort in autonomous vehicles. In: International Conference on Intelligent Human Systems Integration, pp. 52–58. Springer, Cham (2020) 2. Storm, H., Myre, K., Rostrup, M., Stokland, O., Lien, M.D., Raeder, J.C.: Skin conductance correlates with perioperative stress. Acta Anaesthesiol. Scand. 46(7), 887–895 (2002) 3. Villarejo, M.V., Zapirain, B.G., Zorrilla, A.M.: A stress sensor based on galvanic skin response (GSR) controlled by ZigBee. Sensors 12(5), 6075–6101 (2012) 4. Minguillon, J., Perez, E., Lopez-Gordo, M., Pelayo, F., Sanchez-Carrion, M.: Portable system for real-time detection of stress level. Sensors 18(8), 2504 (2018) 5. Yamakoshi, T., Yamakoshi, K.I., Tanaka, S., Nogawa, M., Shibata, M., Sawada, Y., Rolfe, P., Hirose, Y.: A preliminary study on driver’s stress index using a new method based on differential skin temperature measurement. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 722–725 (2007) 6. Healey, J., Picard, R.W.: Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans. Intell. Transp. Syst. 6(2), 156–166 (2005) 7. Gao, H., Yüce, A., Thiran, J.P.: Detecting emotional stress from facial expressions for driving safety. In: IEEE International Conference on Image Processing (ICIP), pp. 5961–5965 (2014) 8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412. 6980 (2014)
Infinite Mixtures of Gaussian Process Experts with Latent Variables and its Application to Terminal Location Estimation from Multiple-Sensor Values Ryo Hanafusa1(B) , Jiro Ebara2 , and Takeshi Okadome2 1
2
Keywalker, Inc., Minato City, Japan [email protected] Kwansei Gakuin University, Nishinomiya, Japan [email protected], [email protected]
Abstract. This study proposes a probabilistic method that estimates the locations of sensor nodes (terminals) in a wireless sensor network using the multi-sensor data from sensors located on terminals and the hop counts between terminals. The proposed method involves the use of a novel probabilistic generative model, called the infinite mixture model of Gaussian process experts with latent input variables (imGPE-LVM) that enables us to inversely infer the value of an explanatory variable from that of a response variable for a piece-wise continuous function. Based on an imGPE-LVM, where the sensor data measured by sensors on a terminal are represented by observed variables and the location of the terminal is represented by a latent variable, the proposed method maximizes the posterior probability of the latent variable given sensor values with assuming the terminal location estimated by the DV-Hop algorithm as a prior. This method yields more precise estimates of terminal locations compared with the localization techniques utilizing a Gaussian process latent variable model and those using the DV-Hop algorithm, which is solely based on the hop counts between two terminals. Keywords: Wireless sensor network · Location estimation · Gaussian process latent variable models · Infinite mixture of Gaussian process experts
1
Introduction
Recent studies propose the use of several techniques for the location estimation of terminals (sensor nodes) in a wireless sensor network lacking a global positioning system (GPS) [6,12,17]. The network generally utilizes the radio field intensities or the hop counts from a base station or other terminals instead of an expensive range finder. The localization techniques based on radio field intensity, however, cannot accurately estimate terminal locations. This is because the field intensity c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 315–330, 2021. https://doi.org/10.1007/978-3-030-55190-2_24
316
R. Hanafusa et al.
of the radio wave at a location in a base station may not reflect the distance from the base station to the location, because of multi-path fading that occurs when the radio wave reaches a location via several paths. Localization techniques using hop counts also cannot accurately estimate terminal locations, because hop counts sometimes do not reflect the distance between two terminals. Focusing on sensors equipped in a terminal such as a thermometer or an illuminance sensor, this study proposes a method for the estimation of terminal locations in a wireless sensor network using multi-dimensional sensor data acquired from the sensors in the terminal. The proposed method used considers only three anchor nodes with given locations and involves no additional hardware cost. Additionally, it is important to note that when the spatial distributions of physical quantities are continuous, an “inverse regression” based on the Gaussian process latent variable model (GP-LVM) [9,10,16] can accurately predict the terminal locations based on the values of sensors located on the terminals, by maximizing a posterior probability of latent variables representing the terminal locations. However, the spatial distributions of physical quantities are generally discontinuous, with the values of the sensor node different from those of other nodes in the neighborhood. This discontinuity causes the inaccurate location estimation when posterior probability based on the GP-LVM, which implicitly assumes the distribution to be continuous, is used. Furthermore, the number of regions with continuous distribution of physical quantities the area is divided into, is unknown. Concurrently, the Gaussian process regression for a discontinuous function cannot accurately predict target variables for new inputs, because it implicitly assumes a continuous function to be a regressor. By extending the Gaussian process, [13] introduced an infinite mixture of Gaussian process experts (imGPE) that builds a regressor for a piece-wise continuous function. (See Appendix A for a summary of imGPE.) Similar to [13], where the Gaussian process was extended to the imGPE in regression, this study extends the GP-LVM in “inverse regression” and defines a novel probabilistic generative model, called the infinite mixture model of Gaussian process experts with latent input variables (imGPE-LVM), that enables us to inversely infer the value of an explanatory variable from that of a response variable for a piece-wise continuous function. The use of an imGPE-LVM allows the partitioning of an environment into areas with continuous physical quantities; the imGPE-LVM represents the sensor values at unknown terminal locations with a GP-LVM (an expert) by area. Using the sensor values for physical quantities, the proposed method estimates the terminal locations by maximizing the posterior probability of the latent variables representing terminal locations based on the imGPE-LVM with priors for terminal locations. As a prior, it assumes the Gaussian distributions for terminal locations estimated by the distance vector Hop (DV-Hop) algorithm that permits easy terminal locations calculations from the given locations of anchor nodes and hop counts from the anchor nodes. This study uses only three anchor nodes, that
Infinite Mixtures of GP Experts with Latent Variables
317
generally produce inaccurate terminal locations estimates with the DV-Hop algorithm. The DV-Hop algorithm is summarized in Appendix B. The contribution of this study is twofold. First, we present a probabilistic generative model, imGPE-LVM, for “inverse regression” on a piece-wise continuous function. Second, we propose a method for the estimation of terminal locations in a wireless sensor network using multi-dimensional sensor data from sensors equipped in a terminal. Based on an imGPE-LVM with the latent variables representing terminal locations and the observed variables representing sensor values, the method estimates terminal locations by maximizing the posterior probability of the latent variables, assuming Gaussian distributions for the terminal locations estimated by the DV-Hop algorithm as a prior.
2
Related Work
Applying Gaussian processes to signal strength localization, [14] constructed a model of interpolation over continuous locations with direct modeling of uncertainty based on the training data. Reference [3] extended this technique to WiFi localization by combining the GP signal strength model with graph-based tracking, allowing accurate localization in large-scale spaces. Extending the GP-LVM by including an additional likelihood model for the latent variables, [16] proposed a technique in Gaussian process dynamical models (GPDM) that permits representing constraints on the reduction of a highdimensional signal strength to a two-dimensional (2D) location space. It models dynamic constraints via Gaussian process mapping between consecutive data points, specifically designed for vision-based motion tracking. Referring to these recent models, [4] introduced an extension to the case of mapping with unknown locations and proposed a technique called WiFi-SLAM, using the GP-LVM for building wireless signal strength maps without any location label in the training data. In the WiFi-SLAM, the high-dimensional data corresponds to the signal strength information for all WiFi access points in an environment. The GP-LVMs map these signal strength measurements to a 2D latent space, which is interpreted as the xy coordinate of the device. Based on the GP-LVM and representation of terminal locations by latent variables, we can accurately estimate the terminal locations from sensor values if the spatial distributions of physical quantities measured by the sensors are continuous. However, the distributions of physical quantities in an environment are generally discontinuous, and similarity of the sensor values of two terminals is sometimes not reflective of the inter-terminal distance. Hence, techniques based on GP-LVMs provide inaccurate estimations of terminal locations.
3 3.1
Proposed Method Notation
We denote the number of terminals by N , the dimensions of sensor values by D, and the number of classes by K. Furthermore,
318
R. Hanafusa et al.
(i) – X = {x(i) }N = (x1 , x2 )T ∈ R2 , i = 1, ..., N , denotes the set of terminal i=1 , x (i) (i) (i) N locations, Y = {y }i=1 , y(i) = (y1 , ..., yD )T ∈ RD , i = 1, ..., N , denotes (i) = 1, ..., the set of sensor values on the terminals, c = {c(i) }N i=1 , where c i = 1, ..., N , denotes the set of classes for the terminals; – X(k) = {x(i) : c(i) = k} is the set of the locations of terminals in class k, (k) (i) yd = {yd : c(i) = k} is the set of the sensor values of the sensors on terminal d in class k, and Nk = |{c(i) : c(i) = k}| is the number of terminals in class k; (i) – Xdvh = {xdvh }N i=1 denotes the set of the terminal locations estimated by the DV-Hop algorithm and σdvh denotes 1/3 × dvh, where dvh is the minimum distance for 1-hop counts estimated by the DV-Hop algorithm; and (k) (k) K (k) K – θ0 = {θ0 }K k=1 , θ1 = {θ1 }k=1 , and θ2 = {θ2 }k=1 are sets of the parameters in the Gaussian process regressors for the classes and α is the parameter in the Chinese restaurant process model while φ is the parameter of the gating function in the infinite mixture model of experts. The last two are the hyperparameters that have respective priors.
3.2
Infinite Mixture of Gaussian Process Experts with Latent Input Variables
In imGPE-LVM, each data is assumed to be generated by a Gaussian process regressor (i.e. an expert), with the number of classes determined by maximizing the posterior using the Chinese restaurant process [11], i.e. prior over partition. We give the formal definition of imGPE-LVM. The joint probability of the latent variables is expressed as follows: p(c, X, θ0 , θ1 , θ2 , α, φ|Y, Xdvh , σdvh ) ∝ p(Y|c, X, θ0 , θ1 , θ2 )p(c|X, α, φ)p(X|Xdvh , σdvh )
(1)
p(θ0 )p(θ1 )p(θ2 )p(α)p(φ), with omission of the parameters for the distribution of hyper-parameters. The first factor of the right side of (1) is the probability of sensor values Y given terminal locations X and classes c and is expressed as p(Y|c, X, θ0 , θ1 , θ2 ) =
K
(k)
(k)
k=1
=
(k)
p(Y(k) |X(k) , θ0 , θ1 , θ2 )
K D
N
(k) yd |0, K(k)
(2)
,
k=1 d=1
where K(k) is the Gram matrix that is composed of the Gaussian kernel values of the terminal locations. This probability represents a model, with the sensor values of the terminals belonging to a class generated by the corresponding Gaussian process. Note that the sensor values are not independent, but conditionally independent given X and c.
Infinite Mixtures of GP Experts with Latent Variables
319
The second factor represents a model of the Chinese restaurant process that depends on the similarity of inputs [13]. That is, the probability that a new input x∗ belongs to class k is expressed as p(c∗ = k|x∗ , X, α, φ) ∗ (i) (i) N I c ,k N i=1 κφ x , x = , N ∗ (i) N +α i=1 κφ x , x
(3)
where I is the function that returns 1 if two arguments are identical and 0 otherwise, and κφ is the Gaussian kernel with variance parameter of φ. Note that the parameter φ is that of the gating function in the mixture model of the Gaussian process experts. This is because the gating function for x∗ is the maximum posterior estimation for it and the posterior probability that two terminal locations belong to the same class depends on the similarity between two locations, that is given by the value of the Gaussian kernel for the locations. (We omit the probability that a new class is produced.) The third factor of that is p(X|Xdvh , σdvh ) =
N
(i) 2 N x(i) |xdvh , σdvh I .
(4)
i=1
This is the prior of terminal locations, that are independent Gaussian distributions. The means of these Gaussian distributions are the terminal locations (i) xdvh estimated by the DV-Hop algorithm, with their standard deviations shared with all terminal locations, and representing one third of the minimum value among the distances of 1-hop counts of the terminals. The introduction of the 1/3 factor leads us to the low probability of locations one hop or more far from those estimated by DV-hop. The other five factors are distributions of hyper-parameters. The parameters θ0 and θ1 , θ2 of the Gaussian process regression obey Gamma distributions that are independent for the classes. The parameters α and φ of the gating function obey the noninformative Gamma distribution and are expressed as K
p(θ0 |aθ0 , bθ0 ) =
(k)
(5)
(k)
(6)
(k)
(7)
G(θ0 |aθ0 , bθ0 ),
k=1 K
p(θ1 |aθ1 , bθ1 ) =
G(θ1 |aθ1 , bθ1 ),
k=1 K
p(θ2 |aθ2 , bθ2 ) =
G(θ2 |aθ2 , bθ2 ),
k=1
p(α|aα , bα ) =
K k=1
G(α|aα , bα ),
(8)
320
R. Hanafusa et al.
p(φ|aφ , bφ ) =
K
G(φ|aφ , bφ ),
(9)
k=1
where G(θ|a, b) is a Gamma distribution. Figure 1 shows the Bayesian network for the probabilistic model assumed in our proposed method. Based on the imGPE-LVM, we construct a probabilistic generative model containing latent variables for terminal locations. This model also includes latent variables for parameters like the kernel parameters and class labels. Given a set of sensor values, the method can estimate the terminal locations by maximizing the posterior probability of all the latent variables. This is because we cannot integral out the latent variables for the parameters or those for the class labels. We use sampling techniques to maximize the posterior, similar to [8,13]. 3.3
Inference
By referring to [13], the posterior in this study is maximized using a Gibbs sampler [1,5] and Hamilton Monte Carlo (HMC) sampling method [1,2]. The parameters of the distributions for the hyper-parameters are determined using the stochastic expectation-maximization (SEM) algorithm[1,7]. The procedure of inference is as follows: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Initialize the set of terminal locations X. Initialize the set of classes c. Run the Gibbs sampler for c. Run the HMC sampler for X. Run the HMC sampler for the kernel parameters θ0 , θ1 , and θ2 of the Gaussian processes for the classes. Run the HMC sampler for the hyper-parameters α and φ of the Chinese restaurant process model. Optimize the parameters aθ0 , aθ1 , and aθ2 for the distributions of the hyperparameters. Repeat from step (3), if the exit condition fails. Select a pseudo-maximum posterior sample.
4 4.1
Evaluation Experiments
In addition to the placement of terminals, several other factors including the environment scale or the radio range, may affect the accuracy of terminal location estimation in a wireless sensor network. During the evaluation of the proposed method, the impact of the differences in the distributions of physical quantities on the accuracies estimated by the imGPE-LVM, was focused on. This is because the imGPE-LVM uses sensor values for physical quantities with various and generally discontinuous distributions.
Infinite Mixtures of GP Experts with Latent Variables
321
Fig. 1. The Bayesian network for the probabilistic model assumed in our proposed method. N : the number of terminals. D: the dimensions of sensor values. K: the number of classes. x(i) = (x1 , x2 )T ∈ R2 , i = 1, ..., N , denotes terminal locations, c = {c(i) }N i=1 , where c(i) = 1, ..., i = 1, ..., N , denotes the set of classes for the terminals. Y(k) = {y(i) : (i) c(i) = k} is the set of the sensor values in class k, xdvh , i = 1, ..., N , denotes the terminal locations estimated by the DV-Hop algorithm and σdvh denotes 1/3 × dvh, where dvh (k) is the minimum distance for 1-hop counts estimated by the DV-Hop algorithm. θ0 , (k) (k) θ1 , and θ2 are the parameters in the Gaussian process regressors for the classes and α is the parameter in the Chinese restaurant process model while φ is the parameter of the gating function in the infinite mixture model of experts. The last two are the hyper-parameters that have respective priors.
In the experiments for the evaluation of the imGPE-LVM, we measured the accuracies of estimated terminal locations estimated for several physical quantities distributions patterns. We fixed the environment scale and the radio range that indirectly affect the estimation accuracy of the imGPE-LVM; they directly affect the accuracy of the locations estimated by the DV-Hop algorithm that we use to obtain a prior. For simplicity, we utilized three types of sensors, a
322
R. Hanafusa et al.
thermometer, an illuminometer, and a microphone, and placed the anchor nodes at locations selected randomly. We evaluated the accuracy for two types of location estimation. The first is the evaluation of the absolute location estimation involving the average Euclidean distances between estimated and true locations. The second is the evaluation of the relative position estimation involving the difference between the true geometrical configurations of the networks and those of the terminals at their locations estimated by the imGPE-LVM [15]. This study also compares the accuracies of terminal locations estimated by the imGPE-LVM with those obtained using the DV-Hop technique with no sensor value. The difference measure is defined by: V [rij ] =
N −1 N 1 2 (1 − rij /r) , N C2 i=1 j=i+1
r=
N −1 N 1 rij , N C2 i=1 j=i+1
rij =
dij , Dij
(10)
(11)
(12)
where dij denotes the Euclid distance between the estimated locations of terminals i and j and Dij denotes the true Euclid distance between terminals i and j. The difference measured V [rij ] is the variance, when we transform the average of the ratios rij to 1, meaning that two networks are more similar in shape if V [rij ] becomes 0. We used artificial data in the experiments, assuming 75 terminals, that we placed randomly in an environment by sampling from the uniform distribution. We prepared five discontinuity patterns for the distribution of illuminance. Figure 3 and 5 show two discontinuity patterns (Pattern 1 and 2) of the illuminance distribution. Figure 2 shows the distribution of sound pressure, while Fig. 4 displays the temperature distribution. We generated 30 sets of data for each illuminance pattern, and for each of the data sets, we estimated the terminal locations using imGPE-LVM, from the Gaussian process latent variable model (GP-LVM), and the DV-Hop algorithm (DV-Hop). The given data are the set of sensor values Y = {y(i) }N i=1 (y involving a three-dimensional (3D) vector composed of the sound pressure, illuminance, (i) and temperature), the means xdvh of the Gaussian priors for the locations of terminals i, i = 1, . . . N , and their standard deviation σdvh . We denote the (i) (i) set of xdvh by Xdvh . That is, Xdvh = {xdvh }N i=1 . We normalized sensor values by dividing by the dynamic ranges of the corresponding sensors. Further details on the experiments are given in Appendix C.1. The parameter settings during estimation of terminal locations by the imGPE-LVM are also described in Appendix C.2.
Infinite Mixtures of GP Experts with Latent Variables
4.2
323
Results
Figure 6 shows the evaluation of the absolute location estimation (unit: m) while Fig. 7 displays those of the relative position estimation. The vertical axis corresponds to the evaluation value and the horizontal axis to the discontinuity pattern of the illuminance distribution. In these figures, the lower values of the evaluation exhibit better performances. The points in the Figures are the average values of the evaluations for the 30 sets of data, while the vertical line segments through the points are their standard deviations. The more complicated pattern of the illuminance distribution results in the lower average accuracy of the terminal location estimation by the GP-LVM for both kinds of evaluations. Contrarily, for all patterns of the illuminance distributions, the accuracies of the estimation by the imGPE-LVM change lesser than those by the GP-LVM. Furthermore, the imGPE-LVM estimates terminal locations more accurately than the GP-LVM and DV-Hop for all patterns of the illuminance distributions. When the true terminal locations are assumed as shown in Fig. 11 for Pattern 1 of the illuminance distribution, the GP-LVM estimates the terminal locations as shown in Fig. 8 while those of imGPE-LVM are displayed in Fig. 9 and 10. In the terminal locations estimated by GP-LVM, a large “groove” (no terminal zone) emerges around the line of discontinuity. The imGPE-LVM splits the terminals at the line of discontinuity and estimates the terminal locations accurately.
5
Discussion
This study proposes a method for terminal locations estimation in a wireless sensor network using multi-dimensional sensor data from sensors equipped in a terminal. If the spatial distributions of physical quantities are continuous, maximizing the posterior probability of latent variables based on the GP-LVM provides estimates of terminal locations accurately. However, the spatial distributions of physical quantities are generally discontinuous and thus, the technique based on the GP-LVM may inaccurately estimate the terminal locations. This is because the GP-LVM considers the continuity of physical quantities implicitly and therefore, it assumes that the sensor values of a terminal resemble those of another terminal in the neighborhood. If the spatial distributions of physical quantities are discontinuous, the sensor values of a terminal may differ from those of another terminal in the neighborhood. The experimental results described in Sect. 4.2 are evidence of the inaccurate location estimation based on the GPLVM for discontinuous distributions of physical quantities (Fig. 8). By extending the GP-LVM, this study presents an imGPE-LVM that enables the division of an environment into areas with continuous physical quantities and the representation of sensor values at unknown terminal locations with the GPLVM by area. Using sensor values for physical quantities, the proposed method estimates terminal locations by maximizing the posterior probability of the latent
324
R. Hanafusa et al.
Fig. 2. The sound pressure distribution in experiment.
Fig. 3. A pattern of the illuminance distribution in experiment. (Pattern 1)
Fig. 4. The distribution of the temperature in experiment.
Fig. 5. Another pattern of the illuminance distribution in experiment. (Pattern 2)
4.00 3.50
Ave. error
3.00 2.50 2.00 1.50 1.00 0.50 0.00 1
2
3
4
5
Discontinuity pattern of the illuminance distribusion. imGPE-LVM
GP-LVM
DV-Hop
Fig. 6. The evaluation of the absolute location estimation.
Ave. normalized var. V[rij]
Infinite Mixtures of GP Experts with Latent Variables
325
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
1
2
3
4
5
Discontinuity pattern of the illuminance distribusion. imGPE-LVM
GP-LVM
DV-Hop
Fig. 7. The evaluation of the relative position estimation.
Fig. 8. An example of the terminal locations estimated by GP-LVM.
Fig. 9. An example of the terminal locations estimated by imGPE-LVM.
variables representing the terminal locations based on the imGPE-LVM with priors for terminal locations. The experimental results in Sect. 4.2 also provides an evidence of accurate location estimation based on the imGPE-LVM in the case of discontinuous distributions of physical quantities (Fig. 9 and 10). Additionally, the imGPE-LVM estimates terminal locations more accurately than DV-Hop. This shows that the imGPE-LVM effectively uses the values of sensors on terminals in addition to the hop counts obtained from anchor nodes. However, the accuracy of the terminal locations estimated by the imGPELVM depends on the distributions of physical quantities. For the uniform distributions of physical quantities, for example, the imGPE-LVM clearly produces an inaccurate estimation of the terminal locations. If the distributions of physical quantities measured by sensors on terminals are close to uniform, additional
326
R. Hanafusa et al.
Fig. 10. An example of the terminal locations estimated by imGPE-LVM. Estimated classes are represented by colors.
Fig. 11. An example of the true terminal locations.
sensors must be installed on the terminal in order to obtain sensor values that obey non-uniform distributions of physical quantities. This study uses terminal locations estimated by the DV-Hop algorithm as priors. The DV-Hop algorithm cannot accurately estimate the terminal locations in environments where anchor nodes cannot directly communicate with the terminals owing to obstacles, and the DV-Hop algorithm provides different hop counts from those in an ideal environment. Thus, in such environments, when the DV-Hop algorithm is used, the number of anchor nodes should be increased to obtain priors, or priors should be constructed using other techniques.
6
Summary
In this study, a method for the estimation of terminal locations in a wireless sensor network was proposed that uses multi-dimensional sensor data from sensors equipped in a terminal. The proposed method uses only three anchor nodes with given locations. By extending the GP-LVM, an imGPE-LVM enabling the division of an environment into areas with continuous physical quantities and representation of sensor values at unknown terminal locations with the GP-LVM by area, was presented. Using sensor values for physical quantities, the proposed method estimates the terminal locations by maximizing the posterior probability of latent variables representing terminal locations based on the imGPE-LVM with priors for terminal locations. As a prior, the model assumes Gaussian distributions for the terminal locations estimated by the distance vector hop (DV-Hop) algorithm permitting easy calculation of terminal locations from the given locations of anchor nodes and hop counts from the anchor nodes.
Infinite Mixtures of GP Experts with Latent Variables
327
The following can be concluded from the evaluation of the proposed method: (1) the complicated pattern of the illuminance distribution results in lower average accuracy of the terminal locations estimation by the GP-LVM, (2) for all the patterns of the illuminance distributions, the estimation accuracies by the imGPE-LVM change lesser than those from the GP-LVM, and (3) the imGPELVM estimates the terminal locations more accurately than the GP-LVM and DV-Hop for all patterns of the illuminance distributions.
Appendix A
Infinite Mixture of Gaussian Process Experts
In the infinite mixture of Gaussian process experts [13], the input space is (probabilistically) divided by a gating function into regions within which specific separate experts make predictions. Using Gaussian process (GP) models as experts, the model has an additional advantage that the computation for each expert is cubic only in the number of data points in its region, rather than in the entire dataset. Each GP-expert learns different characteristics of the function (e.g., lengths scales, noise variances, etc.). Let y ∈ R be an output variable, let x ∈ R be an input variable, and let c be the discrete indicator variable assigning a data point to an expert. The joint probability of the output data y = {y (i) }N i=1 and the corresponding indicators (i) N c = {c(i) }N i=1 given the input data x = {x }i=1 is expressed by p(y, c|x, θ) ⎤ ⎡ = ⎣ p y (i) : c(i) = j | x(i) : c(i) = j , θ(j) ⎦ p(c|x, φ),
(13)
j
where θ = {θ(j) } denotes the set of the parameters in the Gaussian process regressor for each of the experts. In inference, with the posterior of c given, the input data x and the output data y is calculated by Gibbs sampling. Once the expert each data point is assigned to using the maximum posterior estimation for the posterior of c is determined, we obtain the Gaussian process regression by class. For a new input, we output the expectation of the predictions produced by the experts with respect to the posterior distributions of the classes.
B
Distance Vector Hop
The distance vector hop (DV-Hop) algorithm [12,17] estimates terminal locations using the hop count between a terminal and an anchor node with known location in a wireless sensor network. It consists of the following stages:
328
R. Hanafusa et al.
1. Initially, all anchors transmit their locations to the other terminals in the network. The messages are propagated hop by hop where a hop counter is included in the message. Each node maintains an information table on the anchor nodes and counts the minimum number of hops that separates it from each anchor. 2. When an anchor node receives a message from another, it estimates the average distance in terms of hops using the locations of two anchors and the hop counter, and returns it to the network as a correction factor. An anchor estimates the average distance of a hop accompanied by reception of hop counts from anchor nodes of all unknown terminals, this computes the distance to the anchor node based on hop counts and the minimum hop count. 3. The DV-Hop uses a multilateration method to calculate the unknown terminal’s location according to the distance to each anchor node obtained in the stage 2.
C C.1
Details on Experiment Experimental Conditions
1. The scale of the environment is assumed to be 10.00 m. That is, the environment is an area of 10.00 m ×10.00 m. √ 2. The wireless communicable range is assumed to be 10.00 × 2/3 m. This setting enables us to prevent a terminal from communicating with all other terminals directly and to preventing generation of a terminal incapable of communicating with others for a small number of terminals. 3. The three terminals used as anchor nodes are randomly√ selected from the terminals in the environment that are at least 10.00 × 2/2 m away from each other. This constraint on the selection of the three anchors causes the DV-Hop to use the locations of the three and the sensor values on them. 4. We used three kinds of sensors, a microphone, a thermometer, and an illuminometer. Referring to the specification of ADMP504 (microphone), that of LM35D (thermometer), that of PICMD01-1 (illuminometer), we assumed their dynamic ranges are: 20–120 (dB) for the microphone, 0–100 (◦ C) for the thermometer, and 0–1500 (lux) for the illuminometer, with errors of ±1 (dB) for the microphone, ±1 (◦ C) for the thermometer, and ±5 (lux) for the illuminometer. 5. As sensor values, we used the values sampled from the distributions of physical quantities at the true terminal locations added with the Gaussian noises with means of 0 and standard deviations of one third of the sensor errors. C.2
Parameter Settings in Inference in imGPE-LVM
In Sect. 3.3, we described the inference in imGPE-LVM. This appendix describes the parameter settings in the experiment. In sampling, the sample size is 20, 000 and the burn-in size is 2, 000. The initial locations of the terminals x(i) , i = 1, . . ., N , are the locations estimated by
Infinite Mixtures of GP Experts with Latent Variables
329
(i)
DV-Hop, that are denoted by xdvh , i = 1, . . ., N . The initial classes of the experts c(i) , i = 1, . . ., N , are set to 0. Those of the parameters of the Gaussian process (0) (0) (0) regression for class 0 are θ0 = 1.00, θ1 = 1.00, and θ2 = 1.00 × 102 . Those of the parameters, α and φ, of the hyper-parameters for the Chinese restaurant process are α = 1.00 and φ = 2.50 × 10−3 . We set the kernel parameters so that the class 0 is removed during sampling. In sampling by the Hamiltonian Monte Carlo, (1) for the terminal locations x(i) , i = 1, . . ., N , the step size is set to 1.00 × 10−2 and the number of the leap frog steps is set to 10, (2) for the kernel parameters θ0 , θ1 , and θ2 in the Gaussian process regression, the step size is set to 1.00 × 10−2 and the number of the leap frog steps is set to 2, (3) for the hyper-parameters α in the Chinese restaurant process, the step size is set to 1.00 × 10−3 and the number of the leap frog steps is set to 2, and (4) for the hyper-parameters φ in the Chinese restaurant process, the step size is set to 1.00 × 10−2 and the number of the leap frog steps is set to 2. The parameters of the hyper-parameters aα , bα , aφ , and bφ are fixed to aα = 1.00 × 10−2 , bα = 1.00 × 102 , aφ = 1.00 × 10−2 , and bφ = 1.00 × 102 , because the hyper-parameters α and φ obey the noninformative Gamma distribution. The parameters, bθ0 , bθ1 , and bθ2 , of the hyper-parameters are fixed to 1.00. The initial values of the parameters, aθ0 , aθ1 , and aθ2 , the hyper-parameters are set to aθ0 = 1.00 × 101 , aθ1 = 1.00 × 101 , and aθ2 = 1.00.
References 1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006) 2. Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987) 3. Ferris, B., H¨ ahnel, D., Fox, D.: Gaussian processes for signal strength-based location estimation. In: Proceedings of Robotics: Science and Systems (2006) 4. Ferris, B., Fox, D., Lawrence, N.: WiFi-SLAM using Gaussian process latent variable models. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 2480–2485 (2007) 5. Geman, S., Geman, D.: Stochastic relaxation. Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. (6), 721– 741 (1984) 6. Mistry, H.P., Mistry, N.H.: RSSI based localization scheme in wireless sensor networks: a survey. In: Proceedings of the 5th International Conference on Advanced Computing and Communication Technologies, pp. 647–652 (2015) 7. Iwata, T., Yamada, T., Sakurai, Y., Ueda, N.: Sequential modeling of topic dynamics with multiple timescales. ACM Trans. Knowl. Discov. Data 5(4), 19 (2012) 8. Iwata, T., Duvenaud, D., Ghahramani, Z.: Warped mixtures for nonparametric cluster shapes. In: Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, pp. 311–319 (2013) 9. Lawrence, N.D.: Gaussian process latent variable models for visualization of high dimensional data. In: Advances in Neural Information Processing Systems, pp. 329–336 (2004)
330
R. Hanafusa et al.
10. Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005) 11. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012) 12. Niculescu, D., Nath, B.: DV based positioning in ad hoc networks. Telecommun. Syst. 22(1–4), 267–280 (2003) 13. Rasmussen, C.E., Ghahramani, Z.: Infinite mixtures of Gaussian process experts. In: Advances in Neural Information Processing Systems, pp. 881–888 (2002) 14. Schwaighofer, A., Grigoras, M., Tresp, V., Hoffmann, C.: GPPS: a Gaussian process positioning system for cellular networks. In: Advances in Neural Information Processing Systems, pp. 579–586 (2003) 15. Takizawa, Y., Takashima, Y., Adachi, N.: Self-Organizing Localization for wireless sensor networks based on neighbor topology. In: Proceedings of the Seventh International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, pp. 102–108 (2013) 16. Wang, J., Hertzmann, A., Blei, D.M.: Gaussian process dynamical models. In: Advances in Neural Information Processing Systems, pp. 1441–1448 (2006) 17. Zhou, Z., Xiao, M., Liu, L., Chen, Y., Lv, J.: An improved DV-HOP localization algorithm. In: Proceedings of the Second International Symposium on Information Science and Engineering, pp. 598–602 (2009)
Flying Sensor Network Optimization Using Bee Intelligence for Internet of Things Abdu Salam1 , Qaisar Javaid1 , Gohar Ali2 , Fahad Ahmad1 , Masood Ahmad3(B) , and Ishtiaq Wahid3 1
2
Department of Computer Science and Software Engineering, International Islamic University Islamabad, Islamabad, Pakistan [email protected] Department of Information System and Technology, Sur University, Sur, Oman 3 Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan [email protected]
Abstract. Flynig sensor network (FSN) for internet of things (IoT) consist of flying nodes called sensors and ground segments. The flying nodes may be operated manually or it may be automated devices. The flying segment of IoT has different nature compared to ordinary mobile sensor network. The flying speed and diverse directions of nodes make it harder to route the sensor information in a desired way. The data may be collected on the basis of contract opportunities. Here the timely delivery may not be guaranteed. To ensure the desired operation of the FSN, the delivery of data to the base station either deployed in the air or on the ground segment must be ensured in an efficient manner. In this paper, the mating intelligence of bees is used to ensure the delivery of data. The energy consumption is reduced by reducing the amount of control messages and transmitting redundant information. The network lifetime is increased. Simulation is conducted to evaluate the performance of the proposed scheme. The simulation results show that the proposed scheme outperforms existing schemes under consideration. Keywords: Flying sensor networks · Bee intelligence UAV · Routing · WSN · Internet of things
1
· IoT · FSN ·
Introduction
Internet of things [1] is the interconnection of different objects including devices, humans, animals, buildings and other movable objects like vehicles, trains, airplanes etc that are able to communicate with each other over a wireless or wired network and don’t need the assistance of human and computer interaction. These objects are able to share information with each other for a specific purpose. The objects may be deployed on the ground or it may be in the air. The objects may c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 331–339, 2021. https://doi.org/10.1007/978-3-030-55190-2_25
332
A. Salam et al.
play the role of a sensor or a router or both. The air segment may consist of sensor or objects that fly in the air without a pilot. Flying sensor network [2] is the group of nodes flying in the air interconnected with each other and able to communicate information. The sensors may play either the role of a router or a sensor or both. The characteristics of the sensors are different from the sensors in conventional wireless sensor network (WSN). The battery power, communication range and memory of the nodes are limited as in WSN. The flying characteristics of the nodes may consume more energy. There are many issues like architecture, routing that must be addressed for the successful implantation of FSN. The conventional WSN protocols may not be applied to FSN directly. Routing is the process of finding shortest path for data travel from source to the base station. In WSN, most of the protocols assume stationary nodes in the network like LEACH [3]. The mobility issues are addressed as in CBRRR [4]. The nature of mobility is different in FSN. Hence, routing information from source to destination need careful consideration. In this paper, bee mating optimization [5] is used to address the routing issue in FSN. The bees in the network are divided into two types. One is scout bees and other are employed bees. The problem is formulated to dynamic optimization problem in first phase. The remaining energy, node connectivity, speed, communication workload and direction of nodes is considered during shortest path calculation. The bee mating algorithm for routing in FSN is proposed in the second phase. Once the shortest path is calculated via onlooker bees, the information is broadcasted to all its neighbors. The employed bees are now able to transmit the data on the shortest path. The proposed scheme is validated through simulation. The simulation results show that the proposed scheme outperform state of the art routing schemes in FSN. The rest of the paper is organized as follows. In Sect. 2, the recent research on the routing in FSN is studied. The proposed bee mating intelligence for routing in FSN is presented in Sect. 3. The experiment results obtained uring simulation is discussed in Sect. 4. The paper is concluded in Sect. 5.
2
Literature Review
An extended version of OLSR known as predictive-OLSR protocol was proposed in [6]. This scheme predict the quality of links before routing data to the base station. The GPS information will be used to estimate/predict the link quality. The authors claim that the test bed was conducted to validate their proposal. The topology changes are automatically adjusted when routing data from source to destination. Findings: The frequent topology changes in high speed FSN increase the number of control messages and thus network lifetime may be decreased. The QoS may also be compromised due to the large number of control packets. The protocol named TDR-FASNet was proposed in [7] that basically aim differentiating data at each cluster. The QoS requirement is addressed at cluster level. Software defined networking technology is used to control the cluster.
Flying Sensor Network Optimization Using Bee Intelligence for IoT
333
The QoS requirement of delay sensitive data can be guaranteed with traffic differentiated routing. The weights are assigned to each traffic flow based on its importance of quick delivery. The reliability of transmission is also predicted with the help of link quality and its potential of forwarding packets to the destination. Findings: The mobility of nodes is very high in FSN and is ignored in the TDR-FASNet. The topology changes are very frequent in FSN and this issue is not addressed in well manner. The events that need special treatment can be transmitted on quality of service shortest paths. The issue was highlighted in [4]. The packets that don’t afford delay is quickly transmitted on the shortest paths. The normal packets are sent via cluster heads. The cluster heads collect data from all its members and the data aggregation is applied in order to prevent the delivery or transmission of same packets to base station multiple times. Findings: The clusters are formed randomly and in some scenarios, the proposed scheme may perform worst. The malicious nodes may forward their own packets representing their self as real time data. The authors of [8] proposed an algorithm for WSN on the basis of fuzzy logic. The algorithm is designed for balancing/minimizing node coverage and its energy efficiency. The cluster heads are selected based on its energy consumption in the future and their location. In order to prolong the network lifetime, unequal clusters are formed. Findings: the nodes mobility and its future position is not considered and may form unstable clusters which will decrease the lifetime of the network. The node degree is ignored during cluster head selection and their coverage may be calculated accurately.
3
Proposed Bee Optimization Algorithm for Flying Sensor Network
In this article, the flying sensor network is considered a graph G (V, E). Here, the vertices V represent the nodes and the edges E represent the communication links between these nodes. The problem is to find the shortest path from a source node to destination when a node wants to share its data with the base station. The calculation of shortest path from a node to destination may consume a huge amount of energy and time. It’s not possible that every node should calculate the shortest path due to limited resources. The task of finding shortest path is assigned to some nodes in the network. Whenever a node wants to communicate data with the base station, the shortest route information is obtained from these designated nodes. In this research, the bees are divided into two categories. These are onlooker bees and employed bees. Here, the bees represent the nodes in the network. The responsibility of onlooker bees is the calculation of shortest path from its neighbors to the base station. The employed bees obtain the shortest route
334
A. Salam et al.
information from the onlooker bee in its neighborhood and send data to the base station on that path. The nodes/bees having more nectar are considered the onlooker bees. The amount of nectar is calculated based on the remaining energy, node degree and relative mobility and can be represented by: vnectar = Ev + Dv + Sv + Div
(1)
Here Vnectar is the amount of nectar in a node, Ev represent the remaining energy of a node, Dv is the number of nodes in its neighborhood, Sv is the speed of a node and Div is the direction of node. The onlooker bees must be selected from each part of the network. The Euclidean distance of nodes is taken into consideration while electing the onlooker bees. The set of onlooker bees selected in this way have approximately the same distance from each other. Particularly the problem is stated by the equation [4]: M inimizeF (weight, c) =
n k . weightij (vnectar − cj )2 i=1
subject to
k
(2)
j=1
weightij = 1, i = 1, . . . , n
j=1
weightij = 0 or 1i = 1, . . . , nj = 1, . . . , k In the above equation vnectar is the total number of nodes in FSN, k is the total no of onlooker bees (unknown or known), Ni v, (i = 1, . . . , v) represent the location of node i in FSN currently, Cj k, (j = 1, . . . , k) represent an average value used to check the fitness of a node for the role of onlooker bee. The following equation can be to calculate the Cj : cj =
k 1 weightij xi Nj j=1
(3)
where the number of neighbors of the jth onlooker bee is represented by Nj , V nectari j is the relationship weight of node vi with onlooker beej, if the node i is neighbor of a beej, the value of V nectarij will be 1 or it would be 0 otherwise. Once a set of onlooker bees are selected, the onlooker bees are now responsible to calculate the shortest paths for all its neighbors or employed bees. The paths are updated after some interval. These calculated paths are broadcasted to their neighbors due to the frequent topology changes in the network. The working mechanism of the onlooker bees selection is presented in the form of an algorithm which will guide the readers in easy way.
Flying Sensor Network Optimization Using Bee Intelligence for IoT
335
Algorithm 3.01 Pseudo code of BeeFSN 1. Variable initialization 2. do 3. Selection of a set of onlooker bees having higher weights among all nodes using equation 1. (OLB) 4. Checking the fitness of OLB using equation 2. 5. While( bee != nulll) 6. do 1. 2. 3. 4.
Selection of new bees Nectar computation Select bees in a greedy fashion end while
While (i!=max) The above algorithm operate in the following way. A set of onlooker bees/nodes (OLB) are selected from all the bees/nodes (n) in FSN. The nodes are selected for the role onlooker bees on the basis of their weights calculated using Eq. 1 in the initial phase. The fitness of selected onlooker bees is evaluated using Eq. 2. The nodes are assigned the role of onlooker bees for some interval if the test using Eq. 2 is satisfied. On the other hand, another set of onlooker bees selected randomly is evaluated and this process continue until an optimal set of onlooker bees is searched out. In order to equally distribute to load in the network, the onlooker bees are relieved after some interval and another set of bees are selected from employed bees to perform the role of onlooker bees.
4
Experiment Evaluation
In this research, the performance of the proposed scheme is validated through a series simulation. The simulation experiments are carried out in EstiNet 9.0. The size of FSN varies in different experiments. The performance is also tested for different mobility models. The network setting are kept constant for evaluating the performance of different protocols/algorithms. Line charts are used to present the performance of all schemes studied. The traffic condition, environment and mobility models are same during the execution of all algorithms/protocols. The results of simulation experiments are used to evaluate the stability and efficiency of FSN. The simulator has the capacity to scale FSN upto 200 nodes. Overflow error is generated when the no of nodes in FSN is more than 200. Therefore, the number of nodes for different experiments is 50 and 100. The nodes are 100 m away from each other after initial deployment. The size of packets is set to 1200 bytes. The simulation area is set to 1000 m × 1000 m. The maximum simulation time is set to 500 s maximum.
336
A. Salam et al.
Fig. 1. Throughput of 100 nodes network at 50 m/s mobility
4.1
Performance Metrics
In this subsection, the performance of beeFSN, P-OLSR [5], TDR-FASNet [7] and EAUCD [3] is evaluated throughput and end to end delay. The nodes in FSN are moving with RWP mobility model. The speed of nodes is random. The network size is taken as a parameter. In another setting, 200 nodes are deployed and the results are obtained when the nodes are moving at 10 km/h and the mobility model adopted is group mobility. The number of packets transmitted from one location to another location in a specific time interval is known as throughput. The end to end delay is the time a node transmit a packet until the destination node receive that packet. 4.2
Simulation Results
In this subsection, the throughput of beeFSN is compared with state of the art P-OLSR, TDR-FASNet and EAUCD. The results are presented for different size network. The mobility speed is also variable for different tests and the mobility model is either group mobility or random way point. Likewise, the average end to end delay for the protocols under consideration is presented in the form of graph. In Fig. 1, the node moving on the speed of 50 m/s is evaluated. The overall result of BeeFSN with vehicle motion while increasing the number of nodes is better. The P-OLSR also gives satisfactory performance. The above line on
Flying Sensor Network Optimization Using Bee Intelligence for IoT
337
the graph shows that packets to destination are processed more efficiently with BeeFSN protocol. The approach inspired from honey bees for routing results high throughput due to their efficient path calculation mechanism. Figure 2 shows the performance of BeeFSN compared to other protocols when the size of the network is 100 nodes and moving at 50 m/s. In this figure, the performance of all schemes is same for the first 20 s of the simulation. After 20 s, the BeeFSN gives max throughput for the entire life of the simulation. In first few seconds the performance is not satisfactory. Initially the BeeFSN produce initial population and when the time goes on the performance of BeeFSN increases due to the efficient nature of bees.
Fig. 2. Throughput of 100 nodes network at 100 m/s mobility
Figure 3 shows the end to end delay of network with respect to network size. BeeFSN has good performance results with large scale networks. The graph shows that EAUCD have very poor performance when the network size increases. The P-OLSR, and Bee-adhoc also have poor performance as compared to BeeFSN. This shows that BeeFSN is very good solution for routing packets in an efficient manner.
338
A. Salam et al.
Fig. 3. End to end delay of with respect to network size
End to end delay of BeeFSN, P-OLSR, DTR-FASNet and EAUCD is depicted in Fig. 4 for 100 nodes network. Figure 4 shows the end to end delay of 100 nodes network at different speed. BeeFSN has low delay when the size of the network becomes large. When the size of the network is small BeeFSN gives balanced average end to end delay.
Fig. 4. End to end delay of 100 nodes network
Flying Sensor Network Optimization Using Bee Intelligence for IoT
5
339
Conclusion
This research addresses the routing issue in flying sensor network. The scalability of the network and mobility may increase the control packets for large scale network. To increase the lifetime of FSN, the bee mating optimization is utilized to choose the most appropriate nodes for the role of shortest path calculation. The number of control packets in the network is decreased as the shortest path information is obtained from the onlooker bees. The employed bees transmit the data to base station on the designated paths. The proposed scheme is validated on the basis of simulation. The simulation results show that the proposed BeeFSN outperforms existing state of the art routing schemes. In future, the work may be extended to include the ground segments as well as the air segments of IoT. The performance will be checked in the presence of stationary as well as highly mobile sensor nodes.
References 1. https://www.i-scoop.eu/internet-of-things-guide/iot-trends-2017/ 2. Sang-JoYoo, J.P., Kim, S., Shrestha, A.: Flying path optimization in UAV-assisted IoT sensor networks. ICT Exp. 2(3), 140–144 (2016) 3. Ahmad, M., Habib, M., et al.: Energy aware uniform cluster head distribution technique in wireless sensor networks. IJCSNS Int. J. Comput. Sci. Netw. Secur. 10(10), 97–101 (2010) 4. Ahmad, M., Shafi, I., Ikram, A.A.: Cluster based randomized re-routing for special events in mobile wireless sensor networks. Arch. Des. Sci. 65(7) (2012) 5. Ahmad, M., Ikram, A.A., Wahid, I., et al.: Honey bee algorithm based clustering in MANETs. Int. J. Distrib. Sensor Netw. 13(6) (2017) 6. Rosati, S., Kruzelecki, K., Heitz, G., Dario, F., Rimoldi, B.: Dynamic routing for flying ad hoc networks. IEEE Trans. Veh. Technol. 65(3), 1690–1700 (2016) 7. Qi, W., Kong, X., Guo, L.: A traffic differentiated routing algorithm in FASNet with SDN cluster controller. J. Franklin Inst, December 2017, in press 8. Mazumdar, N., Om, H.: Distributed fuzzy logic based energy-aware and coverage preserving unequal clustering algorithm for wireless sensor networks. Int. J. Commun Syst 30(13), e3283 (2017)
A Quantum Model for Decision Support in a Sensor Network Shahram Payandeh(B) Network Robotics and Sensing Laboratory, School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada [email protected]
Abstract. This paper presents a preliminary overview of a model which can be used as a part of a decision support system when fusing data from multiple sensing environment. Combining information from various sensing modalities has been a subject of research for the past many years. A number of methodologies have been designed and developed for handling and integrating uncertainties when deciding on how to combine the sensed information from various sources. In general, determining a suitable framework remains an open problem. For example, a method based on Dempster-Shafer evidence theory has been used as an approach for information fusion in a given decision making environment. Data fusion from a sensor network plays an increasingly significant role during people tracking and activity recognition. This paper explores a quantum model as a part of a decisionmaking process in the context of multi-sensor information fusion. The paper shows how the framework of a quantum model can be used to handle uncertainties from a sensor as a part of the network having common source can be mapped and viewed with respect to other sensors. The paper presents basic definitions and relationships associating with the decision-making process and quantum model formulation in the presence of uncertainties. Keywords: Sensor network · Sensor space · Uncertainties · Hilbert space · Quantum model · Decision support
1 Introduction Integration of a sensor network consists of handling two levels of fusions, namely data and information [1]. Processing of the raw data from sensors can be considered as the lowest level or a data fusion level. Information fusion can be considered as an abstraction of the processed data. Processing of raw sensory data can further be classified as: a) data association, b) state estimation and c) decision fusion. The data association is the process of referencing a set of weighted measurements that correspond to the observation of an event, e.g. target tracking [2]. Examples of various techniques for data association can be Nearest Neighbors, K-Means and probabilistic approaches. State estimation can be considered as a part of raw data processing when a set of measurement/observation does not correspond to the actual description of the state and needs to be estimated (e.g. estimating position and/or velocity). Examples can be Kalman filter and particle filter. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 340–352, 2021. https://doi.org/10.1007/978-3-030-55190-2_26
A Quantum Model for Decision Support in a Sensor Network
341
Decision fusion is the notion for encapsulating and inferring about all the information which are collected from the previous levels and combining them with the propagated uncertainties in order to define a weighted action/decision list. Information fusion from multiple sensors has been a subject of research and development for a few decades. In recent years, detection of movements of objects and their classification has gained popularity in the field of autonomous cars and surveillance for recognition of activities. For example, [3, 4] use multiple sensors with different modality to detect/classify objects surrounding a vehicle. General approaches have utilized a linear estimator combined with a classical optimization method. Data fusion from multiple cameras offers a rich sensing modality that can be used for both global tracking of objects and their classification. In some cases, the sensed information obtained through cameras is combined with other 1D or 2D detection methods to further enhance and complement the image processing and classification. For example, [5] proposed several object tracking schemes for complex surgical scenes. It explores state estimation schemes for both Gaussian (e.g. Kalman filter) and non-Gaussian (e.g. particle filter) frameworks. Similar frameworks have been followed for tracking people across a camera network [6]. The Bayesian framework has also been explored for the development of efficient tracking schemes across multi-model sensory inputs [7, 8]. In general, a decision on how to track objects is computed based on how targets have arrived at the current instance based on the observations obtained through multiple sources. Decision making approaches are typically based on some hierarchical inferences about the events and activities produced from detected objects. For example, the Bayesian method offers an approach for combining conditional estimation for the presence of the tracked object based on the probability of the presence of the event and the likelihood of its observation. The framework can be casted in a sequential scheme where the degree of belief in a hypothesis can be updated based on the arrival of new information [8]. Another decision-making approach is based on Dempster-Shafer (DS) evidence theory, which is the generalization of the Bayesian theory [9, 10]. It provides a formalism that could be used to represent incomplete knowledge, updating beliefs, and a combination of evidence for representing uncertainty explicitly. Dempster-Shafer evidence theory is widely used in various applications of data fusion in a sensor network, e.g. [11]. In the following, we present an overview of DS [9, 10] and its interpretation in the context of multiple sensor fusion in order to give the reader a sense of comparison with the proposed quantum model approach. In Dempster-Shafer (DS) theory each fact has a degree of support, between 0 and 1 (i.e. 0 no support for the fact and 1 full support for the fact). Let a set of possible conclusions be a mutually exclusive and exhaustive set of all possible outcomes given as: = (ψ1 , ψ2 , · · · , ψn ) where at least one of the ψi must be true. DS theory is concerned with pieces of evidence which support subsets of outcomes in Ψ which can be represented in its Power set (or frame of discernment). For example, for three membership set, we can write: (∅, ψ1 , ψ2 , ψ3 , (ψ1 , ψ2 ), (ψ1 , ψ3 ), (ψ2 , ψ3 ), (ψ1 , ψ2 , ψ3 )). ∅, the empty set, has a probability of 0, and each of the other elements in the power set has a probability between 0 and 1. Another definition which was used in DS is the notion of the mass function A, or m(A) (where A is a member of the power set). Mass function is equal
342
S. Payandeh
to portion of all evidence that supports the element A of the power set. Each m(A) is between 0 and 1, where sum of all m(A) is 1. In order to illustrate the basic idea of the DS theory, let us suppose that we have three sensors (s1 , s2 , s3 ) in a room and where a single person is present. Let us assume that there is only one possible sensor that can detect the event (i.e. presence and the movement of the person). We can write the list of possible outcomes as: = (s1 , s2 , s3 ) and its power set can be written as (∅, {s1 }, {s2 }, {s3 }, {s1 , s2 }, {s1 , s3 }, {s2 , s3 }, {s1 , s2 , s3 }). The probability of {s1 , s2 , s3 } is 1.0 since one of the measurements must be true. Let us now assume that after observation (sensing), the following mass probabilities are assigned to various elements of the power set A: ({∅; 0)}, {s1 ; 0.2}, {s2 ; 0.1}, {s3 ; 0.2}, {(s1 , s2 ); 0.1}, {s1 , s3 ; 0.2}, {(s2 , s3 ); 0.1}, {(s1 , s2 , s3 ); 0.1}). In DS theory, the belief β in one of the members of the power set A is defined as the sum of the masses of elements which are subsets of A that also includes A itself. It represents the evidence we have for A directly. For example, β({s1 }) = m({s1 }) = 0.2, and β({s1 , s2 }) = m({s1 }) + m({s2 }) + m({s1 , s2 }) = 0.2 + 0.1 + 0.1 = 0.4. The Plausibility of A, π(A), is defined as the sum of all the masses of the sets that intersect with the set A. For the above example we can write: π({s1 , s2 }) = m(s1 ) + m(s2 ) + m(s1 , s2 ) + m(s1 , s3 ) + m(s2 , s3 ) + m(s1 , s2 , s3 ) = 0.8. On the other hand, the certainty which we can assign to a given subset of A is defined with what is referred to as the belief interval, i.e. [β(A) π(A)]. In our example, the belief interval of {s1 , s2 } is: [0.4 0.8]. In DS theory, the probability of A, p(A), falls somewhere between β(A) and π(A). So p(A) cannot be less than the belief value and can have the maximum value equal to the plausibility. As it can be seen from the above, a small difference between belief and plausibility shows that we are certain about our beliefs. A significant difference shows that we are uncertain about our belief. To further illustrate the DS theory, another overview example is presented in the Appendix [21]. In general, the question of how to fuse sensory information remains an open challenge [12, 13]. Part of this challenge stems from practical situations where interferences exist between various sensing modalities. As a result, the information which is interpreted from the sensed data includes the effects of various external disturbances with sometimes a wide range of uncertainties. Conflict resolution is one of the main challenges for data fusion in a sensor network [14, 15]. To address these challenges, many approaches have been proposed. This paper explores an approach based on a quantum model (inspired by [16–18]), which can be used as a part of the decision support system in a sensor information network. In general, quantum theory can offer an alternative modelling approach when dealing with decision making across various sources of measurements from a shared and common environment [20]. The paper is organized as follows: Sect. 2 presents some background material. Section 3 presents an example of multiple sensor analysis in the context of the proposed framework. Section 4 illustrates how Feynman rules for path diagram can be used as a part of the proposed analysis. Finally, Sect. 5 presents some concluding remarks.
A Quantum Model for Decision Support in a Sensor Network
343
2 Background Material More than nine decades ago, two different sets of axioms were proposed in the area of probability theory. One formulation is based on Kolmogorov axioms (Kolmogorov, 1933/1950) and the other is based on Von Neumann axioms (Von Neumann, 1932/1955). Former organized the principles underlying the classical probabilistic applications and the other was based on the probabilistic interpretation of laws underlaying quantum mechanics. At the conceptual level, a key difference is that the classical probability theory relies on a set-theoretic representation, whereas quantum theory used a vector space representation. The objective of this paper is to present an overview of how uncertainties in the sensed information from multiple sources can be casted in the context of quantum theory model. This association can further allow the development of a formal framework for fusing the sensed information as a part of the decision-making process. Here, an illustrative example is used for a better conceptualization of the proposed framework. Suppose we have a sensor that has a view of a common monitoring area. The sensor measures the possible deviation of the tracked object along a common coordinate system and decision must be made on such possible deviation (these measured coordinates can be generalized into a multi-dimensional feature space associated with the object and the tracked information). Both classical and quantum models cast the problem in assigning probability values to events. For example, what is the probability of the sensing an event such as movement of an object along a single or multiple axes. A single event (or elementary event) represents one unique outcome that can be observed by a sensor. In classical framework, rules for combining events obey the logic of set theory, such as the distributive axiom. In quantum model of decision making, events are not modelled as sets but rather as subspaces. In Quantum model, a set of basis vectors |ei are defined in Hilbert space. These basis vectors are associated with elementary outcomes (subspace) such as elementary events or observations. For example, |e1 can correspond to a measure associated with the change in movements of the tracked object to the left or to the right. The basis vectors corresponding to elementary outcomes are orthogonal to each other which captures mutually exclusiveness of events. One of the main distinctions between the classical and quantum model is that the subspaces do not follow the distributive axiom of set theory. In quantum model, the notion of state, e.g. |x, is a representation of a unit-length vector in the n-dimensional vector space (e.g. where n can be n-dimensional feature space [21]). The state vector can then be mapped into event probabilities in measurement space. In the quantum model, this mapping is an indirect one. That is to say, the state is projected onto the subspace corresponding to an event (e.g. measure or feature) where the squared length of this projection would be equal to the event probability. Description of the state of the tracked object in its measured or computed feature space can be written as [16, 17]: |x = ω1 |e1 + ω2 |e2 + · · · + ωn |en ,
(1)
where ωi is referred to as the probability amplitude associated with the subspace |ei . The description of Eq. (1) is also referred to as the superposition of the state (Fig. 1).
344
S. Payandeh
Fig. 1. Example description of the state |x with respect to measurement ray coordinates (subspaces) |e1 , |e2 and |e3 .
Given a measurement or computed feature vector |ei , there exist a projection operator πe that projects the state |x onto the corresponding subspace for |ei . The probability of the event |x along this ray is equal to the squared length of this projection: 2 p(ei ) = πei |x , or p(ei ) = |ei | x |2 .
(2)
As an illustrative example, let the projection of the state vector along |ei for i = 1, 2, 3 to be 0.696, 0.696 and 0.174 respectively, |x = (0.696) · |e1 + (0.696) · |e2 + (0.174) · |e3 .
(3)
In the above equation, it can be seen that the state |x is projected onto the subspace corresponding to an event. The squared length of this projection equals to the event probability. For example, the probability of measurement along e1 is p(e1 ) = |0.696|2 = 0.4844. Given the description of the subspaces: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 |e1 → ⎣ 0 ⎦, |e2 → ⎣ 1 ⎦, |e3 → ⎣ 0 ⎦, 0 0 1 the projection of the state to each of the subspaces can be obtained through inner product operation. For example, ⎡
⎤ 0.696 e1 | x = 1 0 0 · ⎣ 0.696 ⎦ = 0.696. 0.174
This inner product, e.g.e1 | x, also refers to as the transition amplitude from state |x to state |e1 . This can also be interpreted as the amplitude for a transitioning from the initial state to the state in which |e1 is measured. The probability of individual measure
A Quantum Model for Decision Support in a Sensor Network
345
along each of the subspaces can now be defined as the projection of the state vector onto the ray representing that event and then squaring the magnitude of this projection. For example, the probability of measuring along |e1 is obtained by projecting the state vector |x onto the ray corresponding to |e1 and squaring the length of this projection. |e1 e1 | x = 0.696 · |e1 0.696 · |e1 2 = |0.696|2 .|e1 2 = (0.4848). Important analogy has been noticed in the literature relating the classical and quantum models of decision making, e.g. see [16, 17, 19]. Essentially, both classical and quantum models provide a rule for updating the state based on observations. For example, estimating the next position of the tracked object based on the current measures of its motion. Suppose a sensor measures or a feature is observed or concluded to be true, and we now want to determine the probability of the next event after this observation. Classical theory uses the notion of conditional probability to change the original probability associated with the movement of the tracked object. This is done by obtaining the joint probability of the movement of the tracked object and its associated observed measure and dividing it by the probability of the measure. In quantum model, the current description of the state is changed through its observation. This is done by projecting the state to the corresponding measurement subspace and then dividing this projection by its associated length. For example, if |e1 is chosen as the subspace of measurement (or one of the feature definitions), then the state is changed from |x → |e1 . The next step is to normalize this projection in order to guarantee the revised state has a length of one. This is called the collapse of the state vector onto the subspace corresponding to the observed event [18].
3 Sensor Network Model The framework for multi-sensor probability assignment is casted here in the context of the incompatibility of the measures. Suppose that we have two sensors that are used to measure and compute common information about an object. The question is how to predict and assign a probability of the measure or computed information from one sensor to the other. For example, one sensor assigns a high probability of the change in measure of motion along an axis, and we want to assign a probability of this probability of measure of motion with respect to the other sensor. Here we are assigning probabilities to ordered sequences of measures between the sensors. This model of uncertainties in the incomparability of the measurements between the sensors are represented in a form of a unitary transformation. Using the above illustrative example, suppose e1# , e2# , e3# are the basis for measurements in measure/feature space associated with the second sensor. The state of the |x can tracked in either of the basis vectors of the two spaces, namely
be
defined object e# , e# , e# . In this setting, the transition between basis state e , e , e or 1 2 3 1 2 3 #
e equals to the |e follows the principle that the probability of transition from → 1 1 #
probability of the transition in the opposite direction e1 → |e1 . In other words,
346
S. Payandeh
#
e | e1 2 = e1 | e# 2 which states the probability of transitioning between the basis 1 1 is equal to squaring the magnitude of their inner product. This magnitude is the same regardless of the direction. In quantum theory, this is what has been referred to as the law of reciprocity. 3.1 Order Effects The order that sensed information from one sensor is related, while taking into consideration the sensed information obtained from the other sensor can be interchanged. To illustrate this, let us assume that we have two sensors, and both are observing a common event. Let us compute the probability when relating information from one sensor, e.g.
sensor 2, when it is measuring event e3# of object state to the sensed information from the other sensor, e.g. of sensor 1, when it is measuring the event |e1 . The probability of the path diagram for this order can be defined as: (which is the probability for the path) |x → e3# → |e1 . (4) The probability of measurement of sensor 2 is obtained by projecting the initial state of the object, i.e. |x, onto a ray span by e3# and # squaring its length. This projection # #
equals to e3 |x . e3 . An inner product between e3 and |x is computed as: ⎡ ⎤ 0.696
e3# |x = 0.25 −0.43 0.87 .⎣ 0.696 ⎦ = 0.0261, 0.174
2 which results in probability of this first event to be e3# |x = 0.00068. Given this intermediate state which is conditioned on the initial observation, it now needs # to be e onto revised based on the observation of sensor 1. This is obtained by projecting 3
# |e1 and squaring its length. This projection is e1 |e3 .e1 . An inner product between
|e1 and e3# is computed as:
e1 |e3#
⎡ ⎤ 0.25 = 1 0 0 .⎣ −0.43 ⎦ = 0.25. 0.87
2 The probability is equals to e1 |e3# = 0.0625 which results is probability |x → #
e → |e1 = e# |x 2 e1 |e# 2 . 3 3 3 In order to further illustrate the notion of order effects, we can follow a similar approach for computing the probability of the following reversed order. In the above example, this would be to determine the probability that the sensor 1 measures the state of the tracked object along the ray represented by |e1 and is then predicts the
measurement of the sensor 2 is along ray e3# which can result in the final probability value or what is referred to as the probability of the reverse path, i.e. |x → |e1 → e3# . (5)
A Quantum Model for Decision Support in a Sensor Network
347
Comparing with the previous definition of the path, the resulted probability is obtained by first event (a transition from |x to |e1 ) times the probability of the second
2 event given the first (a transition from |e1 to e3# ), which is given by |e1 |x|2 e3# | e1 . In the
the inner e1 |x = 0.696 gives the probability of 0.484 and the inner prod above, uct e3# | e1 = 0.25 results in the probability of 0.0625. The probability of the path of Eq. (5) is 0.030 which is much greater that the path defined in Eq. (4). 3.2 Unitary Transformation
The transition from the states e1# , e2# , e3# to e1 , e2 , e3 is possible, for example, by forming a 3 × 3 matrix:
⎡ # # # ⎤ e1 |e1# e1 |e2# e1 |e3#
(6) Ue# e = ⎣ e2 |e1 e2 |e2 e2 |e3 ⎦,
e3 |e1# e3 |e2# e3 |e3# where for our illustrative example, in can be written as: ⎡ ⎤ 0.87 −0.43 0.25 Ue# e = ⎣ 0.5 0.75 −0.43 ⎦. 0.0 0.5 0.87 In the above matrix, each of the entries can be interpreted as the amplitude for transitioning from the column state to row state. For example, the entry of row 3 column 2 is the amplitude for transiting from e2# to |e3 . The transition in the reverse direction is possible by forming the Hermitian transpose of Eq. (6) to form Ue†# e = Uee# where
Ue†# e · Ue# e = I (I is an identity matrix). The square of the magnitude of an entry in the above unitary transition matrix Ue# e , is defined as the probability for transition to that entry. The squared magnitudes within the column must sum to unity. A transition must occur to one of the row states from a given column. As an observation, Markov model also requires the use of transition matrices with magnitude of columns sums to unity. However, the law of reciprocity requires that the squared magnitudes of the rows must also add up to one. This restriction associated with the quantum model is referred to as double stochasticity [16].
4 Feynman’s Path Diagram Quantum probability representation of order effects can further be described through Feynman’s path diagram and the associated rules for combining the probability values [16, 18]. The path diagram allows a visualization approach to describe the flow of quantum probability and its assignment starting from an initial state and transitioning through multiple intermediate states until arriving at the final state. The path diagram definition is somewhat analogous to the Markov model representation. In Markov model, the transition to any intermediate state is represented by the conditional probability. For example the transition from the state |e1 to an end state |e2 is described by the
348
S. Payandeh
conditional probability p(e2 |e1 ). In quantum model, this probability is equal to the square of the inner product between the states, p(e2 |e1 ) = |e2 |e1 |2 . For a single path andstarting from a state |x and making a transition to an interme
diate state |e1 and then e2# and then ending-up at |e3 , the single path diagram can be
written as |x → |e1 → e2# → |e3 . From Feynman’s definition of the first rule, it can be stated that the amplitude for this single path is the product of the amplitudes for each of the transitions along the path. The second rule according to Feynman is that the amplitude for the transition from a beginning state |x to a final state |e3 passing through multiple intermediate states by forming different paths equals to the sum of the path amplitudes. In the literature, this is referred to as the law of total amplitude (also analogy has been made between to the law of total probability for the Markov model). In Feynman rule, the notion of probability is replaced by amplitude. Here, the magnitude of the amplitudes must be squared in order to determine the probability. We now consider the third rule which suggests a computation algorithm of a system starting for example from an initial state |x and terminating at some final state. However, we would like to determine the probability at an intermediate state. For this case, let us assume that we are able to measure all
of the intermediate states associated with one or the second sensor, e.g. states e1# , e2# , e3# which are along the path of measurement from |x to |e3 . Let us say that the tracked object is at a given state and we ask the question: what is the probability of measurement/computation of |e3 in sensor 1, given a probability measurement/computation of sensor 2. To resolve this path, we can first compute the probability from the initial state at |x and ending at each one of the ofstarting
intermediate states e1# , e2# , e3# that we measure. Thenwe compute the probability # # # of starting from each one of the intermediate states e1 , e2 , e3 and ending at the final state |e3 . We then compute the total probability along each of the single paths by multiplying two previously computed probabilities for each intermediate state. The final probability is the summation of all the resultant probabilities along each of the paths going through the intermediate states. To illustrate the above rules, let us consider the probability of a single path going through sequences of intermediate measures. What is the probability of measurement of
sensor 2 of the state of the tracked object |x along ray e2# and the sensor 1 measuring
along |e3 . Or, what is the probability of the path |x → e2# → |e3 . According to
Feynman’s first rule, we compute the product of the amplitudes e3 |e2# e2# |x and the total
2 probability is equal to the squared magnitude, or e3 |e2# e2# |x = |(0.5)(0.309)|2 = 0.023. For this single path analysis, and given the state of the object |x we can also compute the probability of sensor1 measuring along ray |e3 considering the probability
e# . That is the probability of the path |x → |e3 → of sensor 2 measuring along ray 2 #
e . Following similar computation, we can obtain the probability of the path to be 2 #
e |e3 e |x 2 = |(0.5)(0.174)|2 = 0.007. 3 2 Let us now analyze the multi-path setup of this illustrative example (Fig. 2). Two different probability conditions are being considered in this case. The first condition is when the probability of the measure for the sensor 1 is considered without considering the probability ranking of sensor 2 and the second condition is when probability of
A Quantum Model for Decision Support in a Sensor Network
349
sensor 1 considers the probability ranking of sensor 2. For this example, we examine the probability of measure for sensor 1 along the ray |e3 as the termination state.
Fig. 2. Path diagram starting with state x mapped to probabilistic measurements of sensor 2 which is then mapped to final probabilistic measurements of sensor 1.
For the first condition, from |x and ending # we have three independent paths
starting
e → |e3 with amplitude e3 |e# e# |x = 0, a second is |x → . |x |e One is → at 3 1 1 1
#
e → |e3 with the amplitude e3 |e# e# |x = 0.154 and a third is |x → e# → |e3 2 2 2 3
with the amplitude e3 |e3# e3# |x = 0.026. From second rule of Feynman, the probability amplitude for starting at |x and transitioning through all the paths going through the intermediate states and finally ending up |e3 equals the sum of the amplitudes:
e3 |x = e3 |e1# e1# |x + e3 |e2# e2# |x + e3 |e3# e3# |x The probability of the path equals to the squared magnitude of this sum: |e3 |x|2 = |0 + 0.154 + 0.026|2 = 0.032. For the second condition, the probability of measurement of sensor 1 is computed by including the probability of measure of sensor 2. According to Feynman’s third rule,
the probability of starting at |x and ending at |e3 after first resolving either e1# , e2# ,
or e3# equals the sum of three path probabilities. First path:
2 # # # |x → e1 → |e3 path e3 |e1 e1 |x = 0 The next path:
2 |x → e2# → |e3 path e3 |e2# e2# |x = 0.024
And,
2 # # # |x → e3 → |e3 path e3 |e3 e3 |x = 0.0005
350
S. Payandeh
Which results in the total probability of:
2
2
2 e3 |e1# e1# |x + e3 |e2# e2# |x + e3 |e3# e3# |x = 0 + 0.024 + 0.0005 = 0.029 For this illustrative example and comparing the total probability for condition 1 is greater than the total probability for condition 2.
5 Conclusions The analytical foundation of quantum theory has been gaining widespread attention in a variety of applications. The fusion of information obtained from data through various measures associated with a variety of sensing modalities has been a challenging area of research. The decision-making process, when facing with an unreliable noisy measure of information, also introduces an additional level of complexity. Inspired by other related works, we present some of the elementary observations of how the quantum model can be utilized in managing uncertainties in a sensor network. The framework offers an approach in the decision-making process when the probability of measurement (or information) can be mapped between the sensors. The unitary matrix, which model the double stochasticity between the sensors, is fundamental in such representation. Further interpretation and investigation into the properties of such matrix in the context of the sensor network are subject of future research.
Appendix This appendix summarizes another overview example of the Dempster-Shafer theory of evidence based on the view given by L. Zadeh [22]. Let us consider a case where there is a monitoring room (Rm1) with five sensors which have independent measurements of a common event. Let the set Rm1{1(8), 2(7), 3(10), 4(12), 5(15)} be a sensed representation of this room with the associated sensors. Now one can ask what fraction of sensed information are between 9 and 13 units of measure, inclusively? Or what fraction of sensed information, i.e. sensor(i) ∈ Q, i = 1, · · · , 5 is in the set or the question Q = [9, 13]. By counting those i’s which satisfy the condition, the answer would be 2/5. Let us now consider similar example except that the sensed information of the i-th sensor is not known with high certainty. For example, the sensed information of sensor(1) can be in the interval [6–10]. As a result, the new set can be defined as: Rm2 = {1[6–10], 2[5–13], 3[7–13], 4[14–16], 5[15–17]}. In the case of sensor(1), the defined interval [6-10] means that the sensed information of sensor(1) is known to be an element of the set {6, 7, 8, 9, 10}, i.e. set of possible variables. In this case, the data associated with the sensor are the possibility of distributions of the values of the sensor. Similar question can now be asked in regard to the possibility of distributions. For example, the question Q is now: which sensor can satisfy the distribution Q = [9, 13]? We can see that it is possible that sensor(1) ∈ Q, in the case of 4, it is not possible that sensor(4) ∈ Q; and in the case of 3, it is certain (or necessary) that sensor(3) ∈ Q. The original question can be stated as what fraction of sensors measurements are between 9 and 13, inclusively? The answer (Ans) to this question (Q) is now casted into
A Quantum Model for Decision Support in a Sensor Network
351
two parts which is according to DS theory, one is relating to the certainty (or necessity) (N) and the other related to possibility (); i.e. Ans(Q) = (N (Q); (Q)). Referring to the example of sensors tracking object in the room, for the set defined by Rm2, we can write: Ans[9, 13] = (N ([9, 13]) = 2/5; ([9, 13]) = 3/5), A two-part answer of this form which is in terms of certainty α and possibility β, is characteristic of the answer based on incomplete information. The first part in is what is referred to as the measure of belief in Dempster-Shafer theory, and the second is the measure of plausibility.
References 1. Castanedo, F.: A review of data fusion techniques. Sci. World J. (2013). https://doi.org/10. 1155/2013/704504 2. Hall, D.L., Llinas, J.: An introduction to multisensor data fusion. Proc. IEEE 85(1), 6–23 (1997) 3. Gohring, D., Wang, M., Schnurmacher, M., Ganjineh, T.: Radar/Lidar sensor fusion for carfollowing on highways. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications (2011) 4. Kunz, F., Nuss, D., Wiest, J., Deusch, H., Reuter, S., Gritscneder, F., et al.: Autonomous driving at Ulm University: a modular, robust, and sensor independent fusion approach. In: Proceedings of IEEE Intelligent Vehicles Symposium (2015) 5. Payandeh, S.: Visual Tracking in Conventional Minimally Invasive Surgery. CRC Press, Boco Raton (2016) 6. Hou, L., Wan, W., Hwang, J. et al.: Human tracking over camera networks: a review. EURASIP J. Adv. Sig. Process. Article number 43 (2017) 7. Hoseinneshad, R., Vo, B., Vo, B., Suter, D.: Bayesian integration of audio and visual information for multi-target tracking using a CB-member filter. In: IEEE International Conference on Acoustic, Speech and Signal Processing (2011) 8. Payandeh, S.: On importance sampling in sequential Bayesian tracking of elderly. In: Proceedings of IEEE Systems Conference, pp. 1–6 (2016). https://doi.org/10.1109/syscon.2016. 7490545 9. Dempster, A.P.: Upper and lower probabilities induced by multivalued mapping. Ann. Math. Stat. 38, 325–339 (1967) 10. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) 11. Persz, A., Tabia, H., Declercq, D., Zanotti, A.: Using the conflict in Dempster-Shafer evidence theory as a rejection criterion in classifier output combination for 3D human action recognition. Image Vis. Comput. 55(2), 149–157 (2016) 12. Gravina, R., Alinia, P., Ghasemzadeh, H., Fortino, G.: Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inf. Fusion 35, 68–80 (2017) 13. Garcia, F., Martin, D., Escalera, A., Armingol, J.: Sensor fusion methodology for vehicle detection. IEEE Intell. Transp. Syst. Mag. 9(1), 123–133 (2017) 14. Dallil, A., Oussalah, M., Ouldali, A.: Sensor fusion and target tracking using evidential data association. IEEE Sens. J. 13(1), 285–293 (2013)
352
S. Payandeh
15. Zao, Y., Jia, R., Shi, P.: A novel combination method for conflicting evidence based on in consistent measurements. Inf. Sci. 367–369, 125–142 (2016) 16. Busemeyer, J., Bruza, P.: Quantum Models of Cognition and Decision. Cambridge University Press, Cambridge (2012) 17. Yearsley, J., Busemeyer, J.: Quantum cognition and decision theories: a tutorial. J. Math. Psychol. 74, 99–116 (2016) 18. Feynman, R.: The Feynman. Lectures on Physics: vol. III. https://www.feynmanlectures.cal tech.edu/ 19. Ashtiani, M., Azgomi, M.: A survey of quantum-like approaches to decision making and cognition. Math. Soc. Sci. 75, 49–80 (2015) 20. He, Z., Jiang, W.: Quantum mechanical approach to modeling reliability of sensor reports. IEEE Sens. Lett. 1(4), 1–4 (2017) 21. Schuld, M., Killoram, N.: Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 122(1), 040504 (2019) 22. Zadeh, L.A.: A Simple view of the Dempster-Shafer theory of evidence and its implication for the rule combination. AI Mag. 7(2), 85–90 (1986)
Design and Implementation of a Flexible Platform for Remote Monitoring of Environmental Variables Francisco de Izaguirre(B) , Maite Gil, Marco Rol´ on, Nicol´ as P´erez, and Pablo Monz´ on Facultad de Ingenier´ıa, Udelar, Montevideo, Uruguay {francisco.de.izaguirre,maite.gil.prandi,marco.rolon,monzon}@fing.edu.uy, [email protected] https://www.fing.edu.uy/
Abstract. The advancement of technology makes remote management and monitoring of environmental systems increasingly close, accessible and necessary. For several applications, it is important to have an equipment to gather data and information in a friendly, remotely, and easyto-access platform. The system that was developed consists of modular telemetry equipment and a server. Its fundamental characteristics are flexible architecture, energy autonomy and remote communication. It is designed and constructed by independent modules, which provide versatility allowing the reuse of the hardware on multiple applications. It enables to rapidly develop a prototype to test an idea. The Arduino platform was used to develop the modules because it is an open software and hardware platform, its philosophy is aligned with the purpose of this project. The server, developed using open software, has databases for storage and a web service for data display and control. The versatility of the equipment was tested in two particular applications: monitoring of beehives sound for environmental pollution control, and determination of the amount of phosphorus and nitrogen present in the water of river courses.
Keywords: Data acquisition networks
1
· Environmental monitoring · Sensor
Introduction
For a while, realtime data acquisition is being fundamental in our lives. There are many cases where this kind of technologies is being used, from weather applications [1,2] to whether more eggs are needed in the fridge [3]. It is a growing area that is being developed very fast to satisfy our daily needs. For academic uses, realtime data acquisition is also a standard requirement. As an example, the group works on acquire the sound of a beehive and some environmental data, like temperature and humidity [4,5]; and also works on river’s overwatch to study how much phosphor and nitrogen are being concentrated on the water, c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 353–363, 2021. https://doi.org/10.1007/978-3-030-55190-2_27
354
F. de Izaguirre et al.
and what the pH level is [6,7].These two previous situations have the same core. Both are a system, that acquires some specific data, stores it and sends it to a server, where an AI, and later an operator or a researcher, can interact with the data. The present work is about a system that can be mostly reused. The system is divided into three kinds of modules:Power Supply Module (Subsect. 2.1), Main Module (Subsect. 2.1), and Acquisition Modules (Sect. 3); and a remote Server (Subsect. 2.2). The first two are reused in every solution. In brief outlines, the Main Module takes care of the scheduler of the system and the bi-directional communication with the server; the Power Supply Module manages renewable energy sources and stores energy in a battery. The acquisition modules are developed and combined for any particular application. Given a new problem, it is only needed to develop, or modify, the acquisition modules. The interconnected equipment is a seven pin unified power and communication bus, based on the standard bus I2C. As is not available a standard communication protocol over I2C that cover the system needs, one was developed by our team. This protocol allows to have a plug and plays device, robust against failures, with autodiscovery functionality, sum verification in the messages, query identifications, scalability and transparency. The article also presents two cases of use that are successfully implemented. The first one is about monitoring beehives. For this application, the solution has two other modules apart from the Main Module and Power Supply Module. One acquires weight, temperature and humidity. The other acquires the sound of the hive, up to four channels. With these measurements, it is expected to learn the hive activity and correlate it with the presence of pesticides. The other application is about river’s poultry monitoring. Here, also two acquisitions modules were developed: one to determine the phosphor or nitrogen levels, and one to measure the water pH. This type of system facilitates the acquisition of large amounts of information with which pattern recognition and big data algorithms can be trained, allowing a better understanding of the world around us. In particular, regarding the monitoring of beehives, it is intended to be able to characterise the sound of the beehive in order to identify regular phases (the songs of queens, the moment just for the division of the hives, foreign attacks, etc.), and distinguish them from unexpected activities [8–11], potentially associated to the presence of agro-toxins on the hive range. Regarding the monitoring of river courses [12–15], an equipment was developed that allows the phosphorus detection experiment to be carried out at the site and only transmit the obtained value. The massive acquisition of data, in this case, is expected to allow improvements at the agricultural, industrial and academic level especially in the fight against cyanobacteria through the early detection of phosphorus spilling in water. The article is organised as follows. Sect. 2 presents the general aspects of the system, along with a detailed description. In Sect. 3 two already developed applications are presented, and finally, in Sect. 4, the conclusions are drawn.
Flexible Plaltform for Remote Monitoring
2
355
Architecture
The system has as its main concept to be flexible and modular. As for, it needs an architecture that accomplishes these objectives, based on independent and replaceable modules, together with a power & data bus that all modules must comply with. The system allows remote storage of the acquired measurements, communication via cellular network, autonomous field operation and specific measurement modules that depend on the application. Fig. 1 shows the data acquisition system: the device itself and the server. The server receives the transmission over the cellular network and stores the data, so that the users can access it through the Internet.
Fig. 1. System blocks diagram.
2.1
Device
The device is the field side of the system. It is divided into modules and interconnected by a defined bus. These are described next. Architecture Definition. The architecture is what defines the structure and functioning of the system you want to end with. In this case, the architecture is designed to achieve a flexible, modular and reusable system. The device is responsible for acquiring signals, those can be scalars or vectors. Once the acquisition has been made, it processes the data obtained, saves it locally, and sends it, totally or partially, to the server. The operating parameters of the equipment must be able to be modified locally and remotely. In order to perform the previous tasks, the device is divided in three types of modules: main module, power supply module, and acquisition modules. Main module responsible for centralising data and uploading it to the server. Power supply module as the name implies, it is responsible for the energy management of the system.
356
F. de Izaguirre et al.
Fig. 2. Developed device with its modules and solar panels.
Last but not least, the acquisition modules responsible for relieving measures according on the application. The complete device can be seen in Fig. 2. Each module has its own microprocessor. As it is explained next, as long as the modules respect the interconnection bus, they do not need to use the same platform. In the examples described in Sect. 3, different Arduino boards were used. Bus Definition. The modules defined in the system needs to communicate to each other somehow, therefore, a communication bus is necessary. The implemented bus is I2C (Inter Integrated Circuit), a serial and synchronous protocol that works in master - slave mode [16]. I2C has two lines, SDA for data, and SCK for serial clock. In such manner, to interconnect the modules, a data & and power supply unified bus is implemented. Four pins are supplied for power: 0V, 3.3V, 5V, 12V, and two for communication through I2C. Local Protocol Definition. A communication protocol is a series of rules that are defined so that two, or more devices, can communicate and understand each other. The need to create a protocol for the exchange of data between modules is born during the development of the system in a constructive and iterative way. It is based on the characteristics and limitations of the device and the I2C bus. The I2C bus protocol, is one of the fundamental aspects of the system. It provides independence with the application implemented in the project, and a high level of flexibility and versatility when adding or removing new modules. Power Supply Module. The power supply module, shown in Fig. 3, is in charge of providing power to the device and acts in the system as an I2C slave module. It is composed of a solar panel, a battery and a controller board. The controller is a module of solar energy management of high efficiency and medium
Flexible Plaltform for Remote Monitoring
357
power, and its designed for standard 18 V solar panels and lead acid battery 12V. It handles an MPPT algorithm, that helps improving the energy efficiency of the system, and thus prolong the useful life of the battery, as well as keeping the device on for a longer time in the absence of solar energy.
Fig. 3. Power supply module
Main Module. The main module, shown in Fig. 4, is in charge of managing the system, timing, alerting the loss of connection with the modules, local storage of the data obtained, transmission of data to the server, and the remote reception of commands. To solve the remote communication, a module was required that would allow the connection to the Internet through the telephone networks. For this purpose, GPRS was chosen as the network technology and the SIM800 module as a modem. At this time, GPRS it is the most widespread telephone network in our location, even in places of difficult access. Among the technologies that are deployed throughout the national territory is the one with the lowest energy consumption. Moreover, it has robust and inexpensive modules to work with. During system initialisation, a scan of the modules that are connected to the bus is performed. Following this data request, the main module, the bus master, has a profile of each slave module assembled. It has an agenda to meet and the necessary information to be able to exchange information through I2C. 2.2
Server
The server is responsible for receiving the information sent by the main module, storing it and displaying it in a web interface. It also allows you to send commands to the main module to configure the auxiliary modules, force measures, or obtain the current configuration. It is sought that the system be free and replicable. Therefore, the operating system used for such purposes is Linux. Within the wide range of distributions that Linux offers, it has been decided to use the Fedora Server.
358
F. de Izaguirre et al.
Fig. 4. Main module
Communication Protocol. One of the great challenges faced in carrying out this project was to find a solution to the problem of establishing a communication in which the user sent commands to the devices. This issue arises mainly because cell phones change their IP each time they connect to the network. Therefore, it is impossible for the server to initiate a communication on its own with the devices. MQTT protocol solves this issues, it is widely used in IoT solutions. The main feature of this connection-oriented protocol is that it implements a logical subscription/publication logic. This allows to decouple what happens on the server, in particular to become independent from having to know who is going to receive the messages. This implies that remote device do not need information about the recipients of its messages, or even when they receive their messages. Likewise, it allows the person who is going to communicate with the device to not need to know their address. Being a connection-oriented protocol, it is possible to send messages from the server to the devices once the module has establish a connection with the server. Databases. There is a large number of server solutions for IoT projects. It is the norm, that they receive or suggest that the data sent be packaged in a JSON. This strategy was decided to be emulated. This facilitates compatibility with existing systems. A decision on that naturally arises is to use a database that supports these files. In this sense, MongoDB is available, an unstructured database for JSON storage. Within the versatility of this database is that no previous structure is required. The JSON received in each message can contain both different fields and a different number of fields. In this database all the messages that arrive from the devices are stored. The messages can be: messages with measures, messages with alerts, messages of confirmation in the changes of configuration.
Flexible Plaltform for Remote Monitoring
359
For graphic representation it was decided that it is best to use a specific database for time series. Within these databases, InfluxDB is one of the most used solutions, since it is optimised for this use. This database is open source and developed by InfluxData. User Interface. For the web interface, three web services are presented. They all have complementary characteristics. 1. MongoKU. This web service allows you to display the MongoDB databases in a browser, which is useful for viewing the messages received. Not only those that contain measures, but also in alerts and properties. You can navigate within this web service looking for messages that contain a certain key. In particular, it can be filtered if it is an alert, a certain timestamp, or an ID. 2. Grafana. Grafana has integration support for a wide range of databases and applications. It presents a clean and simple graphic interface, which allows multiple ways of representing the data, and even allows actions to be taken with them. Grafana supports more databases than InfluxDB, for example SQL or PosgresQL. It is possible to export the data to a CSV, and perform scripts to take actions or analyse the data. 3. Chronograf. Chronograf is developed by InfluxDB. It is free software, and allows managing databases of this type from the same Web Service. However, it is compatible only with InfluxDB databases. As well as Grafana, it allows different representations of the data to be exported to a CSV, and taking actions with them. This is the web service selected for the graphic representation of the data, in order to provide a unique working environment for the user. Linux Service Definition. Until now was said what kinds of software were need to develop the proposed solution. To join all these software and make them work together to achieve to a working server, a Systemd service was developed. This service made run a python script that establishes connection with Mosquitto’s broker, subscribes to all the project channels. Also, establish connection with MongoDB and InfluxDB servers. So, when messages are received, first it’s raw data is stored at MongoDB. After this, the message is parsed, tagged and stored at InfluxDB. The script that is run by the service should be designed and modified for every particular solution. And is extraordinary that the language of this script can be anyone that is accepted by Linux.
3
Applications
So far, two prototypes have been implemented, oriented to two specific applications: monitoring of beehives and monitoring of phosphorus and nitrogen levels in surface waters. As was said, this system is modular and reusable, accordingly for both applications the same Main Module and Power Supply Module were used.
360
3.1
F. de Izaguirre et al.
Monitoring of Beehives
The use of pesticides is common in the agro-industry as a way to eradicate pests. Its extensive use in monocultures seeks to enhance the yield of production, at the expense of, among others, the health of beehives. These are affected, either through by entering the hive contaminated food that they collect and by the toxic waste that remains in their body [18,19]. Two modules were developed to characterise the beehive: The Scalar Acquisition Module and The Vectorial Acquisition Module. In the Scalar Acquisition Module, relative humidity, temperature and weight of the hive are measured. The Vectorial Acquisition Module handle the audio measurement. Figure 5 show both modules.
Fig. 5. Acquisition modules: weight, humidity and temperature (left); sound (right).
The initial stage of this work is monitoring beehives and to collect large amounts of data, and based on it perform pattern recognition, as a mean to understand and predict the hive’s habits [8–11]. The ultimate goal of this application is to determine the correlation between the hive’s behaviour and pesticide residues applied in its area of influence. Detailed aspects on this matter could be found in [4,5]. 3.2
Monitoring of Phosphorus and Nitrogen Levels in Surface Waters
Due to contamination in water of river courses, the need for a system that allows remote monitoring of some relevant parameters in water quality, particularly those related to the growth of cyanobacteria, has increased [17]. Two modules were developed to attend these matters, the Analytical Acquisition Module and the Auxiliary Acquisition Module. Both use a scale laboratory for mixing reagents and sample water, which is shown at Fig. 6. Here, the Analytical Acquisition Module performs a calibration and a measurement routine for both phosphorus and nitrate level acquisition. The Auxiliary Acquisition Module uses the lab to measure temperature and pH. Both modules perform pump control of external water sampling. Since optical absorption measures are
Flexible Plaltform for Remote Monitoring
361
performed using perishable reagents, the lab must be refill once a week. More details on this topic can be found in [6,7]. The massive acquisition of data allows using pattern recognition and signal processing techniques for early phosphorus spilling in water detection, to prevent cyanobacterial proliferation [12–15].
Fig. 6. Scale laboratory for mixing reagents and sample water
4
Conclusions
An open, generic, modular, low-cost and robust system was designed to be used in remote monitoring applications. Since the equipment has communicational features, it is possible to spread a network to monitor a whole area of interest, such as a river course, a lagoon or a crop field. To illustrate the versatility and functionality of the system, two concrete applications were described: the characterisation of a beehive’s behaviour, enabling future work to study pesticide contamination; and water quality monitoring emphasising on the relevant variables in the growth of cyanobacteria. The physical design is robust, which allows the display of the device on the field. The flexibility of the system stands out, and makes the device suitable for other applications just by replacing the acquisition modules. At present, LoRa integration is being carried out. This will enhance the performance and expand the field of research. As the last stage of this project, it is expected that the agroindustry and the academy researchers takes benefit of AI through massive realtime data acquisition.
362
F. de Izaguirre et al.
References 1. Gouda, K.C., Preetham, V.R., Shanmukha Swamy, M.N.: Microcontroller based real time weather monitoring device with GSM. Int. J. Sci. Eng. Technol. Res. (IJSETR) 3(7), 1960–1963 (2014) 2. Kamble, S.B., Rao, P.R.P., Pingalkar, A.S., Chayal, G.S.: IoT based weather monitoring system. Int. J. Adv. Res. Innov. Ideas Educ. (IJARIIE) 3(2), 2886–2991 (2017) 3. Sangole, M.K., Nasikkar, B.S., Kulkarni, D.V., Kakuste, G.K.: Smart refrigerator using internet of things (IoT). Int. J. Adv. Res. Ideas Innov. Technol. 3(1), 842–846 (2017) 4. P´erez, P., Jes´ us, F., P´erez, C., Niell, S., Draper, A., Obrusnik, N., Zinemanas, P., Mendoza, Y., Carrasco, L., Monz´ on, P.: Continuous monitoring of beehives sound for environmental pollution control. Ecol. Eng. (2016). https://doi.org/10.1016/j. ecoleng.2016.01.082 5. Draper, A., Obrusnik, N., Zinemanas, P., Monz´ on, P., P´erez, N.: Design and implementation of a remote monitoring system to detect contamination in beehives. ChileCon (2015) 6. Gonzalez, P., P´erez, N., Knochen, M.: Low cost analyzer for the determination of phosphorus based on open-source hardware and pulsed flows. Qu´ımica Nova (2016) 7. Knochen, M., Roth, G., Gonz´ alez, P., P´erez, N., del Castillo, M., Monz´ on, P.: Desarrollo de un analizador qu´ımico in situ para aguas superficiales. Congreso de Agua Ambiente y Energ´ıa, AUGM (2019) 8. Maes, A.M.: The scaffolded sound beehive. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI (2015) 9. Nolasco, I., Benetos, E.: To bee or not to bee: investigating machine learning approaches for beehive sound recognition. Detection and Classification of Acoustic Scenes and Events (2018). arXiv:1811.06016 10. Nolasco, I., Terenzi, A., Cecchi, S., Orcioni, S., Bear, H.L., Emmanouil Benetos: audio-based identification of beehive states. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2019). arXiv:1811.06330 11. Terenzi, A., Cecchi, S., Orcioni, S., Piazza, F.: Features extraction applied to the analysis of the sounds emitted by honey bees in a beehive. In: International Symposium on Image and Signal Processing and Analysis, ISPA (2019). https://doi. org/10.1109/ISPA.2019.8868934 12. McElhiney, J., Lawton, L.A.: Detection of the cyanobacterial hepatotoxins microcystins. Toxicol. Appl. Pharmacol. 203(3), 219–230 (2005). https://doi.org/10. 1016/j.taap.2004.06.002 13. Bertone, E., Burford, M.A., Hamilton, D.P.: Fluorescence probes for real-time remote cyanobacteria monitoring: a review of challenges and opportunities. Water Res. 141(15), 152–162 (2018). https://doi.org/10.1016/j.watres.2018.05.001 14. Samantaray, A., Yang, B., Dietz, J. E., Min, B-C. : Algae detection using computer vision and deep learning. In: International Symposium on Image and Signal Processing and Analysis, ISPA (2019). arXiv:1811.10847 15. Cremella, B., Huot, Y., Bonilla, S.: Interpretation of total phytoplankton and cyanobacteria fluorescence from cross-calibrated fluorometers, including sensitivity to turbidity and colored dissolved organic matter. In: Association for the Sciences of Limnology and Oceanography, Limnology and oceanography: Methods (2018). https://doi.org/10.1002/lom3.10290 16. NXP Semiconductors: I2 C-bus specification and user manual (2014)
Flexible Plaltform for Remote Monitoring
363
17. Mukhopadhyay, S.C., Mason, A. (eds.): Smart Sensors for Real-Time Water Quality Monitoring. Springer, Berlin (2013) 18. Zacepins, A., Brusbardis, V., Meitalovs, J., Stalidzans, E.: Challenges in the development of precision beekeeping. Biosyst. Eng. 130, 60–71 (2015) 19. Naggara, Y.A., Codlingb, G., Vogtb, A., Monaa, E.N.M., Seifa, A., Giesyb, J.: Organophosphorus insecticides in honey, pollen and bees (Apis mellifera L.) and their potential hazard to bee colonies in Egypt. Ecotoxicol. Environ. Saf. 114, 1–8 (2015)
Adaptable Embedding Algorithm to Secure Stream Data in the Wireless Sensor Network Mohammad Amanul Islam(B) School of Computer Science and Technology, Xidian University, Xi’an 710071, China [email protected]
Abstract. The application of digital watermarking has already been established to protect the data integrity in the wireless sensor network, but embedding watermark requires an extensive review of the data payload. Since the sensor data can be employed to be watermarked, a keen manipulation of the binary sequence of data can attain a higher embedding capacity without introducing any additional storage overhead. Watermark embedding is a critical technology that also guarantees a system of avoiding the threat of leaving freed space into the designated data fields a predefined payload of a sensor data packet. Thereby, a novel data integrity protection method based on an adaptable and distributed approach of watermark embedding has been proposed in this paper that identifies the distinct length of data of the same primitive data type. Yet, the substantial experiments indicate the resiliency of the future system to confirm data accuracy and transmission efficiency. Keywords: Bits sensor network
1
· Data stream · Digital watermarking · Wireless
Introduction
The proliferation of data protection for streaming data is very crucial nowadays to confirm high trustworthy data. The importance of securing the data and critical infrastructure related to sensor networks are also highlighted in the research and development Challenges for Critical National Infrastructure [1] and recommend research initiatives on developing a proactive and predictive security posture. To protect the copyright information and content integrity of multimedia digital works (images, audio, and video, etc.), digital watermarking [19] technology is already been widely explored [2]. Prior works mostly focused on cryptographic algorithms and digital signatures [3–6]. The homomorphic encryption techniques are used as an application of cryptograph [7] to en-route data verification that allows direct aggregation of encrypted data. Another scheme by R. Hasan et al. [8] proposed on both encryption and incremental chained signature to protect c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 364–385, 2021. https://doi.org/10.1007/978-3-030-55190-2_28
AEA
365
sensory data. However, the computational cost of these schemes is high due to executing a vast amount of complex instruction (e.g., multiplication, modulo exponentiation). Therefore, lately, the application of watermark embedding has been increased to protect sensor data against possible malicious attacks and especially to provide data integrity [9–13]. Data trustworthiness can be evaluated by the digital watermarking technique and the actions performed on the data. The key idea of watermarking exhibits hidden information (watermark) related to the data within which itself thereby ensures the presence of watermark along with the data [17] and remains exist within the data after any decryption process [33]. Thus, the assurance of data trustworthiness and prioritizing the application of watermarking technology are now becoming an active area of research. Compared to traditional encryption and digital signature-based method, watermarking methodology has following advantages [14,15]: (i) consumes low energy due to lightweight computation of creating, embedding and extracting watermark; (ii) watermark messages directly interact with data without introducing additional overhead for network parameter and storage capacity of sensor nodes; (iii) the protection of the encryption technique loses its effect while the data is decrypted once, but watermark always guarantee the data security, since it is inseparable part of host carrier; (iv) digital watermarking significantly reduces the end-to-end delay caused by encryption process. However, the previous research interest focused on secret message insertion as inline metadata [16]; finding the channel of data embedding with or without the data; assuring secure and efficient transmission of watermark [17]; reducing the cost of the security operation. However, exploring embedding capacity and any threats on leaving freed space after partial use requires more investigation. Hence, the characteristic of sensor data and its payload requires an extensive analysis to overcome the aforementioned gaps. In this paper, the problem of less embedding capacity has been introduced, and remove the necessity of leaving freed space (in bits) in the data field in terms of hiding secret watermark into it. The proposed scheme utilizes the unused space of segmented data fields of the payload that remained after the insertion of data into the fields for embedding watermark. Thus, the distribution of data watermark in this scheme deliberates with the data into the number of data fields of a pre-defined payload of a data packet. The data watermark demonstrates the interrelation of data based on their arrival time, and their order within the data payload. Each source node encodes dissimilar part of a secret message as a watermark chunk with each data, and insert the encoded data into the data field. Hence, decoding each chunk of watermark requires the processing of all the data fields of each data packet in a particular flow, i.e., data stream. However, a certain length of the non-arbitrary value is allotted to each bitshift window. Since each sensor data use this particular value to shift its bits value after determining its actual bit width, thus it is named as a bit-shift value. The actual bit-width is considered as an adaptable property of sensor data. Therefore, a sensor node on the data flow encodes a variable length of watermark
366
M. A. Islam
chunk along with a variable length of data. This scheme confines the shifting process so as the shifted bits not to exceed the length of the data field. Thus, the original data does not lose any bits while it is being shifted, and shifted value employed as watermark carrier suffers from no data degradation. Furthermore, the security scheme is able to detect errors on access violations to any data field caused by attacks. The contributions of this paper include: – introduce the adaptable possessions of sensor data; – remove the threat on sendoff the freed bits into the data storage and maximize the utilization of it; – introduce an adaptable embedding technique of data watermark, where the part of a watermark is distrusted in distinct length with the data and extend embedding capacity; – design and analysis of the security properties that are composed of bit-shift value, the arrival time of data and the indices that signifies the order of data; – an efficient technique to retrieve the watermark from the watermarked data based on a regression and correlation approach; – an experimental assessment using synthetic data that reflects its bit-shift characteristics, and evaluate the performance of sensor network parameters. The rest of the paper has organized as follows: Sect. 2 represents the related work. Section 3 introduces the proposed framework, functional stages of provenance initiation, retrieval process. Section 4 evaluates the system and analyze the experimental result of the proposed framework. Section 5 presents the discussion. Finally, Sect. 6 concludes the paper. For convenience, Table 1 lists important notations that used in the rest of this paper.
2
Related Work
In terms of wireless sensor network security, data hiding or embedding techniques to ensure data trustworthiness have already got attention as well-known technology [20–22,31]. Additionally, previous research also resolved the difficulties of adding a large size of secret messages in a self-describing binary format [32]. Inserting extra messages in different transmission channels has also been investigated extensively. Therefore, it is not an overhead to embed the watermark in the present world of increasingly dynamic sensor environments. A fragile data hiding method proposed by Guo et al. [20], to verify the integrity of streaming data. This scheme divides the data into groups; secret messages are chained across the group and embedded directly into the LSB of data. Data deletion can be correctly detected due to chaining characteristics. Another fragile embedding algorithm with a dynamic grouping of data and chaining approach proposed by Wang et al. [23]. In this scheme, the data first converted into character and discover blank character as embedding scope, and then the data is divided into groups dynamically. Finally, generate the hash of each group of data as their own
AEA
367
fragile secret data. Though, the above schemes attempt to solve data forgery and integrity problems but affect the data accuracy and embedding capacity. An approach of utilizing fixed redundant bits of the data fields for watermarking has been proposed by Xingming Sun et al. [18]. This scheme was proposed to verify data integrity based on the accuracy of received secret data from the data field. This method is convenient in terms of the computational cost of its security scheme. Moreover, A dual chaining watermarking (DCW) approach based on exploring the binary string of data has been proposed by Baowei Wang et al. [24]. Though it demonstrates fragile watermarking into the data in a chaining way using a dynamic group, the watermark in this scheme also embedded into a fixed position in each group of data, where the freed bits are redundant [18]. This method is convenient in terms of generating its fragile watermark. However, the characteristics of the data and data field are not examined explicitly in these above two methods. Moreover, much proficient research work regarding active timing-based data hiding [25–28] has already been done to protect the network flow. The watermark insertion has also been observed as data provenance. Sultana et al. [17] proposed an inter-packet delay (IPD) based provenance transmission, where each bit of secret pseudo-noise (PN) code was encoded between the delay of each consecutive data packets to secure the sensor data stream. However, this scheme has also a limitation on encoding large size of secret message cause of inserting secret information into consecutive delay between the data packets in a particular flow. The difference between the active timing and IPD based methods is, active timing scheme encodes a single secret message over the IPDs in a particular data flow, whereas the IPD method allows multiple nodes to encode secret messages over the same set of IPD. The proposed method significantly differs from the above approaches in several aspects: computing scope to embed watermark with the data by exploring the length of a data and data field; distributing the watermark without leaving any freed bits on the formatted data; ensuring minimum confidentially before the transmission; assuring the integrity by the interrelation between the disseminated watermark chunks and the order of data stream among the data fields. It also implied as a reversible technique of data hiding because of distinguishing both the secret watermark and the original form of data [29,30]. Moreover, an idea of imposing additional information as an encrypted signature during the data acquisition was proposed by J. Feng et al. [21]. In such a case, secret data is selected to identify tradeoffs between the accuracy and the strength of proof of the authorship. On the contrary, in the proposed method, SN authorizes each pre-shared data specific keys with the arrival time of data and its position index on an ordered data flow. Hence, the above parameters also play an important role in the key synchronization process while the applied bit-shift value is being diagnosed to conform to a successful recovery of data and watermark.
368
3
M. A. Islam
Proposed Framework
3.1
Overview
In this work, the proposed methodology shows a distributed approach that embeds the variable length of watermark chunk with a variable length of sensor data in a particular flow. The adaptable embedding algorithm (AEA) introduces a framework to query the length of each sensor inputs using several bit-shift windows. Each bit-shift window is mapped with a distinct bit-shift value that use to shift the bits of each sensor’s values. Thus, a set of variable lengths of data embeds several lengths of watermark chunk into the vacated bits of shifted code of data. In this framework, a data specific secret key hk i is used to create an encrypted ε(dwi ) before transmission. Since the bit-shift value is used in this framework as one of the key parameters to encode and decode the watermark, each data source includes the λi in hk i generation process. Thus, the SN not only ), but also uses it uses the pre-shared key set K to decrypt the received ε(DW to retrieve the applied bit-shift value based on a key synchronization procedure. Therefore, the SN must be available with a set of pre-shared data specific keys 2 , . . . , hk n } and knowledge about the bit-shift values {λ 1 , λ 2 , . . . , λ |L| } to 1 , hk {hk Table 1. Important notations Notation Description
Notation Description
ε
Encrypt
D
Decrypt
pk i
Single data packet
df i
Single data field
Δj di vsi λi
Single bit-shift window
Δws j
Size of Δj
Single data
di
Shifted code of di
Vacated space on di
dwi
Watermarked form of di
Single bit-shift value
γi
Size of a vsi
W
Watermark
wci
A watermark chunk
hdi i hk
Single hashed di
hki
Single secret key for di
Pre-shared secret key
aλi
Applied bit-shift value
l
Lower limit of a Δi
u
Upper limit of a Δi
Fig. 1. Embedding process at a source node and decoding process at a sink node
AEA
369
complete the whole recovery procedure. Before determining the received watermark using the recovered bit-shift values, their correctness is substantiated by the SN using linear regression and correlation-based method. Thus, the tailed procedures following Fig. 1, the embedding, and decoding watermark based on the AEA can be specified as (i) (ii) (iii) (iv) (v) 3.2
initialize the framework with multiple distinct bit-shift windows; execute the shifting process according to bit-shift windowing; watermark generation and embedding process; prepared security initiatives before transmission; data decryption and watermark decoding. Framework Initialization with Multiple Distinct Bit-Shift Windows
Initially, a node ni constructs the payload structure of its packet with a certain number of data fields of the primitive data type. Due to common data type, each df i has the same bit width (i.e., L bits) as storage space for the data. The sum of storage size of all the data fields are given payload of a data packet n as, payload(pk i ) = i=1 df i (L). The AEA uses each df i to transport the sensor data. According to the Algorithm 1, the AEA framework performs a query through several bit-shift windows ΔS to determine the actual bit width of each di , ba , where bl ≤ ba ≤ bu ), and finds which bit-shift window Δi ∈ ΔS it fits to. Each Δi is specified with a lower limit and an upper limit as bl and bu respectively. Thus, each Δi ∈ ΔS can be stated as a particular partition of the total length (L bits) of a df i . ΔS = {Δ1 [l1 , u1 ], Δ2 [l2 , u2 ], . . . , Δ|ΔS| [l|ΔS| , u|ΔS| ]} |ΔS| S and the lower limit l of Δi+1 is, li+1 = ui + 1. While where, L = j=1 ΔW j a sensor data di is detected as a member of a certain Δi based on its ba , where ba ≤ L, then an appropriate bit-shift value λi = L − ba is returned for the respective di for shifting its bits in the length of λi . However, a node ni may generate bit-shift windows offline with the knowledge of the length of each data field of a data packet. Definition 1. A bit-shift window Δi is a particular range of typical bit width (e.g., 0 to 7 bits, 8 to 15 bits) within the entire bit width (e.g., 32 bits for an integer) of a primitive type data field. Thus, several distinct unsigned ranges (e.g., 0 to 255, 256 to 65535) are formed according to the typical bit width. 3.3
Shifting Process Based on Bit-Shift Windowing
At this stage, each di specific selected λi ∈ λS is applied on the data with a left shift operation according to proposed method. Note that, the partitioned bitshift windows per df i are same for any di , thus selected λi for any di+1 might be
370
M. A. Islam
same or distinct. Here, λi > 0 if di ∈ Δi and λi = 0 if di ∈ / Δi that indicates a non-eligible data to be watermarked. Shifted bits are restricted to be set within the length of data field. The allocated λi in each Δj constructed by subtracting S (bit width of a range) from the total length the size of bit-shift window ΔW j of each df i . Note that, limiting the shifted binary digits within the length of a data field restricts the bits to get shifted off the end while a selected λi is being applied. Thus, the maximum and minimum bit width of a λi in a particular Δj are determined by solving the equation number (1) and (2), respectively for a as follows: AX S = L − ΔW . (1) λM i i while, the actual bit-width of di , bi is 20 ≤ bi < 21 . IN λM i
=L−
M
S ΔW = L − (L − 1). j
(2)
j=1 S while, the actual bit-width of di , bi is 2L−2 ≤ bi < 2L−1 . where, ΔW = f (Δj ) = j (u − l) + 1 and M is the number of bit-shift windows. Each di performs a sequential query through several bit-shift windows to select a λi . For a certain di , a simplified left shift operation with a λi can be defined as, di = di λi or di = dλi | λi , where dλi indicates the status as the bit-shift value is applied on respective di . Thus a set of shifted data set Di = {d1 , d2 , . . . , dn } is generated, where each di ∈ Di open up γi length of freed bits on its vacated space vsi . However, a set of vacated space {vs1 , vs2 , . . . , vsn },(0 ≤ i ≤ n) for data items is denoted as , where n implied the number of data fields and vsi = γi indicates th the length of freed bits nin the vacated space of i data field, and the total size of vacated space N = i=1 γi that uses for embedding watermark in distributed approach.
3.4
Watermark Generation and Embedding Process
Assume the data source records the sensor data with their arrival time. Following the structure of data payload with n number of data fields per pk i , ni initiates n number of sensor data {d1 , d2 , ..., dn } for transmission. In this stage, the AEA framework follows the below formal steps to make the data to be watermarked: Step 1: Accumulate n number of selected bit-shift values {λ1 , λ2 , . . . , λn } for a certain data set; initiate n number of sensor data {d1 , d2 , ..., dn } and {t1 , t2 , . . . , tn } to generate a N size of hash message for each di based on a one way hash function [34] as, hdi = H(ti , di , N ), (0 ≤ i ≤ n) where, ti is the arrival time of ith data di . Then, a secret watermark W is calculated as, W = groupXOR(hd1 ⊕ hd2 ⊕ . . . ⊕ hdn )
AEA
371
Fig. 2. Generation of data watermark
in which ⊕ denotes the XOR [35,36] operation. However, Fig. 2 shows the watermark formation structure besides giving a demonstration on the segregation of a single the watermark into several chunks. Step 2: A watermark W should be parted in several chunks as {wc1 [λ1 ], wc2 [λ2 ], . . . , wcn [λn ]}, where the size of each wci is determined by the AX is a maximum limit (in bits) of a watermark selected λi . Here, λi = γi = wcM i chunk to embed with a di ∈ D . Thus, the choice of bits for wci signifies the selection of j of W [j], where the j value for wci+1 [j] will start after the j value used by wci . Thus, each wci is added into the γi space of di . Step 3: Embedding watermark chunk to the data set includes the below procedures: 1. Generate the watermark information and its distribution as a watermark chunk according to step 1 and step 2; 2. Insert the bits belong to wci a into the γi freed bits or vacated space that available on di and denotes the watermarked data as dwi . A simplified con struction of dwi can be stated as dwi = di | wci . The size of the watermark chunk has not been pre-determined in the watermark embedding algorithm. However, a data source also can initiate the watermark and its embedding process for a certain data set in offline based on the knowledge of data field length and bit-shift widowing. Figure 3 and Algorithm 1 have described a simplified watermark embedding approach. In this method, watermark information encoded in a binary form. In order to simplify the description of the algorithm, p uses as a position counter; the end of its designated loop it signifies the position index of bits of a watermark chunk to insert. However, at the end of the embedding process an aggregate pool of watermarked data can be stated as DW = {dw1 , dw2 , . . . , dwn }.
372
M. A. Islam
Algorithm 1. Watermark Embedding Input: n number of data set and data fields, the length of data field L Output: the encrypted watermarked data set ε(DW ) Initialisation : p ← 0,k ← 0 1: for i = l to n do 2: hdi ← H(ti , di , N ); 3: end for 4: W ← groupXOR(hd1 ⊕ hd2 ⊕ ... ⊕ hdn ) 5: while j ≤ n do 6: tmp0 ← dj 7: if tmp0 ≥ 0 and tmp0 < 20 then 8: λ[i] ← L 9: i←i+1 10: end if 11: if tmp0 ≥ 20 and tmp0 < 21 then 12: λ[i] ← L − 1 13: i←i+1 14: end if 15: ... 16: if tmp0 ≥ 2L−z and tmp0 < 2L−(z−1) then 17: λ[i] ← L − (z − 1) 18: i←i+1 19: end if 20: end while 21: for i = 0 to (L − λ[k]) − 1 do 22: tmp1 ← d1 [i] 23: d1 [i + λ[k]] ← tmp1 24: d1 [i] ← 0 25: end for 26: for i = 0 to (λ[k] − 1) do 27: d1 [i] ← W [i] 28: p←p+1 29: end for 30: for i = 0 to (L − λ[k + 1]) − 1 do 31: tmp1 ← d2 [i] 32: d2 [i + λ[k + 1]] ← tmp1 33: d2 [i] ← 0 34: end for 35: for i = 0 to (λ[k + 1] − 1) do 36: d2 [i] ← W [p] 37: p←p+1 38: end for 39: . . . 40: for i = 0 to (L − λ[k + (n − 1)]) − 1 do 41: tmp1 ← dn [i] 42: dn [i + λ[k + (n − 1)]] ← tmp1 43: dn [i] ← 0 44: end for 45: for i = 0 to (λ[k + (n − 1)] − 1) do 46: dn [i] ← W [p] 47: p←p+1 48: end for 49: Encrypt(DW ) 50: Send(ε(DW ))
AEA
373
Fig. 3. Watermark embedding structure
3.5
Prepared Security Initiatives Before Transmission
In this section, the security approach that involved in the proposed methodology has described explicitly. Before initiating the transmission, each watermarked data dwi is encrypted based on XOR encryption algorithm [35,36]. A message transformation function enc is followed by the AEA method to encrypt each dwi with each formatted di specific hk i , and thus an encrypted watermarked message ε(dwi ) is computed as , ε(dwi ) = enc(dwi ; hk i ) where, hk i is a secret key for each dwi . The AEA scheme also uses the one-way hash function [34] in its hk i generation procedure as, hk i = genHK(i, ti , λi ) where, ti , λi , i are the arrival time, selected bit-shift value and the position index of ith data di respectively, and thus a set of secret keys is created for n data items. The encryption operation performed in a manner, where the index of hk i and dwj should be same as, i == j. Furthermore, the SN will regenerate hk i as
374
M. A. Islam
i and the knowledge of bithk new with a reciprocal relation between received hk i shift value λi , which is a key synchronization policy to determine whether they i will be regarded as the exact applied related or not. If they relate, matched λ bit-shift value, which is used for further recovery of embedded watermark and original form of data. 3.6
Data Decryption and Watermark Decoding
) = In this section, the SN extracts the received ciphered data set ε(DW {ε(dw1 ), ε(dw2 ), . . . , ε(dwn )} from all the data fields of a data packet. Then, performs a decryption procedure with a set of pre-shared data specific keys 2 , . . . , hk n }. Here, the decrypt function dec uses each ordered hk i as an 1 , hk {hk
Fig. 4. Watermark decoding structure
AEA
375
i ) to recover decrypted watermarked data D(dw i) input for each received ε(dw as, i ); hk i )) i ) = dec(ε(dw D(dw = At the SN, decoding algorithm examines a set of watermarked messages DW {dw1 , dw2 , . . . , dwn } after the extraction and decryption process. Figure 4 illustrates the streamlined structure of watermark decoding and its verification at the SN. In the proposed methodology, SN uses the applied bit-shift values as a recovery mechanism that correspond to each watermarked data. Finally, the SN verifies integrity of data by comparing the retrieved watermark and pre-shared original watermark. Thus, the AEA scheme performs two key recovery procedures: bit-shift values recovery and their correctness exploration before initiating the watermark decoding and its verification process. Bit-Shift Value Recovery: The SN attempts a recovery of applied bit-shift 1 , hk 2 , . . . , hk n } that value aλi from each well-ordered received secret keys {hk has the information about aλi == λi . Such recovery process is performed based on a key synchronization procedure. In this function, hk i is regenerated as hk new i 1 , dw 2 , . . . , dw n} that involves the position index number of all the ordered {dw 2 , . . . , λ |L| }. 1 , λ and a set of pre-shared knowledge about the bit-shift values {λ as similar as it was followed by the Here, the procedure of generating hk new i new is composed of ti , i, λi . For any particular dwi , if data source ni , where hk i i then its index value and the pre-shared arrival time ti find an appropriate λ new new i, will be synchronized to its conformed hk i as hk i == hk a correct hk i i is returned as aλi . Though the value of aλi is a nonand then, the matched λ zero integer, but it could be 0 that means no watermark is embedded with that / Δj or bi = L for certain received sensor value. Such a condition is exist if di ∈ any di occur during embedding. Correctness Exploration: To determine whether the recovered aλi is accurate, i and aλi , SN performs a linear regression and correlation procedure between hk where the regression line is determined by solving the equation number (3) as: i = α hk (aλi ) + β i) aλi .hk ( aλi )( hk i ) − n( α = 2 ( aλi )2 − n( aλi ) 2 ) − ( aλi )( hk ) ( hk aλ .hk i )( i i 2i 2 β = ( aλi ) − n( aλi )
(3) (4)
(5)
i and aλi , the coefficient To determine the correctness of association between hk of correlation Cri is computed as follows: Cri = (
σaλi ). α σhk i
(6)
376
M. A. Islam
Algorithm 2. Watermark Decoding i and data fields, the length of data field L. Input: n number of watermarked data dw Output: Integrity confirmation result. Initialisation : m ← 0 1: p ← aλ[m] − 1 2: for i = aλ[m] − 1 to 0 do 1 [i] [p] ← dw 3: W 1 [i] ← 0 4: dw 5: p←p−1 6: end for 7: for i = aλ[m] to L do 1 [i] 8: tmp0 ← dw 9: d1 [i − aλ[m]] ← tmp0 10: d1 [i] ← 0 11: end for 12: p ← (aλ[m] + aλ[m + 1]) − 1 13: for i = aλ[m + 1] − 1 to 0 do 2 [i] [p] ← dw 14: W 15: dw2 [i] ← 0 16: p←p−1 17: end for 18: for i = aλ[m + 1] to L do 2 [i] 19: tmp0 ← dw 20: d2 [i − aλ[m + 1]] ← tmp0 21: d2 [i] ← 0 22: end for 23: . . . 24: p ← (aλ[m] + aλ[m + 1]...aλ[m + (n − 1)]) − 1 25: for i = aλ[m + (n − 1)] − 1 to 0 do n [i] [p] ← dw 26: W 27: dwn [i] ← 0 28: p←p−1 29: end for 30: for i = aλ[m + (n − 1)] to L do n [i] 31: tmp0 ← dw 32: dn [i − aλ[m + (n − 1)]] ← tmp0 33: dn [i] ← 0 34: end for 35: for i = l to n do 36: hdi ← H(ti , di ); 37: end for ← groupXOR(hd1 ⊕ hd2 ⊕ ... ⊕ hdn ) 38: W is not equal to W then 39: if W 40: Integrity verification failed 41: else 42: Integrity verified successfully 43: end if
AEA
377
Here, σaλi and σhk i are the standard deviation of recovered aλi and hk i respec i are determined as follows: tively. The deviation values of both aλi and hk σaλi
σhk i
( aλ2i ) − n1 ( aλi )2 (aλi − μaλi )2 = = n−1 n−1 2 2 1 i − μ )2 (hk ( hk hk i ) hki i ) − n( = = n−1 n−1
(7)
(8)
Here, μaλi and μhk i are the mean values of aλi and hk i respectively. Thus, by substituting the deviation values from the equation number (7) and (8) and the value of α from the Eq. (4) into the Eq. (6), we obtain Cri ≈ 1 that determines i . Thus, SN computes a set of applied the strong correlation between aλi and hk according bit-shift values {aλ1 , aλ2 , ..., aλn } and applies to each element of DW , to Algorithm 2. The program calculates the received watermark message as W where each binary digits belongs to w ci is extracted from dwi [L]. Here, the binary i [0], where the position of digits of w ci accrued from dwi [aλi − 1] down to dw each bit into W [N ] have been tracked by a position counter p. However, all i [L − 1]] takes aλi bits right shift to repossess the i [aλi ], dw the bits between [dw according to same rule received data as di . SN recalculates the watermark as W and W to determine whether the followed by the data source and compare W data integrity has been violated.
4 4.1
System Evaluation and Result Analysis Experimental Result
To experiment with the performance of the AEA framework, the sketched algorithms have developed with c/c++. The result obtained throughout the experiment has been analyzed using MATLAB R2016b. In the experiment, the payload structure of a data packet has been designed with 8 init32 (i.e., 32 bits integer) type data fields or 256 bits payload. Figure 5, illustrates the data package as compose of several segmented data fields that used to transport the watermarked data. As discussed earlier, the proposed method examines the bit-width of data to identify distinct length of sensed data based on the query through several bit-shift windows. The bitshift windows are explored as several partitions of total length of a data field. Hence, the AEA accounts two types of partition scheme to diagnose the length of data field in the experiment: single bit partition and multiple bits partition S > 1bit). Though, the Algorithm 1 illustrates the single bit partitioned (i.e, ΔW j segment, a quantifiable multiple bits partition is mentioned in Table 2 along with single bit partitions partially. Here, the coverage bit-width for any bit-shift window has been observed from 8 bits to 28 bits, where Δ1 to Δ3 indicates
378
M. A. Islam
Fig. 5. The data fields according to 256 bits payload length of a data packet Table 2. Relationship between bit-shift values and bit-shift windows Bit-sift windows Range of values Covered bit-width Set bit-shift values Δ1
20 to 27 7
15
8
24
Δ2
2 + 1 to 2
16
16
Δ3
215 + 1 to 223
24
8
Δ4
223 + 1 to 224
25
7
26
6
27
5
28
4
24
25
Δ5
2
Δ6
225 + 1 to 226
Δ7
26
2
+ 1 to 2
27
+ 1 to 2
the partitioned segments that ranged in multiple bits. Moreover, each bit-shift window is mapped with a certain length of the bit-shift value that followed in this experiment to choose a certain length of watermark chunk with each sensor data after verifying its bit-width. However, a bit-shift window can attain its bit-width (in bits) in any range between [1, 32] over an init32 type data field according to the proposed framework. Figure 6 and Fig. 7 both show the adaptable properties of the AEA framework. In Fig. 6, the sketched plots demonstrate the relationship between the size (i.e., length) of bit-shift windows and the bit-shift values based on the types of partition that the AEA scheme followed. However, a linear relation also can be illustrated from the above figure between the measured actual bit width of each data per data filed and the embedding volume. For further investigation of bit-shift values, we have been created around 10 partitioned segments. It can be denoted that if any sensor value selects its bit-shift value before checking the 8th bit-shift window of single bit partition, the length of bit-shift value should be at least 1 bit greater than the first bit-shift window of multi bits partition. Thus, it can be determined that verifying the bit-width with the single bits partition always has the higher chances to embed a larger size of watermark chunk. Figure 7 illustrates the robustness besides the adaptability of the proposed method based on analyzing the size of embedded information into the 8 data fields of 256 bits data payload for several lengths of the data. Each data has been
AEA
379
Fig. 6. Demonstrate the relationship between the size (i.e., length) of bit-shift windows and the bit-shift values over a certain number of partitioned segments combining both single bit and multiple bits partitions
Fig. 7. Demonstrate the relation between the actual size (i.e., bit width) of data and the embedding capacity with both single bit and multiple bits partitions for a particular data set over certain number of data fields
Table 3. Watermark embedding capacity in 256 bits data payload with different types of partition segments Types of partition Size of data (avg.) Embedding capacity (avg.) Single bits
12 bits 10 bits 6 bits
62.5% 68.75% 81.2%
Multiple bits
12 bits 10 bits 6 bits
50% 50% 75%
avg. = average
380
M. A. Islam
examined based on a single bit and multiple bits partitioned window of a 32 bits data field. However, according to Fig. 7, the Table 3 data shows an anticipated embedding volume for the data of average ≤12 bits length in every data field. It has been observed that the watermark covers 62.5% of given payload with the single bit partition, whereas 50% with multi bits partition. Moreover, the embedding volume for the data of average ≤8 bits can consume 81.2% memory for the watermark with a single bit partitioned segment that is only 6.2% greater compared to multiple bits partition. However, a linear relation also can be illustrated from the above figure between numerous lengths of data and the embedding volume. To obtain a possible maximum and minimum length of secret watermark chunk, each data should satisfy the condition defined in equation number (1) and (2) respectively.
Fig. 8. Demonstrate the watermark embedding capacity into distinct length (in bits) of the data payload of a data packet
Moreover, in Fig. 8, the scalability and the adaptability of the embedding scheme have been explored based on the distinct length of the data payload. The illustration indicates that watermark embedding capacity can be enlarged by increasing the data payload and the segmented data fields of a data packet.
AEA
4.2
381
Complexity Analysis
Complexity in Embedding Step: Assume that a N size of data watermark has parted in n number of watermark chunks to attached with n number of sensor data. However, a data source ni examines the actual bit width of each of n data items to query the bit-shift values and generating a watermark before the embedding process. Thus, the cost of Algorithm 1 can be computed as O((n + 1) + (n + 1)) + O(n(n + 1)) ∼ = O(n2 ). = O(n2 + 3n + 2) ∼ Complexity in Retrieval Step: The SN performs the retrieval process on a n number of watermarked data obtained after the extraction and decryption process at the end of receiving a data packet. The proposed scheme executes the retrieval process by computing two following key steps: recover the aλi and apply i to distinguish the data and watermark. Thus the decoding it on associated dw complexity stands for the Algorithm 2 as O((n+1)+(n+1)+n(n+1)+n(n+1)) ∼ = O(2n2 + 4n + 2) ∼ = O(n2 ).
5
Discussions
Cost Analysis: To analyze the cost of the proposed scheme, the approach of adding watermark into fixed redundant bits [18] that obtainable among the data fields has been examined. In this methodology, a fixed length of redundant bits ri between the n number of data fields while n number of data are processed through them. This space usually used to add blank characters or embed other information. Thus, assuming n number of dfi in a given payload of a pk i , total nstorage capacity for the watermark W can be calculated as, length(W ) = i=1 ri , where ri is fixed, and thus, embedding capacity is limited here. In this case, increasing the embedding capacity depends on either increasing the numbers of data fields or increasing the size of a data field.
Fig. 9. Identify the unused freed bits as a threat scope
Furthermore, it has also been observed in Fig. 9 that in the previous method, the size of ri (in this case 4 bits) varies between the data fields and it could larger
382
M. A. Islam
than the length has fixed. Hence, any data field with large ri might compromise some redundant bits to leave unused due to limiting the length for watermarking, which might cause of the threat. An adversary can create ambiguous data by altering such unused redundant bits. In contrast, the AEA scheme initially does not fix the storage within a df i ; instead, examines the bit-width of di and based on its length, the binary sequence of di is shifted with a certain λi = L − ba according to bit-shift windowing; distribute variable length of wci into the γi freed bits of shifted code of di . Thus, the maximum storage for the watermark n with the same assumption can be calculated as length(W ) = i=1 γi , where 0 < γi ≤ L due to selecting distinct λi for any certain di . Thus, the AEA method does not leave any unused freed bits on threat into the data field and maximizes the utilization of the data field. Table 4 evaluates both the AEA and the fixed redundant bits schemes in terms of storage volume measurement with a 2 bytes (16 bits) data field. It shows, the fixed redundant bits approach preserves 4 redundant bits for a watermark chunk and the storage volume for the sensor data has been assumed to be ≤12 bits, whereas the sensor data size was 7 bits. On the contrary, the AEA scheme attains 9 bits freed space for embedding watermark chunk the same size of data into the data filed due to its bit-shift windowing characteristics. However, assuming n = 8 data fields of given payload with the above measurement, the fixed redundant bits approach utilizes only 25% of the data payload, on the other hand, the AEA achieves 56.25%. Moreover, If the length of a data would (12 + 1) bits, the AEA scheme can still attain a minimum of 3 bits for watermarking due to its adaptability. The computational cost of the proposed method on computing the group XOR with each hashed sensor data to generate secret watermark and its distribution approach relatively equivalent to the method, built on fixed redundant bits. Meanwhile, the AEA scheme performs the secret key generation and watermarked data encryption. Moreover, it doesn’t cost any extra storage rather maximizes the utilization of given storage that available into the data fields by diagnosing with a bit-shift windowing approach. However, the proposed framework does require a secure mechanism for distributing the arrival time of data, secret keys with the designated recipient. Table 4. Cost comparison between the AEA and fixed redundant bits (FRB) approach Data properties
The AEA scheme FRB scheme[18]
Size of data field
16 bits
16 bits
Attained bits space for data
7 bits
12 bits
Available bits for watermark chunk 9 bits
4 bits
Bits on threat (left unused)
5 bits
0
AEA
6
383
Conclusion
In this paper, we address a novel problem of limiting the flow watermark in embedding with the data into the data fields and leaving the threat due to send off the freed space into the data fields. The proposed framework verifies the authenticity and integrity of data besides ensuring the quality and safety of data with a tamper-proof trace of unauthorized access. It examines the required bits of data compared with the length of its primitive data type and detects nonoccupied bits for embedding variable length of watermark chunk with the data in a distributed approach. The experiment has been conducted with the inspection of data payload utilization and embedding capacity, which indicates the scalability beside the adaptability of this scheme. Additionally, the AEA scheme illustrates its robustness in removing the threat into the payload segments cause of freed space eventually increases the embedding capacity. To extend the robustness of this scheme, we expect to experiment more on randomizing the position of the segregated watermark, optimizing the computational cost to maintain ideal energy consumption. Additionally, it opens the prospect of improving the security of data by analyzing the data type and the required bits of data in binary form while resides in the data fields.
References 1. Maglaras, L., Ferrag, M.A., Derhab, A., Mukherjee, M., Janicke, H., Rallis, S.: Threats, Protection and Attribution of Cyber Attacks on Critical Infrastructures. arXiv preprint arXiv:1901.03899 (2019) 2. Van Schyndel, R.G., Tirkel, A.Z., Osborne, C.F.: A digital watermark. In: Proceedings of 1st International Conference on Image Processing, vol. 2, pp. 86–90. IEEE (1994) 3. Malan, D.J., Welsh, M., Smith, M.D.: A public-key infrastructure for key distribution in TinyOS based on elliptic curve cryptography. In: Proceedings of 2004 First Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, IEEE SECON 2004, pp. 71–80. IEEE (2004) 4. Venugopalan, R., Ganesan, P., Peddabachagari, P., Dean, A., Mueller, F., Sichitiu, M.: Encryption overhead in embedded systems and sensor network nodes: modeling and analysis. In: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 188–197. ACM (2003) 5. Karlof, C., Sastry, N., Wagner, D.: TinySec: a link layer security architecture for wireless sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 162–175. ACM (2004) 6. Przydatek, B., Song, D., Perrig, A.: SIA: secure information aggregation in sensor networks. In: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, pp. 255–265. ACM (2003) 7. Yi, X., Paulet, R., Bertino, E.: Homomorphic Encryption and Applications, vol. 3, pp. 1–126. Springer, Heidelberg (2014) 8. Hasan, R., Sion, R., Winslett, M.: The case of the fake Picasso: preventing history forgery with secure provenance. In: 7th USENIX Conference on File and Storage Technologies (FAST 2009). USENIX Association, San Francisco, February 2009. https://www.usenix.org/conference/fast09/technical-sessions/presentation/hasan
384
M. A. Islam
9. Albath, J., Madria, S.: Practical algorithm for data security (PADS) in wireless sensor networks. In: Proceedings of the 6th ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 9–16. ACM (2007) 10. Chen, M., He, Y., Lagendijk, R.L.: A fragile watermark error detection scheme for wireless video communications. IEEE Trans. Multimed. 7(2), 201–211 (2005) 11. Lu, C.-S., Liao, H.-Y.M., Chen, L.-H.: Multipurpose audio watermarking. In: Proceedings of 15th International Conference on Pattern Recognition, ICPR-2000, vol. 3, pp. 282–285. IEEE (2000) 12. Nikolaidis, N., Pitas, I.: Robust image watermarking in the spatial domain. Sig. Process. 66(3), 385–403 (1998) 13. Sion, R., Atallah, M., Prabhakar, S.: Resilient rights protection for sensor streams. In: Proceedings of the Thirtieth International Conference on Very large Data BasesVolume 30, VLDB Endowment, pp. 732–743 (2004) 14. Zhang, G., Kou, L., Zhang, L., Liu, C., Da, Q., Sun, J.: A new digital watermarking method for data integrity protection in the perception layer of IoT. Secur. Commun. Netw. 2017, 12 (2017) 15. Fei, C., Kundur, D., Kwong, R.H.: Analysis and design of secure watermark-based authentication systems. IEEE Trans. Inf. Forensics Secur. 1(1), 43–55 (2006) 16. Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: ProvChain: a blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 468–477. IEEE Press (2017) 17. Sultana, S., Shehab, M., Bertino, E.: Secure provenance transmission for streaming data. IEEE Trans. Knowl. Data Eng. 25(8), 1890–1903 (2013) 18. Sun, X., Su, J., Wang, B., Liu, Q.: Digital watermarking method for data integrity protection in wireless sensor networks. Int. J. Secur. Appl. 7(4), 407–416 (2013) 19. Cox, I.J., Miller, M.L.: Electronic watermarking: the first 50 years. In: Proceedings of 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No. 01TH8564), pp. 225–230. IEEE (2001) 20. Guo, H., Li, Y., Jajodia, S.: Chaining watermarks for detecting malicious modifications to streaming data. Inf. Sci. 177(1), 281–298 (2007) 21. Ren, Y., Shen, J., Wang, J., Xu, J., Fang, L.: Security data auditing based on multifunction digital watermark for multimedia file in cloud storage. Int. J. Multimed. Ubiquit. Eng. 9(9), 231–240 (2014) 22. Zhang, W., Liu, Y., Das, S.K., De, P.: Secure data aggregation in wireless sensor networks: a watermark based authentication supportive approach. Pervasive Mob. Comput. 4(5), 658–680 (2008) 23. Wang, B., Sun, X., Ruan, Z., Ren, H.: Multi-mark: multiple watermarking method for privacy data protection in wireless sensor networks. Inf. Technol. J. 10(4), 833–840 (2011) 24. Wang, B., Kong, W., Li, W., Xiong, N.N.: A dual-chaining watermark scheme for data integrity protection in Internet of Things. CMC-Comput. Mater. Continua 58(3), 679–695 (2019) 25. Collins, J., Agaian, S.: Trends toward real-time network data steganography. arXiv preprint arXiv:1604.02778 (2016) 26. Houmansadr, A., Kiyavash, N., Borisov, N.: Multi-flow attack resistant watermarks for network flows. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1497–1500 (2009)
AEA
385
27. Wendzel, S., Zander, S., Fechner, B., Herdin, C.: Pattern-based survey and categorization of network covert channel techniques. In: ACM Computing Surveys (CSUR), vol. 47, no. 3, p. 50. ACM (2015) 28. Wang, X., Chen, S., Jajodia, S.: Network flow watermarking attack on low-latency anonymous communication systems. In: Proceedings of 2007 IEEE Symposium on Security and Privacy, SP 2007, pp. 116–130. IEEE (2007) 29. Sarkar, T., Sanyal, S.: Steganalysis: detecting LSB steganographic techniques. arXiv preprint arXiv:1405.5119 abs/1405.5119 (2014) 30. An, L., Gao, X., Li, X., Tao, D., Deng, C., Li, J.: Robust reversible watermarking via clustering and enhanced pixel-wise masking. IEEE Trans. Image Process. 21(8), 3598–3611 (2012) 31. Thakur, H., Singh, G.: A privacy and ownership protection digital data techniques: comparison and survey. Indian J. Sci. Technol. 9(47), 1–12 (2016). https://doi.org/ 10.17485/ijst/2016/v9i47/106904 32. Hall, S.R., Allen, F.H., Brown, I.D.: The crystallographic information file (CIF): a new standard archive file for crystallography. In: Acta Crystallographica, pp. 185–206 (1991) 33. Mishra, S., Dastidar, A.: Hybrid image encryption and decryption using cryptography and watermarking technique for high security applications. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–5. IEEE (2018) 34. Jizhi, W., Shujiang, X., Min, T., Yinglong, W.: The analysis for a chaos-based oneway hash algorithm. In: Proceedings of 2010 International Conference on Electrical and Control Engineering, pp. 4790–4793. IEEE (2010) 35. Huo, F., Gong, G.: XOR encryption versus phase encryption, an in-depth analysis. IEEE Trans. Electromagn. Compat. 57(4), 903–911 (2015) 36. Protopopescu, V.A., Santoro, R.T., Tolliver, J.S.: Fast and secure encryptiondecryption method based on chaotic dynamics. U.S. Patent 5479513A, 26 December 1995
Intelligent Monitoring System of Environmental Biovariables in Poultry Farms Gabriela Chiluisa-Velasco(B) , Johana Lagla-Quinaluisa(B) , David Rivas-Lalaleo(B) , and Marcelo Alvarez-Veintimilla(B) Carrera Ingenieria Electr´ onica e Instrumentaci´ on, Universidad de las Fuerzas Armadas ESPE, Sangolqui, Ecuador {gbchiluisa1,jelagla1,drrivas,rmalvarez}@espe.edu.ec
Abstract. Modern technologies in poultry farming prevail over many limitations of traditional methods; thus they help reducing labor costs and increase productivity. In Ecuador, most poultry farms have modest systems capable of monitoring variations in temperature, humidity and gas concentrations caused by the generation of chicken manure in closed environments, which creates a stressful atmosphere affecting the health of broilers during breeding stage; therefore there is a palpable loss of money and productivity. This project presents an intelligent monitoring system composed of a star type sensor network for environmental monitoring variables in the poultry farm. Long Range Technology (LoRa) is used for the system communication, information is gathered and stored in a cloud database and then processed to be visualized through historical trends storage and alarm reporting creating an affordable and easy interpretable solution. Keywords: Sensor network · LoRa parameters · Machine learning
1
· Poultry farms · Environmental
Introduction
The Food and Agriculture Organization mentions that world population has reached 821 million over the last three years in the world until 2017 [1], which caused a rise in demand for production and consumption of proteins and other foods [2]. In consequence poultry meat has been set to be the be one of the most consumed proteins by humans due to it is a nutrient-dense, high-protein, low- saturated fat and low-cholesterol food [3]. In Ecuador, to satisfy local demand, the commercialization of Broiler has had an important growth in the last four years according to the Latin American Poultry Association (ILP) [4], a clear example of this agroindustry, are some poultry farms located in Cotopaxi province, with around 10,000 birds per farm, raised to produce eggs and meat. This production volume has caused the need to measure and monitor relevant variables inside the farm, for a correct growth and development in birds [5]. c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 386–399, 2021. https://doi.org/10.1007/978-3-030-55190-2_29
Intelligent Monitoring System of Biovariables
387
Normal growth of broilers depends on how the farm is designed, taking into consideration environmental parameters and conditions [6], such as humidity, temperature and gases, which are known as environmental biovariables, this, together with the right amount of food gives as a result healthy broilers with ideal weight. These factors are crucial to increase production [7,8]. Poor air quality inside the chicken coop is a result of manure produced by chickens and this causes the release of volatile organic compounds (VOCs) to the atmosphere [9,10] generating the propagation and transmission of pathogenic agents, creating a stressful environment which affects birds health during their breeding stage. Consequently the mortality rate increases [11]. The high cost of implementing a monitoring system capable of recording variations in temperature, humidity and gases is considered as a limitation [12–15]. Modern technologies prevail over these limitations of traditional methods, reducing labor costs and increasing productivity [16,17]. The monitoring and control of variables in agro industries has been improved thanks to the implementation of point-to-point protocols using technologies such as General Package Radio (GPRS) and the well-known Internet of Things (IoT) [18], supported by microcontrollers and Programmable Logic Controllers (PLC) at distances no longer than 100 m [19]. On the other hand, the poultry sector, which is considered as one of the most needy sectors, is currently making use of common wireless communication systems such as Bluetooth, GPRS and Global Mobile Communication Systems (GSM) together with the IoT [20,21] with a range of up to 200 m, to monitor the different conditions of the birds, reducing losses in the production of meat and eggs [22,23]. For sizeable installations where there is a larger production volume and longer distances 200 m, some solutions might not be enough. Some technologies have come up with solutions to deal with long distances, which allows extending the monitoring of many systems. This work offers a system that allows a long-range biovariable environmental monitoring inside the chicken coop, using a star-type sensor network based on Long Range Technology (LoRa) for wireless communication of the nodes, the collected information is stored in a cloud database and processed by intelligent algorithms, to be visualized through historical trends and event and predictive alarms. With this intelligent monitoring system, it is expected to measure environmental bio variables in poultry farms with an error of less than 2%, with respect to the standard equipment with a coverage distance of 300 m round. This article is made up sections: Sect. 2 presents the methodology used to develop the proposed system. In Sect. 3 the experimentation is carried out and results of the implemented system are presented. Section 4 presents conclusions. Finally, Sect. 5 presents future work.
2
Methodology
Poultry farms are located in areas with sufficient amount of water to meet the needs of broilers in addition to respecting the distances enforced by authorities in relation to other farms, urban centers, swampy areas, garbage dumps, wetlands and lakes. The infrastructure and area must have perimeter enclosure in
388
G. Chiluisa-Velasco et al.
order to maintain isolation preventing and controlling the access of people and animals outside the farm. The roofs are made of insulating material, which protects broilers from cold, sun and rain. The floor is made of smooth concrete to facilitate the cleaning, disinfection and total hygiene of the surface. In the windows welded wire mesh is used to exclude entry of birds and predators. The walls design depend on the region where the farm is located, if they are located in the Ecuadorian highlands, they must be 3 m tall to protect broilers from air flow according to the good practice handbook for poultry farms, published in 2016 by the Ministry of Agriculture, Livestock, Aquaculture and Fisheries (MAGAP) [24]. The correct growth and development of broilers are important factors for the productivity and profitability of poultry farms. Variables such as temperature, humidity and gas concentration should be monitored 24 h, 7 days a week, allowing operations staff make decisions based on indicators and historical trends. This work proposes a system that allows monitor environmental bio variables, considering the most important variables the ones inside the chicken coop. The system has several stages: data acquisition, by means of which the bio variable values are collected with the help of sensor nodes, forming a network that allows to get the exact values. The communication system is supported by LoRa, which is responsible for wireless communication between the sensor nodes and send all the information through a physical gateway to a microcomputer to create a cloud database, this database is processed with intelligent algorithms that include filters and classifiers. Data must be displayed in scales of temperature, humidity and gases respectively. The system has a Human Machine Interface for both remote and local access. The remote HMI is available on a web server and the local HMI is installed in the chicken coop. Data is visualized thanks to storage of historical trends and alarm reports so that the operator can access it, as shown in Fig. 1. Data Acquisition: According to the temperature, humidity and gas concentration levels indicated in Table 1, the M Q 135 sensor for CO2 concentration with 2 ppm accuracy, 0.1 ppm resolution and NH3 with precision is selected 1.5 ppm, resolution 0.1 ppm; the DHT 22 is a digital temperature and humidity sensor with high precision of 0.5 ◦ C, resolution of 0.1 ◦ C and precision of ±2% RH and resolution of 0.1% RH for humidity. Network Topology: Because of the architecture that manages the LoRa protocol, the star-type topology is used, so that the first star is made up of sensor nodes along with a gateway, and the second star is made up of the gateway and a network server that avoids information traffic. It is also possible and easy to add new nodes in case any of the nodes fail, the network continues to work normally. Communication System: It consists of a LoRa module for IoT networks, with a frequency of 433 MHz, power up to 600 dmips and −148 dBm sensitivity,
Intelligent Monitoring System of Biovariables
389
Fig. 1. Network diagram. Data acquisition: composed of sensor nodes; Communication System: LoRa wireless communication that works together with the server network called The Things Network; Storage System: Influx cloud database; Processing: filters to reduce error and classifiers used for machine learning; Visualization: Human Machine Interface (HMI).
Table 1. Biovariable parameters for Broiler growth from 1 to 7 weeks. Variable
Levels ◦
Weeks ◦
Temperature 32 C a 34 C 1–2 26 ◦ C a 30 ◦ C 3–4 18 ◦ C a 24 ◦ C 5–7 Humidity
50% a 70%
1–7
CO2 Gas
0 Ns where Ns is the number of support vectors.
On Mistakes We Made in Prior Computational Psychiatry Data
499
To determine parameters of SVM, we utilize a sequential minimal optimization algorithm [27]. The regularization coefficient is set to 1. Decision Trees Decision trees [28], alike SVMs, feature partition space in regions corresponding to classes. However, the regions are not necessarily connected, and the decision boundaries are piecewise linear functions. The technique associates the whole feature space with a root node of a tree and, in the case of the C4.5 algorithm utilize in this study, recursively partitions space by choosing a feature that provides the highest information gain. The partition stops when the minimal number Nmin of samples per node of a decision tree is reached (we utilize Nmin = 2). In the pruning phase, based on the estimation of the classification error (using a confidence level here set to 0.25) the complexity of the model may be reduced and its generalization capacity thus improved [29]. Random Forests Random forests classifier utilizes an ensemble of unpruned trees [30]. The classification is performed by combining classes predicted by ensemble members. Unlike C4.5 algorithm, in each node of a tree, a random subset of the features is considered for partitioning. In this study we utilize ensembles with 100 members and consider int(log2k)+1 random features for each split. Evaluation of Classifier’s Performance The classification accuracy is evaluated through a cross-validation procedure; the dataset is split into K subsets of approximately equal size; K − 1 subsets are used to determine a classification model and the remaining subset to evaluate the classifier. This procedure is repeated K times such that a classifier is evaluated in each subset. In this study, we used K = 10. The classification accuracy is assessed through overall accuracy—the percentage of correctly classified samples—and using the area under the ROC curve (AUC). The overall accuracy of useful classifiers in two-class problems ranges from 50% to 100%. The ROC curve is created by plotting true positive rate (the proportion of samples with depression that are detected as such) vs. false positive rate (the ratio of the total number of controls incorrectly detected as with depression and the total number of controls). AUC ranges from 0.500 (for a classifier that randomly guesses a class) to 1.000 (for an ideal classifier). Feature Extraction To reduce the dimensionality of the feature set and decorrelate the features, we utilize PCA [31] in order to obtain principal components (PCs). PCA normalizes original features by subtracting sample means and calculates sample covariance matrix of such data. The eigenvalue decomposition of the sample covariance matrix yields eigenvectors that are used to linearly project the original feature vectors x into the uncorrelated transformed feature vectors z = [z1 ,…,zk ]. We utilize m < k features z1 ,…,zm corresponding
500
ˇ c et al. M. Cuki´
to m largest eigenvalues of the sample covariance matrix. We define percentage of the explained variance by first m PCs as m
ˆ z2i i=1 σ ˆ x2i i=1 σ
Explained Variance = k
∗ 100%
(9)
where σˆ 2 denotes a sample variance of a corresponding variable.
3 Results Both HFD and SampEn showed significant difference between patients with depression and healthy controls. Since this paper is focused on classification results, we are not reporting those here (it stays in domain of physiological complexity). After decorrelation between HD and SampEn we used them as features for further classification and we also used PCA as method of decreasing the dimensionality of the problem. Table 1 shows classification results of different classifiers that use various numbers of principal components. The principal components were computed on a dataset containing HFD and SampEn features and were normalized to have zero mean and unit standard deviation. The variance of the features explained by the corresponding principal components is also shown.
4 Discussion The major finding of our experiment was that the extraction of non-linear features linked to the complexity of resting-state EEG signals can lead to high separation between signals recorded from patients diagnosed with depression and healthy controls. In line with previous findings [32, 33] about measures used for characterization of EEG, HFD showed increased complexity of EEG recorded from patients with depression in comparison to healthy controls. We showed that SampEn also effectively discriminates these two EEG signals. Our aim with the application of different methods for distinguishing depressed from healthy subjects was not to improve the accuracy of classifiers but to show that with sufficiently good features (i.e., nonlinear measures) a good separation is possible. Further, by using PCA, we strived to demonstrate that such measures may be dependent on each other. Hence the good classification results are possible with a small number of extracted principal components. Therefore we demonstrated separability of the data with all methods utilized. Note that we examined classification methods with a range of underlying paradigms and complexity; the methods belong to statistics and machine learning. Even the simplest methods, such as logistic regression (widely accepted in the medical community although not a classification method in the strict sense) provided excellent classification accuracy. A proper feature extraction is therefore of utmost importance. Our results show that using only the first principal component, it is possible to achieve a classification accuracy of up to 95.12% (Table 1). The best performance was achieved using the Naïve Bayes method (in this case, when only one feature is utilized,
On Mistakes We Made in Prior Computational Psychiatry Data
501
Table 1. Classification results for different classifiers and different number of principal ˇ c et al. [13]) components of the features from SampEn and HFD sets (Cuki´ Number of principal components
1
2
3
10
Explained variance
87.53%
94.25%
95.54%
98.86%
Classifier
Accuracy
AUCb
Accuracy
AUC
Accuracy
AUC
Accuracy
AUC
Multilayer perceptron
92.68%
0.983
92.68%
0.943
92.68%
0.950
95.12%
0.994
Logistic regression
90.24%
0.981
90.24%
0.950
90.24%
0.945
95.12%
0.929
SVMa with 85.37% linear kernel
0.857
82.92%
0.833
85.37%
0.857
90.24%
0.905
SVM with polynomial kernel (p = 2)
73.17%
0.738
68.29%
0.690
85.37%
0.857
95.12%
0.952
Decision tree
92.68%
0.894
92.68%
0.894
90.24%
0.933
90.24%
0.933
Random forest
87.80%
0.981
87.80%
0.981
95.12%
0.987
95.12%
0.996
Naïve Bayes 95.12%
0.983
97.56%
0.981
95.12%
0.988
95.12%
0.988
Average accuracy
88.15%
87.45%
90.59%
93.73%
a SVM - Support Vector Machines, b AUC - area under the curve (related to Receiver Operating
Characteristic – ROC curves)
the assumption of feature independence is automatically satisfied). The classification accuracy generally increases with the number of principal components used at classifiers’ inputs (the average accuracy of all classifiers is 88.15% with 1 and 93.73% with 10 principal components used). Since the data are close to linearly separable, SVM with linear kernel resulted in relatively high accuracy (85.37%) when only one PC is used. The accuracy further increased to 90.24% with 10 PCs. This can be in part explained by Cover’s theorem that data in higher dimensional spaces tend to be more likely linearly separable. It is interesting to observe that with a small number of PCs, linear SVM outperformed SVMs with the polynomial kernel; our hypothesis is that in such cases the inherent non-linearity of polynomial SVMs cannot be fully exploited. Also, this is an indication that the choice of the kernel for SVM may play a significant role in the classification performance of the model. The random forest method benefited from a larger number of utilized principal components, since the method is based on randomly
502
ˇ c et al. M. Cuki´
choosing one from a set of available features to split a decision tree node (when the number of PCs used is small, the set of available features is small). It is important to remember that the goal of our study was not to evaluate the optimal classification accuracy of the discussed classifiers, but to demonstrate the discriminative power of non-linear features. Hence, we utilized default parameters of the classification algorithms from Weka software and did not try to optimize those [34]. The generalization properties of the classifiers are measured using a standard 10-fold cross-validation technique. Since this was only a pilot study, and the repetition of the results is needed on larger data set prior to making a final conclusion about class separability and the potential application of the classification techniques for diagnostic purposes. When we compare our methodology with other publications, majority of previously published work was done on small to modest samples, our study included (although we stated it was pilot study). Ahmadlou performed classification task with the idea to compare two algorithms for calculating the fractal dimension, Higuchi’s and Katz’s [32]. Esteller [16] showed that KFD is more robust to noise compared to Petrosian’s and Higuchi’s algorithm, but others showed [35, 36] that KFD is dependent on the sampling frequency, amplitude and waveforms, which is a disadvantage in analyzing bio signals. The disadvantage of HFD is, according to Esteller [16] that it is more sensitive to noise than KFD. The sample for Ahmadlou study comprised of 12 non-medicated MDD patients and 12 healthy controls. Following studies also had bigger but still modest samples like in [37–40]. Puthankattil had 30 depression and 30 controls, 60 together; Faust and Bairy also, and Ahcarya opted to use just 15 +15. Hosseinifard [41] analyzed sample of 90 persons (45D+45HC), and Bachmann analyzed 33D+30HC [42]. Liao and colleagues had the same sample size as Ahmadlou [43]. Based on previous knowledge from data mining, we know that only on bigger samples we can truly test our models, small samples are leading to so-called ‘unwarranted optimism’ or ‘inflated optimism’ (too high to be true accuracy), apart from the very present practice in published papers that the proper validation was not performed. Among studies we mentioned, only Hosseinifard succeeded in having non-medicated patients diagnosed with depression, and Bachmann group have chosen to analyze the sample consisting of female participants only. When we are talking about sampling frequency contrary to Hosseinifard and our own research (1 kHz), all other researchers used sampling rate of raw EEG to be 256 Hz. Also, Hosseinifard was the only one to analyze all the traces from 19 electrodes (as well as our team) while others opted to use just 1 (Bachmann), 4 (Puthankattil, Faust, Acharya, Bairy), or 7 (Ahmadlou) claiming that it would be enough for detection and that clinicians would most probably prefer single electrode detection (Bachmann). From our results it is clear that all the electrodes are giving their contribution to overall result, as shown on Fig. 1. Therefore, we strongly support that researchers analyze all the traces they have. They can deal with dimensionality later, but at this stage we think it is important to include all the electrodes. The issue with sub-bands is that although they are in use for very long time, no one still did not confirm their physiological meaning [7]; why not using broad-band? It is also worth mentioning that in classical sub-banding higher frequency content is totally abandoned and not contributing to the analysis. From above mentioned studies
On Mistakes We Made in Prior Computational Psychiatry Data
503
Fig. 1. Absolute values of principal components loads for first 10 principal components. Each row contains indicates the coefficients multiplying corresponding non-linear feature in order to ˇ c et al. [13]. generate a principal component. From Cuki´
performing depression classification task that was part of the analysis only in our groups’ work and in Hosseinifard’s. We based our decision on previous work of Goldberger, Peng, Lipsiz, Pincus and others who repeatedly showed that the non-filtered signal is most information rich, when we are performing any kind of nonlinear analysis [6, 18]. The complex intrinsic dynamics of an electrophysiological signal can be destroyed with too much preprocessing, and that is why we think this is important topic. It is also interesting how researchers are handling feature extraction and feature selection; in our case HFD and SampEn was the former and PCA the later. While feature extraction equals creating the features, feature selections’ task is to remove features that are irrelevant or redundant. In Ahmadlou group case the features were fractal dimensions calculated by two distinct algorithms and ANOVA was used to extract those which were relevant, meaning able to differentiate within groups [11]. Puthankattil and team extracted 12 features from prior eight level multiresolution decomposition method of discrete wavelet transform to create feature, i.e. wavelet entropy [37], while Faust used five different entropies as feature, and student’s t-test to evaluate them [38]. Hosseinifard and colleagues used spectral power together with HFD, Correlation dimension and Lapynov exponent (LLE) as features for EEG [41]. Acharya applied 15 different measures both spectral and nonlinear measures for feature extraction: fractal dimension (Higuchi fractal dimension, HFD), Largest Lyapunov exponent (LLE), sample entropy (SampEn), DFA, Hurst’s exponent (H), higher order spectra features (weighted Centre of bispectrum, W_Bx, W_By), bispectrum phase entropy (EntPh), normalized bispectral entropy (Ent1) and normalized bispectral squared entropies (Ent2, Ent3), and recurrence quantification analysis parameters (determinism (DET), entropy
504
ˇ c et al. M. Cuki´
(ENTR), laminarity (LAM) and recurrent times (T2)). They were all ranked using the t value. After that authors formulated Depression Diagnosis Index taking into account only LAM, W_By and SampEn [39]. Hosseinifard used a genetic algorithm (GA) for feature selection. Bachmann used spectral measure SASI, but also calculated Higuchi fractal dimension (HFD), detrended fluctuation analysis (DFA) and Lempel-Ziv complexity as features [42]. In their paper, Liao [43] proposed a method based on scalp EEG and robust spectral spatial EEG feature extraction based on kernel eigen-filter-bank common spatial pattern (KEFB-CSP). They first filter the multi-channel EEG signals (30 electrodes traces) of each sub-band from the original sensor space to new space where the new signals (i.e., CSPs) are optimal for the classification between MDD and healthy controls, and finally applies the kernel principal component analysis (kernel PCA) to transform the vector containing the CSPs from all frequency sub-bands to a lower-dimensional feature vector called KEFB-CSP. Some of the studies we mentioned have very low reproducibility; for example, Bairy did not even mention what algorithm they used for calculating the fractal dimension [40]. While Ahmadlou used Enhanced Probabilistic Neural Networks (EPNN), Puthankattil used artificial feedforward neural network, and Hosseinifard used K-nearest neighbors (KNN), Linear Discriminant Analysis (LDA) and Linear Regression (LR) classifiers. Faust used Gaussian Mixture Model (GMM), Decision Trees (DT), K nearest neighbors (KNN), Naïve Bayes Classifier (NBC), Probabilistic Neural Networks (PNN), Fuzzy Sugeno Classifier (FSC) and Support Vector Machines (SVM), while Acharya used SVM only. It is interesting that many researchers did not even reported the method of validation, like Acharya [39]. In Bachmann (2018) authors used logistic regression with leave-one-out cross-validation. Ten-fold cross validation was used also in our work. Based on HFD (which outperformed KFD) Ahmadlou group obtained high accuracy of 91.3%. Puthankattil obtained performance of artificial neural networks so it resulted in an accuracy of 98.11% (normal and depression signals). Hosseinifard reported classification accuracy the best in the alpha band for LDA and LR both reaching 73.3% (the worst was KNN in delta and beta and LDA in the delta with 66.6%). The best accuracy in the experiment was obtained by LR and LDA classifiers. Conclusion was that ‘nonlinear features give much better results in the classification of depressed patients and normal subjects’ contrary to the classical one. Also, they concluded that depression patients and controls differ in the alpha band more than other bands, especially in the left hemisphere [41]. The accuracy was 99.5%. Acharya was obtaining the accuracy higher than 98%. Faust [38], applied ten-fold stratified cross-validation; the accuracy was 99.5%, sensitivity 99.2%, and specificity 99.7%. Contrary to Hosseinifard they claim that the EEG signals from the right part of the brain discriminate better the depressive persons [38]. Bairy reported an accuracy of 93.8% [40]. We cannot say whether internal or external validation was performed. Liao [43] achieved 80% accuracy with their KEFB-CSP While Bachmann [42] obtained maximal accuracy of 85% with HFD and DFA, but also HFD and LZC, and for only one nonlinear measure maximal 77%. Average accuracy among classifiers obtained in our work ranged from 90.24% to 97.56%. Among the two measures, SampEn had better performance.
On Mistakes We Made in Prior Computational Psychiatry Data
505
Our comparison show that although above mentioned publications used various combination of features and machine learning models, they overall have reached high accuracy in classifying depressive and healthy participants based on their resting-state EEG. Although their direct comparison is challenging, the common denominator for all presented studies can be summarized as comparing methodological steps inevitable in this kind of research where certain features previously shown to be characteristic for depression were used to feed classifiers. It is possible, based on the same nonlinear measures calculated from the resting-state EEG to differentiate between episode and remission of depression [14]. To predict clinical outcomes or relapses (for example, after incomplete remission in recurrent depression) would be of great clinical significance especially in clinical psychiatry. A group of authors elucidated risks, pitfalls and recommend the techniques how to improve model reliability and validity in future research [44–46]. All authors described that neuroimaging researchers who start to develop such predictive models are typically unaware of some considerations inevitable to accurately assess model performance and avoid inflated predictions (called ‘optimism’) [44]. The common characteristics to that kind of research are: classification accuracy is typically overall 80–90%; the size of sample is typically small to modest (less than 50–100 participants); the samples are usually gathered on a single site. Support vector machines (SVM) and its variants are popular but recommendable is the use of embedded regularization frameworks, at least with absolute shrinkage and selection operator (LASO) [46]. Leave-one-out and k-fold cross validation are also popular procedures for validation (for model evaluation), and generalization capability of a model is typically untested on an independent sample [46]. For model evaluation, Vapnik-Chevronenkis dimension should be used [29]. A common denominator to majority of studies is a lack of external validation, or even a contamination between training and testing samples. For the sake of methodology we are mentioning here a study which demonstrated an impeccable methodology in machine learning in every aspect, in the task of prediction the responders to medication in MDD [47]. Their algorithms were all written in R. Generalization is, as we know, the ability of a model that was trained in one dataset to predict patterns in another dataset. When we test the generalizability, we are basically testing whether or not a classification is effective in an independent population. When developing the model, one should be aware of nuisance variables. For example, if using nonlinear measures, they can differ because some of the measures change with age [6] or they can be characteristic for gender [48]. It turns out that the algorithm actually learns to recognize that particular dataset with all its characteristics. Overfitting happens when ‘a developed model perfectly describes the entire aspects of the training data (including all underlying relationships and associated noise), resulting in fitting error to asymptotically become zero’ [46]. For that reason, the model will be unable to predict what we want on unseen (test) data. We must collect more data or establish collaborative project where data can be gathered in numbers which are not achievable for a single site; a certain standardization of a protocol and decision to share can improve the whole endeavor greatly. Some of great collaborative projects like RDoC, STAR*D, IMAGEN, etc. Also, co-recording with fMRI and MEG should be a solution [45]. Another line of research is developing
506
ˇ c et al. M. Cuki´
wireless EEG caps (Epoch, ENOBIO Neuroelectrics, iMotions, just to mention some) which can be used for research in the environment without restraining the patient, and even for monitoring of recovering from severe episodes. Large-scale imaging campaigns and collection of general population data are the prerequisite for translation of those research findings to clinics. The first project to implement 4P (Prediction, Prevention, Personalization and Participation) is SWELL project part of a Dutch national ICT program COMMIT (between 2011 and 2016 in the Netherlands, Leiden University). Many clinical centers are striving to anonymize their data and make it available (like AMC, Amsterdam) since they are aware of this problem. Whelan and Garavan stated that (an unwarranted) ‘optimism increases as a function of the decreasing number of participants and the increasing number of predictor variables in the model (model appears better as sample size decreases)’ [44]. In the same publication they wrote on the importance of keeping training and test subsets completely separate; ‘any cross-contamination will result in optimism’ [44]. The theory of data mining is clear; all machine learning models work best on bigger samples. Use the repository to test your developed model on the unseen cohort. Data mining is the art of finding the meaning from supposedly meaningless data [49]. A minimum rate of ten cases per predictor is common [50], although not a universal recommendation [51]. Optimism can also be lowered with the introduction of the regularization term [52]. Also, using previous information to constrain model complexity relying on Bayesian approaches is recommendable. Bootstrapping is another helpful method [53] as well as cross-validation [54]). Cross-validation tests the ability of the model to generalize and involves separating the data into subsets. Both Kohavi and Ng described the technique [34, 55, 56]. Ng also stated that ‘…optimism becomes unreliable as the probability of overfitting to the test data increases with multiple comparisons’ [56]. To add on this long list of potential problems of misinterpretation and compromising the method by procedural mistakes is publicly expressed concern of both clinical professionals and regulatory bodies about the ethics problems of clinical AI applications. Several legislative efforts are ongoing and UK government leads in this (Topol Review, 2019). What is more, many doctors are concerned that they would become redundant, when it is already shown that General AI (contrary to Narrow AI) is still not available and they do not object many applications which are not obviously stating the use of AI (but are called expert systems instead), that are effective already. In their study elaboration on the future role of clinicians in medicine of the future, Ahuja [57] demonstrate that the human must still stay in the loop and make final decisions in treatment management. Still, there are many questions we still have to answer. Like, instead of asking ‘is algorithm working?’ to ask ‘is the algorithm also useful’. Or to reframe it, due to becoming a buzz word machine learning elicited uninformed expectations as well as unjustifiable fears. In conclusion Ahuja stated that AI-based systems will augment physicians and are unlikely to replace the traditional physician-patient relationship, since application can help physicians exactly that-to spend more time with their patients.
On Mistakes We Made in Prior Computational Psychiatry Data
507
5 Conclusions To conclude we can suggest several levels of standardization in this noble endeavor to offer decision support solutions to everyday psychiatry: • Firstly, the experts in machine learning should establish and maintain high standards in publishing. Practices like registration for the study prior the research are not mandatory at the moment, and certain basic requirements must be exercised in order to give us reliable results. Stages like internal and external validation, as well as precautions of contamination of the data would be a must. If journals can exercise those minimal requirements before the submission of manuscripts, the situation would change for the better. • When we get the new piece of equipment in the laboratory it is natural that members of the team first must learn how to use it, how to calibrate it and how to recognize mistakes which can happen during the process. They must learn how that equipment works, what are the limitations of the method used and in general what it is capable of. The same applies to machine learning practices; we must help our students learn how to use it, but for that they have to understand it first and rely on basic postulates of it. • Different disciplines need to develop different standards how to apply certain methods and how to interpret the results of every machine learning method in that particular field. How we test the accuracy will vary from one to another discipline, but the methodology must be defined in order to avoid unwarranted optimism. This is the only way how researchers, professors, reviewers and editors can pursue the decent level of reproducibility in the field. • Lastly, education in the field of machine learning need implementation of some of broader aspects. There are lots of things to be done. We first learn algorithms and how to use tools, but students need to learn more about the particular practical applications end examine them and understand them in appropriate way. Apart from being scared to become redundant in near future to various AI applications, modern psychiatrists have to learn to embrace those data-driven solutions because they will always be in loop. And it can significantly improve their accuracy, saving time, money and other resources to the whole system. Acknowledgments. Part of this work was supported by the RISEWISE Project H2020-MSCARISE-2015-690874.
References 1. Rush, A.J., et al.: Acute and long-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report. Am. J. Psychiatry 163, 1905–1917 (2006). Mechanical Turk 2. Berinsky, A., Huber, G., Lenz, G.: Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Polit. Anal. 20, 351–368 (2012). WHO
508
ˇ c et al. M. Cuki´
3. World Health Organization. Depression and other common mental disorders (2017). http:// apps.who.int/iris/bitstream/10665/254610/1/WHO-MSD-MER-2017.2-eng.pdf 4. World Health Organisation (2017). http://apps.who.int/iris/bitstream/handle/10665/254 610/WHO-MSD-MER-2017.2eng.pdf;jsessionid=CDB26E156DD7DC804DF23ADC24B 59B90?sequence=1 5. Klonowski, W.: From conformons to human brains: an informal overview of nonlinear dynamics and its applications in biomedicine. Nonlinear Biomed. Phys. 1(1), 5 (2007). Chaos theory eke savi goldberger 6. Goldberger, A.L., Peng, C.K., Lipsitz, L.A.: What is physiologic complexity and how does it change with aging and disease? Neurobiol. Aging 23, 23–26 (2002). Bacar et al., 2015 (iz Ahmadlou uvoda) 7. Bacar et al 2001 8. De la Torre-Luque, A., Bornas, X.: Complexity and irregularity in the brain oscillations of depressive patients: a systematic review. Neuropsychiatry (London) 5, 466–477 (2017) 9. Kwaasteniet, B.D., et al.: Relation between structural and functional connectivity in major depressive disorder. Biol. Psychiatry 74(1), 40–47 (2013) 10. Kim, D., et al.: Disturbed resting state EEG synchronization in bipolar disorder: a graphtheoretic analysis. NeuroImage: Clin. 2, 414–423 (2013) 11. Higuchi, T.: Approach to an irregular time series on the basis of the fractal theory. Physica D 31(2), 277–283 (1988) 12. Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 278(6), H2039–H2049 (2000) ˇ c, M., Stoki´c, M., Simi´c, S., Pokrajac, D.: The successful discrimination of depression 13. Cuki´ from EEG could be attributed to proper feature extraction and not to a particular classification method. Cogn. Neurodyn. Springer, Heidelberg (2020). https://doi.org/10.1007/s11571-02009581-x ˇ c, M., Stoki´c, M., Radenkovi´c, S., Ljubisavljevi´c, M., Simi´c, S., Savi´c, D.: Nonlinear 14. Cuki´ analysis of EEG complexity in episode and remission phase of recurrent depression. Int. J. Res. Methods Psychiatry/IJMPR e1816 (2019). https://doi.org/10.1002/mpr.1816 15. Eke, A., Herman, P., Kocsis, L., Kozak, L.R.: Fractal characterization of complexity in temporal physiological signals. Physiol. Meas. 23(0967–3334 (Print)), R1-38 (2002) 16. Esteller, R., Vachtsevanos, G., Echauz, J., Litt, B.: A comparison of waveform fractal dimension algorithms. IEEE Trans. Circ. Syst. I: Fundam. Theory Appl. 48(2), 177–183 (2001). https://doi.org/10.1109/81.904882 17. Spasi´c, S., et al.: Fractal analysis of rat brain activity after injury. Med. Biol. Eng. Comput. 43(3), 345–348 (2005) 18. Pincus, S.M., Goldberger, A.L.: Physiological time-series analysis: what does regularity quantify? Am. J. Physiol. 266(4 Pt 2), H1643–H1656 (1994) 19. Costa, M., Cygankiewicz, I., Zareba, W., Lobodzinski, S.: Multiscale complexity analysis of heart rate dynamics in heart failure: preliminary findings from the MUSIC study. Comput. Cardiol. 33, 101–103 (2006) 20. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: 11th Conference Uncertainty Artificial Intelligence, San Mateo, pp. 338–345 (1995) 21. Bishop, C.: Neural Networks for Pattern Recognition, pp. 116–160. Oxford University Press, Oxford (1995) 22. Witten, I.H., Frank, E.: Data Mining: Practical Learning Tools and Techniques, 2nd edn., pp. 90–97. Elsevier Inc. (2005) 23. Cox, D.R.: The regression analysis of binary sequences (with discussion). J. Roy. Stat. Soc. B 20, 215–242 (1958) 24. Friedman, J.H., et al.: Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist. 28(2), 337–407 (2000)
On Mistakes We Made in Prior Computational Psychiatry Data
509
25. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson Education, pp. 122– 218 (2009) 26. Kecman, V.: Learning and Soft Computing: Support Vector Machines, Neural Networks and Fuzzy Logic Models, p. 172. A Bradford Book, The MIT Press, Cambridge, Massachusetts (2001) 27. Platt, V.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C. (ed.) Advances in Kernel Methods - Support Vector Learning, pp. 41–65. MIT Press, Cambridge (1998) 28. Quinlan, R.: Programs for Machine Learning, pp. 17–45. Morgan Kaufmann Publishers, San Mateo (1993) 29. Vapnik, V.: Statistical Learning Theory, p. 42. Wiley, New York (1988). Ch. 10 30. Breiman, V.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 31. Jolliffe, V.: Principal Component Analysis, 2nd edn., pp. 10–150. Springer, New York (2002) 32. Ahmadlou, M., et al.: Fractal analysis of frontal brain in major depressive disorder. Int. J. Psychophysiol. 8(2), 206–211 (2012) 33. Bachmann, M., Lass, J., Suhhova, A., Hinrikus, H.: Spectral asymmetry and Higuchi’s fractal dimension of depression electroencephalogram. Comput. Math. Methods Med. 251638 (2013). https://doi.org/10.1155/2013/251638 34. Unnikrishnan, P., et al.: Development of health parameter model for risk prediction of CVD Using SVM. Comput. Math. Methods Med. 2016, ID 3016245 (2016) 35. Raghavendra, B.S., Narayana Dutt, D.: A note on fractal dimensions of biomedical waveforms. Comput. Biol. Med. 39(11), 1006–1012 (2009). https://doi.org/10.1016/j.compbiomed.2009. 08.001 36. Castiglioni, P.: What is wrong with Katz’s method? Comments on: a note on fractal dimensions of biomedical waveforms. Comput. Biol.Med. 40, 950–952 (2010). https://doi.org/10.1016/ j.compbiomed.2010.10.001. Puthankattil et al., 2012 37. Puthankattil, S.D., Joseph, P.: Classification of EEG signals in normal and depression conditions by ANN using RWE and signal entropy. J. Mech. Med. Biol. 12(4), 1240019 (2012) 38. Oliver, F., Chuan, P., Ang, A., Puthankattil, S.D., Joseph, P.K.: Depression diagnosis support system based on EEG signal entropies. J. Mech. Med. Biol. 14(3), 53 (2014). https://doi.org/ 10.1142/s0219519414500353. 53 39. Acharya, U.R., et al.: A novel depression diagnosis index using nonlinear features in EEG signals. Eur. Neurol. 74(1–2), 79–83 (2015) 40. Bairy, V., et al.: Automated classification of depression electroencephalographic signals using discrete cosine transform and nonlinear dynamics. J. Med. Imaging Health Inform. 5(3), 1–6 (2015) 41. Hosseinifard, B., et al.: Classifying depression patients and normal subjects using machine learning techniques and nonlinear features from EEG signal. Comput. Methods Programs Biomed. 109(3), 339–345 (2013) 42. Bachmann, M., Päeske, L., Kalev, K., et al.: Methods for classifying depression in single channel EEG using linear and nonlinear signal analysis. Comput. Methods Programs Biomed. 155(11–17), 34 (2018) 43. Liao, S.C., Wu, C.T., Huang, H.C., Cheng, W.T., Liu, Y.H.: Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns. Sensors 17, 1385 (2017). https://doi.org/10.3390/s17061385.mohammadi. et al., 2015 Pincus, 1998 44. Whelan, R., Garavan, H.: When optimism hurts: inflated predictions in psychiatric neuroimaging. techniques and methods. Biol. Psychiatry 75(9), 746–748 (2014) 45. Gillan, C.M., Whelan, R.: What big data can do for treatment in psychiatry. Curr. Opin. Behav. Sci. 18, 34–42 (2017)
510
ˇ c et al. M. Cuki´
46. Yahata, N., Kasai, K., Kawato, M.: Computational neuroscience approach to biomarkers and treatments for mental disorders. Psychiatry Clin. Neurosci. PCN Front. Rev. (2016). https:// doi.org/10.1111/pcn.12502.tomoki Tokuda et al., 2018 Cho et al., 2019 47. Chekroud, A.M., Zotti, R.J., Shehzad, Z., Gueorguieva, R., Johnson, M.K., Trivedi, M.H., Cannon, T.D., Krystal, J.H., Corlett, P.R.: Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 2, 243–250 (2016) 48. Ahmadi, K., et al.: Brain activity of women is more fractal than men. Neurosci. Lett. 535, 7–11 (2013). Marquand et al., 2016 49. Peter, F.: Machine Learning. Cambridge University Press, Cambridge (2014). ISBN 978-1107-42222-3 50. Peduzzi, P., Concarto, J., Kemper, E., Holford, T., Feinstein, A.: A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 49, 1373–1379 (1996) 51. Vittinghoff, E., McCulloch, C.: Relaxing the rule of ten events per variable in logistic and Cox regression. Am. J. Epidemiol. 165, 710–718 (2007) 52. Moons, K., Donders, A., Steyerberg, E., Harrelli, F.: Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–1270 (2004) 53. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap, vol. 57. Chapman & Hall, London (1993) 54. Efron, B., Tibshirani, R.J.: Improvement on cross-validation. The 632+ bootrstrap method. J. Am. Stat. Assoc. 92, 548–560 (1997) 55. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2, pp. 1137–1143 (1995) 56. Ng, A.Y.: Preventing overfitting of cross-validation data. Presented at the 14th International Conference on Machine Learning (ICML) (1997). http://robotics.stanford.edu/~ang/papers/ cv-final.pdf 57. Ahuja, S.A.: The impact of artificial intelligence in medicine on the future role of the physician. PeerJ 7, e7702 (2019). http://doi.org/10.7717/peerj.7702
Machine Learning Strategies to Distinguish Oral Cancer from Periodontitis Using Salivary Metabolites Eden Romm5 , Jeremy Li4 , Valentina L. Kouznetsova1,2 , and Igor F. Tsigelny1,2,3,5(B) 1 San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
[email protected] 2 Moores Cancer Center, University of California, La Jolla, CA 92037, USA 3 Department of Neurosciences, University of California, La Jolla, CA 92093, USA 4 MAP Program, University of California, La Jolla, CA 92093, USA 5 CureMatch Inc., San Diego, CA 92121, USA
Abstract. Development of non-invasive diagnostic tests which can immediately yield guidance in clinics, reducing the number of samples sent for laboratory testing, has the potential to revolutionize medical diagnostics improving patient outcomes, doctor’s workloads, and healthcare costs. It is difficult to imagine a testing method less invasive than measurement of salivary metabolites by swab or spit. Coupling data from this incredibly convenient measurement to an automated decision-making engine can provide clinicians with immediate feedback on the status of their patients. The aim of this research is to lay the foundation for a system, which when employed in dental practices can help to stratify between patients with periodontitis and oral cancers and illuminate metabolic networks important to both diseases. We built machine learning models trained on QSAR descriptors of metabolites whose concentrations changed drastically between the two diseases and used these same metabolites to illuminate networks important in the development of each disease. The Neural Network developed in TensorFlow performed best, achieving 81.29% classification accuracy between metabolites of periodontitis and oral cancers, respectively. We compared effects of two attribute selection methods, ranking by Correlation and Information Gain coefficients, on the accuracy of the models and employed principal component analysis to the data for dimensionality reduction before training. Models trained on attributes ranked by Information Gain coefficients regularly outperformed those ranked by Correlation coefficients across machine learning methods while relying on fewer principal components. Keywords: Oral cancer · Periodontitis · Machine learning · Deep learning · Diagnostics · Metabolomics · Metabolic networks · Biomarkers
E. Romm, J. Li and V. L. Kouznetsova—Contributed equally to this work. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 511–526, 2021. https://doi.org/10.1007/978-3-030-55190-2_38
512
E. Romm et al.
1 Introduction Technologies running on artificial intelligence (AI) based decision-making engines have the potential to transform the landscape of medical diagnostics. When provided with quality data, they offer the ability to identify trends indicative of disease states without the need for human intervention. However, the fear that these sorts of computational technologies will replace the role of the doctor are unfounded. On the contrary, artificially intelligent diagnostic technologies will enable doctors by reducing the quantity of unnecessary tests and appointments they must endure, allowing them to spend more time on cases which require their attention. Several machine learning (ML) models have been developed for the diagnosis of diseases, mainly cancers. Currently, the most effective applications of machine learning in diagnostics use convolutional neural networks for medical image identification; detecting tumours or other abnormalities in tissues [1] or blood [2]. Li et al. [1] have used a small, ingestible capsule to take endoscopic images and a Support Vector Machine modelling architecture to identify tumours in the digestive tract with 92.4% accuracy [1]. Interestingly, the prediction accuracy is invariant to changes in illumination, something which is often difficult to achieve in image recognition tasks and is especially important when considering a classification performed on images taken in such an inconsistent, turbulent, and photo-deprived region. Vansant and colleagues [2] were able to develop an image recognition machine for more therapeutically relevant diagnosis of samples from 5 common cancer types, cancer types, Prostate, Breast, NSCLC, SCLC, and Bladder, breaking them up into 9 clusters using a combination of machine learning and statistical techniques [2]. Image recognition tasks in cancer diagnostics have been so successful largely for three reasons; Publicly available data sets are abundant and new data is relatively easy to make, they are generally non-invasive, and there is a plethora of algorithms optimized for different image recognition tasks. We see an opportunity for metabolic approaches to see similar success because they have much in common with the image recognition diagnostic approaches. Public sources are rich in information on metabolites, their concentrations, and changes in various diseases are available [3–6]. Measurement of metabolites in an individual require at most a blood draw, but usually either a swab of saliva, as is the case for our model, or a small amount of urine, whereas a tumour biopsy may require sedation and surgery. ML-based classification algorithms, which metabolic diagnostic tools rely on, are some of the most common and well-developed applications of artificial intelligence. We feel that monitoring metabolic changes using AI technologies can greatly contribute to increased survival rates in cancers and other major diseases due to the simplification in screening and monitoring of such diseases. Metabolites are the molecular products of metabolism in every living organism. Current oncological metabolomics research is focused on the discovery of metabolic biomarkers of cancer. The American Cancer Society estimates that approximately 53 000 individuals are to be diagnosed with oral cancer (OC) by the end of 2019, resulting in as many as 10 860 deaths in the United States [8]. Using a metabolomics-based approach can help to lessen this death count in future years by providing a non-invasive method by which to identify this form of cancer earlier. Examining changes in the molecular products of a cancer’s metabolic processes also has potential to be exploited
Machine Learning for Oral Cancer and Periodontitis Finding
513
in monitoring disease progression, as metabolites are the by-products of all processes occurring in the protein networks of the body and its many tissues. Analysis of oral cancer different stages through metabolic indicators can help to tell how far the cancer has developed. Early stages of oral cancer, when there is slight, or no evidence of a tumour can be classified as carcinoma in situ; the cancer cells are abnormally proliferating within the oral epithelium only. Identification of OC at this early stage can have enormously positive consequences for survival rates [9]. Late stages of oral cancer are classified when the tumour has spread thoroughly in the mouth and/or to surrounding tissues or organs [10]. Identification of OC at this late stage can help to pick treatments which are known to be more effective for well-developed carcinomas and whose activity is onset more quickly [10]. Metabolic analysis can be carried out on saliva samples from oral cancer patients in late or early stages and integrated with data from healthy patients to identify OC’s biomarkers or metabolites [11]. Salivary metabolomic profiling is a promising direction for OC diagnosis and monitoring [12]. Ishikawa and colleagues studied profiles of salivary metabolites using capillary electrophoresis–mass spectrometry (CE-TOF/MS) and elucidated 23 metabolites, which exhibited significant fold changes in OC. [13]. Amongst the most affected pathways outlined in the paper were choline, polyamine, and glycolysis [13]. Metabolites used to characterize these pathways were analysed in our study. Much of the difficulty in analysing metabolic data comes from a combination of the sheer number of data points, which need be considered and the inherent instability of measurements in this form of data. Researchers have looked to computerbased approaches to tackle these issues, integrating an ever-growing number of data points to identify functionally significant patterns even in noisy systems. There have been several computer-based approaches to the identification of oral cancers, none of which use metabolic data for the classification. One attempt, by Chuang and colleagues, which required only 4 Singe Nucleotide Polymorphisms (SNPs) of DNA damage repair mechanism-contributing genes was able to achieve 64.2% accuracy on a set of 238 samples of oral cancer and controls [14]. Whether including more than 4 SNPs would benefit the model goes almost without question. Shams and co-authors designed one of the more successful approaches—a model, which could tell patients with oral cancer apart from those who do not have it with 96% accuracy using a deep neural network (DNN) trained on the RNA expression data of 82 patients, 51 with oral cancer and 31 without the disease [15]. These are very promising results even with the class bias in mind. The disadvantage of this approach being that it requires measurement of RNA expression data which may be expensive and time consuming while also more invasive than a metabolomic approach would be. Albeit the numerous studies done on oral cancer detection and biomarker discovery situated on saliva metabolomes, further saliva based metabolomic profiling may yield new putative markers due to the ever-changing, dynamic landscape of salivary metabolomes. Thus, there is a compelling need to find markers of OC, which exhibit greater stability in their concentrations. Biological pathways related to the disease can be identified through the analysis of metabolites, allowing for greater specificity in selection of treatments for the disease. The goal of this research is to analyse metabolite
514
E. Romm et al.
sets of different oral diseases, show their distinguishing and common features, and create a machine-learning (ML) model that can distinguish between different forms of oral disease.
2 Methods 2.1 Approach Overview The programs used for metabolic analysis are MetaboAnalyst 4.0 [16], E-Dragon [17], Waikato Environment for Knowledge Analysis [18], TensorFlow [19], Chemical Translation Service [20], and OpenBabel [21]. The flowchart of methods is shown in Fig. 1.
Fig. 1. The general scheme of the project. We began by collecting data on fold changes of metabolites in the two diseases using a maximum p-value cut-off of 0.05 and excluding those metabolites whose concentration fold changes fit in the range 0.7 to 1.3. There are two arms to the process from this point. The first is the machine learning portion. It begins with pre-processing which includes, normalization, and principal component analysis. The prepared data is then used to train several models. The second, lower arm, is meant to elucidate metabolic pathways which these metabolites belong to, and so those which are implicated in each disease. The pathways this approach suggests being related to the diseases are then compared to the literature to demonstrate the findings made using MetaboAnalyst are consistent with experimental data on the diseases.
2.2 Metabolite Selection We collected metabolites representative of oral cancers [4, 5] and periodontal disease [5–7] from public sources. A three-step cut off was used to ensure quality of data, significance of metabolites used, and pertinence to discrimination between the two diseases. We used a p-value of 0.05 or smaller to make sure the measurement is of statistical significance. We defined a significant fold change as one which is either less than 0.7 or greater than 1.3 with respect to concentrations of the respective metabolites in healthy individuals. Pertinence to discrimination between the two diseases was ensured by removing any metabolites that are not significantly altered in both diseases. This led to a set of metabolites containing 91 metabolites whose concentration changes are significant in
Machine Learning for Oral Cancer and Periodontitis Finding
515
Oral Cancers (OC) and 89 metabolites whose concentration changes are indicative of Periodontal Disease (PD). All 180 metabolites are used to analyse metabolic pathways involved in the diseases and indication of relevant genetic networks. Constraints of descriptor calculation forced the removal of certain metabolites from the machine learning, leaving 76 metabolites of OC and 80 of PD, or 156 total of the 180 total metabolites, to be used in the training of the model. 2.3 MetaboAnalyst 4.0 MetaboAnalyst 4.0 [16] is a program for statistical, functional, and integrative analysis of metabolomics data. It consists of four general categories: (1) exploratory statistical analysis, (2) functional enrichment analysis, (3) data integration and systems biology (biomarker analysis, pathway analysis, and network explorer), and (4) data processing. It accepts a large variety of metabolomics data input types, such as a list of gene/compound names, Kyoto Encyclopaedia of Genes and Genomes (KEGG) ID orthologues [22] or Human Metabolome Database (HMDB) index numbers [23], to support integrative analysis with transcriptomics or metagenomics. 2.4 Chemical Translation Service Chemical Translation Service [20] is a web-based server that performs batch conversions of the most common compound identifiers, including CAS, CHEBI, compound formulas, Human Metabolome Database HMDB, InChI, InChIKey, IUPAC name, KEGG, LipidMaps, PubChem CID+SID, SMILES, and chemical synonyms. 2.5 OpenBabel OpenBabel [21] converts between over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. This was used to convert InChI Code into SMILES. 2.6 E-Dragon E-Dragon [17] is the electronic remote version of the software DRAGON, which is an application for the calculation of molecular descriptors developed by the Milano Chemometrics and QSAR Research Group of Prof. R. Todeschini. This can convert SMILES keys into molecular descriptors, which can then be used to evaluate molecular structure–activity or structure–property relationships, as well as for similarity analysis and high-throughput screening of molecule databases.
516
E. Romm et al.
2.7 Waikato Environment for Knowledge Analysis (WEKA) WEKA [18] is a workbench of ML that supports many popular algorithms, preprocessing techniques, and statistical analysis tools. We employed attribute selection techniques Information Gain ranking (IG) and Correlation ranking (Corr), a host of modelling architectures including Logistic Regression, and principal component analysis for isolation of variance patterns observed and resulting noise reduction. Accuracy was the evaluation metric of choice used to evaluate model utility. 2.8 TensorFlow TensorFlow [19], accessed through the Python programming language, is a package allowing for the application of deep-learning methods, offering multiple levels of abstraction. The high-level Keras API is used to build and train models.
3 Results Our final set of metabolites selected for further study contained 80 metabolites of PD and 76 of OC after removal of metabolites whose concentration fold changes between healthy and disease states where between 0.7 and 1.3, whose p-values were greater than 0.05, and which appeared in both diseases. We studied the pathways related to OC and P, which could exhibit diagnostic potential or information of disease progression, using MetaboAnalyst. 3.1 Metabolic Pathways Oral Cancer Metabolic Pathways. We elucidated a set of metabolic pathways related to oral cancers using our set of salivary metabolites. There are a set of metabolic pathways related to oral cancer that we elucidated based on saliva metabolites (Fig. 2(a)). The analysis of these metabolic pathways was initially described in our article [24]. Aminoacyl-tRNA Biosynthesis. This pathway is involved in many cancers. AminoacyltRNA synthetases are associated with cancer [25]. Arginine and Proline Metabolism It is related to biosynthesis of the amino acids arginine and proline from glutamate. The pathways linking arginine, glutamate, and proline are bidirectional. Thus, the net utilization or production of these amino acids is highly dependent on cell type and developmental stage. Altered proline metabolism has been linked to metastasis formation in breast cancer [26, 27]. D-glutamine and D-glutamate Metabolism. Cancer cells can reprogram glucose metabolism towards aerobic glycolysis instead of oxidative phosphorylation [28]. An alteration in this pathway can be indicative of a cancer.
Machine Learning for Oral Cancer and Periodontitis Finding
a
b
Fig. 2. Metabolic pathways involved in oral cancer (a) and periodontitis (b).
517
518
E. Romm et al.
Alanine, Aspartate, and Glutamate Metabolism Some cancer cell lines display a strong attachment to glutamine even though it is a non-essential amino acid that can be synthesized from glucose. In many cancer cells, glutamine is the primary mitochondrial substrate and is required for maintenance of mitochondrial membrane potential and integrity and for the support of the NADPH production needed for redox control and macromolecular synthesis [29]. Alanine secretion in cancer cell lines supports cancer development [30]. Phenylalanine, Tyrosine, and Tryptophan Biosynthesis. Tryptophan metabolism via the serotonin pathway involved in breast cancer development [31]. Glycine, Serine, and Threonine Metabolism. Alterations in the biosynthesis of serine and glycine are linked to the growth of cancer cells [32] and alterations of serine and threonine metabolism were found in cancers [33]. Periodontitis Metabolic Pathways. We elucidated a set of metabolic pathways related to periodontitis using salivary metabolites of the disease (Fig. 2(b)). Arginine and Proline Metabolism This pathway is also important in OC (see above). In the same time, it is involved in periodontitis mechanism. For example, degradation of arginine and other amino acids occurs in the periodontal pockets by asaccharolytic anaerobic Gram-positive rods [34]. Arginine metabolism to nitric oxide plays a role in periodontitis [35]. Taurine and Hypotaurine Metabolism. Taurine and hypotaurine improve the antioxidant state of chronic periodontitis [36, 37]. Glutathione Metabolism. Glutathione is a known antioxidant. Periodontal patients have diminished level of this molecule in saliva that can be one of the reasons leading to less reactive oxygen species (ROS) in the area [38]. Biosynthesis of Unsaturated Fatty Acids. As shown by El-Sharkawy and colleagues, polyunsaturated fatty acids (PUFAs), including docosahexaenoic acid (DHA) and eicosapentaenoic acid (EPA), can have therapeutic anti-inflammatory and protective actions in periodontitis fighting inflammation process [39].
3.2 Machine-Learning Classifiers Quantitative characterization of the metabolites used in our study was accomplished using QSAR descriptors calculated using EDragon. The resulting set of features contained more than 4000 features representing each metabolite. The descriptor set was normalized in Python before the application of two attribute selection methods leading to two separate sets. One after calculating the information gain (IG) coefficients for each column with respect to the output class in WEKA and ordering the descriptors by their IG values. The second set was made by the same method, but we instead used the correlation coefficient between descriptor and output class to rank the QSAR descriptors’ importance. Further subsets of these two larger sets were made using a variety of
Machine Learning for Oral Cancer and Periodontitis Finding
519
Table 1. Number of features and corresponding threshold value pertaining to each attribute evaluation method in the Neural Network models. Attribute ranker
Number of genes Threshold value
Correlation
681
0.115
Information gain 914
0.3
threshold values for the respective ranking method. Threshold values for coefficients of Corr and IG which led to the most accurate models are presented in Table 1. Principle component analysis (PCA) was then applied to transform each set. PCA allows the models to more quickly find minima while preserving much of the variance present in their data (variance covered by components set to 0.99). The principle components were then normalized and ranked by the same attribute selection method used to determine which features are used to construct them. Components were removed one at a time, from least informative or correlative to most, to determine the component set which leads to the most accurate model. Figure 3 illustrates the accuracy of the most accurate modelling architecture, Neural Network with IG feature selection, as each component is removed (Fig. 3).
85 80
Accuracy /%
75 70 65 60 55 50 0
5
10
15
20
25
30
35
Number of Principle Components /Count Fig. 3. The accuracy changes as a function of the number of highest ranked principle components used for model training for the modelling method which achieved the highest overall accuracy. Peak accuracy, 81.28%, was achieved using the 23 most informative components and is labelled in yellow.
It should be noted that although the components are theoretically ordered by their informative contribution to the problem, not every component contributes positively to
520
E. Romm et al.
the accuracy. This can be due to noise in the data, network topology, or a plethora of other unforeseeable reasons and is typically true for all modelling schemes. Many modelling architectures were investigated, with Logistic Regression in WEKA and a four-layered Neural Network in TensorFlow achieving the highest accuracies in 10-fold cross validation (Fig. 4). 90 80 70
Accuracy /%
60 50 40 30 20 10 0
CorrelaƟon
InformaƟon Gain
ZeroR
51.28
51.28
LogisƟc Regression
76.29
78.21
Neural Network
79.48
81.28
ZeroR
LogisƟc Regression
Neural Network
Fig. 4. The highest accuracies achieved using the two attribute selection methods, ranking by Correlation (Left) and Information Gain (Right). ZeroR is the baseline value, illustrating the result in a prediction based on random chance.
The most accurate results by both modelling techniques were achieved using IG ordered data sets. IG based models not only outperformed those built on the Corr ordered sets, but also required fewer principle components to achieve higher accuracies (Table 2). Table 2. The number of principal components used to build the most accurate model under each architecture. Modeling method
Correlation
Information gain
Logistic regression
21 (76.29%)
17 (78.21%)
Neural network
30 (79.48%)
23 (81.28%)
Less accurate modelling architectures, which we did not include, also tended to demonstrate a higher accuracy when trained in IG ranked components than Corr ranked
Machine Learning for Oral Cancer and Periodontitis Finding
521
components. It is an interesting and sensible result when considering that the IG attribute evaluator is meant to rank utility of input features on their ability to classify while the Corr attribute evaluator is not specific to classification or regression problems. 3.2.1 Neural Network Parameter Tuning in Deep-Learning Models We investigated tuning parameters for the more accurate modelling approach, Neural Network trained on components filtered through IG, and how they behave under various parametric manipulations. We took the two most accurate modelling approaches in this category, M1 and M2 whose parameters are illustrated in Table 3 and changed a single parameter at a time to see whether there were any apparent regularities. We list these observations to streamline the machine learning process, hoping to add some predictability to the model selection and tuning process. The current approach based around the “No Free Lunch” understanding often treated as an axiom is in essence “guess and check”. This is not only unsatisfying but wastes time, decreasing productivity and slowing the rate of progress. Finding regularities in machine learning approaches will help move towards the more efficient development and deployment of models. Table 3. Parameters of the two most accurate models, M1 and M2, developed using the Neural Network with 23 principal components with highest Information Gain. Model Layers
Activation Optimizer Epochs Components Accuracy Deviation Function by Layer
M1
23, 16, 8, 1 Relu, sigmoid, tanh, sigmoid
adagrad
150
23
81.29
8.9
M2
23, 18, 9, 1 Relu, sigmoid, tanh, sigmoid
adagrad
125
23
80.13
7.16
The first notable observation was that all our most accurate models had a sigmoid layer, followed by a layer with an arbitrary activation function, followed by a sigmoid output layer (Table 3). We call this “the sigmoid sandwich” (Fig. 5). The accuracy was routinely higher for models built using the sigmoid sandwich, especially for those which were built using the adagrad optimizer. Training the model with the pair, sigmoid sandwich and adagrad, is necessary to achieve the highest possible accuracy for this prediction with this data (Table 4). No other set of parameters was able to reach similar results, and multiple modelling architectures which exhibit this feature were able to achieve accuracies over 80%. We explored the effects of three common optimizers, adam, adagrad, and stochastic gradient descent on the accuracy of models built using the parameters of M1 and M2. We found adagrad to be the most effective optimizer for increasing accuracy in our models.
522
E. Romm et al.
Fig. 5. Depicting the Sigmoid Sandwich which is composed of two sigmoid activation layers which house another layer with an arbitrary activation function between them.
Table 4. Accuracies and deviation for different models. Depicting a variety of models, we trained in route to developing the most accurate ones. It can clearly be seen that models trained using the sigmoid output layer one removed from another sigmoid layer, sigmoid sandwich, with an adagrad optimizer perform best. Model
Layers
Activation function by layer
Optimizer
Accuracy
Deviation
M1
23, 16, 8, 1
Relu, sigmoid, tanh, sigmoid
adagrad
81.29
8.9
M2
23, 18, 9, 1
Relu, sigmoid, tanh, sigmoid
adagrad
80.13
7.16
M3
16, 16, 7, 1
Relu, sigmoid, relu, sigmoid
adagrad
76.67
8.67
M4
16, 8, 1
Relu, relu, sigmoid
adagrad
76.03
7.09
M5
16, 8, 1
Relu, relu, sigmoid
adam
74.49
7.69
M6
16, 8, 4, 1
Relu, sigmoid, relu, sigmoid
adam
73.53
9.57
M7
16, 4, 1
Sigmoid, sigmoid, relu
adagrad
63.14
7.56
It did this quickly, finding the error minima in 125 to 150 iterations (Table 3). adagrad was the most effective optimizer in nearly all cases; it performed best regardless of the complexity of layers, number of input components, etc., but only when the sigmoid sandwich was present (Fig. 6). We will investigate further whether pairing of the sigmoid sandwich and adagrad is useful in other contexts as we complete later projects. Our final observation had to do with the error rate as a function of the number of layers used to build the model. It became apparent during model training that peak accuracy is achieved using 4 layers. For M1 and M2, accuracy increases up to four layers, and begins decreasing at 5 layers (Fig. 7). We found this interesting because deeper learning is often understood to be better, increasing accuracy and thus making models more powerful. It is important to understand that this is not always the case. The best model is not the most complicated one, or the one which relies on the most advanced techniques, but the one which most accurately maps the topology of your problem. For our prediction, four layers is just right.
Machine Learning for Oral Cancer and Periodontitis Finding
Accuracy /%
82 81 80 79 78 77 76 75 74 73
M1
M2
Adagrad
81.29
80.13
SGD
78.13
78.17
Adam
76.29
77
523
Model Adagrad
SGD
Adam
Fig. 6. Peak accuracies achieved under each optimizer with for M1 and M2 modelling parameters.
82 81
Accuracy /%
80 79 78 77 76 75 2
2.5
3
3.5
4
4.5
5
5.5
6
Number of Layers /Count Accuracy M1
Accuracy M2
Fig. 7. Accuracy changes at different numbers of layers. The highest accuracy is at 4 layers for both models.
4 Conclusion We developed a machine-learning-based classifier for the recognition of oral cancer, as opposed to periodontal disease, based on saliva metabolites with 81.28% accuracy.
524
E. Romm et al.
This approach can pave a road towards the diagnosis of such cancers from salivary metabolites, a much less invasive method than the current state of the art. It can be used for many of other cancers. We also investigated the impact of various parametric changes on the accuracy of our models. We hope to contribute towards the production of a test, which will allow dentists to diagnose the disease quickly and without having to take tissue samples and to bring some clarity to the process of developing a machine learning model. Conflict of Interest. There is no conflict of interest to declare.
References 1. Li, B., Meng, M.Q.: Tumor recognition in wireless capsule endoscopy images using textural features and SVM-based feature selection. IEEE Trans. Inf. Technol. Biomed. 16(3), 323–329 (2012). https://doi.org/10.1109/titb.2012.2185807 2. Vansant, G., Jendrisak, A., Sutton, R., Orr, S., Lu, D., Schonhoft, J., Wang, Y., Dittamore, R.: Functional cell profiling (FCP) of ~ 100,000 CTCs from multiple cancer types identifies morphologically distinguishable CTC subtypes within and between cancer types. J. Clin. Oncol. 37(15_Suppl.), e14553 (2019). https://doi.org/10.1200/JCO.2019.37.15_suppl.e14553 3. Dame, Z.T., Aziat, F., Mandal, R., Krishnamurthy, R., Bouatra, S., Borzouie, S., Guo, A.C., Sajed, T., Deng, L., Lin, H., Liu, P., Dong, E., Wishart, D.S.: The human saliva metabolome. Metabolomics 11(6), 1864–1883 (2015). https://doi.org/10.1007/s11306-015-0840-5 4. Lohavanichbutr, P., Zhang, Y., Wang, P., Gu, H., Nagana Gowda, G.A., Djukovic, D., Chen, C.: Salivary metabolite profiling distinguishes patients with oral cavity squamous cell carcinoma from normal controls. PLoS ONE 13, e0204249 (2018). https://doi.org/10.1371/journal.pone. 0204249 5. Wong, D.T.W., Tomita, M., Sugimoto. M., Hirayama, A., Soga, T.: Salivary metabolic biomarkers for human oral cancer detection. US Patent application US20100210023A1 (2010) 6. Barnes, V.M., Kennedy, A.D., Panagakos, F., Devizio, W., Trivedi, H.M., Jönsson, T., Guo, L., Scannapieco, F.A.: Global metabolomic analysis of human saliva and plasma from healthy and diabetic subjects, with and without periodontal disease. PLoS ONE 9, e105181 (2014). https://doi.org/10.1371/journal.pone.0105181 7. Aimetti, M., Cacciatore, S., Graziano, A., Tenori, L.: Metabonomic analysis of saliva reveals generalized chronic periodontitis signature. Metabolomics 8, 465–474 (2012). https://doi.org/ 10.1007/s11306-011-0331-2 8. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2019. CA: Cancer J. Clin. 69, 7–34 (2019). https://doi.org/10.3322/caac.21551 9. Schiffman, J.D., Fisher, P.G., Gibbs, P.: Early detection of cancer: past, present, and future. Am. Soc. Clin. Oncol. Educ. Book 2015, 57–65 (2015). https://doi.org/10.14694/EdBook_ AM.2015.35.57 10. Patel, K.N., Shah, J.P.: Neck dissection: past, present, future. Surg. Oncol. Clin. North America 14(461–477), vi (2005). https://doi.org/10.1016/j.soc.2005.04.003 11. Yuvaraj, M., Udayakumar, K., Jayanth, V., Rao, A.P.: Fluorescence spectroscopic characterization of salivary metabolites of oral cancer patients. J. Photochem. Photobiol. B: Biol. 30, 153–160 (2014). https://doi.org/10.1016/j.jphotobiol.2013.11.006 12. Mikkonen, J.J., Singh, S.P., Herrala, M., Lappalainen, R., Myllymaa, S., Kullaa, A.M.: Salivary metabolomics in the diagnosis of oral cancer and periodontal diseases. J. Periodontal Res. 2015, 431–437 (2015). https://doi.org/10.1111/jre.12327
Machine Learning for Oral Cancer and Periodontitis Finding
525
13. Ishikawa, S., Sugimoto, M., Kitabatake, K., Sugano, A., Nakamura, M., Kaneko, M., Ota, S., Iino, M.: Identification of salivary metabolomic biomarkers for oral cancer screening. Sci. Rep. 6, 31520 (2016). https://doi.org/10.1038/srep31520 14. Chuang, L.Y., Wu, K.C., Chang, H.W., Yang, C.H.: Support vector machine-based prediction for oral cancer using four SNPs in DNA repair genes. In: Proceedings of International Multiconference of Engineers and Computer Scientists, 1, IMECS 2011, March 16–18, 2011, Hong Kong (2011) 15. Shams, W.K., Htike, Z.Z.: Oral cancer prediction using gene expression profiling and machine learning. Int. J. Appl. Eng. Res. 12, 4893–4898 (2017) 16. Chong, J., Soufan, O., Li, C., Caraus, I., Li, S., Bourque, G., Wishart, D.S., Xia, J.: MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis. Nucleic Acids Res. 46, W486–W494 (2018). https://doi.org/10.1093/nar/gky310 17. Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.A., Prokopenko, V.V.: Virtual computational chemistry laboratory—design and description. J. Comput.-Aided Mol. Des. 19, 453–463 (2005). https://doi.org/10.1007/s10822-005-8694-y 18. Frank, E., Hall, M.A., Witten, I.H., Pal, C.J.: Appendix B: The WEKA workbench. In: Data Mining, 4th edn. Morgan Kauffmann Publishers, Cambridge, Mass, pp. 553–571 (2017). https://doi.org/10.1016/b978-0-12-804291-5.00024-6 19. Abadi, M., Agarwal, A., Barham, P., Zheng, X.: TensorFlow: learning functions at scale. ACM SIGPLAN Notices 51, 1 (2016). https://doi.org/10.1145/3022670.2976746 20. Wohlgemuth, G., Haldiya, P.K., Willighagen, E., Kind, T., Fiehn, O.: The chemical translation service—a web-based tool to improve standardization of metabolomic reports. Bioinformatics 26, 2647–2648 (2010). https://doi.org/10.1093/bioinformatics/btq47 21. O’Boyle, N.M., Banck, M., James, C.A., Morley, C., Vandermeersch, T., Hutchison, G.R.: OpenBabel: an open chemical toolbox. J. Cheminform. 3, 33 (2011). https://doi.org/10.1186/ 1758-2946-3-33 22. Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K., Tanabe, M.: New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47, D590–D595 (2018). https:// doi.org/10.1093/nar/gky962 23. Wishart, D.S., Feunang, Y.D., Marcu, A., Guo, A.C., Liang, K., Vázquez-Fresno, R., Sajed, T., Scalbert, A.: HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018). https://doi.org/10.1093/nar/gkx108 24. Kouznetsova, V.L., Li, J., Romm, E.L., Tsigelny, I.M.: Finding distinctions between oral cancer and periodontitis using saliva metabolites and machine learning. Oral Diseases (2020, in press) 25. Kim, D., Kwon, N.H., Kim, S.: Association of Aminoacyl-tRNA synthetases with cancer. In: Kim, S. (eds.) Aminoacyl-tRNA Synthetases in Biology and Medicine. Topics in Current Chemistry, vol. 344. Springer, Dordrecht (2013) 26. Elia, I., Broekaert, D., Christen, S., Boon, R., Radaelli, E., Orth, M.F., Verfaillie, C., Fendt, S.-M.: Proline metabolism supports metastasis formation and could be inhibited to selectively target metastasizing cancer cells. Nat. Commun. 8, 15267 (2017). https://doi.org/10.1038/nco mms15267 27. Phang, J.M., Liu, W., Hancock, C.N., Fischer, J.W.: Proline metabolism and cancer. Curr. Opin. Clin. Nutr. Metab. Care 18, 71–77 (2015). https://doi.org/10.1097/mco.000000000000 0121 28. Kim, S.-Y.: Cancer energy metabolism: shutting power off cancer factory. Biomol. Therapeutics 26, 39–44 (2018). https://doi.org/10.4062/biomolther.2017.184 29. Wise, D.R., Thompson, C.B.: Glutamine addiction: a new therapeutic target in cancer. Trends Biochem. Sci. 35, 427–433 (2010). https://doi.org/10.1016/j.tibs.2010.05.003
526
E. Romm et al.
30. Sousa, C.M., Biancur, D.E., Wang, X., Halbrook, C.J., Sheman, M.H., Zhang, L., Kimmelman, A.C.: Pancreatic stellate cells support tumour metabolism through autophagic alanine secretion. Nature 536, 479–483 (2016). https://doi.org/10.1038/nature19084 31. Juhász, C., Nahleh, Z., Zitron, I., Chugani, D.C., Janabi, M.Z., Bandyopadhyay, S., AliFehmi, R., Mangner, T.J., Chakraborty, P.K., Mittal, S., Muzik, O.: Tryptophan metabolism in breast cancers: molecular imaging and immunohistochemistry studies. Nucl. Med. Biol. 39, 926–932 (2012). https://doi.org/10.1016/j.nucmedbio.2012.01.010 32. Amelio, I., Cutruzzolá, F., Antonov, A., Agostini, M., Melino, G.: Serine and glycine metabolism in cancer. Trends Biochem. Sci. 39, 191–198 (2014). https://doi.org/10.1016/ j.tibs.2014.02.004 33. Locasale, J.W.: Serine, glycine and one-carbon units: cancer metabolism in full circle. Nat. Rev. Cancer 13, 572–583 (2013). https://doi.org/10.1038/nrc3557 34. Uematsu, H., Sato, N., Hossain, M.Z., Ikeda, T., Hoshino, E.: Degradation of arginine and other amino acids by butyrate-producing asaccharolytic anaerobic Gram-positive rods in periodontal pockets. Arch. Oral Biol. 48, 423–429 (2003). https://doi.org/10.1016/s0003-996 9(03)00031-1 35. Ozer, L., Elgun, S., Ozdemir, B., Pervane, B., Ozmeric, N.: Arginine–nitric oxide–polyamine metabolism in periodontal disease. J. Periodontol. 82, 320–328 (2011). https://doi.org/10. 1902/jop.2010.100199 36. Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2012). https://doi.org/10.1007/s00726-012-1361-4 37. Yatsuoka, W., Ueno, T., Miyano, K., Uezono, Y., Enomoto, A., Kaneko, M., Ota, S., Soga, T., Sugimoto, M., Ushijima, T.: Metabolomic profiling reveals salivary hypotaurine as a potential early detection marker for medication-related osteonecrosis of the jaw. PLoS One 14, e0220712 (2019). https://doi.org/10.1371/journal.pone.0220712 38. Bains, V.K., Bains, R.: The antioxidant master glutathione and periodontal health. Dental Res. J. 12, 389–405 (2015). https://doi.org/10.4103/1735-3327.166169 39. El-Sharkawy, H., Aboelsaad, N., Eliwa, M., Darweesh, M., Alshahat, M., Kantarci, A., Hasturk, H., Van Dyke, T.E.: Adjunctive treatment of chronic periodontitis with daily dietary supplementation with omega-3 fatty acids and low-dose aspirin. J. Periodontol. 81, 1635–1643 (2010). https://doi.org/10.1902/jop.2010.090628
Smart Guide System for Blind People by Means of Stereoscopic Vision Jes´ us Jaime Moreno Escobar1(B) , Oswaldo Morales Matamoros1 , no´n Mart´ınez1 , Ricardo Tejeida Padilla2 , Jhonatan Casta˜ 1 and Mario Mendieta L´ opez 1
Escuela Superior de Ingenier´ıa Mec´ anica y El´ectrica, Zacatenco, Instituto Polit´ecnico Nacional, Mexico City, Mexico [email protected] 2 Escuela Superior de Turismo, Instituto Polit´ecnico Nacional, Mexico City, Mexico
Abstract. The main goal of this work is to carry out a system that complements the tasks of detecting the white cane, locating objects or obstacles that are hollow, exceeding their detection and aerial range, through a stereoscopy system implemented in an embedded breadboard Raspberry PI. Thus, the present work consists of obtaining three-dimensional information through two webcams, for the representation of this a disparity map filtered by Weighted Least Squares is calculated and an algorithm based on the averages of three different regions is proposed. In this way, the system makes a decision which is transmitted to the System User through connected headphones. The system alerts to avoid collisions, with voice commands indicating to the user the best route to avoid obstacles. This system is implemented in a base-3D printer which has a compartment for a battery, which will feed the RaspBerry PI. Keywords: Computer vision systems · Real time system
1
· 3D/stereoscopic vision · Embedded
Introduction
In Mexico 1,292,201 people suffer from blindness, according to a study conducted by the National Institute of Statistics and Geography (INEGI) [1]. The statistics given by this census show (Fig. 1) that the second cause of disability in Mexico is vision loss. During life, the sense of sight is of vital importance for all activities that are carried out daily, the deficiency or absence of this sense causes the person who suffers a physical and social limitation, which makes total integration into the community impossible and related tasks of it. The tool most frequently used by those who possess this condition is the white cane, however its design has limitations in the location of objects/obstacles that are hollow, exceed its detection and aerial range. Using information acquired from a disparity map it c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 527–544, 2021. https://doi.org/10.1007/978-3-030-55190-2_39
528
J. J. M. Escobar et al.
Fig. 1. Disability statistics in Mexico.
is possible to cover a greater detection range by creating a complement for the white cane. That is why this proposed project is based on the creation and implementation of a stereoscopy system, which is a tool that complements the white cane when its detection range is exceeded. To do this, we use two webcams implemented in a development card, the stereoscopy uses epipolar geometry to model this type of vision, in which information from two different planes is captured in the same instant of time; for the realization of this, the proposed system uses parallel and real-time processing [6]. The system makes a depth map using block matching to obtain information on the distance of the objects and using a weighted least squares filter to obtain information with less noise. The prototype of the system is placed by means of crossed straps and backpack brooches, using them at chest height. Thus, we can define the objective of this work as the design and construction of an intelligent guide system in order to complement the tasks of the white cane, locating and indicating routes to avoid objects/obstacles that are hollow, exceed their detection and aerial range. This will take place analyzing the existing typhotech systems to determine opportunities in fields unexplored by them in order to improve the techniques used in obstacle recognition, addressed in Sect. 2. In the next section, the necessary elements for the creation of a disparity map in a stereoscopic system are determined, in order to obtain three-dimensional information in a scene. Also in Sect. 3, a disparity map is calculated in order to obtain a graphic representation of the proximity of objects in a scene. The methodology of the proposal design is shown in Sect. 4, where the algorithm is
Smart Guide System for Blind People
529
designed to avoid collisions with obstacles using the information on the disparity map. Finally, the experimental results of the system used by a volunteer are analyzed in order to check the operating conditions of system work in Sect. 5, in addition to comparing the proposed work with other related works.
2
Typhlological Devices
The word typhlotechnology comes from the Greek tiflo which means blind and technology can be defined as the set of knowledge of an industrial art, which allows the creation of artifacts or processes to produce them [4]. It includes the set of techniques, knowledge and resources to help people with visual or blind disabilities to achieve personal autonomy and full social, labor and educational integration. Typhlotechnology devices are divided into two categories: typhological devices that are low-tech and typhotechnical devices are those that have operating autonomy and those that help access information from a computer. 2.1
Low Tech
The most common is the White Staff as such dates back to the 1930s, and although in history there are several versions about its inventor and origin, one of the most accepted by the world community is that of the Argentine inventor Jos´e Mario Fallotico [3], to whom his invention is attributed but not his patent, since the latter was made by George A. Benham, president of the Illinois Lions Club (USA) who proposed it as a way to identify publicly in a blind condition. It is a tool for people with visual or blind disabilities, it is painted white with a red stripe as a rule (see Fig. 2), its length should be at the sternum, and the measurement varies according to the height. The white cane is taken from the handle and it is placed diagonally on the floor, so that the tip is approximately one meter away from the person with this has an obstacle detection area.
Fig. 2. White folding cane.
530
J. J. M. Escobar et al.
On the other hand, the first event that has been recorded about systematic tame for dogs that help blind people, occurred in 1780 at Les Quinze-Vingts hospital for blind people in Paris. Later in 1788, Josef Riesinger, a blind sieve of Vienna, trained a Spitz (Fig. 3) that was a watershed that showed the canine ability to guide a human being.
Fig. 3. Representation of the first guide dog.
2.2
With Operating Autonomy
Electronic navigation aids for visually impaired people have grown from adding sensors to white canes to having cameras integrated in the lenses. These tools try to improve and in some cases replace it. Vision replacement systems can be further classified into Electronic Travel Devices (ETA), electronic guidance devices and position locating devices. As the name implies, ETAs are useful devices for a user with visual impairment. ETAs are based on a set of sensors, but only cameras, capable of detecting the environment and identifying a possible set of obstacles along the way. As a result, they provide feedback (through various sensory modalities) to take the user to a particular place or to alert him in case of an obstacle. Within the ETA devices there is the work proposed by Sez et al. in [5], which focuses on the detection of aerial objects, these are referred to in this project as objects such as branches and awnings, in general all those objects that are impossible to detect with a cane or with the help of a guide dog, being a support
Smart Guide System for Blind People
531
Fig. 4. Use of Smart Phone for the system proposed by Sez et al. in [5].
for these tools mentioned above. Since the software is designed for the Android operating system, a cell phone is necessary for the use of this tool, which supports people in the detection of aerial objects, for the correct use of this program it is necessary to use a strap that attaches to the cell phone by the user’s neck, Fig. 4. Another navigation assistance system for the mobility of people with visual impairment was proposed by Xiao et al. in [7], which integrates real-time location technologies based on events interpreted from social networks, with GPS sensor, pedometer, camera, IMU and RGB-D sensor, making use of devices that allow system interaction -user who are headphones- and a vibratile device.
3 3.1
Preliminary Steps to Establish the Proposal Capture and Axial Rectification of the Images
In the implementation of stereoscopy it is necessary to use a system with two cameras, which capture a scene at the same time, with the use of conventional programming. The tasks are taken care of sequentially, in this way the photographs are taken at different moments of time. This is why parallel computing implemented thanks to the POSIX standard is used, with this the program is divided into threads, which two of these perform the task of taking photos simultaneously. When observing the comparison between the cameras, in the conventional programming the LEDs that indicate the state of the camera (on\off), turn on one after the other with a difference of three seconds; with the programming with wires these indicators are turned on. At the same time, it is also observed how the photos are taken at different times, this represents a difference between the images. To capture simultaneous images using Python threads, it is necessary to assign variables to the initialization of the camera, this assignment has a number of USB ports, in this project 0 and 1 values are used, this process causes the cameras to initialize. The next step is to define two functions one
532
J. J. M. Escobar et al.
Fig. 5. Flowchart of image capture using threads.
for each camera, these are assigned to the threads later, it receives an argument that is a Queue element that is called in programming as a queue and is of type LIFO (last in, first out). The function captures an image and is assigned to a variable that is taken out as it is queued by the Queue.put method, this process is put in an infinite cycle so that it only stops taking photos when it stops the program. The creation of the two threads is done with the POSIX standard, these have the function to capture the images and the queue, since for the extraction of information from a thread the Queue element is used, the last step is the creation of a function main (main), the threads are initialized with the Start method and the variables they contain are extracted and assigned to a variable for the next process. Figure 5 shows the process used to capture images using parallel computing. The stereoscopic systems have to be calibrated in order to obtain the intrinsic parameters of the cameras that will be used for the development of the system, as well as the calculation of the distortion coefficients of these, achieving a reconstruction of the 2D-world of the environment taken to a 3D-approaching reality. With the camera calibration we will obtain the following parameters: Vector of vector of the points of the calibration pattern, camera matrix of dimensions 3 × 3, rotation vector, translation vector. For this task, OpenCV libraries offer special tools or functions to calibrate and obtain these parameters. That is why the general flow chart shown in Fig. 6 is proposed.
Smart Guide System for Blind People
533
Fig. 6. Flowchart of the image calibration.
The calibration stage is performed independently of the application, where the images to be read are specific to achieve the desired parameters; the images are taken with the image capture program of an irregular chess board with the dimensions 9 × 6. With the parameters the images will no longer be useful in the main program. Finalizing the calibration process is not necessary to do it again, as long as the system cameras are not replaced. Otherwise, it will be necessary to recalibrate the system cameras. The output variables of this calibration process are: ret Left and right rotation vectors are expressed by the Eqs. 1 and 2, respectively. ⎤ 0.0163901336 ⎢ −0.85345944 ⎥ ⎥ ⎢ retl = ⎢ 0.0093444958 ⎥ ⎣−0.002803522⎦ 64.4986675
(1)
⎤ 0.0164729316 ⎢ −1.15816876 ⎥ ⎥ ⎢ retr = ⎢ 0.0130564647 ⎥ ⎣−0.0116020930⎦ 44.1493876
(2)
⎡
⎡
K Matrices of the left and right chamber with the dimensions of 3 × 3 pixels are expressed by the Eqs. 3 and 4, respectively.
534
3.2
J. J. M. Escobar et al. ⎤ ⎡ 1440.24816 0.00000000 501.586786 Kl = ⎣0.00000000 1443.15894 493.623155⎦ 0.00000000 0.00000000 1.00000000
(3)
⎡ ⎤ 1447.39686 0.00000000 499.525286 ⎣ ⎦ 0.00000000 1451.67547 488.456999 Kr = 0.00000000 0.00000000 1.00000000
(4)
Estimation of the Disparity Map
Since the corrected image in this project is used, α = 1 is the time to create the disparity maps, then we will create two one for the right side and one for the left side. Thus, the parameters are estimated to calculate the disparity map as well as other variables of great importance by means of the cv2.StereoSGBM create() function. This function is a modification of the algorithm of Hirschmuller [2], in which the SGM method is used (for its acronym in English Semiglobal Matching), the idea underlies the coincidence of pixels of mutual information and approximation of restriction of global 2D smoothness by combining restrictions. The parameters are assigned to calculate the disparity map by blocks, the function in the main program uses the following parameters: – Minimum Disparity: −1, is the minimum disparity value, usually zero, but the rectification algorithms vary by which it must be adjusted. – Maximum disparity: 32, must be greater than zero and divisible by sixteen. – Uniqueness rate: 15, is the margin of error in percentage, by which the best minimum value calculated based on the cost of profit. – Brightness window size: 200, is the maximum size of the disparity area to take into account its noise points and invalids. – Brightness range in the window: 32, is the maximum disparity variation within each connected component. – Prefilter threshold: 23, is the cut-off value for previously filtered image pixels. The algorithm first calculates the derivative x in each pixel and cuts its value using the interval [−λ, λ]. – Maximum difference allowed: 11, in units of pixels in the comparison to the disparity, which is necessary; to disable it, it must enter a negative number. It is important to set all its parameters for greater control of the disparity map, in case the quality of the map is not so important, only the parameters of Maximum Disparity and Block Size must be established. The disparity maps that are obtained from the first instance are shown by Fig. 7. Individually, they give results which need an intermediate process of refinement or rectification for subsequent processes, since none of the maps correctly shows the silhouette of any object contained in the image (Fig. 7(c) and Fig. 7(d)). In order to obtain refined disparity maps, both least squares filters are applied to both maps weighted or WLS (Weighted Least Squares).
Smart Guide System for Blind People
535
Fig. 7. Stereoscopic images. (a–b) Original View and (c–d) Disparity Maps.
3.3
Filtering and Rectifying the Disparity Map
The disparity map filter is based on the WLS filter in the form of Fast Global Smoother, which is much faster than traditional WLS filter implementations and optional use of trust based on the consistency of the left and right disparity maps, to obtain hole-free disparity maps. This filter also requires some parameters that can be varied, so for this project are used λ = 80000 and σ = 1.5. The disparity maps calculated in Fig. 7 are rectified in the left side of the image shown in Fig. 8, in which the details of the scenario can be better appreciated three-dimensional, it should be mentioned that the same applies to the right view. In addition, it should keep in mind that the original view is being used to guide the filtering process. The disparity map is automatically scaled up in a manner compatible with the resolution of the original view. Rectification and filtering do not remove all background noise, which is why two additional steps are performed, namely a morphological transformation and a threshold of the disparity map. The morphological transformation applied is Closing or closing black dots, although it is not visually visible in Fig. 8 because there may be only a few pixels scattered throughout it, the system at the time of performing subsequent processes will include them affecting the correct decision making. Morphological transformations are some simple operations based on the shape of the image. It is needed two entries, one is our original image, the
536
J. J. M. Escobar et al.
Fig. 8. Rectified disparity map.
second is called a structuring element or kernel that decides the nature of the operation. To close the points, dilation is used, followed by erosion, which reduces background noise, resulting in Fig. 9. In Fig. 10 is shown a better appreciation between the initial and final difference of the filtering or rectifying by adding the morphological transformation Closing, where the red ellipses are only with the filter. In the left part it is appreciated that the black regions disappear and change to gray regions, however in the upper right part the corners are softened and in the lower right part both things occur. Achieving a disparity map of the scene in greater detail visually. Both the calibration and the WLS filter do not remove all the noise since the operations of making the disparity map generate noise due to its processing speed. To solve this problem, the binary thresholding principle is used, in which the whole image can be 0 or 255 depending on the intervals established. The same principle will be applied with the difference that if the pixels are equal to or greater than 224, they will be sent to zero and the rest will retain their values. The distance of the minimum detection point is 1 m on the disparity map, objects or people one meter away have pixel values less than 224, and in cases where it exceed this value, it was noise.
4
Proposal: Intelligent Guidance System for Blind People
The fundamental proposal of this work is an Smart Guide System for Bind People (SGSBP). Thus, to determine the null existence of any obstacle in front of the user, a segmentation of the disparity map is required, which has dimensions of 240 pixels high and 288 pixels long, is divided into three sections: central, left and right. The central section has dimensions from pixel 0 to pixel 240 high and long from pixel 109 to 179 long, this section was estimated taking the dimensions of a door and observing two meters that pixels occupied.
Smart Guide System for Blind People
537
Fig. 9. Disparity map with filter and closing points.
Fig. 10. Comparison of filtered disparity maps (left) and rectified (right).
The right and left section have the same dimensions and are used to determine in which section there is greater possibility of passage, the dimensions of these sections are shown in the Table 1. Table 1. Segmentation o the disparity map. Section Width (pixels) Height (pixels) Central 109 a 179 Left Right
0 a 240
30 a 178
0 a 240
179 a 288
0 a 240
538
J. J. M. Escobar et al.
Fig. 11. Region without disparity map information.
The original disparity map is shown in Fig. 11, in this it is observed that there is a region without information, this covers from pixels (0.240 to 30.240), this loss of information is caused because of the depth map is calculated by block matching and there is information that is not available in the two images. This region is highlighted with a red box. Figure 12 shows the segmentation performed, for this a variable -with the coordinates of the regions of interest- is used, the Numpy library facilitates the selection of these due to its structure in Matrix management. Once the disparity map is segmented, the average of each section is calculated. These averages are used since the function of a disparity map is to have a representation of the distant and nearby objects, the nearby objects have a numerical value close to 255, and when the object is at a distance far from the camera it will have a representation equivalent to 0, thus the average of the sections will indicate which section is most likely to pass since its value will be closer to an equal value 0 With a series of conditionals, a decision is made. This is shown in Fig. 13. Also in Fig. 13 the decision making process is shown. This stage has intermediate steps which are the creation of counters since this process is performed 10 times, before making a decision because the map of disparity sometimes contains noise which can cause an incorrect decision. With the implementation of this counter, the error decreases since it makes a comparison, to determine if there is no obstacle to estimate that the average should be less than 120, since it is the average 3 m without obstacles in the scene captured by the disparity maps. If the three averages are greater than a range, it increases a counter, and if this is greater than the other counters, it warns that there is no step or there is.
Smart Guide System for Blind People
Fig. 12. Segmentation the disparity map.
Fig. 13. Segmentation diagram of the disparity map.
539
540
5
J. J. M. Escobar et al.
Experimental Results
5.1
Initial Conditions
In this section an experiment is carried out on 10 different people to quantify the experience of using the system. The test consists of going through a circuit 10 times with blindfolds, this is developed in order to test the efficiency of this system, since the system aims to avoid collisions with obstacles. Likewise, different tests of system functionality are performed, to check its parameters, these tests are aimed at obtaining information about the system and its limitations. Stereoscopy is a technique of obtaining three-dimensional information from the calculation of the disparity of two cameras, in this way a representation of the proximity of objects in a scene is obtained. The objective of the present experiment is to guide a person from a point to another without any collision, indicating to SGSBP users with a hearing aid the best route to avoid unwanted contact with any object, the materials necessary for the preparation of this experiment are: – – – – –
Red Scarf, Volunteer, Smart Guide System (this proposal), Headphones with Jack 3.5 mm, and School chairs.
In Fig. 14(a), the materials used in the experiment are observed, the handkerchief covers the eyes of the people and the system will be placed in the torso of the users by pulling the locks to adjust as each person. 5.2
Functional Test
To carry out the experiment, it was necessary to prepare the circuit (Fig. 14(b). In this way the school chairs are placed randomly, it is important that users do not know the location of the obstacles. It is essential before starting the experiment, to calibrate the cameras. In this way, the radial and tangential distortion were corrected by using a board with black and white squares. At the beginning of the experiment, the system (SGSBP) is placed at chest level, thus having the best use range in this system, the handkerchief is placed in the eyes, so that the person will not be able to see, Fig. 14(c). To start the experiment, the system is connected to initialize the program, therefore the user has to complete the circuit. In Fig. 14(d) a user of the system is shown traveling the circuit; the experiment is intended to demonstrate the efficiency and viability of the system. When performing the experiment it is verified that the system avoided collisions, the algorithm is based on a average of regions; in this way the algorithm only indicates the best route to avoid colliding with an obstacle but it does not indicate the best route of transfer, that is to say that on specific occasions it indicated the route to not have contact with the banks but diverted the user from bearing circuit. When conducting the
Smart Guide System for Blind People
541
Fig. 14. (a) Materials required in this experiment, (b) Circuit of the experiment, (c) Preparation for the experiment and (d) Volunteer performing the circuit.
experiment with people without visual impairment these are not adapted to an environment without this sense, so that some people did not walk in a straight line because of the sense of sight. The results shown in Fig. 15 point out that stature influences the employment of the system due to the use of cameras, the user with a height greater than 1.75 loses information, this is due to the image capture range. That is to say the camera gets less information from an object as it approaches, and when placing the camera on a user with a height greater than this range, it loses information below the torso more easily than users with low heights.
Fig. 15. Collisions.
542
J. J. M. Escobar et al.
Moreover, the results obtained in this experiment show that 92% of the users of the system avoided the obstacles in their entirety, while 8% had some type of collision, this is due to the near point of the system, since it is of 30 cm when turning on the system and having an object less than 30 cm away causes that it is not detected. Likewise, the height is a parameter which modifies the user experience and it is observed that people who measure more than 175 cm obtain a better experience when using the system below the default zone of use. 5.3
Comparison with Related Works
The present work is implemented in an embedded system, i.e. a device for specific use, in this way the hardware and software resources are fully exploited, SGSBP uses disparity maps to obtain three-dimensional information. This improves the obtained results since it is acquired more detailed information on this technique. The proposed algorithm uses the calculation of averages unlike histograms used in SMAUDVMCC; when using two webcams and creating a stereoscopic system, it is possible to have control of the parameters of the camera, that is to say there is not a limitation by the manufacturers of 3D sensors. The Mechatronic System to Help Users with Visual Disabilities While Walking and Running (SMAUDVMCC), the Air Obstacle Detection with 3-D Mobile Devices (DAODM), and the SGSBP although not all are embedded, the first two use embedded systems that are raspberry pi 3 b + and Olimex A20-OLinuXinoMICRO, respectively, and the last two use general purpose devices, such as a smart phone and a lap-top, have similar features (Table 2) as: the technique they use is computer vision, they use cameras and they are outdoor systems. The SGSBP has the functionalities of each system, this is intended to help the user reduce their dependence. Table 2. Comparison with related works. Name of the project Outdoor use Detection Obstacles
Embedded system (dedicated) Potholes of
Aerial Terrestrial Holes Lines
6
SMAUDVMCC
Yes
No
No
No
Yes
Yes
DAODM 3-D
Yes
Yes
No
No
No
No
MNAPDV
Yes
No
No
No
No
No
SGSBP
Yes
Yes
Yes
Yes
Yes
Yes
Conclusions
In the present work an algorithm was proposed for the detection of obstacles (people, tables, desks, posts, etc.) and evade them with simple and concise indications. The SGSBP was implemented in 10 people of different sexes and statures. The contributions made possible, as well as possible future work, to allow further
Smart Guide System for Blind People
543
development of the SGSBP based on the current status of the investigation. In addition, this guide system was implemented in a Raspberry PI 3B + development card, which complements the tasks of detecting the white cane, locating objects or obstacles that are hollow, exceeds its detection and aerial range, through a stereoscopy system implemented in an embedded system. The technique used by obtaining information from this system is based on Stereoscopy by calculating a disparity map and using image processing to obtain more detailed information. In addition, the results obtained from an experiment with 10 test subjects were analyzed, in which 92% of the users totally avoided obstacles in a circuit, observing that the height of the users is important when using the system since the people who with heights greater than 178 cm presented some collision, in addition to a loss of information in the lower part of the body, for a better experience of these use the system below the chest. It is concluded that the proposed algorithm is efficient to avoid obstacles and the system is sensitive to the height of the users. Among the possible future works include: – Implement a GPS module with voice instructions, in this way indicate transfer routes while avoiding obstacles, with the disparity map information. – The use of disparity maps for depth calculation has the advantage of being able to visualize the morphology of objects in a scene so that an object recognition can be implemented with this information, without using an object recognition system based on waterfall filters. – Implement the algorithm in an embedded system with greater processing capacity, achieving an optimal response by adding more than one iteration. – Improve the object detection algorithm. – Transcribe the entire application in a language that can execute the instructions faster than python, it is proposed to move the application to c++. Acknowledgment. This article is supported by the National Polytechnic Institute (Instituto Poli´ecnico Nacional) of Mexico by means of Project No. 20190046 granted by Secretariat of Research and Postgraduate(Secreter´ıa de Investigaci´ on y Posgrado), National Council of Science and Technology of Mexico (CONACyT). The research described in this work was carried out at the Superior School of Mechanical and Electrical Engineering (Escuela Superior de Ingenier´ıa Mec´ anica y El´ectrica), Campus Zacatenco. It should be noted that the results of this work were carried out by Bachelor Degree students Jhonatan Casta˜ n´ on Mart´ınez and Mario Mendieta L´ opez.
References 1. Instituto Nacional de Estad´ıstica y Geograf´ıa. Archivo situacionista hispano (2013). https://www.inegi.org.mx/temas/discapacidad/ 2. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008) 3. Rojas Olaya, C.R.: Implementaci´ on de un bast´ on detector de obst´ aculos elevados para personas invidentes. Technical report, Universidad de los Llanos (2016) 4. S´ anchez, J.C.: La tecnolog´ıa. Editorial D´ıaz de Santos, S.A. (2012)
544
J. J. M. Escobar et al.
5. Sez, J.M., Escolano, F., Lozano, M.A.: Aerial obstacle detection with 3-D mobile devices. IEEE J. Biomed. Health Inform. 19(1), 74–80 (2015) 6. Shah, M.: Fundamentals of Computer Vision. RA-MA S.A. Editorial y Publicaciones (1997) 7. Xiao, J., Joseph, S.L., Zhang, X., Li, B., Li, X., Zhang, J.: An assistive navigation framework for the visually impaired. IEEE Trans. Hum.-Mach. Syst. 45(5), 635–640 (2015)
An IoMT System for Healthcare Emergency Scenarios Tom´as Jer´onimo1 , Bruno Silva1,2(B) , and Nuno Pombo1 1
2
Instituto de Telecomunica¸co ˜es, Departamento de Inform´ atica, ´ Universidade da Beira Interior, Rua Marquˆes d’Avila e Bolama, 6201-001 Covilh˜ a, Portugal [email protected] Universidade Europeia, IADE, Av. D. Carlos I, 4, 1200-649 Lisbon, Portugal
Abstract. Today and all around the globe there is a national emergency service in almost every country. These services typically offer through telephone calls access to medical services that ranged from advice to onthe-spot response to hazardous situations. These emergency call systems have become very popular and the proliferation of mobile devices and the mobile network have also facilitated the use of these services. Emergency responders face several challenges when trying to give an accurate and above all rapid response. One of the main problems is the collection of citizen’s information and its location and context. This is a crucial process for understanding the severity of a citizen state and effectively determining what response should be given. This paper presents an IoMT solution for medical emergency call scenarios, proposing an ubiquitous approach of patient data collection and response. The solution consists in a mobile application for emergency calls that collects patient location and health data from a wearable device (smartwatch) and a web application for emergency responders that interacts with patients with the aid of voice analysis and recognition. The proposed IoMT system and integrated solutions were validated both in terms of features and communication through a series of experiments on real devices through Wi-Fi network. Keywords: IoMT · Decision support systems Mobile application · Sensors
1
· ehealth · Emergency ·
Introduction
The efficacy of emergency medical services is challenging for each involved person: the caller seeks and urges immediate help whereas the emergency dispatcher needs to obtain timely and comprehensible information to have confidence in making meaningful life-and-death decisions [3]. On the one hand, the decision making in the emergency medical services is always interconnected, and thus, decisions taken in one step may affect decisions in subsequent steps. On the other hand, time is crucial since every moment of delay can significantly reduce c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 545–558, 2021. https://doi.org/10.1007/978-3-030-55190-2_40
546
T. Jer´ onimo et al.
a victim’s chances of survival. In addition, that victim may be overcome by stress, concern, or uncertainty, and thus, its actions are affected by these emotions which may compromise an adequate emergency dispatchers’ perception of the real situation. Since, professionals engaged on emergency medical services may work intensely and their actions, if wrong, will cost lives. For that reason, in the last years computerised systems were increasingly adopted to support the human decision making in emergency medical services by providing healthcare personnel such as physicians, nurses, and/or patients with timely, and additional knowledge [16]. These systems aim to collect person-specific information, to filter and to analyse it intelligently, and to provide; desirably, useful and timely clinical feedback. Firstly, the data may be obtained from a multitude of sources including actuators, and sensors. Secondly, the capability to transmit information to either mobile or remote devices and/or systems is a must. Thirdly, data processing and analysis may require either a large cloud storage or a high computational performance due to the introduction of complex artificial intelligence models, for example [8]. In line with this, the Internet of Medical Things (IoMT) is a booming concept since it bridges the gap between the physical and digital world. It is more than an assortment of medical devices and applications that connect to clinical systems through online computer networks. In fact, IoMT may empower clinical systems with the ability to manage invaluable additional data, and to provide enhanced clinical insight to support the decision-making [14]. The anytime anywhere connectivity combined with the intelligence may boost clinical systems to meet the above mentioned requirements. In other words, IoMT is not the limiting factor in rethinking health but it can also be adopted as a new trend for disrupting current health practices. However, clinical systems face additional challenges when applied into the emergency field. On the one hand, the emergency dispatcher needs to recognise rapidly the caller’s life-threatening condition. On the other hand, the calltakers’ prioritise to establish the location of the emergency where as the callers frequently prioritise describing the who, what, and when of the emergency [7]. Moreover, the desideratum to adopt a holistic outcome-based approach for emergency medical services, which should be designed as a methodology that details all decisions, and clinical reports related to a caller, is understudied and/or undeveloped [1]. With this in mind, we propose a computerised emergency system based on IoMT principles aiming at to empower the emergency dispatcher with the caller’s collection of information regarding all events involving the caller before. The main contribution of this work are highlighted as follows: – Implementation of voice recognition methods aiming at to enhance the context-aware information on emergency scenarios; – Real-life scenario experiments to support the validation of the proposed system; – Proposal of a reliable IoMT system based on ubiquitous devices (mobile and wearable) and web services.
An IoMT System for Healthcare Emergency Scenarios
547
The remainder of this paper is organized as follows. Section 2 presents a review on the related work in emergency systems and IoMT solutions. In Sect. 3, the IoMT system and architecture is presented. Section 4 presents the system evaluation based on several experiments. Finally, the paper is concluded on Sect. 5 and future work is presented.
2
Related Work
Recent trends such as interoperability, and intelligent systems replaced the oldfashioned models to deliverer emergency medical systems. In [4] authors proposed an Android based application integrated with the Philippine’s 911 emergency service. This system provides not only immediate data needed by the dispatcher to respond to emergency without directly asking it from the caller, but also to obtain its location by means of geolocation. In addition, in [12] authors highlighted the description of the location as a critical feature since it was observed that place reformulations may arise from misspeaking or recipient recognition. Congruently, it makes little difference if the ambulance takes a very long time to arrive, or goes to the wrong location. Similarly, in [18] authors analysed a random selection of emergency calls being that linguistic variations in the way the scripted sentences of a protocol are delivered can produce an impact on the efficiency with which call-takers process emergency calls. In [19] authors proposed an information system focused on car crash emergency and trauma response. The main goal was to combine data from 911 communications, ambulance response, and trauma data system as the cornerstone of the clinical decision making. In [6] authors proposed an ontology to collect performance metrics on the emergency information system including the celerity of either operator dispatching or ambulance arriving, and sufficient healthcare personnel on duty at the receiving hospital. In [23] authors presented a framework for the design of emergency information systems. This framework addressed first responders, support to the decision making on command and control personnel. Moreover, authors in [2] implemented a criteria-based dispatch protocol for assessing the calls, guiding decisions about the emergency level and determining the appropriate responses based on a machine learning framework developed by a private company. In [9] it is concluded that providing medical dispatchers with visual information from the location might improve their understanding of caller’s scenario, as well as might enhance communication, their ability to guide the medical support. In [13] authors conducted a three-years examination of patients frequently calling 911 in terms of diagnosis, complaints, and reasons for calling the emergency service. Thus, selected participants were twice weekly in-home visits by healthcare personnel that delivered health education coaching, and performed routine health screenings. The observed results revealed that study participation lead to improve the quality of life, and reduced emergency episodes. Despite the important advances in recent years, the accuracy of emergency medical systems are susceptible to be affected by the absence of vital information in a multiplicity of topics as described in Table 1.
548
T. Jer´ onimo et al.
Table 1. Risks, and mitigation proposals raised by the absence of vital information on emergency medical systems Absence of:
Risks
Mitigation proposals
Precise location
– Disparities in response time – Delays that may impact victim outcomes Appropriate – Slight variations in phrasing and language delivery may escalate in serious communications difficulties [18] Context-aware data – Proneness to cause insufficient coordination – Inadequate clinical practices
– Use of the GPS existing in victim’s mobile device (when applicable) – Use of visual information – Use of the present perfect rather than the past tense – Provide clinical collection information – Promote periodic in-home visits to deliver health education coaching, and to perform health screenings Adequacy of system – Lack of readiness for emergencies – Use of development framerequirements – Uncertain quality of care works – Use of metrics for system evaluation
Furthermore, the expected trend of the aging population during the next decades, are promising services capable to provide flexible application that enables the elderly to communicate their desires with others in reasonable time. In line with this, is also important to adapt emergency medical system and services for ageing population in order to accommodate for a multiple possible cases, from physical condition help to life threatening cases on a 24 h basis [20,25]. On the other hand, IoMT may support health providers by reducing various monotonous tasks and/or sending alerts in case of emergency [24]. In [11] authors proposed an IoMT environment for healthcare purposes based on a variety of sensors (e.g. accelerometer, temperature) and wearable devices (e.g. smartwatch). Congruently, in [10] a IoMT system is proposed to support the clinical diagnosis process. In [15], authors highlighted the energy drain observed in the tiny and resource-constrained devices in IoMT proposing an ON-OFF algorithm to cope with that challenge. Similarly, authors in [17] also focused on the energy expenditure, implemented an IoMT system that employs autonomous mobile chargers which is equipped with wireless energy transfer technology, to support sensor nodes recharging requests. In [5], authors designed an IoMT system to predict failures on medical devices before they occurs. Thus, a maintenance strategy may be defined in advance which may lead to a cost and effort reduction. On the contrary, authors in [22] proposed an IoMT system to collect additional patients’ clinical information and thus to enhance the medical decision-making. Moreover, in [21] authors developed an IoMT system combined with a cloud to monitor elderly people in its daily life activities.
3
Methods
This section presents the implementation and development methodology used in the construction of the IoMT system for medical emergency scenarios. This
An IoMT System for Healthcare Emergency Scenarios
549
solution main goal is to react and facilitate the communication between users and the emergency services, regarding of their context or location. The following subsections elaborate on the conceptual design of the IoMT system, used technologies, implementation and development details. 3.1
System Architecture Design
The system architecture is presented in Fig. 1. It consists on a smartphone application, a smartwatch service and a web solution for health professionals. All these solutions are connected to the Internet and communicate through a cloud computing architecture. Both smartphone and smartwatch communicate with the cloud services through Wi-Fi or mobile communication. Therefore, this system is able to monitor users’ health status and location and send this information in real time to healthcare professionals. In case of a medical emergency scenario, since the user is unable to contact by himself the emergency services, the system will contact them automatically. The IoMT solution also includes a voice recognition application that enables a faster reaction and action from the health professional. This application can also be used by a familiar or through a third-party client, such as, a medical emergency service operator.
Fig. 1. System architecture of the proposed IoMT system
The central system architecture component is the cloud service module that is responsible for receiving, validating and delivering messages. These web services are a pipeline of data between users and healthcare professionals. The data are stored in a remote server and database that is also connected to the cloud service module. 3.2
Mobile Applications
The workflow diagram of the mobile application and its integration with the IoMT ecosystem is depicted in Fig. 2. The system includes two user mobile applications: 1) a smartphone application with a simple and intuitive user interface,
550
T. Jer´ onimo et al.
only for contacting the caregiver; and 2) a smartwatch service, with no user interface, for enhancing the context and location services, but especially to collect and transmit health data in real time.
Fig. 2. Activity diagram depicting the integration of the mobile application and smartwatch with the cloud web server
The smartwatch application, is the cornerstone for continuous data collecting and delivering which may include user’s heart rate and body temperature. This data are acquired through integrated sensors in the smart device and directly stored on the remote database. This scenario enables the detection of abnormal values that may lead to trigger an emergency request, and thus, to promote; automatically, a remote communication between the user and the caregiver. The smartphone application, shown in Fig. 3, allows the user to login into his personalized account, create and edit his personal and health information (Fig. 3 b)). This information such as the user’s age, health condition and medical history
An IoMT System for Healthcare Emergency Scenarios
551
is of great value to the caregiver in medical emergency scenarios. Otherwise, these data needed to be obtained during the emergency call that represents an enormous time-consuming task to collect them with accuracy in a stressful scenario. The mobile application also may provides the user location in real time, taking advantage of the device GPS, included in the majority of modern smart mobile devices.
Fig. 3. Mobile application main and edit user data activities
The application, presents an easy and intuitive interface, as may be seen in Fig. 3 a). In case of an emergency scenario, the user only needs to press the button on the center of the application to start and emergency call. Every call triggers two cloud service actions: first, a request to the server, notifying that an emergency call has been initiated, and the respective call (using mobile communications) to the predefined cell number. 3.3
Caregiver Web Application
The Web application used by emergency operators, or informal caregivers (friends or familiars) is presented in Fig. 4. On the left side, real-time user, context and location information are presented, whereas the opposite side encompasses stored health records, and the operators’ ability to create additional records. The ultimate goal of this application is to provide; in real-time, the
552
T. Jer´ onimo et al.
most relevant and personalized information according to each emergency scenario. The system aims (1) to reduce the time that an operator requires to evaluate the conditions, (2) to prioritize the emergency call and (3) to deploy the appropriate measures, such as, rescue teams.
Fig. 4. Caregiver web application screen presenting the user real time data and results from the voice recognition algorithm
Figure 5 presents the workflow diagram representing actions triggered when an emergency call is initialized. The caregiver application will immediately show the user’s heart rate, body temperature and real-time location. This will provide the operator basic context and location information about the user’s situation and health status, before starting the conversation and emergency scenario evaluation. Moreover, during the call, the operator’s voice will be processed using a developed speech recognition algorithm. The voice recognition system main goal is to provide previous health, context and location information about the user. When the emergency call ends, the operator is required to give a summary of the situation which is sent to the cloud server, and added to the user’s records. 3.4
Voice Processing and Identification
The main goal of the voice processing feature is to automatically collect and present; to the caregiver, the existing victims’ stored health records. For instance if the caller is not in conditions of answering questions, if the caregiver asks for known allergies, the system will automatically fetch this information based on the past episodes. During the call, the operators’ voice must be processed to detect requests for new information. For this, a speech-to-text Javascript API was used to convert the operators’ voice into text that can be processed by the application. Figure 6 presents a visual representation of the voice recognition algorithm. After the
An IoMT System for Healthcare Emergency Scenarios
553
Fig. 5. Activity diagram depicting the process of user data in the cloud web server
operators’ voice is detected, the input is processed and converted to text. Then, the voice inputs are converted into transcripts as long as voice input is being detected. When voice input is no longer being detected a final transcript is created using a collection of all the converted scripts. Therefore, the final transcript is divided into words and each word is compared to a dictionary of keywords. Each keyword, in the dictionary, is linked to victims’ health records and when a match occurs the keyword ID is added to an array. All entries on that array will compose the caregivers’ interface with the structured victims’ health information. Finally, at the end the array is cleaned.
Fig. 6. Architecture for voice processing algorithm
554
4
T. Jer´ onimo et al.
Performance Evaluation
The above-described integrated solutions that implement the IoMT system for healthcare emergency scenarios were deployed on a controlled pilot scenario with real users. The main goal was to evaluate the viability of the system, especially the performance of the voice recognition algorithm. The evaluation included five male users, five male caregivers, five female users and five female caregivers, between 19 and 23 years old. The mail goal was to evaluate the solution with several personas, that contribute with different data, in particular different voice tones. The pilot consisted two different emergency scenarios. 1) an emergency call where the user’s having a breathing disorder; and 2) an emergency call where the user’s having muscle pains. For both scenarios, two scripts were created with the objective of simulating the caregiver’s behaviour in two different emergency calls. With each script two main variables were measured: 1) the number of key words detected by the algorithm and the 2) number of sentences that were totally converted without error. Each script was read by each user ten times and the observed results were described in the following graphs. In the first scenario the objective was to simulate an emergency call were the user is having some trouble breathing. Each pilot participant read the script ten times. The following script was used in this scenario: “Emergency service, how can I help you? Are you having trouble breathing? Try sitting in a comfortable position. Do you have any allergies? The emergency services were sent to your location.” The results of this test are presented in Fig. 7:
success failure
0
keywords
200 200
sentences
300 0
50
100
150
200
250
300
Fig. 7. First script performance evaluation results
The results of this test show that every keyword was successfully detected in every reading of the script and 60% of the sentences were totally converted without error. During the readings it was also possible to identify that the majority of the errors were occurred in the beginning of each sentence. With this information it was possible to conclude that a delay between the voice recognition and the beginning of its processing was occurring. It was also possible to verify
An IoMT System for Healthcare Emergency Scenarios
555
that ambient noise was also a major factor in the processing of the voice input, which delayed this process and generated some errors. The second script aimed at simulating a situation where the user was feeling some muscle pain, which could be connected to his diabetes history. Each pilot participant read the script ten times. The following script was used on this scenario: “Emergency services, how can I help you? Is the pain you are feeling a muscle pain? Do you feel tired? Do you have previous records of diabetes? What is your blood type? How tall are you? How much do you weight? The emergency services were sent to your location.” The results of these reading are presented in Fig. 8:
0
key words
400
success failure
170
sentences
530 0
100
200
300
400
500
Fig. 8. Second script performance evaluation results
Once again, the algorithm was able to identify every key word in the script and was able to convert 76% of the sentences without any error. It was once again detected a delay between the voice recognition and the beginning of its processing, with the most errors occurring in the beginning of the sentences and it was recorded that some noise ambience caused some delay in processing method. The smartphone application and the smartwatch service where also tested, in particular the communication and integration with the cloud services. The smartphone solution has one particular negative aspect, the dependency of WiFi or mobile communications to execute the emergency call. Besides that the interactions with the server were all successfully well done. In case of the smartphone GPS is switch off, the real time location of the user can be also collected from the smartwatch. However, the location is less precise inside buildings or near them. In these situations, the last known position was sent to the server. Overall, the results achieved were satisfactory and show the reliability of this solution. There are two major limitations in this study that could be addressed in future research. First, complementary experiments should be addressed including a larger number of users aiming at to evaluate the correct operation of the ecosystem, considering the users satisfaction and usability. Second, software improvements aiming at to reduce the delay between voice recognition and input processing along with the ambient noise optimization.
556
5
T. Jer´ onimo et al.
Conclusion and Future Work
This paper presented an IoMT system for healthcare emergency scenarios that consisted in wearable solutions for emergency response integrated with a web application for caregivers integrated through a cloud service. This IoMT solution consists of a mobile application for emergency calls, a smartwatch service for real-time data collection and a caregiver web application for emergency calls response. A controlled pilot was also implemented and described. The earlystage experiments evidenced the feasibility of the presented proposal. As future work, integration of more wearable sensors to this solution, such as shimmers (Electromyography (EMG) and Galvanic Skin Response (GSR) sensors), will be performed for a more comprehensive monitoring. Finally, complementary experiments should be addressed including a larger number of users aiming at to evaluate the correct operation of the ecosystem, considering the users satisfaction and usability. Acknowledgments. This work is funded by FCT/MCTES through national funds and when applicable co-funded EU funds under the project UIDB/EEA/50008/2020.
References 1. Aringhieri, R., Bruni, M.E., Khodaparasti, S., van Essen, J.T.: Emergency medical services and beyond: addressing new challenges through a wide literature review. Comput. Oper. Res. 78, 349–368 (2017) 2. Blomberg, S.N., Folke, F., Ersbøll, A.K., Christensen, H.C., Torp-Pedersen, C., Sayre, M.R., Counts, C.R., Lippert, F.K.: Machine learning as a supportive tool to recognize cardiac arrest in emergency calls. Resuscitation 138, 322–329 (2019) 3. Ecker, H., Lindacher, F., Dressen, J., Wingen, S., Hamacher, S., B¨ ottiger, B.W., Wetsch, W.A.: Accuracy of automatic geolocalization of smartphone location during emergency calls - a pilot study. Resuscitation 146, 5–12 (2020) 4. Edillo, S.B., Garrote, P.J.E., Domingo, L.C.C., Malapit, A.G., Fabito, B.S.: A mobile based emergency reporting application for the Philippine national police emergency hotline 911: a case for the development of i911. In: 2017 Tenth International Conference on Mobile Computing and Ubiquitous Network (ICMU), pp. 1–4, October 2017 5. Farhat, J., Shamayleh, A., Al-Nashash, H.: Medical equipment efficient failure management in IoT environment. In: 2018 Advances in Science and Engineering Technology International Conferences (ASET), pp. 1–5, February 2018 6. Horan, T.A., Marich, M., Schooley, B.: Time-critical information services: analysis and workshop findings on technology, organizational, and policy dimensions to emergency response and related e-governmental services. In: Proceedings of the 2006 International Conference on Digital Government Research, DG.O 2006, pp. 115–123. Digital Government Society of North America (2006) 7. Imbens-Bailey, A., McCabe, A.: The discourse of distress: a narrative analysis of emergency calls to 911. Lang. Commun. 20(3), 275–296 (2000) 8. In´ acio, P.R.M., Duarte, A., Fazendeiro, P., Pombo, N. (eds.): 5th EAI International Conference on IoT Technologies for HealthCare. Springer, Cham (2020)
An IoMT System for Healthcare Emergency Scenarios
557
9. Linderoth, G., Møller, T.P., Folke, F., Lippert, F.K., Østergaard, D.: Medical dispatchers’ perception of visual information in real out-of-hospital cardiac arrest: a qualitative interview study. Scand. J. Trauma Resusc. Emerg. Med. 27(1), 8 (2019) 10. Lu, S., Wang, A., Jing, S., Shan, T., Zhang, X., Guo, Y., Liu, Y.: A study on service-oriented smart medical systems combined with key algorithms in the IoT environment. China Commun. 16(9), 235–249 (2019) 11. Maria, A.R., Sever, P., George, S.: MIoT applications for wearable technologies used for health monitoring. In: 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4, June 2018 12. Nattrass, R., Watermeyer, J., Robson, C., Penn, C.: Local expertise and landmarks in place reformulations during emergency medical calls. J. Pragmat. 120, 73–87 (2017) 13. Nejtek, V.A., Aryal, S., Talari, D., Wang, H., O’Neill, L.: A pilot mobile integrated healthcare program for frequent utilizers of emergency department services. Am. J. Emerg. Med. 35(11), 1702–1705 (2017) 14. Pinho, A., Pombo, N., Silva, B.M., Bousson, K., Garcia, N.: Towards an accurate sleep apnea detection based on ECG signal: the quintessential of a wise feature selection. Appl. Soft Comput. 83, 105568 (2019) 15. Pirbhulal, S., Wu, W., Mukhopadhyay, S.C., Li, G.: A medical-IoT based framework for ehealth care. In: 2018 International Symposium in Sensing and Instrumentation in IoT Era (ISSI), pp. 1–4, September 2018 16. Pombo, N., Garcia, N., Bousson, K., Felizardo, V.: Machine learning approaches to automated medical decision support systems, pp. 1653–1673. IGI Global, Hershey (2017) 17. Rajasekaran, M., Yassine, A., Hossain, M.S., Alhamid, M.F., Guizani, M.: Autonomous monitoring in healthcare environment: reward-based energy charging mechanism for IoMT wireless sensing nodes. Future Gener. Comput. Syst. 98, 565–576 (2019) 18. Riou, M., Ball, S., Williams, T.A., Whiteside, A., O’Halloran, K.L., Bray, J., Perkins, G.D., Smith, K., Cameron, P., Fatovich, D.M., Inoue, M., Bailey, P., Brink, D., Finn, J.: ‘Tell me exactly what’s happened’: when linguistic choices affect the efficiency of emergency calls for cardiac arrest. Resuscitation 117, 58–65 (2017) 19. Schooley, B., Horan, T.A., Marich, M., Hilton, B., Noamani, A.: Integrated patient health information systems to improve traffic crash emergency response and treatment. In: 2009 42nd Hawaii International Conference on System Sciences, pp. 1–10, January 2009 20. Sukkird, V., Shirahada, K.: Technology challenges to healthcare service innovation in aging Asia: case of value co-creation in emergency medical support system. Technol. Soc. 43, 122–128 (2015) 21. Syed, L., Jabeen, S., Manimala, S., Alsaeedi, A.: Smart healthcare framework for ambient assisted living using IoMT and big data analytics techniques. Future Gener. Comput. Syst. 101, 136–151 (2019) 22. Tamgno, J.K., Diallo, N.R., Lishou, C.: IoT-based medical control system. In: 2018 20th International Conference on Advanced Communication Technology (ICACT), p. 1, February 2018 23. Turoff, M., Chumer, M., Walle, B., Yao, X.: The design of a dynamic emergency response management information system (DERMIS). J. Inf. Technol. Theory Appl. 5, 3 (2003)
558
T. Jer´ onimo et al.
24. Uddin, M.S., Alam, J.B., Banu, S.: Real time patient monitoring system based on Internet of Things. In: 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), pp. 516–521, September 2017 25. Xu, L., Pombo, N.: Human behavior prediction though noninvasive and privacypreserving Internet of Things (IoT) assisted monitoring. In: 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), pp. 773–777, April 2019
Introducing Time-Delays to Analyze Driver Reaction Times When Using a Powered Wheelchair David Sanders1 , Malik Haddad1(B) , Martin Langner2 , Peter Omoarebun1 , John Chiverton1 , Mohamed Hassan3 , Shikun Zhou1 , and Boriana Vatchova4 1 Faculty of Technology, University of Portsmouth, Portsmouth PO1 3DJ, UK
{david.sanders,malik.haddad}@port.ac.uk 2 Chailey Heritage Foundation, North Chailey BN8 4E, UK 3 School of Chemical Engineering, University of Southampton, Southampton SO17 1BJ, UK 4 Bulgarian Academy of Sciences, Akad.G.Bonchev Str., Bl.2, 1113 Sofia, Bulgaria
Abstract. This paper investigates the introduction of time-delays into wheelchair driving. Two dissimilar ways in which wheelchair drivers interact are compared. Users were observed as they drove their wheelchairs with and without time-delays. Tests took place with a computer system and sensors which provided assistance and then without any assistance provided. As delays became longer then drivers found it more difficult to drive. If the wheelchair moved through a more complicated environment or if the time-delay was made longer, then driving was better if the computer and sensors assisted. Time delays were introduced between the motor controller and the wheelchair joystick. With shorter time-delays or in simpler environments then less assistance was needed from the computer system and sensors. In more complicated environments or if time-delays were longer, then more assistance was needed. That suggest varying sensor support could be helpful depending on the complexity of the environment or the difficulties being experienced by the drivers. Keywords: Ultra-sonic · Sensor · Time-delay · Wheelchair · Intelligent
1 Introduction Time-delays are investigated in this paper and their effect on the performance of wheelchair drivers is discussed. Many things can affect reaction times (ReTis), for example: age, gender, personality type, tiredness, distraction, physical fitness, alcohol and whether stimuli are auditory or visual [1]. Specifically, Parkinson’s sufferers have a longer ReTi [2]. Numerous dissimilar environments are considered. In each case, drivers used a joystick to control their chair and could use ultrasonic sensors to assist them. They completed a series of tasks with and without sensors to help them. The next Section provides some background to give context to the work and then the sensors and wheelchair hardware are described. After that, the testing is presented. Finally the paper ends with some discussion, conclusions and future work. A significant © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 559–570, 2021. https://doi.org/10.1007/978-3-030-55190-2_41
560
D. Sanders et al.
deduction is that drivers perform better in simpler situations without sensors to assist [3–6]. Failures increased noticeably if the sensors weren’t used or if time-delays were increased.
2 Background Time delays can be created by slower reactions. Slower reactions add distortion to controller commands and feed-back [7] and that can reduce performance [8]. Some approaches to control with delays are described in [9, 10]. ReTi is defined in [1] as the time for a message to travel from a sensor (e.g.: eyes) to the brain and then to an actuator (e.g.: a muscle in your leg). Neurons transmit messages to and from the brain and spinal cord. Numerous issues affect ReTi [14] and slower ReTis can cause accidents during driving [14]. People suffering from akinesia have appreciably longer ReTis [11, 12] or a lesion in the right basal ganglia will cause ReTi to get longer [13, 14]. The information flow can be represented as: Neuron Stimulus sent to Spinal-Cord – then – from Brain to Actuator to Neuron and on to Response. Sensor neurons convert stimuli to electro-chemical signals within sensory neurons. The signal journeys through the nervous system to the motor neurons. The motor neurons cause muscles to change shape or glands to secrete. Factors that affect ReTi are: • Age. ReTi gets shorter until late twenties, then gets longer slowly until the 60 s when ReTi gets longer more quickly. • Left vs Right. ReTi in left-handed people are often faster. • Practice. Following errors, ReTis get longer. Training shortens ReTi and improves accuracy. • Errors. If a task is new, then ReTis are less consistent. • Physical Tiredness. ReTi slows when tired. • Mental Tiredness. Sleep deprivation or sleepiness lengthens ReTi and causes people to miss stimuli. • Distraction. ReTi increases with distractions. Additionally, ReTis get longer when verbal tasks are being given. • Warning. ReTis are quicker if a warning is given. • Alcohol. Drink slows ReTi because muscle activation becomes slower. • Finger Tremble. Fingers tremble and ReTis are quicker if a reaction is happening on a ‘downswing’. • Personality. Anxious personalities and extroverts have faster ReTis. Neurotics and Schizophrenics have slower ReTis. • Exercise. Fitter people have quicker ReTis. • Threats or Stress. Making a person anxious can make reactions faster. • Stimulant. Caffeine makes ReTi faster and smokers refraining from smoking have faster ReTis. Some drugs make ReTi faster. • Learning Disorders. People with language and reading difficulties tend ot have slower ReTis. • Injury to the brain. Brain injury can slow down ReTi and concussions and headaches can reduce performance.
Introducing Time-Delays to Analyze Driver Reaction Times
561
• Illness. Minor upper respiratory tract infection can slow ReTi. Work described here was mainly interested in time delays because of aging, illness, learning disorders and brain injury but also considered effects of errors, practice and fatigue. Powered wheelchairs are often controlled with a joystick [15] but other transducers can be used: pointers [16, 17], switches [18, 19], or virtual reality headsets [20]. Controllers interface lower current inputs to higher current actuators that drive motors connected to wheels. Variances in wheels and different gradients and surfaces can cause veer [18, 19] and time-delays [9, 21–23] can occur because of longer ReTis [14]. The drivers react to disturbances and correct wheelchair direction and speed.
3 The System Ultrasonics are simple and robust [24] and in this work, 40 kHz transmitter/receiver pairs were mounted at the front of the wheelchair to provide a basic ultrasound image of the environment. A joystick controlled the electrical current to DC servo amplifiers and motors on a BobCat II wheelchair. A computer was inserted between the joystick and the servo amplifiers [25]. The ultrasonics sensed objects around the wheelchair and the computer modified the user control signals in the light of the ultrasonic image. The system is described in [4, 6, 26, 27]. The computer controlled the wheelchair but considered input from the joystick and the sensors. The computer quizzed the sensors and adjusted wheelchair direction. Joystick data could go straight to the wheelchair controller so that the wheelchair would react directly to input from the joystick. Software was assembled as described in [28] with three levels (Servo, Strategic and Supervisory [29, 30]). These rules were applied: A trajectory was only adjusted if necessary; Movements were smooth and controlled; The driver remained in overall control.
4 Trials Trials took place to: • Compare systems when jointly controlled with a mix of human and computer control, with human control, when a variety of time-delays were introduced to represent various ReTis. • Record the number of successful tasks and failures with different time-delays and in different conditions. • Record any improvements achieved when using the system to assist, especially when time-delays were introduced. • Record the time taken to complete tasks with the sensors and without them as timedelays increased and gaps reduced. • Record the smallest gap that human users could safely drive through as the time-delay was lengthened, both with the sensors and without them.
562
D. Sanders et al.
Wheelchair drivers quickly learnt how the systems worked and responded. There were eight clusters of trials for each driving route. Four without any automatic assistance from the sensors and four with automatic assistance from the sensors. Obstacle courses were created for each trial in a variety of environments: INSIDE LABORATORY: Two objects on a flat floor with upright walls. INSIDE SIMPLE CORRIDOR: Upright walls with flat and some sloping surfaces with some objects. No doorways. INSIDE COMPLICATED CORRIDOR: Doorways and upright walls, with flat and sloping surfaces. Some radiators and door surrounds. Numerous obstacles. OUTSIDE: More complex environment with various flat and sloping surfaces and vertical edges. Various natural obstacles and objects. A clear explanation was provided to participating volunteers that included risks and benefits. There were 15 female and 36 male participants. The 51 volunteers were 18–51 years old (SD 4.8, Mean 22). Trials were repeated as human performance varied. Drivers repeated trials and could learn and were able to perform at their best in the time available. As trails were repeated, then time-delays became longer. Volunteers tried to beat their best performances. The number of failed runs and successful runs were recorded. A successful run was collision free. A failure included a collision(s). The initial set of trials used routes with objects set 90 cm apart (10 cm wider than the powered wheelchair). Then trials were repeated with thinner gaps. If a trial was successful with a thinner gap then a driver made at least one more attempt at the other test (with or without the sensor system assisting them). If they were successful again then another attempt was made with the original setup. Trial routes started from a standing start at set starting position. Gaps were verified by two researchers using a ruler and measure. Figure 1 shows where the delays were introduced (h2). The velocity command to the motors (v1) could be delayed so that signal (vr) was delayed. h was the total time-delay, that is h2 (the forward delay) and h1 (the backward delay). Figure 2 shows Indoor Complicated Corridor Three. Arrows show the route for the wheelchairs. Shaded blocks show the objects in the path of the powered wheelchairs. That route also had two double-doorways. There, one door was open and one kept shut. So a chair had to zig-zag to successfully pass them. A camera on the chair observed and recorded trials. Figure 3 shows a scene from the camera. The pictures show a successful trial run with a delay of 2.1 s.
Introducing Time-Delays to Analyze Driver Reaction Times
563
Fig. 1. Delays in the system. Based on the system in [10].
Fig. 2. Complicated corridor one.
5 Results The wheelchair automatically avoided obstacles when the assistive computer systems were connected. There were some chaotic factors that affected the result, including variation in wheel position, slope, floor surface, or the trailing casters could send a chair off the desired path.
564
D. Sanders et al.
Fig. 3. Camera view when moving through an indoor complicated corridor.
Fig. 4. Average of the best time taken to complete a route.
Introducing Time-Delays to Analyze Driver Reaction Times
565
5.1 Operation With and Without Sensors Figure 4 displays the average of best time to finish a variety of routes. Average time to finish successful runs is shown on the vertical scale. Simple environments are to the left in each graph shown in the figures, for example empty corridors and laboratory. The results show that drivers completed the simpler routes more quickly when they did not have any assistance from the sensors and computer system. More complex routes are shown to the right, for example outside routes and complicated corridors. Wheelchair users finished the more complex routes faster when the sensor and computer system were connected and working. The lower graph shows the average of fastest times when a 1 s delay was introduced. Each time a test took place, gaps were reduced by 0.5 cm. The thinnest set of gaps that a driver successfully navigated through were recorded with the number of failed and successful runs. Drivers completed courses with thinner gaps when utilizing the computer and sensors. Figure 5 shows the average improvement in cm when using the sensors and microcomputer. The graph at the top is without any time-delay and the
Fig. 5. Reducing gap widths with a time-delay of 1.5 s (bottom).
566
D. Sanders et al.
graph at the bottom is with a 1 s delay. As simpler environments were changed into more complex environments or gaps changed to be thinner then drivers found it trickier to judge the width between obstacles. It was more difficult for them to pass through the thinner gaps. Drivers relied more on the sensor and computer systems. Drivers successfully drove through thinner gaps when the microcomputer and sensors were being used. Gradients, hills and surfaces had a tendency to turn the chairs and sensors became more useful in those cases. The microcomputer consistently corrected wheelchair angles and as time-delays increased then the results were more noticeable. Wheelchairs were driven faster through thinner gaps with the assistance from the sensors, especially when time-delays increased. Figure 5 shows the gaps achieved with and without the sensors engaged and without any delay (top). The bottom graph shows the gaps achieved with and without the sensor systems engaged but with a delay of one second introduced.
Fig. 6. Comparing average number of failed and successful attempts with 1 s delay
Introducing Time-Delays to Analyze Driver Reaction Times
567
5.2 Times to Complete Courses As gaps became thinner and time-delays increased, then the fastest time to complete routes and tasks was logged. When gaps widened, drivers completed courses faster without sensors and when gaps became thinner then wheelchair drivers completed routes and tasks faster with sensors and the microcomputer. 5.3 Failure Rates Figure 6 displays failed and successful attempts with and without sensors and the microcomputer assisting drivers. The x axis is a list of different environments. Average numbers of failed and successful trials is to the left. The center bar shows percentage of failed attempts. The right shows the difference between failures when being assisted and when not. The top bar chart is when there was not any delay and at the bottom are the results with 1 s delays.
6 Discussion and Conclusions In simpler situations, wheelchair drivers performed faster without sensors assisting them but in more complicated environments or with longer time delays then they were quicker with sensors assisting them. With wider gaps or in simpler situations then drivers consistently performed faster without help. As gaps reduced or environments became more complicated, or as time delays increased then drivers found driving more difficult and the sensors became more and more useful. When the situations became more complicated then drivers performed better with help from the sensors. As gaps reduced, assistive systems were consistently quicker than human drivers by themselves. Time-delays were only introduced between the joystick and controller. Delays could be introduced elsewhere. Further statistical analysis could be conducted and delay compensation. A delay could have been introduced in two places but it was only introduced in displaying the camera view to the tele-operator. The system needs to be retested with a delay after the joystick and before transmitting the movement instructions to the mobile robot as results may then be significantly different. In any real system a delay would probably be present in both if it was present in one. An implication of the results is that sensors should not be used in freely navigable regions with good views but should be reserved for more complicated situations. Intelligence [31–34], input devices [17] and force sensing [35] could be included. Modelling [36–39] and decision making [39–44] are now being investigated for future application. Acknowledgment. Research in this paper was funded by EPSRC grant EP/S005927/1 and supported by The Chailey Heritage Foundation and the University of Portsmouth.
568
D. Sanders et al.
References 1. Draper, I.T., Johns, R.: The disordered movement in parkinsonism and the effect of drug treatment. Johns Hopkins Hosp 115, 465–480 (1964) 2. Sanders, D., Langner, M., Bausch, N., Huang, Y., Khaustov, S.A., Simandjuntak, S.: Improving human-machine interaction for a powered wheelchair driver by using variable-switches and sensors that reduce wheelchair-veer. In: Intelligent Systems and Applications. Advances Intelligent System Computing, vol. 1038, pp. 1173–1191 (2019) 3. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.O.: Using a simple expert system to assist a powered wheelchair user. In: Intelligent Systems and Applications. Advances Intelligent System Computing, vol. 1037, pp. 662–379. Springer, Heidelberg (2019) 4. Sanders, D.A., Tewkesbury, G.E., Parchizadeh, H., Robertson, J., Omoarebun, P.O., Malik, M.: Learning to drive with and without intelligent computer systems and sensors to assist. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 1171–1181. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01054-6_81 5. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Intelligent Systems and Applications: vol. 1. Advances Intelligent System Computing, vol. 868, pp. 822–838. Springer, Heidelberg (2019) 6. Fiorini, P., Oboe, R.: Internet-based telerobotics: problems and approaches. In: Proceedings of ICAR 1997, Monterey, CA, vol. 681, pp. 765–770. Teleoperation of robots with delay (1997) 7. Richard, N.: Time delay systems: an overview of some recent advances and open problems. Automatica 39, 1667–1694 (2003) 8. Lawrence, D.A.: Stability and transparency in bilateral teleoperation. IEEE Trans. Robot. Autom. 9(5) (2003) 9. Kim, W., Hannaford, B., Bejczy, A.: Force reflection and shared compliant control in operating telemanipulators with time delay. IEEE Trans. Robot. Autom. 8(2), 176–185 (1992) 10. Slawinski, E., Mut, V., Postigo, J.F.: Teleoperation of robots with time-varying delay. Robotica 2006(24), 673–681 (2006) 11. Brumlik, J., Boshes, B.: The mechanism of bradykinesia in parkinsonism. Neurol. (Minneap) 16, 337–344 (1966) 12. Joubert, M., Barbeau, A.: Akinesia in Parkinson’s disease. Progress in Neurogenetics. Excerpta Medica Foundation, pp. 366–376 (1969) 13. Wiesendanger, M., Schneider, P., Villoz, J.P.: Electromyographic analysis of a rapid volitional movement. Am. J. Phys. Med. 48, 17–24 (1969) 14. Kosinski, R.J.: A Literature Review on RT Kinds of RT Experiments (2012). https://www.sem anticscholar.org/paper/A-Literature-Review-on-Reaction-Time-Kinds-of-Time-Kosinski 15. Sanders, D., Gegov, A., Tewkesbury, G., Khusainov, R.: Sharing driving between a vehicle driver and a sensor system using trust-factors to set control gains. In: Intelligent Systems and Applications: vol 1. Advance Intelligent Systems Computing, vol. 868, pp. 1182–1195. Springer, Heidelberg (2019) 16. Sanders, D.A.: Comparing speed to complete progressively more difficult robot paths between human drivers and humans with sensor systems to assist. Assembly Automation. AA-08-057 (2009) 17. Sanders, D.A., Urwin-Wright, S., Tewkesbury, G.E., et al.: Pointer device for thin-film transistor and cathode ray tube computer screens. Electron. Lett. 41(16), 894–896 (2005) 18. Stott, I.J., Sanders, D.A.: New powered wheelchair systems for the rehabilitation of some severely disabled users. Int. J. Rehabil. Res. 23(3), 149–153 (2000)
Introducing Time-Delays to Analyze Driver Reaction Times
569
19. Sanders, D.A.: Controlling the direction of “walkie” type forklifts and pallet jacks on sloping ground. Assemb. Auto. 28(4), 317–324 (2008) 20. Stott, I.J., Sanders, D.A.: The use of virtual reality to train powered wheelchair users and test new wheelchair systems. Int. J. Rehabil. Res. 23(4), 321–326 (2000) 21. Anderson, R.J., Spong, M.: Bilateral control of teleoperators with time delay. IEEE Trans. Autom. Control 34(5), 494–501 (1989) 22. Chen, J.Y., Haas, E.C., Barnes, M.J.: Human performance issues and user interface design for teleoperated robots. IEEE Trans. Syst. Man Cybern. Part C: Apps Rev. 37, 1231–1245 (2007) 23. Sanders, D.A., Langner, M., Gegov, A.E., Ndzi, D., Sanders, H., Tewkesbury, G.E.: Driver performance and their perception of system time lags when completing robot tasks. In: Proceedings of 9th International Conference Human Systems Interaction, pp. 236–242. IEEE (2016) 24. Gao, W., Hinders, M.: Robot sonar backscatter algorithm for automatically distinguishing walls, fences, and hedges. Int. J. Robot. Res. 25(2), 135–145 (2006) 25. Sanders, D.A., Baldwin, A.: X-by-wire technology. Total Vehicle Technology: Challenging Current Thinking, pp. 3–12 (2001) 26. Sanders, D.A.: Comparing ability to complete simple nuclear rescue or maintenance tasks with a robot for a human driver and a human with a sensor system to assist. In: Advanced Robotics, vol. 8011 (2009) 27. Sanders, D.: Analysis of failure rates with a robot between a human driver and a human with a sensor system to assist. Robotica (2009) 28. Sanders, D.: Microprocessing and microprogramming. 38, 833 (1993) 29. Sanders, D.A.: The modification of pre-planned manipulator paths to improve the gross motions associated with the pick and place task. Robotica 13, 77–85 (1995) 30. Tewlesbury, G.E., Sanders, D.A.: A new robot command library which includes simulation. Ind’. Robot. Int’ J. 26(1), 39–48 (1999) 31. Sanders, D.A., Stott, I.J.: A new prototype intelligent mobility system to assist powered wheelchair users. Ind’. Robot. Int’. J. 26(6), 466–475 (1999) 32. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan Sayed, M.: Learning to make intelligent decisions using an Expert System for the intelligent selection of either PROMETHEE II or the Analytical Hierarchy Process. In: Intelligent Systems and Applications: vol 1. Advances Intelligent Systems Computing, vol. 868, pp. 1303–1316 (2019) 33. Sanders, D.A., Robinson, D.C., Hassan, M., Haddad, M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 869, pp. 1229–1236. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01057-7_92 34. Sanders, D.: A pointer device for TFT display screens that determines position by detecting colours on the display using a colour sensor and an Artificial Neural Network. Displays. DISPLA-D-08-00006 (2009) 35. Sanders, D.: Force sensing. Ind. Rob. Int’. J. 34(4), 268–268 (2007) 36. Sanders, D.A.: Recognising shipbuilding parts using ANNs and Fourier Descripors. IMechE Part B: J. Eng. Manf. JEM1382 (2009) 37. Sanders, D.: Real time geometric modelling using models in an actuator space and Cartesian space. J. Robot. Syst. 12(1), 19–28 (1995) 38. Sanders, D.A.: Progress in machine intelligence. Ind. Robot – Int. J. 35(6), 485–487 (2008) 39. Sanders, D., Wang, Q., Bausch, N., Huang, Y., Khaustov, S.A., Popov, I.: A method to produce minimal real time geometric representations of moving obstacles. Intelligent Systems and Applications: vol 1. Advances Intelligent Systems Computing, vol. 868, pp. 881–892. Springer, Heidelberg (2019)
570
D. Sanders et al.
40. Haddad, M.J.M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. Intelligent Systems and Applications. Advances Intelligent Systems Computing, vol. 1037, pp. 680–693. Springer, Heidelberg (2019) 41. Haddad, M.J.M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019) 42. Haddad, M.J.M., Sanders, D., Tewkesbury, G.E.: Selecting a discrete Multiple Criteria Decision Making method to decide on a corporate relocation. Arch. Bus. Res. 7(5), 48–67 (2019) 43. Haddad, M.J.M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehab. Eng. 27(2), 228–235 (2019) 44. Haddad, M.J.M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.C.: Initial results from using preference ranking organization METHods for enrichment of evaluations to help steer a powered wheelchair. Intelligent Systems and Applications. Advances Intelligent Systems Computing, vol. 1037, pp. 648–661. Springer, Heidelberg (2019)
Intelligent Control and HCI for a Powered Wheelchair Using a Simple Expert System and Ultrasonic Sensors David Sanders1 , Malik Haddad1(B) , Peter Omoarebun1 , Favour Ikwan1 , John Chiverton1 , Shikun Zhou1 , Ian Rogers2 , and Boriana Vatchova3 1 University of Portsmouth, Portsmouth PO1 3DJ, UK {david.sanders,malik.haddad}@port.ac.uk 2 Gems Sensors & Controls, Lennox Road, Basingstoke RG22 4AW, UK 3 Bulgarian Academy of Sciences, 1113 Sofia Akad.G.Bonchev Str., bl.2, Sofia, Bulgaria
Abstract. Intelligent control and human computer interaction is investigated for a powered wheelchair using a simple expert system and ultrasonic sensors. The aim is to make driving easier. Signals from sensors and joysticks are interpreted. The interpreted signals are mixed so that the systems collaborate with the human driver to improve their control over direction and speed. Ultrasonic sensors identify hazards and the system suggests a safer speed and direction. Results are presented from drivers completing a series of timed routes using joysticks to control wheelchairs both with a microcomputer and sensors assisting them and without. Recently published systems are used to contrast and compare results. The new system described in this paper consistently performed better. An additional result appears to be that the amount of support from the microcomputer and sensors should be altered depending on surroundings and situations. The research is part of a bigger research project to improve mobility and enhance the quality of life of disabled powered wheelchair users by increasing their self-reliance and self-confidence. Keywords: Expert · System · Disabled · Smart · Wheelchair · Sensor · Ultrasonic
1 Introduction Interest in enhanced quality of life is rising as modern medical treatment is improving survival rates and life expectancy is increasing [1]. Smart wheelchairs can help enhance that quality of life. They have sensors and work with cognitive techniques developed during research into mobile robots. But they tend not to act so autonomously. Instead they try to extend or complement the abilities of a disabled driver. For disabled and aging people who cannot walk, smart wheelchairs can provide some significant benefits and enhance their quality of life by maintaining their mobility, broadening (and continuing) social and community activities, conserving energy and strength. But because powered wheelchairs can be difficult to drive, some automation can be helpful [2]. George Klein © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 571–583, 2021. https://doi.org/10.1007/978-3-030-55190-2_42
572
D. Sanders et al.
invented the first powered wheelchair (with the National Research Council of Canada) to assist quadriplegics injured during World War Two [3]. Smart wheelchairs have often been thought of as mobile robots with an added seat but the difference between powered wheelchairs and robots is that the powered wheelchair becomes an extension of their human driver. The disabled driver sits on it and needs to be comfortable driving it (and sitting in it) [4]. Disability systems including intelligent powered wheelchairs need to accommodate many different types of disability. The World Health Organization suggest that 15% of the population of the world live with a disability, and between 2% and 4% experience considerable problems in practice. Global estimates of disability are rising as the world population ages and improvements have been made to the assessment and measurement of disability [5]. This paper describes a simple expert system [6, 7] to control a powered wheelchair [8]. Ultrasonics [9] identify obstacles and propose more suitable and safer speeds and directions. Systems in use now [10–13] rely on driver experience and their ability to see. Work described in this paper interprets sensor and joystick signals and then uses an expert system to make driving easier for a disabled user. The way humans interact might diminish effectiveness [10–15] and some ways of improving interaction are considered, especially if disturbance are present because of differences between wheels or because of hills, slopes and surfaces [16]. Algorithms mix data from a joystick with data from ultrasonic sensors. Drivers show a desired direction using a joystick and the wheelchair tends to move in that direction. Drivers react to disturbances and revise their desired direction if necessary. The intelligent system described in this paper, processed data from a joystick and from sensors and used the information from that to assist the disabled driver. Powered wheelchairs have often been guided with a joystick [10–18] although there are other devices: switches [19], pointers [20, 21] or virtual reality transducers [22]. Drivers need to drive safely and avoid obstacles (people, wheelchairs, other wheelchairs or vehicles) [23, 24]. Local sensing has been used with powered wheelchairs, for example laser or light [25], ultrasonics [10–18] or infra-red [26] and position sensing has used gyro, tilt, acoustic or odometry [27]. GPS [28] does not work well inside or when shielded but Assisted GPS [29] can be helpful. Vision [30–36] uses more processing power but as the cost of computing power has reduced [37] they have become more and more popular. The human driver still tends to provide the best data about the environment and what is to be achieved. Because they are cost effective, simple and robust [10–17, 38–40], ultrasonics were chosen for the work described here.
2 The Wheelchair Systems Smart wheelchairs usually comprise of a standard powered wheelchair with sensors and a computer added, or a mobile robot base with a seat mounted on top. In this work a powered wheelchair with large driving wheels and trailing casters was used. Cameras could be attached to the front of the wheelchair in between the driving wheels. Pairs of ultrasonic transducers could be attached. Two above each driving wheel and a third in the center at the front. Receiver/transmitter pairs were attached at the front of the wheelchair [8, 14]. They transmitted a one millisecond pulse of sound. That pulse reflected back
Intelligent Control and HCI for a Powered Wheelchair
573
from obstacles and then ranges were calculated from the times taken for pulses to return. In that way, any obstacles in front of the wheelchair could be sensed. Links between joysticks and wheelchairs were disconnected. Instead, a microcomputer handled control in one of three modes: • Ultrasonics interrogated by the microcomputer and it modifies the wheelchair bearing using either: – The new algorithms. – Algorithms published recently and used for comparison. • Wheelchair driven directly by the Joystick. The code structure is shown in Fig. 1. Rules were: apply smooth and controlled movements; user remains in overall control; only modify the trajectory of the wheelchair when necessary. An imaginary potential field was created around obstacles [36]. Initialise. (C)
High level code. (C)
SPI controller
Transmitter control
UART interface
(Assembly)
(Assembly)
(Assembly)
ADC control
Fig. 1. Code structure.
Longer sound pulses contain more energy and can detect objects further away. Sound waves travel at approximately 330 m/s. If a pulse is three milliseconds long then the length of the pulse is about one meter. The minimum range is then half a meter if the pulse length is three milliseconds. Shorter ranges were required and lengths of 50 us, 100 us, 500 us, and 1 ms were considered. Pulse lengths were switched automatically by a “range finder”. If the “range finder” did not detect anything then it incremented pulse lengths so that range extended. Ultrasonics were relatively noisy and misreadings were filtered out. Volumes in front of the wheelchair were placed into an array of three sections: close, near and far away. If an object was detected then distance to the object was classified as: close, near or faraway. Ultrasonic transmitter/receiver pairs were attached to the front of the wheelchair so that their beams enclosed the volume in front of the wheelchair. The transducers are represented in Fig. 2.
574
D. Sanders et al.
Fig. 2. Three sensors and the array representing the volume ahead.
Elements in the grid that contained an obstacle were increased by a high amount, for example, 5. The other elements were incremented by a smaller number, for example, 1. Elements had a maximum value of fifteen and minimum value of zero. Objects within an element of the grid prompted an increase in value within that element. A haphazard reading within other elements would increase them briefly, but those readings reduced every time the system updated. If an object relocated to another element then the old grid element reduced in value and the new element increased. Consistent measures of distance were acquired within half a second. Figure 3 shows the structure of the object detection process.
3 Algorithms to Interpret the Joystick The Penny & Giles Joystick on the wheelchair contained two potentiometers providing two voltages. The voltages denoted the position of the joystick. They were read by an Analogue to Digital Converter and then converted from Cartesian co-ordinates to Polar co-ordinates: |J| θ. Where |J| represented desired speed and θ represented desired direction. Standard C libraries were used for mathematical functions. Joystick magnitude and angle were calculated using: argument = JS0/JS1; opposite/adjacent (ATAN)
(1)
bearing = atan(argument); Angle of joystick in radians
(2)
Magnitude = sqrt((JS1 ∗ JS) + (JS0 ∗ JS0))
(3)
Intelligent Control and HCI for a Powered Wheelchair
575
Fig. 3. Structure of the histogrammic object detection process.
Where JS1 and JS0 are Cartesian co-ordinates. The sector occupied by the joystick was calculated using Magnitude and Angle. Position and confidence were denoted within an array of pairs of values: – Joystick position for a desired speed was represented by “Magnitude”. – Confidence a joystick is within a sector was represented by “Angle Confidence”. If joysticks were held stationary then the grid element associated with their position increased and the other elements reduced. That element quickly increased in value. Shaking hands on the joystick increased other grid elements for a moment but they then reduced every time the system updated. If the joystick was moved to a new grid element then the new grid element increased and the previous grid element reduced. Joystick position corresponded to a histogram and the histogram element with the largest value signified the preferred bearing. Figure 4 shows a joystick histogram. Position and angle of the joystick was tested by a module named JSArray. JSArray determined the zone inhabited by the joystick. The element corresponding to “angle confidence” (Aconf) increased by 40. Other Aconf elements reduced by 20. Histogram elements reduced in value rapidly but increased in value more gradually.
576
D. Sanders et al.
Fig. 4. Representation of joystick using histograms.
4 Expert System Expert knowledge was provided by skilled wheelchair drivers and experienced rehabilitation engineers [37–44]. The system had to work in real time [45–50] in order to help disabled drivers. The two real-time inputs were from the sensor system and the joystick. Speed and direction were provided by the wheelchair driver and data about the environment and obstacles were provided by the sensors. A module called “Expert-Sensor” assessed sensor data and it suggested new bearings if required to avoid collisions. If data from the different sources disagreed then an expert called “FuzzMix” considered both inputs and decided on the outputs to be sent to the motor controllers. “JoyMon” deciphered what the wheelchair driver was trying to make the wheelchair do. The whole system was made up of: “GapOrDoor”, “Expert-Sensor”, “JoyMon”, and “FuzzMix”. Control effort was distributed between the joystick and sensors by “FuzzMix”. “FuzzMix” coordinated the sensor recommendation and joystick recommendation and any conflicts between them. “GapOrDoor”, “JoyMon” and “Expert-Sensor” provided guidance. “Prox-Stop” was a failsafe function to stop the chair and “FuzzMix” could over-ride all the other inputs using “Prox-Stop”. “FuzzMix” mixed joystick confidence and sensor data values. If joystick confidence was high then the position of the joystick was correctly reflecting the wishes of the wheelchair driver. In that case, the sensors had less effect. If confidence was low then it was necessary for the wheelchair to avoid an obstacle [46, 47]. Joystick consistency and position was checked by “JoyMon”. If the joystick position was held steady then that established the wishes of the wheelchair driver. If the joystick was moving randomly then the driver was not in control or was unsure. In that case, the system depended more on the sensors for steering the chair. “Expert-Sensor” employed
Intelligent Control and HCI for a Powered Wheelchair
577
knowledge about the sensors. “Expert-Sensor” built a grid to contain data from the sensors and suggested potential maneuvers that would steer the wheelchair safely and elude collisions. “Expert-Sensor” didn’t take the driver into account. “GapOrDoor” was the obstacle avoidance function. “GapOrDoor” used data from “Expert-Sensor” but could be overridden by “FuzzMix”. Joystick data was merged with Sensor data so that: Vout(right) = InJoy(right) − Dist(left)
(4)
Vout(left) = InJoy(left) − Dist(right)
(5)
Where: InJoy was input joystick voltage, Vout was the controller output voltage, and Dist was the range to the nearest obstacle. InJoy, Vout and Dist were vectors having two values, right wheel and left wheel. “GapOrDoor” could turn the wheelchair away from the nearest object, smoothly slow down the wheelchair as it gets close to an obstacle and center the wheelchair between objects, for example the surrounding frame of a doorway. “FuzzMix” allocated control to the joystick or the sensors subject to the environment, situation and desires of the wheelchair user. The relationships were: • All joystick, No sensors. • All sensors, No Joystick. • Something in between. “FuzzMix” assessed inputs and algorithms allocated control between joystick and sensors. Algorithms used distance-functions to create target values for left and right voltages. The distance-functions were: RightTarg = 2.5 ∗ InstRange[0] + 100;
(6)
LeftTarg = 2.5 ∗ InstRange[1] + 100;
(7)
Where: InstRange[] = instantaneous distance to obstacle. Data from sensors was changed into a form matching Target (ADC) data. The position of the joystick was obtained from a joystick map divided into sectors: Forward, Stop, Reverse, Right Turn, Left Turn, Right Spin and Left Spin. A rule set within “ExpertSensor” was extracted from the mapping and used to search Sensor-Byte for objects so that “Expert-Sensor” could suggest action. Software was downloaded to a micro-computer on the wheelchair. Software was written in a mix of low- and high-level languages. The programs were compiled and loaded into on board non-volatile memory. Systems were then tested in a variety of situations and environments. Algorithms were predictable and fast. If “Joystick” and “Expert-Sensor” signified “forward”, then a bearing was set to drive straight ahead. Sensors continued to be quizzed to find distances to obstacles and speed was decreased if a wheelchair moved close to an obstacle. If joystick requested a “Turn” then a different algorithm was used.
578
D. Sanders et al.
5 Experiments Experiments involved driving wheelchairs through a variety of situations and environments. The response of the systems was acceptable and the wheelchair safely drove along corridors and aligned with the centers of door gaps with the controlling joystick held in a static (pushed forward) position. Paths of wheelchairs indicated that “Expert-Sensor” suggested suitable changes to speeds and directions. The algorithms successfully avoided obstacles. The systems were not to replace wheelchair drivers but were intended to help them. Systems assisted wheelchair drivers in steering their chairs. Drivers swiftly learnt to drive. Sets of experimental runs took place to assess speed with and without assistance from sensors, using an existing system [3, 9] as a control check and then using the new systems described in this paper. Time taken to successfully finish experimental runs was recorded when a human driver was: • Driving without any assistance, • Assisted by existing previously published systems, • Assisted by systems described in this paper. Drivers drove their wheelchairs with the sensor systems helping them and researchers recorded times using digital laboratory clocks and stop watches. Experiments were to: • • • • •
Observe operation. Measure time taken by human drivers driving by themselves. Measure time taken with sensors and expert systems helping. Measure improvement. Capture suggestions and comments.
Experiments were conducted without any assistance. Then experiments were repeated with assistance (using the sensors). For each experiment, an obstacle course was set up and drivers had to handle the following: LABORATORIES: Vertical walls. Objects on floor. EMPTY CORRIDORS: Flat and sloping surfaces. Vertical walls. Door gaps. Obstacles in staggered arrangements. COMPLICATED CORRIDORS: Flat and sloping surfaces. Vertical walls. Door gaps. Items on walls (e.g.: radiators). Obstacles offset. Door gaps. OUTSIDE ENVIRONMENTS: More complicated environments with different flat and sloping surfaces. Vertical and sloping edges. Human beings present. Various objects. Whenever possible, experiments were performed more than once. Wheelchair drivers could repeat experiments as often as they liked so that they learnt how the system(s) behaved and they could perform at their best. Experiments were thought to be fun and were well-liked. Experiments encouraged some competition and wheelchair drivers tried to beat their best times and beat other wheelchair drivers.
Intelligent Control and HCI for a Powered Wheelchair
579
Experiments compared the speed of wheelchair drivers when being assisted and when driving by themselves in a variety of situations and environments. When a fast time was recorded then the driver made at least one more attempt at the other experiment (with/without sensors to assist) to check the result was not just because they had learnt more about how the systems worked.
6 Results Figure 5 shows some results. The systems described in this paper were compared with existing systems [30, 31]. Drivers driving with assistance were compared to the same driver driving without assistance. Average best time to complete various courses is shown on the vertical scale. Drivers driving without using any sensors to assist are on the left in Fig. 5. Drivers using the recently published systems [3, 30] are in the center and the improved systems described in this paper are on the right. The different testing routes are listed on the horizontal scale. Driver alone Tele-operation Basic assistance Basic Improved computer assistance systems Improved sensor systems
Time in seconds
25 20 15 10 5 0
OUTSIDE 1
COMPLICATED CORRIDOR 3
COMPLICATED CORRIDOR 2
COMPLICATED CORRIDOR 1
EMPTY CORRIDOR 2
EMPTY CORRIDOR 1
LABORATORY
Fig. 5. Results from experiments.
On average, the new systems were quicker. In simpler empty corridors and laboratories, wheelchair drivers finished their driving tasks more quickly when they did not have any assistance form the sensors or computers. In more tricky outside and complicated corridors, wheelchair drivers completed their driving tasks more quickly with assistance from the sensors and computers. As gaps became narrower or environments became more complicated then wheelchair drivers took longer. Wheelchair drivers sometimes had to slow down or stop and reverse to avoid collisions. Drivers performed better with assistance in more complicated experiments.
580
D. Sanders et al.
The new system presented in this paper reliably changed wheelchair directions and speeds and out-performed the recently published systems.
7 Discussion and Conclusions Results from experiments were placed into two sets: with sensors assisting and without. Pairing removed a lot of randomness and variability. Results were statistically significant. Paired samples tests showed that driving was significantly different at p < 0.05 (95% probability that the result would not occur by chance). The methods and systems described here were significantly better than recently published systems. The new systems performed every test more quickly than recently published systems. Current work has now moved on to investigate new applications [51–55] and decision making [56–61]. Acknowledgment. Research in this paper was funded by EPSRC grant EP/S005927/1 and supported by The Chailey Heritage Foundation and the University of Portsmouth.
References 1. Kawaguchi, A., Noda, Y., Sato, Y., Kondo, Y., Terashima, K.: A mechatronics vision for smart wheelchairs. In: Proceedings of the 4th International Conference on Assistive Technologies, pp. 145–150, April 2008 2. Leaman, J., La, H.M.: A comprehensive review of smart wheelchairs: past, present, and future. IEEE Trans. Hum.-Mach. Syst. 47(4), 486–499 (2017) 3. Yukselir, M., Scarrow, K., Wilson, P., Cowan, T.: The brains behind the electric wheelchair, one of Canada’s ‘great artifacts’. The Globe and Mail, 27 August 2012 4. Carlson, T., Millan, J.D.R.: Brain-controlled wheelchairs: a robotic architecture. IEEE Robot. Auto. Mag. 20, 65–73 (2013) 5. Joshi, M.K., Gupta, M.V., Gosavi, M.M., Wagh, M.S.: A multifunctional smart wheelchair. Int. J. Adv. Res. Electron. Commun. Eng. 4(5), 1281–1284 (2015) 6. Sanders, D., Hudson, A.: A specific blackboard expert system to simulate and automate the design of high recirculation airlift reactors. Math. Comput. Simul. 53(1–2), 41–65 (2000) 7. Sanders, D., Okonor, O., Langner, M., et al.: Using a simple expert system to assist a powered wheelchair user. Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–379. Springer, Heidelberg (2019) 8. Sanders, D.: Comparing speed to complete progressively more difficult mobile robot paths between human tele-operators and humans with sensor-systems to assist. Assem. Auto 29(3), 230–248 (2009) 9. Rahiman, M., Zakaria, Z., Rahim, R., et al.: Ultrasonic tomography imaging simulation of two-phase homogeneous flow. Sens. Rev. 29(3), 266–276 (2009) 10. Sanders, D.: Controlling the direction of “walkie” type forklifts and pallet jacks on sloping ground. Assem. Auto. 28(4), 317–324 (2008) 11. Sanders, D., Tewkesbury, G., Parchizadeh, H., et al.: Learning to drive with and without intelligent computer systems and sensors to assist. Advances Intelligent Systems Computing, vol. 868, pp. 1171–1181. Springer, Heidelberg (2019) 12. Sanders, D.A., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 822–838. Springer, Cham (2019)
Intelligent Control and HCI for a Powered Wheelchair
581
13. Sanders, D.A., Gegov, A., Tewkesbury, G.E., Khusainov, R.: Sharing driving between a vehicle driver and a sensor system using trust-factors to set control gains. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 1182–1195. Springer, Cham (2019) 14. Sanders, D.A.: Comparing ability to complete simple tele-operated rescue or maintenance mobile robot tasks with and without a sensor system. Sens. Rev. 30(1), 40–50 (2010) 15. Sanders, D.A.: Analysis of the effects of time delay on the tele-operation of a mobile robot in various modes of operation. Ind. Robot: Int. J. 36(6), 570–584 (2009) 16. Stott, I., Sanders, D.: New powered wheelchair systems for the rehabilitation of some severely disabled users. Int. J. Rehabil. Res. 23(3), 149–153 (2000) 17. Sanders, D., Stott, I.: A new prototype intelligent mobility system to assist powered wheelchair users. Ind. Rob. 26(6), 466–475 (1999) 18. Sanders, D.A., Baldwin, A.: “X-by-wire technology” Total Vehicle Technology: Challenging current thinking, pp. 3–12 (2001) 19. Sanders, D., Langner, M., Bausch, N., et al.: Improving human-machine interaction for a powered wheelchair driver by using variable-switches and sensors that reduce wheelchairveer. Adv. Intell. Syst. Comput. 1038, 1173–1191 (2019) 20. Sanders, D.A., Tewkesbury, G.E.: A pointer device for TFT display screens that determines position by detecting colours on the display using a colour sensor and an ANN. Displays 30(2), 84–96 (2009) 21. Sanders, D.A., Urwin-Wright, S.D., Tewkesbury, G.E., et al.: Pointer device for thin-film transistor and cathode ray tube computer screens. Electron. Lett. 41(16), 894–896 (2005) 22. Stott, I., Sanders, D.: The use of virtual reality to train powered wheelchair users and test new wheelchair systems. Int. J. Rehabil. Res. 23(4), 321–326 (2000) 23. Goodwin, M.J., Sanders, D.A., Poland, G.A., et al.: Navigational assistance for disabled wheelchair-users. J. Syst. Arch. 43(1–5), 73–79 (1997) 24. Sanders, D.A.: The modification of pre-planned manipulator paths to improve the gross motions associated with the pick-and-place task. Robotica 13(Part 1), 77–85 (1995) 25. Larsson, J., Broxvall, M., Saffiotti, A.: Laser-based corridor detection for reactive navigation. Ind. Rob.: Int’. J. 35(1), 69–79 (2008) 26. Lee, S.: Use of infrared light reflecting landmarks for localization. Ind. Rob. 36(2), 138–145 (2009) 27. Horn, O., Kreutner, M.: Smart wheelchair perception using odometry, ultrasound sensors, and camera. Robotica 27, 303–310 (2009) 28. Milanes, V., Naranjo, J.E., Gonzalez, C., et al.: Autonomous vehicle based in cooperative GPS and inertial systems. Robotica 26, 627–633 (2008) 29. Lim, D., Lee, S., Cho, D.: Design of an assisted GPS receiver and its performance analysis. In: IEEE Symposium on Circulation & Systems, pp. 1742–1745 (2007) 30. Bloss, R.: Vision and robotics team up at the 2007 show. Ind. Rob.: Int’. J. 35(1), 19–26 (2008) 31. Sanders, D., Tan, Y., Rogers, I., Tewkesbury, G.: An expert system for automatic design-forassembly. Assem. Auto. 29(4), 378–388 (2009) 32. Sanders, D.: Environmental sensors and networks of sensors. Sens. Rev. 28(4), 273–274 (2008) 33. Hopper, D.: The long perspective for robotic vision. Assemb. Autom. 29(2), 122–126 (2009) 34. Sanders, D., Lambert, G., Graham-Jones, J., et al.: A robotic welding system using image processing techniques and a CAD model to provide information to a multi-intelligent decision module. Assem. Auto. 30(4), 323–332 (2010) 35. Sanders, D., Graham-Jones, J., Gegov, A.: Improving ability of tele-operators to complete progressively more difficult mobile robot paths using simple expert systems and ultrasonic sensors. Ind. Rob. Int. J. 37(5), 431–440 (2010)
582
D. Sanders et al.
36. Sanders, D., Lambert, G., Pevy, L.: Pre-locating corners in images in order to improve the extraction of Fourier descriptors and subsequent recognition of shipbuilding parts. IMechE Part B 223(9), 1217–1223 (2009) 37. Sanders, D.: Progress in machine intelligence. Ind. Rob. 35(6), 485–487 (2008) 38. Sanders, D.: Recognizing shipbuilding parts using artificial neural networks and Fourier descriptors. Proc. Inst. Mech. Eng. Part B 223(3), 337–342 (2009) 39. Sanders, D., Stott, I.: Analysis of failure rates with a tele-operated mobile robot between a human tele-operator and a human with a sensor system to assist. Robotica 30(6), 973–988 (2012) 40. Sanders, D., Langner, M., Tewkesbury, G.: Improving wheelchair-driving using a sensor system to control wheelchair-veer and variable-switches as an alternative to digital-switches or joysticks. Ind. Rob.: Int. J. 37(2), 157–167 (2010) 41. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Proceedings of IEEE International Conference on Robotics and Automation, vol. 5, no. 1, pp. 90–98 (1986) 42. Hudson, A., Sanders, D., Golding, H., et al.: Aspects of an expert design system for the wastewater treatment industry. J. Syst. Archit. 43(1–5), 59–65 (1997) 43. Sanders, D.: Introducing AI into MEMS can lead us to brain-computer interfaces and superhuman intelligence (invited viewpoint review paper) Assemb. Autom. 29(4), 309–312 (2009) 44. Sanders, D.: Ambient-intelligence, rapid-prototyping and where real people might fit into factories of the future. Assemb. Autom. 29(3), 205–208 (2009) 45. Sanders, D., Haynes, B., Tewkesbury, G., et al.: The addition of neural networks to the inner feedback path in order to improve on the use of pre-trained feed forward estimators. Math. Comput. Simul. 41(5–6), 461–472 (1996) 46. Sanders, D.: Perception in robotics. Ind. Rob. 26(2), 90–92 (1999) 47. Sanders, D.: System Specification 2. Microprocess. Microprogram. 38(1–5), 833–833 (1993) 48. Sanders, D., Harris, P., Mazharsolook, E.: Image modelling in real-time using spheres and simple polyhedra. In: 4th International Conference on Image Processing and Its Applications, vol. 354, pp. 433–436 (1992) 49. Sanders, D., Hudson, A., Tewkesbury, G.: Automating the design of high-recirculation airlift reactors using a blackboard framework. Expert Syst. Appl. 18(3), 231–245 (2000) 50. Sanders, D.: Real time geometric modelling using models in an actuator space and Cartesain space. J. Robot. Syst. 12(1), 19–28 (1995) 51. Fahimi, F., Nataraj, C., Ashrafiuon, H.: Real-time obstacle avoidance for multiple mobile robots. Robotica 27, 189–198 (2009) 52. Tewkesbury, G., Sanders, D.: The automatic programming of production machinery for deflashing plastic parts. In: Advances in Manufacturing Technology VIII, pp. 279–283 (1994) 53. Tewkesbury, G., Sanders, D.: The use of distributed intelligence within advanced production machinery for design applications. In: Total Vehicle Technology: Challenging Current Thinking, pp. 255–262 (2001) 54. Sanders, D.A., Robinson, D.C., Hassan, M., Haddad, M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 869, pp. 1229–1236. Springer, Cham (2019) 55. Sanders, D., Wang, Q., Bausch, N., Huang, Ya., Khaustov, S., Popov, I.: A method to produce minimal real time geometric representations of moving obstacles. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 881–892. Springer, Cham (2019) 56. Haddad, M., Sanders, D., Gegov, A., et al.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Advances Intelligent Systems Computing, vol. 1037, pp. 680–693. Springer, Heidelberg (2019) 57. Haddad, M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019)
Intelligent Control and HCI for a Powered Wheelchair
583
58. Haddad, M., Sanders, D., Tewkesbury, G.: Selecting a discrete multiple criteria decision making method to decide on a corporate relocation. Arch. Bus. Res. 7(5), 48–67 (2019) 59. Haddad, M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235 (2019) 60. Haddad, M., Sanders, D., Tewkesbury, G., et al.: Initial results from using preference ranking organization METHods for enrichment of evaluations to help steer a powered wheelchair. Adv. Intell. Syst. Comput. 1037, 648–661 (2019) 61. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan, M.: Learning to make intelligent decisions using an expert system for the intelligent selection of either PROMETHEE II or the analytical hierarchy process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) IntelliSys 2018. AISC, vol. 868, pp. 1303–1316. Springer, Cham (2019)
Intelligent System to Analyze Data About Powered Wheelchair Drivers Malik Haddad1(B) , David Sanders1 , Martin Langner2 , Mohamad Thabet1 , Peter Omoarebun1 , Alexander Gegov1 , Nils Bausch1 , and Khaled Giasin1 1 University of Portsmouth, Portsmouth PO1 3DJ, UK
{malik.haddad,david.sanders}@port.ac.uk 2 Chailey Heritage Foundation, North Chailey BN8 4EF, UK
Abstract. The research presented in this paper creates an intelligent system that collects powered wheelchair users’ driving session data. The intelligent system is based on a Python programming platform. A program is created that will collect data for future analysis. The collected data considers driving session details, the ability of a driver to operate a wheelchair, and the type of input devices used to operate a powered wheelchair. Data is collected on a Raspberry Pi microcomputer and is sent after each session via email. Data is placed in the body of the emails, in an attached file and saved on microcomputer memory. Modifications to the system is made to meet confidentiality and privacy concerns of potential users. Data will be used for future analysis and will be considered as a training data set to teach an intelligent system to predict future path patterns for different wheelchair users. In addition, data will be used to analyze the ability of a user to drive a wheelchair, and monitor users’ development from one session to another, compare the progress of various users with similar disabilities and identify the most appropriate input device for each user and path. Keywords: Analysis · Disabled · Intelligent system · Data collection · Python programming language · Powered wheelchair · Raspberry Pi
1 Introduction Research presented in this paper is part of a larger research project at the University of Portsmouth [1]. The main aims are to use AI to enhance mobility and improve the quality of life of disabled powered wheelchair users through increasing their self-confidence and reliance. Demographic figures in the United Kingdom (UK) estimated the UK population in 2020 above 67 million individuals. A Family Resources Survey in Great Britain, conducted by the Department for Work and Pension 2016/17 claimed that around 13.9 million individuals of the UK population suffered from some kind of disability, and 51% of them had a mobility problem [2]. Assistive devices often compensate any loss of ability and on-going deterioration in the individual mobility over time while decreasing the dependency on carers [3]. People © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 584–593, 2021. https://doi.org/10.1007/978-3-030-55190-2_43
Intelligent System to Analyze Data About Powered Wheelchair Drivers
585
using powered wheelchairs can be missing the required cognitive/motor/sensory ability, because of disease, disability with their hands, arms, shoulders or even more extensive disabilities [4]. People with visual impairment or people who are not able to take any avoiding action by themselves are potential new users for the new systems described in this paper. The new systems can assist new powered wheelchair users with the navigation of their wheelchairs. A smart wheelchair is a powered wheelchair with computers and sensors attached to it. A disabled user specifies a desired speed and bearing by placing or moving a transducer into a position. For example, using switches or a joystick. A powered wheelchair will then have a tendency to progress along the route specified (and to move at the anticipated speed). A user can then make adjustments to avoid objects. Figure 1 shows an example of a wheelchair used in this research.
Fig. 1. Example of a wheelchair considered during the research.
The large wheel shown in Fig. 1 provided motion and direction. Each large wheel was operated independently using a separate motor. Powered wheelchair users often attend driving sessions to learn how to use input devices that control their wheelchair. Wheelchair users navigate by varying electrical current to the individual motors. Joysticks are normally used to control the velocity (direction and speed) of a powered wheelchair. If users were unable to use their fingers or hands or lacked the necessary
586
M. Haddad et al.
dexterity, head or chin switches, foot control or puff/sip tubes could be considered as other input options. Considerable research was conducted to study the steering and navigation of powered wheelchairs [5–17]. The systems used were often local, and slight work has been made to improve mobility using more global approaches. Research has considered collision avoidance [18–23] using sensors that delivered a more local sense of the surrounding of the wheelchair [24]. Research has often suggested an initial wheelchair path that can be locally adjusted if obstructions were present, but they have rarely been successfully applied to help disabled wheelchair users. The new intelligent systems to predict potential desired routes that are under consideration and described here could rapidly predict routes. In this paper, a technique that will gather data from users’ driving sessions and the nature of data gathered is described. A Raspberry Pi microcomputer was fitted between user input devices and powered wheelchair motors to intercept and collect the required data. A Python program was created and installed on the Raspberry Pi. The program gathered the required data. The collected data was stored on the Raspberry Pi memory, and at the end of each driving session, the collected data was emailed to a specified email address. Data was sent in the body of the emails and in a Comma Separated Value (CSV) file. A graphical representation of the data in the shape of a pie-chart was created and attached to the emails. Data will be considered for future analysis and will be used as a training data set to train an intelligent system that will predict future route patterns for different wheelchair users. In addition, data will be studied to measure the ability of a user to operate their wheelchair, and users’ progress from one session to another, compare the progress of different users with the same type of disability and identify the most suitable input device for each user and route. In the work described in this paper, Raspberry Pi microcomputers were considered because of their small physical size, low cost, upgradeability, replicability and simplicity [12]. The program installed on the Raspberry Pi is described in Sect. 2. Section 3 provides modification to the program to meet privacy and confidentiality concerns. Section 4 is a discussion about how the collected data will be used for analysis. Section 5 presents conclusions drawn from the current system and Sect. 6 presents some future work.
2 The Python Program The aim of the project was to create a new system that gathered users’ driving session data using a simple, robust and low cost architecture. Raspberry Pi Microcomputers were considered because they are often considered simple and robust computers with reasonable price. They were first launched in 2012 with an option of two models: A and B. In 2014, Model B+ was launched that included hardware improvements compared to the previous models [25]. The research conducted in this paper considered a Raspberry Pi model 3B+. The Python programming language was used to create a program that will gather required driving session data for future analysis. The program was installed onto a Raspberry Pi. The Raspberry Pi digitized users desired input. The desired direction supplied by the user was saved on the Raspberry Pi memory.
Intelligent System to Analyze Data About Powered Wheelchair Drivers
587
A first step towards creating the new program was to create a new User Interface (UI) to simplify the interaction with helpers and carers, the UI requested a helper/carer to give the following details at the beginning of a driving session as shown in Fig. 2: • • • • •
User name. Name of the input device used. Any medication that could impair the user driving ability. When the last dose (of medication) was administered hh:mm. When next dose (of medication) will be administered hh:mm.
Fig. 2. User interface for the new program.
An emergency exit function was considered if anything went out of control. If an emergency exit was requested, the Raspberry Pi would switch off power from wheelchair motors, save and email all the details of the driving session and state that an emergency exit was requested with the date/time of the request. An End of Session exit function was created that switched off power from the wheelchair motors and exited the program. In case an End of Session exit was requested the program would exit, save and email all the details of the driving session and state that no emergency was required. Data collected from emails contained the following: • • • • • • • • • •
Name of the user. Session start time. Type of input device used. Name of any medication that could impair user ability. Time when the user was last administered that medication. Time of the next dose of that medication. Duration of moving backward in seconds. Duration of moving forward in seconds. Duration of moving to the left in seconds. Duration of moving to the right in seconds.
588
• • • • •
M. Haddad et al.
Duration when no switch was pressed. Order of switch presses. Number of time each switch was pressed. Whether an emergency exit was required. Session end time.
Figure 3 shows a screenshot of the CSV file attached to the emails including some sample data. Figure 4 shows a graphical display (pie-chart) for data obtained from the switches, in this example three switches were considered (Red, Yellow and Blue) which was a simple visual summary of the session.
Fig. 3. Example of CSV file showing driving session data.
Yellow 28% 45%
Blue
27% Red Fig. 4. Example of graphical representation (pie-chart) showing data obtained from switches.
Intelligent System to Analyze Data About Powered Wheelchair Drivers
589
3 Modification to the New System To meet privacy policy and confidentiality concerns of potential users, the new system was subjected to several modifications. The email function presented in the previous section was disabled and all data were stored in the microprocessor memory. Data were retrieved from the microprocessor memory on a monthly basis. A further modification to the new system was conducted to interface two IR head switches as shown in Fig. 5. The new modified system was mounted on to a powered wheelchair used by a student at Chailey Heritage School as shown in Fig. 6. Guardian consent was acquired and the collected data will be considered for analysis and will be used as a training set for an intelligent system to predict future route patterns. In addition, data will be used to analyze the ability of the user to operate their wheelchair, and monitor user’s progress from one session to another and compare the progress of different users with the same type of disability.
Fig. 5. Modified new system interfacing to IR head switches.
4 Discussion Research presented in this paper is part of a larger research project being conducted by the authors at the University of Portsmouth [1] that aims to improve mobility and enhance the quality of life of disabled powered wheelchair users by increasing their self-confidence and reliance. This research used a Raspberry Pi to store users’ driving session data and email data to a predefined email address for future reference and analysis. The new system was clinically trialled and results showed that it successfully stored session data and emailed data to a specified email address.
590
M. Haddad et al.
Fig. 6. Powered wheelchair using the modified new system.
Data received will be used as a training set to train an intelligent system to predict routing patterns for different wheelchair users, assess users’ progress from one session to another and compare the progress of the different users with similar types of disability. Data will be used to identify how different types of disability could affect user progress, what and how different types of medication could affect driving ability, identifying a “best time in the day” for each user to conduct driving session, and how driving ability was affected by the duration of driving. In addition, data will be used to estimate the time spent travelling by a user in each session, monitor user ability factors and the need for assistance and analyze which type of input device best suited each user and different route patterns.
5 Conclusions This paper presented a new system that gathered users’ driving session data and could be used as an interface between any user input device and powered wheelchair motors. The work will apply simple and computationally inexpensive AI software. A Python program was created and installed on a Raspberry Pi to gather users’ driving session data and save and email collected data to a predefined email address. A CSV file was created and attached to the emails containing driving session data. A graphical representation of the session data (pie-chart) was created and attached to the emails to improve understanding of any patterns. Modifications to the new system were applied to meet confidentiality and privacy concerns of collaborators and users.
Intelligent System to Analyze Data About Powered Wheelchair Drivers
591
The authors are building a friendlier user interface for the new system that could simplify the operation of a powered wheelchair and gather subjective feedback about driving sessions from helpers and carers. The authors are currently studying ways to apply machine learning and other AI techniques to improve this research. Results showed that the new systems performed satisfactorily.
6 Future Work Future work will consider the application of machine learning techniques on the collected data set to train an intelligent system to predict driving patterns, assess driving skills progress and analyze ability factors denoting the ability of a user to operate their wheelchair. Future work will also consider adding a new function to the Python program to indicate if an intervention by the helper/carer or from a collision avoidance device occurred during a driving session. This will be used to provide a distinction of who is operating the device and providing information regarding driving ability and the need for assistance factors. Data collected will also be linked to driving and environmental situations and the targeted activity. Finally, some decision making systems will be applied to the data [26–28]. Systems are clinically trialed at Chailey Heritage School as part of the bigger project [1]. Acknowledgment. Research in this paper was funded by EPSRC grant EP/S005927/1 and supported by The Chailey Heritage Foundation and the University of Portsmouth.
References 1. Sanders, D., Gegov, A.: Using artificial intelligence to share control of a powered-wheelchair between a wheelchair user and an intelligent sensor system. EPSRC Project 2019–2022 (2018) 2. Department for work and pensions (2017). https://assets.publishing.service.gov.uk/govern ment/uploads/system/uploads/attachment_data/file/692771/family-resources-survey-201617.pdf. Accessed 10 Jan 2020 3. Disability facts and figures [internet]. http://odi.dwp.gov.uk/disability-statistics-and-research/ disability-facts-and-figures.php. Accessed 2 Dec 2019 4. Sanders, D.A.: Non-model-based control of a wheeled vehicle pulling two trailers to provide early powered mobility and driving experiences. IEEE Trans. Neural Syst. Rehabil. Eng. 26(1), 96–104 (2018) 5. Parhi, D.R., Singh, M.K.: Rule-based hybrid neural network for navigation of a wheelchair. Proc. IMechE Part B J. Eng. Manuf. 224, 11103–11117 (2009) 6. Sanders, D.A., Gegov, A., Ndzi, D.: Knowledge-based expert system using a set of rules to assist a tele-operated mobile robot. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Studies in Computational Intelligence, vol. 751, pp. 371–392. Springer, Cham (2018) 7. Sanders, D.A., et al.: Rule-based system to assist a tele-operator with driving a mobile robot. In: Lecture Notes in Networks and Systems, vol. 16, pp. 599–615. Springer (2018) 8. Parhi, D.R., et al.: The stable and precise motion control for multiple wheelchairs. Appl. Soft Comput. 9(2), 477–487 (2009)
592
M. Haddad et al.
9. Nguyen, V., et al.: Strategies for human - machine interface in an intelligent wheelchair. In: 35th Annual International Conference of IEEE Engineering in Medicine & Biology Society Conference Proceedings (EMBC), Osaka, Japan, pp. 3638–3641 (2013) 10. Sanders, D., Langner, M., Bausch, N., Huang, Y., Khaustov, S.A., Simandjuntak, S.: Improving human-machine interaction for a powered wheelchair driver by using variable-switches and sensors that reduce wheelchair-veer. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications. Advances in Intelligent Systems and Computing, vol. 1038, pp. 1173–1191. Springer, Cham (2019) 11. Okonor, O.M., Gegov, A., Adda, M., Sanders, D., Haddad, M.J.M., Tewkesbury, G.: Intelligent approach to minimizing power consumption in a cloud-based system collecting sensor data and monitoring the status of powered wheelchairs. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications. Advances in Intelligent Systems and Computing, vol. 1037, pp. 694–710. Springer, Cham (2019) 12. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.O.: Using a simple expert system to assist a powered wheelchair user. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications. Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–679. Springer, Cham (2019) 13. Bausch, N., Shilling, P., Sanders, D., Haddad, M.J.M., Okonor, O.M., Tewkesbury, G.: Indoor location and collision feedback for a powered wheelchair system using machine learning. In: IEEE SAI Intelligent Systems Conference, London, United Kingdom. Advances in Intelligent Systems and Computing, vol. 1, pp. 721–739. Springer (2019) 14. Tewkesbury, G., Sanders, D., Haddad, M.J.M., Bausch, N., Gegov, A., Okonor, O.M.: Task programming methodology for powered wheelchairs. In: 2019 IEEE SAI Intelligent Systems Conference, London, United Kingdom. Advances in Intelligent Systems and Computing, vol. 1, pp. 711–720. Springer (2019) 15. Sanders, D., Tewkesbury, G., Parchizadeh, H., Robertson, J.J., Omoarebun, P.O., Malik, M.: Learning to drive with and without intelligent computer systems and sensors to assist. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1171–1181. Springer, Cham (2019) 16. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 822–838. Springer, Cham (2019) 17. Sanders, D., Gegov, A., Tewkesbury, G., Khusainov, R.: Sharing driving between a vehicle driver and a sensor system using trust-factors to set control gains. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1182–1195. Springer, Cham (2019) 18. Sanders, D.A., et al.: Results from investigating powered wheelchair users learning to drive with varying levels of sensor support. In: Proceedings of the SAI Intelligent System, London, U.K, pp. 241–245 (2017) 19. Haddad, M.J.M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 680–693. Springer, Cham (2019) 20. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan Sayed, M.: Learning to make intelligent decisions using an Expert System for the intelligent selection of either PROMETHEE II or the Analytical Hierarchy Process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1303–1316. Springer, Cham (2019) 21. Haddad, M.J.M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235 (2019)
Intelligent System to Analyze Data About Powered Wheelchair Drivers
593
22. Haddad, M.J.M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.C.: Initial results from using Preference Ranking Organization METHods for Enrichment of Evaluations to help steer a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 648–661. Springer, Cham (2019) 23. Sanders, D., Wang, Q., Bausch, N., Huang, Y., Khaustov, S.A., Popov, I.: A method to produce minimal real time geometric representations of moving obstacles. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 881–892. Springer, Cham (2019) 24. Song, K.T., Chen, C.C.: Application of asymmetric mapping for wheelchair navigation using ultrasonic sensors. J. Intell. Wheelchair Syst. 17(3), 243–264 (1996) 25. Sachdeva, P., Katchii, S.: A review paper on Raspberry Pi. Int. J. Curr. Eng. Technol. 4(6), 3818–3819 (2014) 26. Haddad, M.J.M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019) 27. Haddad, M.J.M., Sanders, D., Tewkesbury, G.: Selecting a discrete Multiple Criteria Decision Making method to decide on a corporate relocation. Arch. Bus. Res. 7(5), 48–67 (2019) 28. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M.J.M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using Ambient Intelligence and Artificial Intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 2. Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer, Cham (2019)
Intelligent Control of the Steering for a Powered Wheelchair Using a Microcomputer Malik Haddad1(B) , David Sanders1 , Martin Langner2 , Nils Bausch1 , Mohamad Thabet1 , Alexander Gegov1 , Giles Tewkesbury1 , and Favour Ikwan1 1 University of Portsmouth, Portsmouth PO1 3DJ, UK
[email protected] 2 Chailey Heritage Foundation, North Chailey BN8 4EF, UK
Abstract. The research presented in this paper describes a new architecture for controlling powered wheelchairs. A Raspberry Pi microcomputer is considered to assist in controlling direction. A Raspberry Pi is introduced between user input switches and powered wheelchair motors to create an intelligent Human Machine Interface (HCI). An electronic circuit is designed that consists of an ultrasonic sensor array and a set of control relays. The sensors delivered information about obstructions in the surrounding environment of the wheelchair. Python programming language was used to create a program that digitized the user switches output and assessed information provided by the ultrasonic sensor array. The program was installed on a Raspberry Pi and the Raspberry Pi controlled power delivered to the motors. Tests were conducted and results showed that the new system successfully assisted a wheelchair user in avoiding obstacles. The new architecture can be used to intelligently interface any input device or sensor system to powered wheelchair. Keywords: Direction · Avoidance · Python · Wheelchair · Raspberry Pi · Steer
1 Introduction Research described here is part of a bigger research project to improve mobility and enhance the quality of life of disabled powered wheelchair users by increasing their selfconfidence and reliance [1]. A new architecture is described that acquires steering inputs from users and intelligently combines them with sensor readings to assist a disabled user to drive safely. The World Health Organization’s (WHO) report on disability stated about 15% of world population were suffering of some type of disability and 2–4% of them experience significant difficulties in mobility. These numbers were higher than previous WHO estimates due to population ageing, rapid spread of chronic diseases, as well as modern medical treatment improvement [2]. In many cases, people with disabilities struggle with daily manoeuvring tasks and can be dependent on helpers and carers for other daily activities [3]. Disabled users of powered wheelchairs may not have the required ability or sufficient mobility due to © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 594–603, 2021. https://doi.org/10.1007/978-3-030-55190-2_44
Intelligent Control of the Steering for a Powered Wheelchair
595
hand, finger, shoulder, arm or more widespread incapability, and may not have adequate lower limb strength to propel a manual wheelchair [4]. An intelligent driving system could ease their mobility problem and reduce their dependency on others. A smart wheelchair is a combination of a powered wheelchair with an intelligent sensor system and computer algorithms needed to make immediate intelligent decisions for path planning and collision avoidance [3]. A driver supplies speed and direction using an input device, for example levers, switches or joysticks. A wheelchair then starts to move at the desired speed towards the chosen direction. A user can then make adjustments to avoid obstacles. The preferred direction is combined with sensor output [5–17] to help the drivers. Many researchers investigated navigation and steering [18–22] and obstacles avoidance [23] has been considered using local sensors [24], but they have barely moved out of the laboratory to assist disabled wheelchair users. A new architecture is described here that uses sensors to effectively assist with driving. The new system quickly responds to objects and aims to prevent a user from driving in the direction of an obstacle. Raspberry Pi microcomputers have often been seen as reliable powerful microcomputers with small physical size and low cost. Two models were launched in 2012: A and B. In 2014, Model B+ was launched with numerous enhancements based on users’ feedback [25]. The research conducted in this paper used Raspberry Pi model B+. An ultrasonic sensor array was considered to detect objects surrounding a wheelchair. Powered wheelchair users provided their desired direction using binary switches. Python programming language was used to create a program that acquired users’ direction and analyzed information provided from the ultrasonic sensor array. A program was installed on a Raspberry Pi. The Raspberry Pi and the electronic circuit converted users input from switches into digital logic levels. A compromise between the desires of a driver and distance to nearby obstacles was conducted. If there was no obstacle detected in the direction supplied by the user, the Raspberry Pi activated a set of specific relays that provided power to the wheelchair motor responsible for driving the wheelchair in the chosen direction. If an obstacle was detected in that direction, the Raspberry Pi switched-off power to the motor responsible for driving the wheelchair in that direction and avoided the obstacle. Detection range could be digitally tuned by the intelligent controller from 2 cm to 500 cm. New courses were simulated before testing in the laboratory using ultrasonic sensor arrays, Fig. 1. Many researchers considered sensors to assist wheelchair users with avoiding items safely [26]: infrared sensors [27], ultrasonics [28], and structured lighting [29]. Global methods cannot coordinated output when used inside buildings [30] and local systems provided more successful results: gyroscopes, odometers, tilt sensors and ultrasonics [31–33]. Cameras are becoming affordable but the processing of the data can be complicated. Computers are becoming affordable and more powerful [30], so cameras are often used. Despite that, a disabled driver can often provide the best data about what is required, but visual impairment or a disability may still reduce that ability [34]. In the work presented in this paper, ultrasonic sensors were considered due to their simplicity, robustness, and low cost [32]. Section 2 briefly explains the sensors, Sect. 3 presents the intelligent controller and HCI created to interface to the sensors. Section 4 presents results and Sect. 5 presents discussion and conclusions.
596
M. Haddad et al.
Fig. 1. Powered wheelchairs considered for the research.
2 Sensors Sensors considered in this research were the same as those used in [35]. Sensors were placed under footrests [36]. The distance from a detected object was determined by measuring the flight time required for a pulse to be sent and reflected back to a receiver(s) [37]. Figure 2 shows the envelop of the ultrasonic sensor and a potential grid that can be created. If no object was detected, then range could be digitally increased until an object was detected. Warnings could then be generated regarding obstructions ahead. HC SR04 ultrasonic sensors were studied and tested by considering different objects to create polar plots. The physical architecture of these sensors prevented them from suffering from side lobe interference. Objects with different shapes and sizes were considered. The sensors detected small cylindrical objects with diameters less than 0.65 cm 15 cm away to larger objects 5 m away. Figure 3, 4, 5 and 6 show polar plots for different sensors detecting different objects at different distances. An architecture of five ultrasonic sensors was considered to produce an array, the ultrasonic sensors array was used to assess the wheelchair surroundings. Five HC SR04 ultrasonic sensors were used (3 were mounted to the front, 1 to the right and 1 to the left). The sensors provided distance to the nearest object in their detection zone. Python programming language was used to create a program that stopped a wheelchair from moving in the direction of an object if it was detected at a close distance. The range could be digitally tuned in the program from 2 cm to 500 cm.
Intelligent Control of the Steering for a Powered Wheelchair
597
Sample beam pattern.
2 m.
Furthest. 1 m.
Intermediate. Adjacent.
1m.
0m.
1m.
Fig. 2. The envelope for the ultrasonic sensor.
105
97.5
90
82.5
75
112.5
67.5
120
60
150
30
0
180 0
0.025
0.05
0.075
0.1
Meters
Fig. 3. Polar plot for HC SR04 ultrasonic sensor detecting 0.639 cm cylindrical metal object.
3 Intelligent Controller and Human Computer Interface (HCI) An electronic circuit was created to digitise the output of the users’ switches. A Raspberry Pi microcomputer was introduced between the users’ switches and the wheelchair motors. The Raspberry Pi could be used as an intelligent controller and interface to any type of input device such as a joystick, chin, blow or head switch, or an EEG helmet. User switches were connected the Raspberry Pi using a 9 pin D-connector. The following stages were used to digitize the output from the users’ switches in order to create the intelligent controller and HCI:
598
M. Haddad et al.
Fig. 4. Polar plot for HC SR04 ultrasonic sensor detecting 2.188 cm cylindrical metal object.
Fig. 5. Polar plot for HC SR04 ultrasonic sensor detecting (4 × 8) cm rectangular plastic object.
Fig. 6. Polar plot for HC SR04 ultrasonic sensor detecting (5.5 × 8.2) cm rectangular plastic object.
1. Voltage drop (12 V → 5 V): The switches operated two 12 V DC motors. Raspberry Pi operated at 5 V DC. The output from the input switches was reduced to 5 V DC using CD4050BE Hex Buffer.
Intelligent Control of the Steering for a Powered Wheelchair
599
2. Python program 1: A program was created to acquire the output from switches. 3. Common collector circuit: Common collector circuits were created to operate the 5 V DC relays. Figure 7 shows the circuit diagram of the common collector circuit used. Figure 8 shows a prototype of the circuits used. 4. Python program 2: Controls the wheelchair and provides intelligent HCI.
Fig. 7. Common collector circuit diagram.
Fig. 8. Common collector circuit considered for the new system.
A flywheel diode was used to provide a path for the current when the coil was switched off, this would prevent a voltage spike from occurring and damaging switch contacts or destroying switching transistors. The architecture described in this section successfully interfaced input switches and intelligently controlled the operation of the switches to accurately transfer the desires of
600
M. Haddad et al.
the wheelchair users to the wheelchair motors. The input from the switches (representing the wishes of the user) were passed on to a basic intelligent system contained within the Raspberry Pi. The software is written in Python. The information form the switches will then be combined with the input from the sensors to produce a new speed and direction. The new trajectory will be safer and avoid collision. A simple system has been created now to test scenarios and to gauge what might be possible. Once testing of the basic system is completed then more and more intelligence will be added.
4 Results The work presented in this paper considered an ultrasonic sensor array to assess the surrounding of a powered wheelchair, digitized the output of users’ switches and helped wheelchair users in driving their wheelchairs safely. A driver provided a desired direction by pressing a switch, which triggered a specific 5 V DC relay that controlled the power supplied to the wheelchair motor responsible to drive the wheelchair in the desired direction. Binary switches were used as an interface between a disabled user and their wheelchair to adjust bearing and speed. The digitization of the output of the binary switches combined with data from a sensor system could enhance driving by decreasing the number of collisions. Human drivers often use their own skill to drive their powered wheelchair, but sensors delivered a more repeatable and accurate reading, and could cover for any lack of ability or awareness. The new system blended human driving skill with autonomy and then involvement from the systems if necessary. When moving through complex or changing environment the sensors could provide better outcomes that could lead to safer decisions. Disabled wheelchair drivers could successfully drive their wheelchairs using their input device and the system avoided obstacles. The sensors ensured that the wheelchair was safe as it moved.
5 Discussion and Conclusions The architecture presented in this paper is part of a larger research project concerned with improving mobility and enhancing the quality of life of disabled powered wheelchair users by increasing their self-confidence and reliance [1]. This paper presented a system that could interface any user input device to a powered wheelchair. The new systems might reduce the need for helpers and introduce more independence. The electrical current that drove the motors was created from intelligently mixing the output from the sensor reading and the desired direction provided by a disabled driver. In an environment with few obstacles or if the obstacle(s) were far apart, then a driver would not need any support. If there were many obstacles in the way or there were obstructions close to the wheelchair, then the new sensor system was able to reduce the weight of the inputs from joysticks or switches to prevent a collision from happening.
Intelligent Control of the Steering for a Powered Wheelchair
601
The authors are currently studying ways to apply Machine Learning and other computationally inexpensive AI techniques to improve this research. Future work will consider the application of Machine Learning techniques and aim at collecting user data to predict driving patterns, assess driving skills progress and analyze ability factors denoting the ability of a user to operate their wheelchair. Specifically some decision-making algorithms will be investigated [38–40]. Results showed that the new system performed accurately. Systems and methods are clinically trialed at Chailey Heritage School as part of a larger ERSPC research project [1]. Acknowledgment. Research in this paper was funded by EPSRC grant EP/S005927/1 and supported by The Chailey Heritage Foundation and the University of Portsmouth.
References 1. Sanders, D., Gegov, A.: Using artificial intelligence to share control of a powered-wheelchair between a wheelchair user and an intelligent sensor system. EPSRC Project 2019–2022 (2018) 2. Joshi, M.K., Gupta, M.V., Gosavi, M.M., Wagh, M.S.: A multifunctional smart wheelchair. Int. J. Adv. Res. Electron. Commun. Eng. 4(5), 1281–1284 (2015) 3. Leaman, J., La, H.M.: A comprehensive review of smart wheelchairs: past, present, and future. IEEE Trans. Hum.-Mach. Syst. 47(4), 486–499 (2017) 4. Sanders, D.A.: Non-model-based control of a wheeled vehicle pulling two trailers to provide early powered mobility and driving experiences. IEEE Trans. Neural Syst. Rehabil. Eng. 26(1), 96–104 (2018) 5. Parhi, D.R., Singh, M.K.: Rule-based hybrid neural network for navigation of a wheelchair. Proc. IMechE Part B J. Eng. Manuf. 224, 11103–11117 (2009) 6. Sanders, D.A., Gegov, A., Ndzi, D.: Knowledge-based expert system using a set of rules to assist a tele-operated mobile robot. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Studies in Computational Intelligence 2018, vol. 751, pp. 371–392. Springer, Cham (2018) 7. Sanders, D.A., et al.: Rule-based system to assist a tele-operator with driving a mobile robot. In: Lecture Notes in Networks and Systems, vol. 16, pp. 599–615. Springer (2018) 8. Haddad, M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 680–693. Springer, Cham (2019) 9. Haddad, M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235 (2019) 10. Haddad, M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.C.: Initial results from using Preference Ranking Organization METHods for Enrichment of Evaluations to help steer a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 648–661. Springer, Cham (2019) 11. Sanders, D., Tewkesbury, G., Parchizadeh, H., Robertson, J.J., Omoarebun, P.O., Malik, M.: Learning to drive with and without intelligent computer systems and sensors to assist. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1171–1181. Springer, Cham (2019) 12. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 822–838. Springer, Cham (2019)
602
M. Haddad et al.
13. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan Sayed, M.: Learning to make intelligent decisions using an Expert System for the intelligent selection of either PROMETHEE II or the Analytical Hierarchy Process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1303–1316. Springer, Cham (2019) 14. Sanders, D., Gegov, A., Tewkesbury, G., Khusainov, R.: Sharing driving between a vehicle driver and a sensor system using trust-factors to set control gains. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1182–1195. Springer, Cham (2019) 15. Sanders, D., Langner, M., Bausch, N., Huang, Y., Khaustov, S.A., Simandjuntak, S.: Improving human-machine interaction for a powered wheelchair driver by using variable-switches and sensors that reduce wheelchair-veer. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1038, pp. 1173–1191. Springer, Cham (2019) 16. Okonor, O.M., Gegov, A., Adda, M., Sanders, D., Haddad, M., Tewkesbury, G.: Intelligent approach to minimizing power consumption in a cloud-based system collecting sensor data and monitoring the status of powered wheelchairs. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 694–710. Springer, Cham (2019) 17. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.O.: Using a simple expert system to assist a powered wheelchair user. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–679. Springer, Cham (2019) 18. Parhi, D.R., et al.: The stable and precise motion control for multiple wheelchairs. Appl. Soft Comput. 9(2), 477–487 (2009) 19. Nguyen, V., et al.: Strategies for human - machine interface in an intelligent wheelchair. In: 35th Annual International Conference of IEEE Engineering in Medicine & Biology Society Conference Proceedings, (EMBC), Osaka, Japan, pp. 3638–3641 (2013) 20. Tewkesbury, G., Sanders, D., Haddad, M., Bausch, N., Gegov, A., Okonor, O.M.: Task programming methodology for powered wheelchairs. In: 2019 IEEE SAI Intelligent Systems Conference, London, United Kingdom. Advances in Intelligent Systems and Computing, vol. 1, pp. 711–720. Springer (2019) 21. Sanders, D., Wang, Q., Bausch, N., Huang, Y., Khaustov, S.A., Popov, I.: A method to produce minimal real time geometric representations of moving obstacles. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 881–892. Springer, Cham (2019) 22. Bausch, N., Shilling, P., Sanders, D., Haddad, M., Okonor, O.M., Tewkesbury, G.: Indoor location and collision feedback for a powered wheelchair system using machine learning. In: 2019 IEEE SAI Intelligent Systems Conference. Advances in Intelligent Systems and Computing, London, United Kingdom, vol. 1, pp. 721–739. Springer (2019) 23. Sanders, D.A., et al.: Results from investigating powered wheelchair users learning to drive with varying levels of sensor support. In: Proceedings of the SAI Intelligent System, London, U.K, pp. 241–245 (2017) 24. Song, K.T., Chen, C.C.: Application of asymmetric mapping for wheelchair navigation using ultrasonic sensors. J. Intell. Wheelchair Syst. 17(3), 243–264 (1996) 25. Sachdeva, P., Katchii, S.: A review paper on Raspberry Pi. Int. J. Curr. Eng. Technol. 4(6), 3818–3819 (2014) 26. Sanders, D., Langner, M., Tewkesbury, G.: Improving wheelchair-driving using a sensor system to control wheelchair-veer and variable-switches as an alternative to digital-switches or joysticks. Ind. Robot Int. J. 37(2), 151–167 (2010) 27. Lee, S.: Use of infrared light reflecting landmarks for localization. Ind. Robot Int. J. 36(2), 138–145 (2009)
Intelligent Control of the Steering for a Powered Wheelchair
603
28. Sanders, D., Stott, I.: A new prototype intelligent mobility system to assist powered wheelchair users. Ind Robot 26(6), 466–475 (2009) 29. Larsson, J., Broxvall, M., Saffiotti, A.: Laser-based corridor detection for reactive Navigation. Ind. Robot Int. J. 35(1), 69–79 (2008) 30. Milanes, V., Naranjo, J., Gonzalez, C.: Autonomous vehicle based in cooperative GPS and inertial systems. Robotica 26, 627–633 (2008) 31. Sanders, D.A.: Controlling the direction of walkie type forklifts and pallet jacks on sloping ground. Assem. Autom. 28(4), 317–324 (2008) 32. Sanders, D.: Recognizing shipbuilding parts using artificial neural networks and Fourier descriptors. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 223(3), 337–342 (2009) 33. Chang, Y.C., Yamamoto, Y.: On-line path planning strategy integrated with collision and dead-lock avoidance schemes for wheeled wheelchair in indoor environments. Ind. Robot Int. J. 35(5), 421–434 (2008) 34. Sanders, D.: Comparing speed to complete progressively more difficult mobile robot paths between human tele-operators and humans with sensor-systems to assist. Assem. Autom. 29(3), 230–248 (2009) 35. Sanders, D.A., Bausch, N.: Improving steering of a powered wheelchair using an expert system to interpret hand tremor. In: Proceedings of Intelligent Wheelchair and Applications, Part II (ICIRA 2015), vol. 9245, pp. 460–471 (2015) 36. Sanders, D.A.: Using self-reliance factors to decide how to share control between human powered wheelchair drivers and ultrasonic sensors. IEEE Trans. Neural Syst. Rehabil. Eng. 25(8), 1221–1229 (2017) 37. Sanders, D.A., et al.: Tele-operator performance and their perception of system time lags when completing mobile robot tasks. In: Proceedings of the 9th International Conference on Human Systems Interaction, pp. 236–242 (2016) 38. Haddad, M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019) 39. Haddad, M., Sanders, D., Tewkesbury, G.: Selecting a discrete Multiple Criteria Decision Making method to decide on a corporate relocation. Arch. Bus. Res. 7(5), 48–67 (2019) 40. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer, Cham (2019)
Intelligent Risk Prediction of Storage Tank Leakage Using an Ishikawa Diagram with Probability and Impact Analysis Favour Ikwan1 , David Sanders1 , Malik Haddad1(B) , Mohamed Hassan2 , Peter Omoarebun1 , Mohamad Thabet3 , Giles Tewkesbury1 , and Branislav Vuksanovic3 1 School of Mechanical and Design Engineering, University of Portsmouth, Portsmouth, UK
[email protected], [email protected] 2 School of Chemical Engineering, University of Southampton, Southampton SO17 1BJ, UK 3 School of Energy and Electronic Engineering, University of Portsmouth, Portsmouth, UK
Abstract. Intelligent probability and impact analysis are used with an Ishikawa diagram. Causes of tank leakage events are identified. Causes were ranked and weights assigned to show their relative importance in the diagram. A Risk Score for each category of causes is identified using probability and impact analysis. The application is explored to predict the risk of leakage in a storage tank. That risk can be mixed with real time data to create an intelligent system. Various methods can be used to predict future system states centred upon an analysis of trends within historic or past data. A simple human computer interface is presented to display the results by overlaying ‘Fail’ or ‘Warning’ states on a schematic of a storage tank. Important information can be flagged alongside conditions. As an example, a surface graph, representing the storage tank condition over a ten-week period is displayed. A continuing deterioration in the score connected with “lack of operating procedures” is presented. Keywords: Ishikawa · Petroleum · Risk analysis · Storage tank
1 Introduction This paper describes the application of an Ishikawa diagram to predict the leakage risk within a storage tank by ranking and assigning weights to causes to show their relative importance. Probability and impact analysis were used to calculate a Risk-Score for individual categories of causes. The application was further explored to predict the risk of leakage in a storage tank. That risk could be mixed with real time data to create an intelligent system. Fires in tanks containing crude oil and explosions are recurrent accidents within oil terminals, petroleum refineries and petroleum storage facilities. They often result in human casualty, environmental pollution, and economic loss [1]. In order to minimize these accidents, risks needed to be minimized and controlled. Attention to the safety © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 604–616, 2021. https://doi.org/10.1007/978-3-030-55190-2_45
Intelligent Risk Prediction of Storage Tank Leakage
605
of industrial plants has increased and accidental release within petroleum processing facilities [2]. Risk could be defined as the possibility that someone or something is badly affected by a hazard and danger as a result of an unsafe situation or because of a potential damaging or undesirable event [3]. Risk could also be defined as a measure of hazard severity, or a measure of the probability and the severity of damage. A simple human computer interface is presented to display the results. An Ishikawa diagram (also called a “cause and effect” [4] or “Fishbone” diagram) was named after “Kaoru Ishikawa”, who originally developed the diagram in the 1960’s [5]. It could be used as a tool to identify root cause(s) for specific problems. It also afforded an organized way of considering causes and effects that created or contributed to top level effects. The diagram aided in identifying possible causes for a problem that might not otherwise be considered by focusing on the categories and considering alternative causes. In addition, the schema would be utilized to ascertain risks of sub-cause risk, cause risk and global-risk [6]. A model receiving continuous data would be processed with human and environmental factors. It would then be converted into real time risk. Such a real-time depiction of risk (current and cumulative) could help in making intelligent decisions using critical information and insight. There were numerous methods to show information in a complete, inclusive and userfriendly way. Specifically, “design for usability” and “human factors” were reviewed [7] and a main dashboard was represented using an Excel® spreadsheet.
2 Construction of an Ishikawa Diagram The Ishikawa diagram was created according to the logic scheme in Fig. 1.
Fig. 1. A logic scheme for implementation.
Problems, accidents, risks or consequences were identified and analyzed. The secondary and main-causes of the problem were established.
606
F. Ikwan et al.
The diagram was created and assessed against the following conditions [8]: • “Impact” (I) and “Probability of occurrences” (P) were derived and depicted by an equation R = P * I. • Causes were depicted by possibility, frequency of occurrence or probability. • Main-causes were regarded as effects (of second order or secondary). • There was an objective with an operational motive. • Causes of secondary effects were illustrated by named side-effects, and perpetrated identical conditions to main-causes. • The same conditions as for problem identification were fulfilled to identify main and secondary-causes. Repartition of causes and sub-causes could be in a preference order or could be random. Risk was analyzed after accepting the diagram. A course of action for risk causes was established along with global-risk for the effect (characterized event).
3 Analysis of “Leak in a Storage Tank” Diagram Accidents could also result in stock devaluation, lawsuits or company bankruptcy [9]. Risk is an uncertain condition or event that has a negative or positive effect on one or more objectives. The problem “leakage in a storage tank” was an undesirable event which could lead to a negative effect. The risk allotted to the event was called “risk of leakage in a storage tank”. The risk illustrated the dynamic using probability distribution of the effect occurring and the impact of the effect. Risk was expressed on a scale from 0 to 3 or an integer (1, 2, 3). A simplified Ishikawa diagram is shown in Fig. 2. The Ishikawa diagram was reduced for the ease of explanation in this paper. Some main and secondary-causes were deleted to simplify the diagram.
Fig. 2. A simplified Ishikawa diagram for the risk of leakage in a storage tank.
Figure 2 is characterized by three main-causes (outfilling, catastrophic tank rupture and crack formation leak) and nine secondary-causes. The representation on the diagram
Intelligent Risk Prediction of Storage Tank Leakage
607
axis had the main-causes situated above and to the left of the axis (outfilling), and below and to the right of the axis (catastrophic tank rupture and crack formation leak). Main and secondary-causes represent main-causes relevant to the process. Secondary-causes were selected to be nigh to the portrayal of the risk.
4 Codification of Causes Cause codification was crucial during risk analysis when making use of an Ishikawa diagram as it allowed for a simpler depiction of causes and operation [10]. Codification was formulated on: • • • • •
belonging to the right or left of the diagram; internal (endogenous) or external (exogenous) cause distribution; frequency of occurrence; grouping of code to be more representative; option to convert codification if criteria change; Table 1 shows a simplified codification of the causes shown in Fig. 2. Table 1. Codification of causes and sub-causes. Issue
Cause
1
Outfilling
Subcause
Code A1
1.1
Operator failure
A11
1.2
Valve shut off response
A12
1.3
Sensor level failure
A13
Acoustic signal failure
A14
1.4 2
Catastrophic tank rupture
Z1
2.1
Tank breaking
Z11
2.2
Reinforcement breaking
Z12
3
Crack formation leak
Z2
3.1
Corrosion
Z21
3.2
Insufficient revisions
Z22
3.3
Operator failure
Z23
5 Determining Risk of Leakage in a Storage Tank A leakage in a storage tank was the global-risk (Rg) in this model. Global-risk was shaped by risk of yielding a main-cause and exemplifies a weighted-sum representing
608
F. Ikwan et al.
a main-cause. Leakage in a storage tank, Rg, depicted the weighted-sum of risks from categories shown above and to the left (Ra), and below and to the right of the axis (Rz). Rg = Pa ∗ Ra + Pz ∗ Rz
(1)
Where the total of left and right category weights should equal one. (Pa + Pz = 1)
(2)
Each risk category was a weighted-sum of main risk causes allocated to right or left. n n Ra = Pi ∗ Rai ; Pi = 1 (3) i=1
i=1
Rai were the main-causes allocated to the left of diagram axis and Rzj were the maincauses allocated to the right axis. n n Rz = Pj ∗ Rzj ; Pj = 1 (4) j=1
j=1
Each main-cause risk illustrated the weighted-sum of secondary-cause risks to give. Rai = Pik ∗ Raik ; Pik = 1 (5) i,k
i,k
Where Raik represented the secondary-cause risks, which established main-causes on the left: Rzj = Pjl ∗ Rjl ; Pjl = 1 (6) jl
jl
Where Rzjl described the secondary-cause risks, which established the main-causes on the right. Global-risk was determined from direct formalizations or extracted from tables based on: • Secondary-cause risks were determined (Raik and P ik ; Rzjl and P jl ); • Main-cause risks were determined from weighted-sums of secondary-causes risks and evaluated using Rai and P i and Rzj and P j ; • Risk categories (Ra and Rz) were determined and evaluated for global-risk (Pa and Pz); • Global-risk (Rg) was determined. Weights were determined and weights assessment were depicted in a weights table as shown in Tables 2 and 3. Settlement of causes in the matrix of weights illustrated a different way of presenting weights, benefitting from a disclosure about any direct relationships between secondary and main-causes. In order to determine the global-risk (the risk of leakage in a storage tank), the following algorithms were determined:
Intelligent Risk Prediction of Storage Tank Leakage
609
Table 2. Main and secondary weights. Code Main 1 1.1 1.2 1.3 1.4
A1
2 2.1 2.2
Z1
3 3.1 3.2 3.3
Z2
• • • •
Sec.
Weights of sec causes
A11 A12 A13 A14
0.4 0.3 0.2 0.1
Z11 Z12 Z21 Z22 Z23
Weight control
Weight of main causes
Weight control
1
1
1
1
0.5
1
1
0.5
0.55 0.45 0.44 0.38 0.18
risks of secondary-causes (Raik and Rzjl ); risks of main-causes (Rai and Rzj ); risks categories by secondary-causes (Ra and Rz); global-risk (Rg).
The following calculations were conducted to apply an Ishikawa diagram in the case of “risk of leakage in a storage tank”: Risks of Secondary-Causes: These were determined using R = P * I, so that risk R equalled probability of event occurrence (P) multiplied by consequences or impact of it occurring (I). Impact and probabilities of the occurrence of this event were calculated using methods described in [6] and [10] and were listed in Table 4. Considering the frequency of occurrence of the secondary-causes, probabilities could be similar for different categories or groups of causes or for an entire group of causes but the impact was not likely to be the same [10]. For simplicity, equal probabilities were evaluated for secondary-causes and a main-cause was determined. Impact was evaluated in a different way for every secondary-cause. Determining Risk for Main-Causes: This was based on relationships between secondary-cause risks and their weights in determining a main-cause as shown in Tables 3 and 4. Ra1 = Pa11 ∗ Ra11 + Pa12 ∗ Ra12 + Pa13 ∗ Ra13 + Pa14 ∗ Ra14. Rz1 = Pz11 ∗ Rz11 + Pz12 ∗ Rz12. Rz2 = Pz21 ∗ Rz22 + Pz23 ∗ Rz14 + Pz15 ∗ Rz15. Therefore,
610
F. Ikwan et al. Table 3. The matrix of secondary and main cause weights and categories. Secondary causes
Main causes
A11 A12 A13 A14
0.4 0.3 0.2 0.1
Weight control
1
A1
Z1
Z11 Z12
0.55 0.45
Weight control
1 0.44 0.38 0.18
Weight control
1 1
Right category Z
0.5
Effect weight
1
0.44
1
0.56
Z2
Z21 Z22 Z23
Left category A
Weight control
0.5
Weight control
1
Table 4. Impact, probability and risk evaluated for causes. Current issue
Cause
Probability
Impact
Risk
1 2 3 4
A11 A12 A13 A14
0.75 0.75 0.75 0.75
0.35 0.28 0.46 0.15
0.26 0.21 0.34 0.11
5 6
Z11 Z12
0.43 0.43
1 0.82
0.43 0.35
7 8 9
Z21 Z22 Z23
0.55 0.55 0.55
0.72 0.44 0.56
0.39 0.24 0.30
Intelligent Risk Prediction of Storage Tank Leakage
611
Ra1 = (0.4 ∗ 0.26) + (0.3 ∗ 0.21) + (0.2 ∗ 0.34) + (0.1 ∗ 0.11) = 0.10 + 0.063 + 0.068 + 0.011 = 0.242. Rz1 = (0.55 ∗ 0.43) + (0.45 ∗ 0.35) = 0.236 + 0.157 = 0.393. Rz2 = (0.44 ∗ 0.39) + (0.38 ∗ 0.24) + (0.18 ∗ 0.30) = (0.171 + 0.091 + 0.054) = 0.316.
Determining Secondary-Cause Categories: This was based on the definition of the weighted-sum of the risks of secondary-causes for a particular category: Ra = p1 ∗ Ra1. Ra = 1 ∗ 0.242 = 0.242. Rz = p1 ∗ Rz1 + p2 ∗ Rz1. Rz = (0.5 ∗ 0.393) + (0.5 ∗ 0.316) = (0.196 + 0.158) = 0.354. Determining Global-Risk: This was based on a definition of risk weighted-sums of cause categories: Rg = (0.44 ∗ 0.242) + (0.56 ∗ 0.324) = (0.106 + 0.18) = 0.287.
6 Interpretation of Results On a scale of risk with three intensities (SLIGHT, MODERATE, IMPORTANT), a rate of 0.287 for global-risk of “leak in storage tank” positioned it as risk “MODERATE” (1.18 on a scale of 0 to 3). Risks and main-cause categories can frame the event represented by risk as shown in Table 5. Table 5. Vulnerabilities table. Issue
Cause
Code
Risk value
Risk area
1
Outfilling
A1
0.242
Minor
2
Catastrophic tank rupture
Z1
0.393 (1.62)
Major
3
Crack formation leak
Z2
0.316 (1.30)
Medium
4
Left category
A
0.242 (1)
Minor
5
Right category
Z
0.324 (1.33)
Medium
Risk treatment measures were considered. The vulnerability of the organization to this threat (leak in a storage tank) was determined. The left category had SLIGHT vulnerability for the causes and the right category had MODERATE vulnerability for the causes. The vulnerability of the organization with regards to the secondary-causes were: SLIGHT for Outfilling. IMPORTANT for Catastrophic Tank Rupture. MODERATE for Crack Formation leak.
612
F. Ikwan et al.
Other methods could have been used to interpret risk values. For example, comparing values calculated with a fixed acceptable level [10] or assuming that the acceptance level (Rp) for “risk of leakage in a storage tank” was 0.20 (SLIGHT – 0.82), and the value of global-risk (0.287; 1.18 - MODERATE) was compared. If: Rg < Rp, then risk is ignored, Rg > Rp, then risk must be promptly improved. Considering the example reflected in this paper, Rg = 0.287(1.18) > Rp = 0.2(0.82). Therefore, treatment measures were required. Necessity of measures are shown in Table 6. Table 6. Necessity of measures. Issue
Cause
Code
Situation
Necessity of measures
1
Outfilling
A1
0.242 > 0.2
Yes
2
Catastrophic tank rupture
Z1
0.393 > 0.2
Yes
3
Crack formation Leak
Z2
0.316 > 0.2
Yes
4
Left category
A
0.242 > 0.2
Yes
5
Right category
Z
0.324 > 0.2
Yes
7 Hierarchy of Causes A model of hierarchy could be created by weighting towards the value of the main-cause to present the cause contribution to the organizations’ vulnerability [8]. Hierarchy of causes for the case of risk is shown in Table 7. Table 7. Table showing the hierarchy of causes. Issue Cause
Code Size
Hierarchy
1
Catastrophic tank rupture Z1
0.393 1
2
Crack formation leak
Z2
0.316 2
3
Outfilling
A1
0.242 3
Weights of causes were established. The biggest value was 0.393 in the case analyzed. Weights for other causes were determined by multiplying weighting value (Mp) by the normalized risk value. Weighted values of secondary-cause risks are in Table 8.
Intelligent Risk Prediction of Storage Tank Leakage
613
Table 8. Table of weighted values Issue
Cause
Code
Size
Weighted value
1
Catastrophic tank rupture
Z1
0.393
10
2
Crack formation leak
Z2
0.316
8.04
3
Outfilling
A1
0.242
6.15
8 Real Time Data A model receiving continuous data, could be processed with human and environmental factors and translated into real time insight and risk awareness. Representing current and cumulative risk in real time could help in making decisions by providing some critical insight and information. Dynamic Risk Modelling combines the effect of technical conditions and human decisions and allows real-time data to be made available and linked with decisions and risk awareness, and could also provide an insight into the effect of hazards when controls or barriers fail or change. Predictive tools could be used with a variety of methods to predict future states of systems based on an analysis of historic trends. From these trends, a next expected value could be determined.
9 Human Computer Interface Fixed roof tanks as shown in Fig. 3 are employed to store various refined products, including heavy fuel oils and volatile material. A spreadsheet was used to present the main dashboard as it suited the requirement in order to see the objective. However, this was not the most systematic way of presenting information. Efficiency was attained by over-laying “Fail” or “Warning” situations onto a representation of a storage tank with pertinent information highlighted alongside the fail or warning situations. As an example, the surface graph, representing the storage tank condition over ten days (see Fig. 4), shows a steady reduction in the score associated with “lack of operating procedures”. Ten traits were scored between ‘0’ and ‘9’ against a set of criteria to deliver a regular and consistent approach to capture and nominate trait conditions. The assignment of Green, Amber and Red color: Good = Green (7–9), Warning = Amber (4–6), and Fail = Red (0–3). The pattern showing the ‘States’ (shown in Fig. 4 at the top of the surface-graph towards week 4), illustrated a change in ‘State’ (that is a change in color) when the level of the trait fell to 3. An operator observing a steady decline in a trait level(s) could reasonably predict that the level could continue dropping until a dangerous level is reached.
614
F. Ikwan et al.
Fig. 3. Fixed roof tank.
Fig. 4. Storage tank ten (10) days surface plot.
It could not be guaranteed that humans might be able recognize a dire situation and intervene. Instead, an intelligent system might recognize a reducing level(s) and draw attention to the likelihood of further reduction until dangerous levels might be reached. This ability to predict relies on an analysis of trends within the real time data. That is, future data could be predicted by projecting forward using trend analysis to predict future states depending on historic states. The Excel® linear regression function could provide a straightforward way of predicting future states for traits, based on current real time data.
Intelligent Risk Prediction of Storage Tank Leakage
615
10 Discussion and Conclusion An example of risk evaluation was presented for storage facilities because leakage of crude oil from a storage tank might lead to catastrophic accidents. Risks and consequences were identified and an Ishikawa diagram was created. Main and secondary-causes were finalized. The problem “leakage in a storage tank” was an undesirable event. Risk assigned to the event represented not only the dynamic state depending on probability of an effect occurring but also on the potential impact of an event. Risk was expressed on a scale of three levels. Codification of causes was completed because it was important in risk analysis and allowed an easier operation and representation of the causes. Leakage in a storage tank was the global-risk. It was conditioned by the risk of producing main-causes and represented the weighted-sum of the main-causes. The risk of main and secondary-causes, causes categories and global-risk were determined. This allowed structuring of treatment measures for vulnerable areas. The method involved evaluation of the impact of the causes, weights and probabilities, and helped in understanding the essence of risk analysis, risk treatment measures and risk values. On a scale of risk having three levels (SLIGHT, MODERATE, IMPORTANT), the vulnerability with regards to the secondary-causes were SLIGHT for outfilling, IMPORTANT for catastrophic tank rupture and MODERATE for crack formation leaks. Risk values obtained were compared with an acceptance level value in order to determine if the risk should be neglected or not. Weights for causes were determined by multiplying their weighting value with the risk. Risk analysis helped identify and manage risk and decision making helped to understand these risks, took appropriate actions and minimized impact in the case of disasters [11]. A model capable of receiving continuous data, can be processed with environmental and human factors and translated into real time risk awareness and insight. A real-time representation of current and cumulative risk could aid decision makers with critical information and insight. An example using a surface graph to represent the storage tank condition over a ten-week period showed the steady reduction in the score associated with “lack of operating procedures”. Research is now moving on to consider Multicriteria Decision Making Systems [12–15]
References 1. Daqing, W., Peng, Z., Liqiong, C.: Fuzzy fault tree analysis for fire and explosion of crude oil tanks. J. Loss Prev. Process Ind. 26, 1390–1398 (2013) 2. Marhavilas, P.K., Koulouriotis, D., Gemeni, V.: Risk analysis and assessment methodologies in the work sites: on a review, classification and comparative study of the scientific literature of the period 2000–2009. J. Loss Prev. Process Ind. 1924, 477–523 (2011) 3. José, L., Carmen, G.-C., Cristina, G.-G., Piedad, B.: Risk analysis of a fuel storage terminal using HAZOP and FTA. Int. J. Environ. Res. Public Health 14, 705 (2017) 4. Juran, J.M.: Juran’s Quality Handbook, 5th edn. McGraw-Hill, New York (1999) 5. Watson, G.: The legacy of Ishikawa. Qual. Prog. 37(4), 54–57 (2004) 6. Ciocoiu, C.N.: Managementul riscului. Teorii, practici, metodologii. ASE, Bucharest (2008) 7. Institute of Ergonomics & Human Factors. http://www.ergonomics.org.uk/. Accessed 15 June 2018
616
F. Ikwan et al.
8. Ilie, G.: De la management la guvernare prin risc. UTI Press & Detectiv, Bucharest (2009) 9. James, C., Cheng-Chung, L.: A study of storage tank accidents. J. Loss Prev. Process Ind. 19(1), 51–59 (2006) 10. Ilie, G., Carmen, N.C.: Application of fishbone diagram to determine the risk of an event with multiple causes. Manag. Res. Pract. 2(1), 1–22 (2010) 11. Ikwan, F.: Reducing energy losses and alleviating risk in petroleum engineering using decision making and alarm systems. J. Comput. Syst. Eng. 422–429 (2018) 12. Haddad, M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Intelligent Systems and Applications, Advances in Intelligent Systems and Computing, vol. 1037, pp. 680–693. Springer (2019) 13. Haddad, M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019) 14. Haddad, M., Sanders, D., Tewkesbury, G.: Selecting a discrete Multiple Criteria Decision Making method to decide on a corporate relocation. Arch. Bus. Res. 7(5), 48–67 (2019) 15. Haddad, M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235 (2019)
Use of the Analytical Hierarchy Process to Determine the Steering Direction for a Powered Wheelchair Malik Haddad(B) , David Sanders, Mohamad Thabet, Alexander Gegov, Favour Ikwan, Peter Omoarebun, Giles Tewkesbury, and Mohamed Hassan University of Portsmouth, Portsmouth PO1 3DJ, UK [email protected]
Abstract. The Analytical Hierarchy Process (AHP) is utilized to propose a driving course for a powered-wheelchair. A safe route for a wheelchair is proposed by a decision-making system that aims to avoid obstacles. Two ultrasonic transceivers are fitted onto a wheelchair. The area in front of a wheelchair is segmented to left and right zones. The system inputs are distance to an object from the midpoint of the chair, distance to an object from the left of the chair and distance to an object from the right of the chair. The resulting route is a blend between a provided direction from a user’s input device and a proposed direction from the decision-making system that steers a powered-wheelchair to safely avoid obstacles in the way of the wheelchair. The system helps a disabled user to navigate their wheelchair by deciding on a direction that is a compromise between a direction provided by the sensors and a direction desired by the driver. Sensitivity analysis investigates the effects of risk and uncertainty on the resulting directions. An appropriate direction is identified but a human driver can over-ride the decision if necessary. Keywords: Analytical Hierarchy Process · AHP · Wheelchair · Direction
1 Introduction A Multi Criteria Decision Making (MCDM) approach is presented that helps with the driving of a powered-wheelchair. Ultrasonic sensors are considered to provide information about the surroundings of a wheelchair. The new decision-making system then assists disabled drivers to drive safely. A driver provides a desired route and sensors generate an alternative route. Intelligent mixing of the two routes generates a new route. The desired route is modified depending on sensor readings [1–11] to help disabled drivers. Extensive research has been conducted on powered-wheelchair driving and navigation [12–17]. Outcomes have often been local, without global application. Approaches to avoid obstacles have been studied [18] that considered sensors providing local information [19]. The use of MCDM with sensors is presented. It can successfully drive the motors connected to the driving wheels of a powered wheelchair. The architecture promptly © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 617–630, 2021. https://doi.org/10.1007/978-3-030-55190-2_46
618
M. Haddad et al.
detects obstacles ahead, manages to turn in the desired direction specified by a human driver, and avoids obstacles. A best compromise route is provided that evade obstructions. A joystick controls direction and speed and then a MCDM system modifies the route if required. The desires of a driver are weighted against distance to nearby obstacles. The wheelchair considered in this research had two large driving wheels each separately connected to a driving motor. Direction and speed were achieved by providing the required power to each motor. A driver was able to drive their chair by varying electrical current to the two motors. Many researchers presented systems used to avoid obstacles [20], for example: infrared [21], ultrasonics [22] and structured lighting [23]. Global systems provided uncoordinated outcome inside buildings [24] but local systems were more successful, including: ultrasonics [25–28], gyroscopes, odometers or tilt sensors. Ultrasonic sensors were used because they were simple, affordable and reliable [28]. The sensors used in this research were the same as the sensors described in [29]. They were fitted under the footrests of the wheelchair [30]. Distance to obstacles were measured by calculating the flight time required for pulses to be reflected back from objects [30]. If no obstacle was sensed then detection ranges could be tuned by increasing the length of the ultrasonic pulses until an object was sensed. The area in front of a wheelchair was segmented into a left zone and right zone. A grid was then placed onto them. The grid consisted of three components: VERY CLOSE, CLOSE and FAR-OFF. Transceiver envelopes over-lapped so that a center column in the grid signified both left and right sensors detecting an object. Any obstacle in front of the wheelchair was labeled as either VERY CLOSE, CLOSE or FAR-OFF. Section 2 describes MCDM. Then Sect. 3 explains testing undertaken and presents some results. A short discussion is included in Sect. 4 and some conclusions are presented in Sect. 5.
2 MCDM and AHP MCDM methods are often considered as reliable decision-making methods that produced a systemic and straight forward approach to improve objectivity and generate results with reasonable satisfaction [31]. MCDM help in solving real-world problems with multiple conflicting criteria, and they produce a suitable solution even when faced with several possible scenarios that incorporated risk, uncertainty or fuzziness [32, 33]. This is the first time this type of method has been applied to a powered-wheelchair application. The Analytical Hierarchy Process (AHP) is a method based on applying pairwise judgments amongst choices with respect to each criterion and then coherently aggregating them to provide an overall weighting for each choice with respect to all criteria. Criteria weights indicated their relative significance. AHP used the eigenvalue method [34], where a consistent matrix with known priorities was created and a comparison of choices i and j is given by pi /pj . The comparison matrix is multiplied by the priority vector p: A.p = n.p Where:
(1)
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
619
p: Priority vector. n: Matrix dimension. A: Comparison matrix. Ishizaka and Labib [35] provided the following steps for a decision-making process that used AHP: • • • • • •
Modelling of a problem. Conduct pairwise comparisons. Identify qualitative and quantitative judgments scales. Evaluation of local weights Check for consistency. Evaluation of global weights using (2). Wj .lij . Pi = j
(2)
Where: Pi : global weight of the choice i. wj : weight of the criterion j. l ij : local weight. • Sensitivity analysis. AHP aim at helping decision makers in reaching a suitable decision that corresponded to the global goal of the problem as well as their aims and their understanding of problems. The next section describes testing.
3 Testing The new system inputs were: Distance to an object from the midpoint of the chair (Dc ), Distance to an object from the left of the chair (Dl ), Distance to an object from the right of the chair (Dr ). If no object was sensed in any direction then Distance was set to one. Three setups are described as a powered-wheelchair moved in a setting with some boxes as obstructions. • Setup 1: No boxes in the surrounding (Point A in Fig. 1). • Setup 2: Box spotted to the right (Point B in Fig. 1). • Setup 3: Box spotted to the left (Point C in Fig. 1). Three choices were weighted: Move-Forward, Move-Right and Move-Left. Each choice was given a weight and these are shown in Table 1 as a “decision matrix”.
620
M. Haddad et al.
Fig. 1. Powered wheelchair driving through an environment with some cardboard boxes as obstacles.
Table 1. Decision matrix for powered wheelchair. Choice
Criteria
Move left (A1 )
Dl
Dc
0.5
0.25 0.167
Move forward (A2 ) 0.333 0.5 Move right (A3 )
Dr 0.333
0.167 0.25 0.5
The area in front of a wheelchair was segmented to left and right zones. If no object was detected Dl, Dc and Dr were fixed to 1. If the right transducer sensed an object and the left transduce did not then Dl was fixed to 1 and Dc and Dr were evaluated using (3) and (4). Dc = D cos(α)
(3)
Dr = D sin(α)
(4)
Where: D = Distance between the wheelchair and an object; α = Angle were the object was sensed. If the left transducer sensed an object and the right transduce did not then Dr was fixed to 1 and Dc and Dl were evaluated using (3) and (5). Dl = D sin(α)
(5)
Where: D = Distance between the wheelchair and an object; α = Angle where the object was sensed. Setup 1 (point A in Fig. 1): The powered-wheelchair began moving, nothing was sensed, Distances were all fixed to 1. AHP was employed with three choices and three criteria being considered. AHP yielded a ranking of choices: A2 > (A1 = A3 ). Global weights for choices were: A1 = 0.306, A2 = 0.389 and A3 = 0.306. The global weights
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
621
Global Weights Expressed as Vector Magnitudes 0.45
0.4 0.35 0.3 0.25
Move right 90
0.2
Move forward
0.15
Move left 90
0.1
-0.4
-0.3
-0.2
0.05 0 0 -0.1 -0.05 0
0 0.1
0.2
0.3
0.4
Fig. 2. Global weights of choices expressed as vector magnitudes, setup 1: no boxes were sensed.
Overall route from AHP
Fig. 3. Overall route yielded from AHP for setup 1.
were expressed as vector magnitudes as shown in Fig. 2 and the overall route yielded from AHP was evaluated using vector algebra and shown as a thick black line in Fig. 3.
622
M. Haddad et al.
Sensitivity analysis was applied to assess the stability of AHP results if risk and uncertainty would affect criteria weights. The smallest changes needed to alter the results were evaluated. Table 2 shows the results where N/F stands for a non-feasible value where ±100% modification to a criterion value did not alter the result. Table 2. Minimum percentage change needed in criteria weights to alter the outcome of AHP, setup 1, no boxes were sensed. Criterion name Minimum percentage change Dl
±0.1%
Dc
N/F
Dr
±0.1%
Sensitivity analysis revealed that a 0.1% modification in Dl or Dr could alter the route of the powered-wheelchair. 0.1% rise in Dl or 0.1% reduction in Dr caused the wheelchair drive ahead with a small angle to the left, a 0.1% reduction in Dl or a 0.1% rise in Dr made the wheelchair drive ahead with a small angel to the right. Setup 2 (Point B in Fig. 1): The powered-wheelchair moved ahead, where a box was positioned to the right of the wheelchair. Sensors detected the box shown at point B in Fig. 1. The obstacle was 0.5 m away at an angle of 45° as shown in Fig. 4.
⁰
Fig. 4. Setup 2: box sensed to the right.
Dl was fixed to 1 since the left transducer detected nothing. Dc and Dr were evaluated using (3) and (4). AHP was employed with three choices and three criteria being considered. AHP yielded a ranking of choices: A1 > A2 > A3 . Global weights of choices were: A1 = 0.379, A2 = 0.368 and A3 = 0.253. The global weights were expressed as vector magnitudes as shown in Fig. 5 and the overall route yielded from AHP was evaluated using vector algebra and shown as a thick black line in Fig. 6. Sensitivity analysis was applied to assess the stability of AHP results if risk and uncertainty would affect criteria weights. The smallest changes needed to alter the results were evaluated. Results are shown in Table 3.
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
623
Fig. 5. Global weights of choices expressed as vector magnitudes, setup 2: box sensed to the right.
Overall route from AHP
Fig. 6. Overall route yielded from AHP for setup 2. Table 3. The minimum percentage change needed in criteria weights to alter the outcome of AHP, setup 2: 1 box sensed to the right. Criterion name Minimum percentage change Dl
−5.81%
Dc
19.32%
Dr
21.59%
Sensitivity analysis identified the effect of change in Dr , Dc and Dl on the overall route of the wheelchair. A 5.81% % reduction in Dl , a 19.32% rise in Dc or a 21.59% rise in Dr caused the wheelchair to drive ahead and left with an angle of 134°.
624
M. Haddad et al.
Setup 3 (Point C in Fig. 1): The powered-wheelchair moved ahead where a box was positioned to the left of the wheelchair. Sensors detected the box at point C in Fig. 1. The obstacle was 0.2 m away at an angle of 30° as shown in Fig. 7.
⁰
Fig. 7. Setup 3: box sensed to the left.
Since the right transducer detected nothing Dr was fixed to 1, Dc and Dl were evaluated using (3) and (5). AHP was employed with three choices and three criteria being considered. AHP yielded a ranking of choices: A3 > A2 > A1 . Global weights of choices were: A1 = 0.204, A2 = 0.368 and A3 = 0.44. The global weights were expressed as vector magnitudes as shown in Fig. 8 and the overall route yielded from AHP was evaluated using vector algebra and shown as a black line in Fig. 9.
Fig. 8. Global weights of choices expressed as vector magnitudes, setup 3: box sensed to the left.
Sensitivity analysis was applied to assess the stability of AHP results if risk and uncertainty would affect criteria weights. The smallest changes needed to alter the results were evaluated. Results are shown in Table 4 where N/F stands for a non-feasible value where ±100% modification to a criterion value did not alter the result.
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
625
Overall route from AHP
Fig. 9. Overall route yielded from AHP for setup 3.
Table 4. Minimum percentage change in criteria weights needed to change the outcome of AHP, setup 3: 1 box sensed to the left. Criterion name Minimum percentage change Dl
N/F
Dc
N/F
Dr
25.7%
Sensitivity analysis identified the effect of change in Dr , Dc and Dl on the overall route of the wheelchair. A 400.01% rise in Dl , a 188.97% rise in Dc or a 25.7% rise in Dr caused the wheelchair to drive right and forward with an angel of 46.
4 Discussion The new system successfully presented a new approach to mix the desired direction provided by wheelchair driver with output form ultrasonic sensors. Powered-wheelchair drivers controlled their chairs using joysticks and sensors adjusted their route if required. Sensors systems provided a safe route for the wheelchair. The output that control the wheelchair motor was a function of output from the decision-making system and the weighted desire of the user. Ccomd (the overall control command) was derived using (6): Ccomd = (Gh |J | + kt Csens )
(6)
Where Gh |J| is the joystick command, Csens is the decision-making output and kt was an increasing variable (increasing over time) so the user can override the decision-making output.
626
M. Haddad et al.
Figure 10 shows the overall route for the wheelchair when mixing the output from AHP for setup 1 when the user wanted to move to the left with a high speed. The line shown in black was the AHP output, the line shown in grey was the user desire and the line shown in red was the overall actual route and velocity.
Overall route
AHP route
User route
Fig. 10. Overall route of the wheelchair after mixing AHP route for setup 1 and user desired route.
Figure 11 shows the overall wheelchair route for setup 2, when the user wanted to move forward at a low speed. Figure 12 shows the overall route of the wheelchair for setup 3 when the user wanted to move to the left at a moderate speed.
Overall route
AHP route User route
Fig. 11. Overall route of the wheelchair after mixing AHP route for setup 2 and user desired route.
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
627
Overall route
AHP route User route
Fig. 12. Overall route of the wheelchair after mixing AHP route for setup 3 and user desired route.
5 Conclusions The research described in this paper effectively used AHP to avoid collisions. Mathematically inexpensive, effective and safe outcomes were accomplished. The new system provided support to wheelchair drivers while driving their wheelchairs as obstructions appeared in their path and wheelchairs were driven safely round them. The work could bring some independence and decrease the need for helpers. The authors are now considering new methods for the system to analyze additional inputs by integrating various AI techniques [2, 29, 36, 37]. The general idea will be that multiple AI techniques can be applied to their maximum advantage. MCDM could not consider all situations so neuro, reinforcement or neuro-fuzzy learning could deliver efficient outcomes. These algorithms will be explored. Systems attempted to avoid obstacles but if a driver continuously indicated they desired to drive towards to a specific direction then the driver wishes could rule against the decision-making system. Driver wishes could rule against the system by holding a joystick in a fixed position. The chair drove as anticipated by the driver if nothing was sensed. The authors are currently applying the AHP and Preference Ranking Organization MEthod for Enrichment of Evaluation (PROMETHEE) method to other problems. A framework for the intelligent selection of MCDM methods has been created [38]. The authors are applying different MCDM methods to different types of problems [39–41]. Future work will consider a bigger set of choices to consider 360° around the chair. Uncertainty will be captured using probability functions, fuzzy set theory and percentages. Results show that the decision-making system worked properly. Systems will be clinically trialed at Chailey Heritage as part of an ERSPC funded project [42]. Research is now examining path modification [43], force sensing [44] and contrasting accomplishments both with the sensors and without sensors [45].
628
M. Haddad et al.
Acknowledgment. Research in this paper was funded by EPSRC grant EP/S005927/1 and supported by The Chailey Heritage Foundation and the University of Portsmouth.
References 1. Parhi, D.R., Singh, M.K.: Rule-based hybrid neural network for navigation of a wheelchair. Proc. IMechE Part B J. Eng. Manuf. 224, 11103–11117 (2009) 2. Sanders, D.A., Gegov, A., Ndzi, D.: Knowledge-based expert system using a set of rules to assist a tele-operated mobile robot. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Studies in Computational Intelligence, vol. 751, pp. 371–392. Springer (2018) 3. Sanders, D.A., et al.: Rule-based system to assist a tele-operator with driving a mobile robot. In: Lecture Notes in Networks and Systems, vol. 16, pp. 599–615. Springer (2018) 4. Sanders, D., Langner, M., Bausch, N., Huang, Y., Khaustov, S.A., Simandjuntak, S.: Improving human-machine interaction for a powered wheelchair driver by using variable-switches and sensors that reduce wheelchair-veer. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1038, pp. 1173–1191. Springer, Cham (2019) 5. Okonor, O.M., Gegov, A., Adda, M., Sanders, D., Haddad, M.J.M., Tewkesbury, G.: Intelligent approach to minimizing power consumption in a cloud-based system collecting sensor data and monitoring the status of powered wheelchairs. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 694–710. Springer, Cham (2019) 6. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.O.: Using a simple expert system to assist a powered wheelchair user. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–679. Springer, Cham (2019) 7. Bausch, N., Shilling, P., Sanders, D., Haddad, M.J.M., Okonor, O.M., Tewkesbury, G.: Indoor location and collision feedback for a powered wheelchair system using machine learning. In: 2019 IEEE SAI Intelligent Systems Conference. Advances in Intelligent Systems and Computing, London, United Kingdom, 5 September 2019, vol. 1, pp. 721–739. Springer (2019) 8. Tewkesbury, G., Sanders, D., Haddad, M.J.M., Bausch, N., Gegov, A., Okonor, O.M.: Task programming methodology for powered wheelchairs. In: 2019 IEEE SAI Intelligent Systems Conference. Advances in Intelligent Systems and Computing, London, United Kingdom, 5 September 2019, vol. 1, pp. 711–720. Springer (2019) 9. Sanders, D., Tewkesbury, G., Parchizadeh, H., Robertson, J.J., Omoarebun, P.O., Malik, M.: Learning to drive with and without intelligent computer systems and sensors to assist. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1171–1181. Springer, Cham (2019) 10. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 822–838. Springer, Cham (2019) 11. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan Sayed, M.: Learning to make intelligent decisions using an Expert System for the intelligent selection of either PROMETHEE II or the Analytical Hierarchy Process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1303–1316. Springer, Cham (2019) 12. Parhi, D.R., et al.: The stable and precise motion control for multiple wheelchairs. Appl. Soft Comput. 9(2), 477–487 (2009)
Use of the AHP to Determine the Steering Direction for a Powered Wheelchair
629
13. Nguyen, V., et al.: Strategies for human - machine interface in an intelligent wheelchair. In: 35th Annual International Conference of IEEE Engineering in Medicine & Biology Society Conference Proceedings, (EMBC), Osaka, Japan, pp. 3638–3641 (2013) 14. Haddad, M.J.M., Sanders, D., Gegov, A., Hassan Sayed, M., Huang, Y., Al-Mosawi, M.: Combining multiple criteria decision making with vector manipulation to decide on the direction for a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 680–693. Springer, Cham (2019) 15. Haddad, M.J.M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.C.: Initial results from using Preference Ranking Organization METHods for Enrichment of Evaluations to help steer a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 648–661. Springer, Cham (2019) 16. Sanders, D., Wang, Q., Bausch, N., Huang, Y., Khaustov, S.A., Popov, I.: A method to produce minimal real time geometric representations of moving obstacles. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 881–892. Springer, Cham (2019) 17. Sanders, D., Gegov, A., Tewkesbury, G., Khusainov, R.: Sharing driving between a vehicle driver and a sensor system using trust-factors to set control gains. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1182–1195. Springer, Cham (2019) 18. Sanders, D.A., et al.: Results from investigating powered wheelchair users learning to drive with varying levels of sensor support. In: Proceedings of the SAI Intelligent System, London, U.K. (2017) 19. Song, K.T., Chen, C.C.: Application of asymmetric mapping for wheelchair navigation using ultrasonic sensors. J. Intell. Wheelchair Syst. 17(3), 243–264 (1996) 20. Sanders, D., Langner, M., Tewkesbury, G.: Improving wheelchair- driving using a sensor system to control wheelchair-veer and variable-switches as an alternative to digital-switches or joysticks. Ind. Robot Int. J. 37(2), 151–167 (2010) 21. Lee, S.: Use of infrared light reflecting landmarks for localization. Ind. Robot Int. J. 36(2), 138–145 (2009) 22. Sanders, D., Stott, I.: A new prototype intelligent mobility system to assist powered wheelchair users. Ind. Robot 26(6), 466–475 (2009) 23. Larsson, J., Broxvall, M., Saffiotti, A.: Laser-based corridor detection for reactive navigation. Ind. Robot Int. J. 35(1), 69–79 (2008) 24. Milanes, V., Naranjo, J., Gonzalez, C.: Autonomous vehicle based in cooperative GPS and inertial systems. Robotica 26, 627–633 (2008) 25. Sanders, D.A.: Controlling the direction of walkie type forklifts and pallet jacks on sloping ground. Assem. Autom. 28(4), 317–324 (2008) 26. Sanders, D.: Recognizing shipbuilding parts using artificial neural networks and Fourier descriptors. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 223(3), 337–342 (2009) 27. Chang, Y.C., Yamamoto, Y.: On-line path planning strategy integrated with collision and dead-lock avoidance schemes for wheeled wheelchair in indoor environments. Ind. Robot Int. J. 35(5), 421–434 (2008) 28. Sanders, D.: Comparing speed to complete progressively more difficult mobile robot paths between human tele-operators and humans with sensor-systems to assist. Assem. Autom. 29(3), 230–248 (2009) 29. Ishizaka, A., Siraj, S.: Are multi-criteria decision-making tools useful? An experimental comparative study of three methods. EJOR 264, 462–471 (2018) 30. Sanders, D.A.: Using self-reliance factors to decide how to share control between human powered wheelchair drivers and ultrasonic sensors. IEEE Trans. Neural Syst. Rehabil. Eng. 25(8), 1221–1229 (2017)
630
M. Haddad et al.
31. Sanders, D.A., et al.: Tele-operator performance and their perception of system time lags when completing mobile robot tasks. In: Proceedings of the 9th International Conference on Human Systems Interaction, pp. 236–242 (2016) 32. Raju, K., Kumar, D.: Irrigation planning using genetic algorithms. Water Resour. Manag. 18, 163–176 (2004) 33. Haddad, M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 8(4), 333–351 (2019) 34. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008) 35. Ishizaka, A., Labib, A.: Analytic hierarchy process and expert choice: benefits and limitations. Or Insight 22(4), 201–220 (2009) 36. Gegov, A., Gobalakrishnan, N., Sanders, D.A.: Rule base compression in fuzzy systems by filtration of non-monotonic rules. J. Intell. Fuzzy Syst. 27(4), 2029–2043 (2014) 37. Sanders, D.A., et al.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: IEEE Proceedings of the SAI Conference on Intelligent Systems, London, U.K., pp. 426–433 (2018) 38. Haddad, M., Sanders, D.: The behavior of three discrete multiple criteria decision making methods in the presence of uncertainty. Oper. Res. Perspect., to be published 39. Haddad, M.J.M., Sanders, D., Bausch, N.: Selecting a robust decision making method to evaluate employee performance. Int. J. Manag. Decis. Making 18(4), 333–351 (2019) 40. Haddad, M.J.M., Sanders, D.: Selecting a best compromise direction for a powered wheelchair using PROMETHEE. IEEE Trans. Neural Syst. Rehabil. Eng. 27(2), 228–235 (2019). https:// doi.org/10.1109/TNSRE.2019.2892587 41. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M.J.M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer, Cham (2019) 42. Sanders, D., Gegov, A.: Using artificial intelligence to share control of a powered-wheelchair between a wheelchair user and an intelligent sensor system. EPSRC project 2019–2022 (2018) 43. Sanders, D.A.: The modification of pre-planned manipulator paths to improve the gross motions associated with the pick and place task. Robotica 13, 77–85 (1995) 44. Sanders, D.A.: Viewpoint - force sensing. Ind. Robot 34, 177 (2007) 45. Sanders, D.: Comparing ability to complete simple tele-operated rescue or maintenance mobile-robot tasks with and without a sensor system. Sens. Rev. 30(1), 40–50 (2010)
Methodology of Displaying Surveillance Area of CCTV Camera on the Map for Immediate Response in Border Defense Military System Hyungheon Kim1(B) , Taewoo Kim2 , and Youngkyun Cha1 1 Korea University, Seoul, Korea
[email protected] 2 Innodep Technology Laboratory, Seoul, Korea
Abstract. This paper deals with a methodology for displaying the geomagnetic direction of cameras installed in various parts of a city on a map. Since the normal camera does not have a sensor that can measure the geomagnetic direction, it does not know the direction. So, it’s not possible to draw the direction and the region being viewed on the map unlike its position. For this reason, this paper propose a methodology for acquiring the direction with operator’s feedback. The camera is set with several pan, tilt, zoom value and the operator directs which area in the map belong to the scenery from the camera. This paper established a camera environment model for parameter acquisition and presented the results of testing this model in a laboratory environment. Keywords: Geomagnetic direction · Camera direction · Operator feedback · GIS
1 Introduction Due to the low birth rate, the number of soldiers in Korea continues to decrease, and the Korean government is developing a video analysis system for defense of national boundaries. The defense boundary system developed in our project is composed of several sub-systems. To explain one by one, first one is the edge camera that captures the image and analyzes simple movements in the terminal. The second one is VMS (Video Management System), which saves the video from the camera and distribute it to the other systems. Third one is a video analytics server that detects enemies based on deep learning, and final one is a collaboration platform, which helps the agent to determine what to do and what to see by controlling cameras and giving proper information when the events occur. The proposed technique in this paper is applied to VMS accompanied by GIS (Geographic Information System) that displays the location of cameras on the map. On the other hand, a typical GIS solution only displays the location of the camera and cannot show on the map which direction the camera is steering and which area it is currently monitoring because ordinary cameras do not have sensors that can tell the geomagnetical direction they are looking. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 631–637, 2021. https://doi.org/10.1007/978-3-030-55190-2_47
632
H. Kim et al.
National boundary surveillance is a matter of defense, and you need to aware of and respond to the situation as soon as possible. When an abnormal behavior is detected by the video analytics engine, it would be very helpful to respond if it could indicate the location of the abnormal behavior on the map. For this purpose, this paper introduces a technique for displaying the camera’s control area on a map in a conventional VMS system without the adoption of special geomagnetic sensors. Similar studies include the study of estimating the direction of an object displayed on the camera through image analysis [1] or the estimation of the speed of a moving object using optical flow [2]. In [3], there is a study that calculates the camera’s control area using the effective angle of view of the camera, and calculates the effective position and number needed to control the surveillance area closely. The research in [3] can be used usefully with our work. However, among the currently proposed papers, no paper has been published regarding estimation of the geomagnetical direction of the existing camera. In Sect. 2 of this paper, we describe the proposed methodology and related geometries, symbols and terms. Section 3 describes the results of applying the proposed methodology and established an experiment to implement the methodology of Sect. 2. The conclusion is presented in Sect. 4.
2 Methodology 2.1 Background The target system is a camera that monitors borders, a VMS system that receives and stores videos from the camera, and a GIS system that displays the location of the installed camera on a map. It is assumed that when the camera’s system pan, tilt, zoom values are specified, the camera can move directly to the corresponding direction and conversely, the pan, tilt, zoom values can be read by requesting the camera. The problem is that each camera does not know its own geomagnetic direction, as shown in Fig. 1 below. Since each camera is not equipped with a geomagnetic sensor, when installed, it will face different geomagnetical directions depending on which direction it is installed. Therefore, the task to be solved in a given situation is to obtain the area the camera is monitoring when given the pan, tilt and zoom values of the camera or the desirable area is given. Figure 2 shows the geometry involved in the pre-explained situation. Where α represents a horizontal FOV (Field Of View) of the camera and β represents a vertical FOV. The h in the figure is the height of the camera, r is the horizontal distance from the camera’s installation position to the center of the camera image, and HL is the distance from the camera center point on the map to the lowest point on the map. HU indicates the distance from the camera center point on the map to the highest point on the map. 2.2 Proposed Method The proposed method is to show the user an image of which the zoom value is 1 and set an arbitrary pan and tilt value and then get feedback from the user which area of the
Methodology of Displaying Surveillance Area of CCTV Camera
633
Fig. 1. The discrepancies of the geomagnetic directions when installing same cameras
Fig. 2. The geometries regarding camera monitoring
map the monitoring area corresponds. The user indicates the center point of the camera on the map and returns to W, which is the horizontal width passing the center point. The other two values given by the user are HL and HU . Using this value, the system can calculate the unknown camera height H, and horizontal FOV, α, and vertical FOV, β using equations from (1) to (3). h = r tan θ α = 2 tan−1 β = cos−1
W 2R
(r − HL )2 + (r + HU )2 + 2h2 − (HL + HU )2 2 (r − HL )2 + h2 ∗ (r + HU )2 + h2
(1) (2) (3)
634
H. Kim et al.
This process is repeatedly done for several pan and tilt values and those are all recorded. Since information about the monitored area is given by human guesses, there may be deviations from point to point, so record all of the and refer to the α, β and h values of the point closest to the requested point by the user. Then, when the zoom value is 1 and an arbitrary pan and tilt value is set, the monitored area can be calculated using the following equations. Geomagnetic Azimuth Direction = ϕ + ϕ0
(4)
r = h cot θ
(5)
W = 2R tan α 2
(6)
HL = r − H tan(90 − θ − α/2)
(7)
HU = r + H tan(90 − θ + α/2)
(8)
On the other hand, when the zoom value is not 1, parameters related to the monitored area can be obtained by using the following equations. At this time, r value is the same as that of formula (4). WZ = W /Z
(9)
HLZ = HL /Z
(10)
HUZ = HU /Z
(11)
By using the above methodology, it is possible to convert between pan, tilt, zoom and parameters related to the monitored area using equations from (4) to (11) and to implement a system that can display the monitored area on the map as shown below (Fig. 3).
3 Implementation 3.1 A Subsection Sample The implementation environment was built in the office. Originally, CCTV (ClosedCircuit TeleVision) cameras are installed on high poles and designed to look from top to bottom. However, it is too expensive to install the pole as in the CCTV in the office, so we placed the camera on the office desk and assumed that the ceiling is the ground. Numbered post-its are attached to various points on the office ceiling and the coordinates of each point were measured and recorded. This is shown in the following figure (Fig. 4). The parameters related to each point are summarized in the following Table 1. As shown in the table, the calculated height h of the camera varies from point to point because the user’s feedback is not accurate. In the table ϕ0 is the difference between the
Methodology of Displaying Surveillance Area of CCTV Camera
635
Fig. 3. The map of camera indicating monitored region
Fig. 4. The implemented situation overview (upper left), the coordinates and positions of the post-its (lower left), the pictures of the office and post it with number (right pane)
direction of x axis on the map and the pan value of the camera. Table 2 shows the result of obtaining α and β by receiving input from the user by changing the zoom values of points 1 and 8. As shown in the table, α and β are slightly different, so the system uses the average of these values. In the proposed system, when the user wants to move the camera to a specific point on the map for more accurate PTZ (Pan, Tilt, Zoom) control, the h and α
636
H. Kim et al. Table 1. Simulation environment parameters
Point
Pan (deg)
Tilt (deg)
X (m)
1
355.95
24.81
2
105.77
3
67.06
4 6
h (m)
ϕ0 (deg)
3.46
1.60
281.87
1.35
4.45
1.79
282.29
4.72
5.61
2.03
280.65
12.21
2.79
262.91
2.66
1.67
282.51
Y (m)
r (m)
7.65
4.95
21.92
2.25
19.95
2.02
18.89
12.87
1.35
108.33
32.09
4.05
1.35
6.70
1.62
Camera
12.6
7
265.02
46.33
7.65
1.35
0.98
1.03
280.88
8
45.51
25.28
4.27
4.95
4.11
1.94
279.44
Table 2. Simulation results parameters Point
Pan (deg)
Tilt (deg)
Zoom
W (m)
H L (m)
HU (m)
α (deg)
β (deg)
1
355.95
24.81
3
1.50
0.90
1.35
72.32
139.26
8
45.51
25.28
3
2.25
1.12
2.70
91.64
141.15
8
45.51
25.28
7
0.75
0.67
0.90
72.82
135.30
and β values that are referred to obtain the parameters related to the point are the values of the position closest to the point requested. So, we used this to control the PTZ as a test point, and there was some error, but we could get enough accuracy to display the monitored area on the map.
4 Conclusion In this paper, we implemented a system that show the video of the camera to the user and get him to indicate the viewed area on the map, and then use the information to move to the point when the user later clicks on the map of interest, or to determine which area on the map the camera is currently monitoring. In general, since the feedback accuracy of user is not high, this system is implemented to use the most appropriate reference value among reference values stored. This enables indication of each camera’s monitored region on the map. The system proposed in this paper is expected to contribute to the effective defense of the national boundaries. Acknowledgments. This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (2017-0-00250, Intelligent Defense Boundary Surveillance Technology Using Collaborative Reinforced Learning of Embedded Edge Camera and Image Analysis).
Methodology of Displaying Surveillance Area of CCTV Camera
637
References 1. Gupta, P., Rathore, M., Purohit, G.N.: Detection of direction deviation of vehicle using CCTV cameras. In: 2014 IEEE International Conference on Recent Advances and Innovations in Engineering, Jaipur, India. IEEE (2014) 2. Shibata, M., Makino, T., Ito, M.: Target distance measurement based on camera moving direction estimated with optical flow. In: 2008 IEEE International Workshop on Advanced Motion Control, Trento, Italy, pp. 62–67. IEEE (2008) 3. Hsiao, Y., Shih, K., Chen, Y.: On full-view area coverage by rotatable cameras in wireless camera sensor networks. In: 2017 IEEE International Conference on Advanced Information Networking and Applications, Taipei, Taiwan, pp. 260–265. IEEE (2017)
Detecting Control Flow Similarities Using Machine Learning Techniques Andr´e Sch¨ afer(B) and Wolfram Amme Friedrich-Schiller-University, 07743 Jena, Germany {andre.schaefer,wolfram.amme}@uni-jena.de
Abstract. In this work, methods are presented that allow a comparison between control flow paths. The intended use cases for these methods are weak points and bug detection. In existing work, control flow graphs have always been compared with each other to achieve those goals. Nevertheless, vulnerabilities or bugs can be hidden in completely different contexts, i.e. in different parts of the program. Therefore, this work deals with the extraction, coding and comparison of control flow paths. This is because the path of a vulnerability or bug in which the instructions are executed is always similar.
Keywords: CFG detection
1
· Control flow graph · Comparison · Vulnerability
Introduction and Motivation
The project’s goal is to be able to search for and recognize similar control flows in a high-level control flow graph. Application of the techniques to be developed should enable the detection of vulnerabilities and bugs in programs. Vulnerabilities and bugs can be found in many programs, but not always in the same functions or program parts. Similar bugs or vulnerabilities often have similar control flow paths. For this reason, we concentrate on designing techniques that recognize or identify similar control flow paths rather than complete functions as other techniques do. Programs with comparable semantics often have similar control flows. For example, if you are looking at a program to find a certain value in an array, a loop is usually used to pass through the array and check the value with an If condition. Of course, any other statements, assignments or calls can be executed in addition to these. But the loop and the condition will always be found. In order to perform an analysis for similarity between control paths, a control flow graph must first be generated. The analysis should be performed for the Java programming language. A control flow graph could be generated from the source code, but this would limit the number of available test programs. To avoid this problem, the control flow graph is to be generated from a compiled program, i.e. from Java byte code. In later work further types of control flow graphs will c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 638–646, 2021. https://doi.org/10.1007/978-3-030-55190-2_48
Control Path Similarity
639
be tested. It would also be possible to use an abstract syntax tree enriched with control flow information, such as that provided by the abstract syntax tree of Crelier [3]. The abstract syntax tree used by Crelier has the character of a control flow graph, by integrating information about the dominator relation into the tree and by specifying the order of execution. For a node n in the tree, it always applies that the subtree described by the left node is always executed before the subtree described by the right successor node. Furthermore, the left and right subtrees are executed before n itself. In a control flow graph generated from bytecode, there are only If-, Assign-, Invoke-, Goto-, Switch-, Throw- and Return-statements. All old and new language constructs of Java are mapped to these commands, so the variance is not as large as in the original source code. Furthermore, identifier names are replaced, since they are not important for the similarity analysis of control paths. A further advantage is that it is also possible to analyze software for which the source code is not publicly available. In our procedure, we first create intra-procedural control flow graphs for the functions contained in the program, which we then use to derive the control flow paths necessary to describe the program parts we are looking for. These control flow paths are then encoded using the machine learning method Autoencoder and can then be checked for similarity using the euclidean distance measure between vectors.
2
Related Work and Techniques Already Used
There are many works that deal with the search for vulnerabilities and bugs. In the following we will briefly discuss the methods developed in recent years. In the work of David and Yahav [4], for example, similar functions are searched for in binaries. For this purpose the control flow graph is divided into basic blocks and then similar blocks are searched for by rewrite operations. In the work of Pewny et al. [7] the control flow graph of a binary was also divided into basic blocks. From these blocks a signature is created using the contained instructions. Subsequently, similar signatures are searched for in a codebase for all signatures. These two methods did not yet use machine learning methods. A more recent work by Xu et al. [9] tries to detect vulnerabilities and bugs based on the coding of metadata of the basic blocks. The properties of a basic block - such as number of instructions, type of instructions used, etc. - but also meta information such as predecessors and successors of basic blocks are used to encode the basic blocks. The basic blocks are coded individually and then functionally converted into a common measure using a simple summation. All of the methods listed so far attempt to solve the search for vulnerabilities and bugs by searching for similar control flow graphs. However, bugs or vulnerabilities could also be embedded in a completely different context frame. In order to be able to solve such situations, the method presented in this paper does not use control flow graphs to find similar program parts, but instead uses control flow paths describing the searched program code. The use of paths for the description of program parts is not new, but is first used in works by Alon et al. [1,2] for
640
A. Sch¨ afer and W. Amme
source code detection and the derivation of function names. In this work, paths within the abstract syntax tree of a program are used to find similar functions in the source code. A function is described by a set of paths to which the leaves contained in the abstract tree of the function can be linked.
3
Description of Our Methods
The soot framework [8] was used to create a control flow graph from bytecode. Soot is a Java-based framework for manipulating and optimizing Java bytecode. To create a control flow graph, every single class file was loaded and processed from a Jar file using Soot. This way a control flow graph could be generated from each function in each class file. In such control flow graphs only: If-, Assign-, Invoke-, Goto-, Switch-, Throw- and Return- instructions are contained. With the help of these graphs, an algorithm was developed that determines control flow paths and displays them in the form: assign(i0=r1.java io InputStream.read(byte[]))|if(i0!=-1)| invoke(r19.java io OutputStream.write(r20,0,i0))|goto The corresponding source code as well as the control flow graph excerpt can be found in the List 1.1 and in Fig. 1. However, only the encoding of the loop was chosen here. So far, this algorithm determines all possible paths from the beginning of the graph to each end. The content of loops is extracted and encoded separately to avoid a path explosion. public S t r i n g c r c ( InputStream i n p u t ) { i f ( i n p u t == null ) return ”” + System . nanoTime ( ) ; CRC32 c r c = new CRC32 ( ) ; byte [ ] b u f f e r = new byte [ 4 0 9 6 ] ; try { while ( true ) { int l e n g t h = i n p u t . r e a d ( b u f f e r ) ; i f ( l e n g t h == −1) break ; c r c . update ( b u f f e r , 0 , l e n g t h ) ; } } catch ( E x c e p t i o n ex ) { try { input . c l o s e ( ) ; } catch ( E x c e p t i o n i g n o r e d ) { } } return Long . t o S t r i n g ( c r c . g e t V a l u e ( ) ) ; } Listing 1.1. Source Code of the crc function in the class JniGenSharedLibraryLoader in the version 1.6.1
Control Path Similarity
641
Fig. 1. Part of the control flow graph of the crc function in the class JniGenSharedLibraryLoader of the gdx-jnigen jar in the version 1.6.1
In this form you can see the serial processing of successive statements, where the statements are separated from each other by |. In this example, you can see a loop that reads data from an Inputstream and writes it out again in an Outputstream. This happens until i0 is −1, so nothing else can be read. This path consists of an assign followed by an If condition, an Invoke and a final Goto. Once the control flow paths have been extracted and stored, coding can begin. To allow comparisons to be made and to identify advantages and disadvantages, the control flow paths were coded in two different ways. encoding 1 encodes each statement within a control flow path separately, whereas encoding 2 encodes the complete control flow path as a whole. However, the pre-processing of the control flow paths is the same for both encodings. All characters or strings are encoded using one hot encoding, and the same number is used to encode always the same strings. Thus a vector representation is obtained for the control flow paths. assign(i0=r1.java io InputStream.read(byte[]))|... assign = 11 ( = 195 i0 = 3 r1 = 17
. = 73
read=25
( = 195 byte = 71
] = 272
) = 13
= = 21
java io InputStream = 112 . = 73 ) = 13
[11 195 3 21 17 73 112 73 25 195 71 139 272 13 13 ...]
[ = 139
642
A. Sch¨ afer and W. Amme
Afterwards the machine learning procedure Autoencoder is used. An autoencoder [6] is a neural network with N input and N output neurons. However, in a hidden layer there are less than N neurons available. Now the neural network is trained using backpropagation. For each input, the input is simultaneously given as expected output. Since the network has less inner neurons available, the network learns a better representation for the data. An overview of the procedure is shown in Fig. 2. Autoencoders are used to reduce dimensions and remove noise. Exactly these two properties are also needed when searching for similar control flows. A smaller dimension increases the speed of checking, and also requires less memory when saving. The removal or softening of small differences is also important so that similar control flows in vector space are also located in similar places. This feature is very useful when using these coded control flow paths to train further applications with machine learning techniques. In addition, this coding process generates vectors of the same length, which enables the calculation of distances between vectors in the euclidean space.
Fig. 2. System overview
3.1
Encoding 1 - Individual Encoding of Each Instruction
With the encoding Sect. 3.1 each statement of a control flow path is encoded individually. As a precondition, an instruction can consist of a maximum of 20 characters or character strings. If there are fewer, the remaining digits are filled with zeros. An autoencoder is then used to learn a 12 character code for each statement. After the learning process is completed, all paths are translated with the help of the trained autoencoder. If the user now wants to search for control flow paths, he can search for individual instructions within a control flow path by calculating the distance dimension for each coded instruction.
Control Path Similarity
643
Search for: assign(i2=i19) | if(i2!=0) | invoke(i0=r1.java io InputStream.read(byte[])) Simplified example: assign(i1=i3)|assign(i1=”.exe”)| if(i1!=0)|assign(i14=25)|assign(i15=”.pdf”) |invoke(i0=r1.java io InputStreamReader.read(char[])) By euclidean distance calculation between the vectors, similarly coded instructions can be searched for. If similar equivalents are found within a control flow path for all statements searched for, a similar control flow path is found. 3.2
Encoding 2 - Control Flow Path Encoded as an Entirety
With the encoding Sect. 3.2 a complete control flow path is one hot encoded into a single vector and then also completely encoded using an autoencoder. Preconditions are that an instruction can consist of a maximum of 20 characters and the complete path can consist of a maximum of 250 instructions. All remaining digits are filled up with zeros. This results in all control flow paths being one hot encoded into a vector of size 5000. Then an autoencoder with 2000 inner neurons is trained. After the training process, all control flow paths are encoded using the autoencoder. After this process each coded control flow path has the vector length 2000 and similarities can be searched for using the euclidean distance measure.
4
Implementation and First Results
In order to test the two presented methods against each other, similar control flow paths were selected by hand from the open source framework libgdx in version 1.6.1 from the jar gdx-jnigen1 . As an example the source code for the crc function can be found in the List 1.1. The relevant part of the function is the While-true loop. In this loop something is read, it is checked if it was successful and if so, a function is called. A part of the control flow graph generated by the crc function can be found in the image at Fig. 1. The crucial instructions are marked in red and by rounded boxes. A total of 5 similar loops within the Jar could be identified by hand. The 5 loops are represented as paths as follows: 1. write - function assign(i0=r1.java io InputStream.read(byte[]))|if(i0!=-1)| invoke(r19.java io OutputStream.write(r20,0,i0))|goto 2. crc - function assign(i1=r1.java io InputStream.read(byte[]))| if(i1!=-1)|invoke(r2.java util zip CRC32.update(r3,0,i1))|goto 1
https://github.com/libgdx/libgdx/tree/master/extensions/gdx-jnigen.
644
A. Sch¨ afer and W. Amme
3. extractLibrary - function assign(i0=r35.java io InputStream.read(byte[]))| if(i0!=-1)|invoke(r5.java io FileOutputStream.write(r6,0,i0))|goto 4. readString - function assign(i0=r21.java io InputStreamReader.read(char[]))| if(i0!=-1)|invoke(r2.java lang StringBuilder.append(r22,0,i0))|goto 5. readBytes - function assign($i3=i0+i5)|assign($i2=i1-i5)|assign(i6=r2.java io InputStream .read(byte[],int,int))|if(i6>0)|assign(i5=i5+i6)|goto As can be seen, the paths are not completely identical. There are bigger and smaller differences. readBytes for example has 2 more assigns at the beginning of the path. Furthermore, no function call is made before the Goto, but an assign is made. readString uses InputStreamReader instead of InputStream, as the other function sections do. Of course there are much more differences than these. Nevertheless, these paths have a certain obvious similarity. In order to be able to perform a test with the 5 determined paths using the approaches from Sect. 3, the complete gdx-jnigen jar was preprocessed, i.e. all paths were extracted. These paths were pre-processed by using one hot encoding and then an autoencoder was trained for the approach from Sect. 3.1 and Sect. 3.2. The path of the write function was searched for. assign(i0=r1.java io InputStream.read(byte[]))|if(i0!=-1)| invoke(r19.java io OutputStream.write(r20,0,i0))|goto
Table 1. Evaluation of the test run Method
Similarity measure Found/tp (similarity)
Not found/fn More found/fp
Sect. 3.1 0.22
crc extractLibrary readString
readBytes
4
Sect. 3.1 0.31
crc extractLibrary readBytes resdString
0
11
Sect. 3.2 0.059
crc (0.006) extractLibrary (0.0001) resdString (0.056)
readBytes
0
Sect. 3.2 0.085
crc (0.006) extractLibrary (0.0001) readBytes (0.076) readString (0.056)
0
50
Control Path Similarity
645
The results are shown in Table 1. Since the method Sect. 3.2 compares the whole path, the similarity measure between the searched and found path can be given here. With method Sect. 3.1, each statement is compared individually and only if all statements are below the similarity measure, the path is considered found. For this reason no uniform value can be specified here. As already expected, readBytes is the least similar to write. Exactly this is reflected in the values of the similarity measures. A comparison of the two methods with the same distance dimensions makes no sense, because the methods work too differently. Therefore a comparison of the methods is best done by comparing the similar control flows/ true positive (tp) and false positive (fp) control flows found. If all control flows of Sect. 3.2 should be found, then 50 additional functions with similar control flows are also found. This is due to the fact that readBytes is quite different. However, if it is satisfactory that readBytes is not found, then Sect. 3.2 will not find any other functions with similar control paths, which means that the number of falsely positive recognized paths is 0. Procedure Sect. 3.1 on the other hand finds 11 false positives at a forced true positive rate of 100%. If you omit readBytes again, only 4 false positives are found. From this it can be concluded that procedure Sect. 3.1 puts more emphasis on the individual elements of the path, but less emphasis on the whole. Procedure Sect. 3.2, on the other hand, is more forgiving of small deviations and more generalized.
5
Conclusion and Further Work
In summary, it can be said that a comparison between control flow paths is possible with the methods from Sect. 3.1 and Sect. 3.2. The method from Sect. 3.2 even gives an exact distance measure. In future work, these methods will be tested and optimized on larger scenarios or a much larger data base. The generated codings are very well suited to serve as input for further machine learning methods. Thus, further features can be added to increase the accuracy. With this work, however, the basic correct way of working could be confirmed. The goal was to determine the similarities between control flows using machine learning and to confirm the basic correct way of working. There are some very interesting applications for comparing control flow paths. For example, you could search for the critical paths of malware in other programs to find more possibly mutated malware. However, in order for these methods to be used effectively for such an application purpose, a way to determine the important/critical paths of programs automatically is needed. To find possible procedures, Malware is therefore analyzed in further tasks. Here, a way to determine these important/critical paths is to be found. A first naive improvement could be random walks, as in the work of DeFreez et al. [5] demonstrated. Furthermore, only intraprocedural paths have been used in the procedures described so far. This is of course a disadvantage for the application area of Malware detection, which is why the procedure must be further extended to include interprocedural control flow graphs.
646
A. Sch¨ afer and W. Amme
References 1. Alon, U., Brody, S., Levy, O., Yahav, E.: Code2seq: generating sequences from structured representations of code, p. 22 (2019) 2. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: Code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019) 3. Crelier, R.: OP2: A Portable Oberon–2 Compiler, p. 10 4. David, Y., Yahav, E.: Tracelet-based code search in executables. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2014, Edinburgh, United Kingdom, pp. 349–360. ACM Press (2013) 5. DeFreez, D., Thakur, A.V., Rubio-Gonz´ alez, C.: Path-based function embedding and its application to error-handling specification mining. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018, Lake Buena Vista, FL, USA, pp. 423–433. ACM Press (2018) 6. Le, L., Patterson, A., White, M.: Supervised autoencoders: improving generalization performance with unsupervised regularizers, p. 11 7. Pewny, J., Schuster, F., Bernhard, L., Holz, T., Rossow, C.: Leveraging semantic signatures for bug search in binary programs. In: Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC 2014, New Orleans, Louisiana, pp. 406–415. ACM Press (2014) 8. Vall´ee-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam, P., Sundaresan, V.: Soot: a Java bytecode optimization framework. In: CASCON First Decade High Impact Papers, CASCON 2010, Toronto, Ontario, Canada, pp. 214–224. ACM Press (2010) 9. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security CCS 2017, pp. 363–376 (2017). arXiv: 1708.06525
Key to Artificial Intelligence (AI) Bernhard Heiden1,2(B) and Bianca Tonino-Heiden2 1
University of Applied Sciences, Villach, Austria [email protected] 2 University of Graz, Graz, Austria http://www.cuas.at
Abstract. The paper is bridging the Quantum Mechanical (QM) observer - reality relationship with regard to time and decision processes. A key to intelligence is given by at least three nature universal relation sketches: (a) The solution of the observer-reality is given as a potential to reality emergence shift by means of realisation through the observing or the observer. (b) The decision process of emergence is leading to the irreversibility of the time arrow, which can be related to directed or undirected graphs, back-relating to QM systems. (c) The featuring as intelligence process of the bidirectional or strongly coupled system relates to a basic intelligence system consisting of question and answer related to communication processes, as their intrinsic key process. As a general conclusion, an outer evolutive circle keys up the enclosing intelligently. Keywords: General Artificial Intelligence · AI · Time · Time-process · Feature · Feedback-process · Graph-theory Observer-influence · Information · Intelligence · Informational-Knot-Matrix-Method · IKMM
1
·
Introduction
In this work the foundations of Artificial Intelligence (AI) are reviewed and looked for their common features, that are inherent to them, leading to theoretical viewpoints as well then, as a consequence, as practical applications. But there are different theories available, more or less suitable for the specific application. Whereas linear theories are simple to understand, nonlinear theories are mostly difficult to handle and distracting because of the multiplicity of results. Hence a common method, till the beginning and throughout most of the last century was to “linearise” phenomena, by means of snipping out a small part of “reality,” and regarding it then as linear because they are then analytical solvable. This has lead to a first success story of analytical programs like that of Russell [1], leading to the conclusion that everything in the world can be explained logically. Though not regarded as overwhelmingly successful, there remain important parts from this area. There are at least two famous successors of Russell, Spencer Brown, and Ludwig Wittgenstein. Spencer Brown originated a new arithmetic theory c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 647–656, 2021. https://doi.org/10.1007/978-3-030-55190-2_49
648
B. Heiden and B. Tonino-Heiden
staying in the realm of deducibility [2]. Wittgenstein was a pre-thinker of natural language processing, emphasizing the fact that natural language is sufficient for intelligent processes also with respect to machines (compare also [3]). In the 1970’s the Prolog program was implemented (see e.g. [4,5]), leading to the recognition that logical programming seems to be more efficient in programming for information processes like is state of the art in AI. With respect to this work, after clarifying the purpose of it, a deeper background, and significance to the later derived conclusions are given, trying to bridge fundamental and widely different disciplines. This is done by introducing the Informational-Knot-Matrix-Method (IKMM), which has properties, that can be applied to different diverse application fields like Graph Theory (GT), Quantum Mechanics (QM) and others. Three consequences are deduced (a) the observer reality relationship, (b) the time arrow implications to information, and (c) the featuring properties of higher order information processes for incomplete knowledge systems. Finally, a Key to AI is depicted symbolically.
2
Purpose
The purpose of this article is to make plausible how some special fundamental natural properties - inherent in self-reference - lead to AI, Intelligence, and selfaware processes and how this is linked together with these key properties as a core. This leads then to applications that are inherently linked to information processes, and those have as a consequence special “features” that transcend them and make them increasingly useful in the global order, ordering and self ordering processes.
3
Background and Significance
AI is embedded in a series of disciplines. It grew out of the digital age, beginning with the development of modern computer systems which are based on the von Neumann architecture and which were developed in a frame-work around the Macy Conferences [6] and especially Norbert Wiener [7]. In the last years the meaning of information is increasing, beginning with information theory by Shannon [8] to the actual discussion of AI-development and its basic problems discussed e.g. in [9]. One major problem can be regarded as that of the difference between natural and artificial intelligence and in special with the notion of consciousness [10]. So a main part of the mind body discussion since Descartes is nowadays that at least there exists the phenomenal consciousness which makes the minimal difference between AI and human body intelligence. But there are also arguments against this, which deny the existence of consciousness. In this context our approach can be regarded as functionalistic and hypercritical, meaning that the physical realm is anyhow dominating, leading to a human machine-mechanism interpretation according to a biocybernetic machine concept. Humans are at least or can be, very smart machines.
Key to AI
649
To understand this can help to make ourselves better machines by means of AI. Therefore it is of eminent importance to understand the continuum of natural and artificial intelligence which is basing on information flows.
4
Informational-Knot-Matrix-Method
For the purpose of the analysis, a new method is introduced shortly in its basics, which leads immediately to the conclusions intended. In physics and chemical engineering block-diagrams can describe mass- and energy balances, that can be translated into matrices which translate into balance-equations in a matrix form. This equation set can be solved, yielding the steady state solution, or by introducing the differential quotients, for the e.g. time dependent cases [11]. But what is when the process of balance equations is also applied additionally to information processes? In this case, there exists no balance condition in the ordinary sense, but these can be regarded and depicted as an information flow (compare also [12]). For the case of solving mass, energy, and information equations in one systematic way, the geometry-algebra transformation problem in this paper introduces the Informational-Knot-Matrix-Method (IKMM) for general finding the equations, described in a block-diagram schema of an informational or control system. The illustration is given for a simple back coupled system leading to surprising results with the property of a new dimension of freedom emergence. 4.1
Balancing in Chemical Engineering by Means of Computer Systems
In chemical engineering, there exists a method for providing balance equations of chemical devices (e.g. mixer, splitter, distillation apparatus, etc.) by means of transformation [11]. By means of this equation based method, opposed to a sequential method, all equations of a system are solved simultaneously, which can be done with a computer algebra program for example. The matrix equation for such an equation system can be written according to is the Knot-Matrix, U is the Stream-Matrix, V is the AnalysisEq. (1), where K Matrix, and Z is the Formation-Matrix. ·U ·V =Z K 4.2
(1)
Informational-Knot-Matrix-Method (IKMM) for ‘Stationary’ Control Systems
The method above from chemical engineering balance modelling has to be adapted for control or information systems by means of several premises:
650
B. Heiden and B. Tonino-Heiden
1. The Knot-Matrix that consists of 0/1 has to be replaced by the implementation of transfer functions Gi (i = 1..n) of the informational or control system block diagram. 2. The knots that contain information streams instead of ‘material’ streams have to contain the net sign of the intended direction of the arrows in the block-diagram. This is then the Informational-Knot-Matrix K. The rest is analogous to the Knot-Matrix-Method of chemical engineering systems (see [11]). The information above the block connector lines represents the stream variables. In this case, they are not material streams but signal streams. Contrary to material streams, they do not add in value, when they are added in a knot, and they do not become less when a knot is divided into parts, as the information, respective the signal, does not change by this operation. As a remark can be noted that the Informational-Knot-Matrix-Method can also be described in the notion of the Graph Theory (compare also [13]). Here an informational system can be depicted with a block diagram according to Fig. 1, which denotes a “directed graph.” According to [13, p. 8f] this is then also a swirl in the case of directed graphs and a circle or circular for undirected graphs. 4.3
Informational-Knot-Matrix-Method (IKMM) - Degrees of Freedom
As a first question there arises, what does this mean that an information stream is splitting? At first, it can be stated, that when the balance is made around such a knot, everything that goes into the knot, also goes outside the knot. This means that the information stays constant. Even if a splitting occurs. That means the system produces a redundancy, i.e. there is no additional information gained, but something different. There is also no additional degree of freedom, at first sight. But it makes a difference for sure. It seems that this redundancy information, is decisive, as it is quasi a pre-decision or an environment for a decision. That is not nothing, but it is something abstracted from ‘reality’ or in the case of a control or information system of effectiveness. It can be concluded that it is the prerequisite of a ‘new’ or innovative decision. It could be regarded, with respect to an informational connected social system as ‘downloading’, which was denoted e.g. from Scharmer [14] in the theory U. With regard to any network, there are two apparent possibilities: There forms firstly a new branch, or there is secondly formed a feedback loop. In each case, something new can emerge, just because of the mirror property. But let us look at the difference to ‘material’ streams. Why is it different there? Material streams split, and they get necessarily less. Informational streams split and they stay the same. This fundamental difference is because they have a different meaning. When we duplicate the property of information splitting, in material streams, then we would have to copy the stream, with a twin of him. We would need a material source which means that a creation process would have to take place. That is also valid vice versa in the informational
Key to AI
651
process. The mirror information is created. But by definition, the same information, is not new information, as it is the same. That means that the notion information is different with regard to the creation of information or itself information. In other words, information is back coupled with an earlier version of itself. Or with the fifth main sentence of thermodynamics [15], it is of higher order compared to an earlier stage of itself. Concluding with the theory U of Otto Scharmer very clearly, the information knot division or the information knot multiplication is the creation of a new cycle of innovation, beginning with mirroring i.e. observing and downloading or copying this information. When this cycle leads to a feedback loop, then new information is created real (0/1) (compare also [16]) or to formulate it more stringently - as a reality. The observer creates reality. Here there becomes apparent the importance of Berkeley’s “esse est percipi” [17, p. 26 §3]. 4.4
Calculation Example
There shall be solved the example according to Fig. 1 with the above introduced out IKMM. For this, it shall be (1) formed the Informational-Knot-Matrix K G(s) X1 (s)
U(s) +
V(s) G1 (s)
-
X2 (s)
G2 (s)
Fig. 1. Feed-back loop of first order
of Fig. 1, (2) done a degree of freedom analysis and (3) calculated the transition (s) , (compare e.g. [18]) or transfer function G(s) of the whole system G(s) = VU (s) as a function of G1 (s) and G2 (s). Solution. In Fig. 1 knots and streams are identified. First there exist three knots: (1) U(s), X1 (s), X2 (s), (2) X1 (s), G1 (s), V(s), and (3) V(s), G2 (s), X2 (s). There are also four streams: (1) U(s), (2) X1 (s), (3) X2 (s), and (4) V(s). The knots for the system in Fig. 1 are depicted in Fig. 2. together with the The knots put into the Informational-Knot-Matrix K Stream-Vector U give Eq. (2). The equation that follows from the first row in the Information/Control-Knot-Matrix multiplied by the Stream-Vector is identical
652
B. Heiden and B. Tonino-Heiden
Fig. 2. Knot analysis of the system in Fig. 1
with the “signal-balance” around the first knot in Fig. 2: U (s) − X1 (s) − X2 (s) + 0 = 0. ⎛ ⎞ ⎧⎛ ⎞ U (s) ⎨ 1 −1 −1 0 ⎜ ⎟ ⎝0 G1 (s) 0 −1 ⎠ · ⎜X1 (s)⎟ = 0 Knots (Rows) (2) ⎝X2 (s)⎠ ⎩ 0 0 −1 G2 (s) V (s)
Streams (Columns)
Stream-Vector (U ) Informational-Knot-Matrix (K)
When now another equation is implemented, that of the whole transfer function of the system U (s) · G(s) = V (s), then there results a fourth knot U (s), G(s), V (s). If now the system in Eq. (2) is supplemented then for the equation system of the streams follows Eq. (3). ⎛ ⎞ ⎛ ⎞ G(s) 0 0 −1 U (s) ⎜ ⎜ ⎟ −1 −1 0 ⎟ ·U =⎜ 1 ⎟ ⎜X1 (s)⎟ K (3) ⎝ 0 G1 (s) 0 −1 ⎠ · ⎝X2 (s)⎠ = 0 0 0 −1 G2 (s) V (s) is solved according to Eq. (4). Now the determinant of the matrix K K = G(s) − G1 (s) + G(s) · G1 (s) · G2 (s) = 0
(4)
Solved to G there results Eq. (5). G(s) =
G1 (s) G1 (s) · G2 (s) + 1
(5)
In Eq. (5) it can be seen, that the outer equations are dependent on the inner is set to 0, then the rank reduces at least ones. When the determinant of K
Key to AI
653
by 1. That means that there are maximal possible three degrees of freedom. A function with three independent variables as in Eq. (5) fulfills this. Each variable corresponds to a degree of freedom. For the modelling it is of importance that these are independent of each other, at least they can be regarded as such in an approximation. This means fundamentally that they are ‘discriminable’ or that they can be distinguished and that they are sufficiently different. This also means that they have an individual “Eigenleben” (something like “Living on their own”). The individual equations are obtained by evaluating Eq. (3) into Eq. (6): ⎛ ⎞ G(s) · U (s) − V (s) ⎜ ⎟ ·U = ⎜U (s) − X1 (s) − X2 (s)⎟ = 0 (6) K ⎝ G1 (s) · X1 (s) − V (s) ⎠ G2 (s) · V (s) − X2 (s) ·U = 0, can be solved numerically or In general, the equation system K algebraically, e.g. with a Computer-Algebra program. This results in four equations for seven unknowns G(s), G1 (s), G2 (s), U(s), V(s), X1 (s), X2 (s). The system of equations has four degrees of freedom. If the function contains the single functions, the number of degrees of freedom is being equal to 0. reduced by the feedback. This leads to the determinant of K G1 (s) In this case, it is also identical with G(s) = G1 (s)·G2 (s)+1 . It can be seen from this that the feedback reduces the degree of freedom by one and this increases the internal degree of freedom of the system. G1 (s) 1 (s)·U (s) , V (s) = G1G(s)·G , The four solution equations are G(s) = G1 (s)·G 2 (s)+1 2 (s)+1
U (s) 2 (s)·U (s) X1 (s) = G1 (s)·G , and X2 (s) = GG1 (s)·G . The degree of freedom can, 2 (s)+1 1 (s)·G2 (s)+1 therefore, be interpreted as corresponding to the number of different parameters in the solution vector. This is for the first solution three and for the remaining four. The degree of freedom of the symbolic solution, in this case, is also four because a system of equations is solved with four unknowns. Obviously, such a system increases the system order, and the nonlinearity increases. One could say then that the system makes an emergent order jump (compare also [15]). The order changes from an outer order to an inner order - a self-order. This can be regarded as the basic principle of all self-organizing (SO) and hence intelligent systems.
5
Key to AI
When we generalize the IKMM method, especially in the point that there arises in the informational balance equations an increase or decrease in freedom, which means that multiple or non-linear solutions emerge in the case of the backcoupling process. (a) Applied to a QM system (compare e.g. [19–21]) the observer-system relationship is according to Rhunau, regarded the influencing of the quantum mechanical systems by the observer or observing system still not clear by its
654
B. Heiden and B. Tonino-Heiden Unbewusstsein / Unconsciousness potential-space locked-area
Bewusstsein reality-space unlock-area
Fig. 3. Key to AI - the observing ring of reality generation unlocks the enclosing part, analogously to a full body scanner.
nature. The above given deduction of the freedom increase, or decrease, as a direct consequence of feedback or not, and by this process, observation or also redundancy, information multiplication is a result of dimensional emergence, and hence freedom generation, from an outer towards an inner or self control. This process is depicted in Fig. 3 as the Key to AI, the Key to intelligent processes of nature. The outer becomes to an inner, and by means of this creating it emergentically. With regard to QM systems, the observing creates reality from a potential nexus which is possible by the Schr¨ odinger equation to a realisation through an observer decision. This can be consciousness or any other back-coupling process, e.g. an AI-system, as well. In any case, the unlock-area is created by the informational observing process. For QM systems, e.g. the Pauli Principle, that postulates that different QM states like e.g. the spin of electrons can be linked to each other by means of entanglement. The Einstein Podolski Rosen Experiment [19] then had related these experiments to the question of synchronicity of the velocity of light as the highest speed in the universe. The experiments allow in principle for both. Synchronicity means de facto, that time does not exist for this system. From the above given it could be followed, that time, in an entangled QM state, has not yet emerged from the potential room, and hence is in the locked-area or unobserved state. At the moment a measurement device, or an observer in general, or a back-coupling takes place, the reality, and by this, the irreversibility of time emerges. This leads to (b) that time is an observable like others that can emerge by realisation. In fact, the unidirectionality of time is in question. This can be seen as a directed graph, and a directed graph is a graph whose realisation is given a direction, in opposition to the undirectedness of Schr¨ odinger’s ψ-Function. That means the direction of the time arrow is a result of that the world is being observed with regard to time in the “normal case.” Deviations from this may result in different times in consciousness according to the “Zeit-Gestalten” (Time-Shapes) from Ruhnau e.g., that form consciousness, or of equivalent states of AIs. Finally (c) there can be concluded that intelligent or observing systems, that develop order or higher order functions self-organisationally, can be regarded
Key to AI
655
as a back-coupled system or feed-back system, which creates potential order by means of bidirectionality. This means that first for an intelligent system in general and an AI system in special, there has to be generated reality by means of observation. There has to be a flow - followed by back flow. This is a bidirectional informational flow, that can also be described with graph, process, or system theory as strongly coupled. This process is intrinsic in modern informational systems. E.g. in Prolog there is a reality generating AI-process by means of communication: Questioning and Answering. Ask yourself which companies nowadays use these features. Which comes into your mind? Hence this is core and key feature of AI - A strong informational coupling of information by means of observer-actor systems.
References 1. Whitehead, A.N., Russell, B., G¨ odel, K.: Principia Mathematica, 10th edn. Suhrkamp Verlag (2018). 169 p 2. Spencer-Brown, G.: Laws of Form. Bohmeier Joh. (2008). 216 p. ISBN 3890945805 3. Wittgenstein, L.: Philosophical Investigations (in German: Philosophische Untersuchungen), 9th edn. Suhrkamp Verlag AG (2019). 300 p. ISBN 3518223720 4. Warren, D.H., Pereira, L.M., Pereira, F.: Prolog the language and its implementation compared with LISP (2015) 5. Sterling, L., Shapiro, E.: The Art of Prolog. MIT Press, Cambridge (1994) 6. Pias, C., Vogl, J. (eds.): Cybernetics - Kybernetik, The Macy - Conferences 1946. diaphanes, Z¨ urich, Berlin (2004) 7. Wiener, N.: Kybernetik: Regelung und Nachrichten¨ ubertragung im Lebewesen und in der Maschine. Cybernetics or control and communication in the animal and the machine (deutscher Originaltext). Econ Verlag (1963). 287 p 8. Weaver, W., Shannon, C.E.: Mathematical Theory of Communication. Combined Academic Publishers (1963). 144 p. ISBN 0252725484 9. Floridi, L. (ed.): Philosophy of Computing and Information. Automatic Press/VIP, United States of America and United Kingdom (2008). 204 p. ISBN 8792130097 10. Metzinger, T., et al.: Consciousness - Contributions of Present Philosophy (in German: Bewusstsein: Beitr¨ age aus der Gegenwartsphilosophie), p. 792. Mentis/Sch¨ oningh (1996). ISBN 3506755137 11. Schnitzer, H.: Grundlagen der Stoff- und Energiebilanzierung. Vieweg Verlag, Braunschweig (1991) 12. Heiden, B., Tonino-Heiden, B., Decleva, M.: Towards a Wittgensteinean ladder for the virtual classroom. In: Innovationskongress 2019, Villach, Austria (2019) 13. L¨ auchli, P.: Algorithmische Graphentheorie (Programm Praxis) (German Edition). Birkh¨ auser (1991). ISBN 3764326638 14. Scharmer, O.: Theorie U. Von der Zukunft her f¨ uhren: Presencing als soziale Technik. 2. Aufl. Carl-Auer Verlag (2009) 15. Heiden, B., Leitner, U.: Additive manufacturing – a system theoretic approach. In: Drstvenˇsek, I., et al. (eds.) ICAT 2018, Maribor, 10–11 October, pp. 136–139. Interesansa - zavod, Ljubljana (2018). ISBN 978-961-288-789-6 16. Heiden, B.: Wirtschaftliche Industrie 4.0 Entscheidungen - mit Beispielen - Praxis der Wertsch¨ opfung. Akademiker Verlag, Saarbr¨ ucken (2016) 17. Berkeley, G.: Eine Abhandlung u ¨ ber die Prinzipien der menschlichen Erkenntnis. Felix Meiner Verlag, Hamburg (1979)
656
B. Heiden and B. Tonino-Heiden
18. Große, N., Schorn, W.: Taschenbuch der praktischen Regelungstechnik. Carl Hanser Verlag, M¨ unchen Wien (2006) 19. Einstein, A., Podolsky, B., Rosen, N.: Can quantum-mechanical description of reality be considered complete? Phys. Rev. 47, 777–779 (1935) 20. Schr¨ odinger, E.: Space-Time Structure. Cambridge University Press, Cambridge, New York (1950) 21. Ruhnau, E.: Zeit-Gestalt und Beobachter Betrachtungen zum tertium datur des Bewusstseins. In: Bewusstsein. Beitr¨ age aus der Gegenwartsphilosophie, pp. 201– 220. Mentis/Sch¨ oningh (1996). ISBN 3506755137
US Traffic Sign Recognition Using CNNs W. Shannon Brown, Kaushik Roy(B) , and Xiaohong Yuan Department of Computer Science, North Carolina A&T State University, Greensboro, NC 27405, USA [email protected], [email protected]
Abstract. Traffic Sign recognition is the technology that gives a vehicle the ability to recognize everyday traffic signs that are put on the road. Detection methods are usually classified as color based, shape based, and learning based methods. Recently, Convolutional Neural Networks (CNN) have shown to become the popular solution to image recognition problems. Thanks to the quick execution and high recognition performances the CNNs have greatly enhanced many computer vision tasks. In this paper, we propose a traffic sign recognition system by applying CNN architectures on the LISA traffic sign dataset. Keywords: Deep learning · Convolutional Neural Networks · Traffic sign recognition
1 Purpose Driving is a part of many people’s daily routine. Whether you are commuting to work or running errands, vehicles are an integral piece of getting these tasks accomplished. When driving there are many decisions one has to make when operating a vehicle. We need to control how fast or slow the vehicle is going along with navigating the road in accordance to the rules of the road [1–3]. We interpret these rules by looking at the traffic signs that we see on the roads around us. With the many innovations around transportation such as cars that park autonomously and lane detection, we are on the verge of a new frontier of how we will move throughout the world. It is expected that autonomous vehicles will be on our everyday roads as soon as the year 2021. Researchers have been working to improve the performance of autonomous vehicles [4]. The better the accuracy we can achieve of recognizing traffic signs, the better the autonomous systems that we will be able to create. We can assume that better autonomous systems will equate to more of an opportunity for the public to adapt to this new frontier.
2 Background Approaches that we have found of published materials on traffic sign recognition range from color based techniques to shape based techniques of machine learning based methods [1–6]. Deep Neural Networks are trending in the industry when it comes to pattern © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 657–661, 2021. https://doi.org/10.1007/978-3-030-55190-2_50
658
W. S. Brown et al.
recognition and computer vision research [5, 6]. Color based approaches are very common with this problem, using the segmentation technique on features such as RGB. The shape-based method is also another popular approach in regards to the traffic sign recognition problem. One of the biggest challenges that we face in this field of research is the lack of publicly available traffic sign datasets. Most of the publicly available datasets focus on European traffic signs. In this research, we utilize the LISA traffic sign dataset, which contains a set of videos and annotated frames containing US traffic signs [7]. This paper applies deep learning approaches to improve the performance of the traffic sign recognition performance.
3 Method In this work, we propose a traffic sign recognition system using the modified VGG net architecture [8]. This model executes classification of traffic sign images through a Convolutional Neural Network (CNN) [5]. We apply two different CNN models, namely CNN Model V1 and CNN Model V2. 3.1 Dataset Used and Preprocessing This paper will focus on the LISA traffic sign dataset [7] as mentioned earlier. Our reasons for choosing this dataset are because the LISA dataset, though it is a small dataset, it is the largest that contains all United States traffic signs. The LISA dataset contains sign samples with different resolution and image distortions that were extracted from 1-s along with 47 classes. Samples of the different traffic sign samples are RGB annotated images whose sizes vary. During the pre-processing stage, all the samples are downsampled or upsampled to 32 × 32 pixels, and then converted to grayscale images to eliminate color information and focus on shape features. 3.2 CNN Models Used First, we apply CNN Model V1 which utilizes 3 convolutional layers and 1 dense layer with the ReLU activation function. We started with this architecture to get a basic understanding of how to use the TensorFlow library on our chosen datasets. Because this was our first attempt implementing a CNN, we did not expect to achieve a promising performance. Next, as illustrated in Fig. 1, we apply the CNN Model V2 to address some of the issues that occurred in our first model. With having such low results in our first run, we thought about the possible adjustments that we could make to get more accuracy in the next model. We looked over our model and found two key features that needed to be changed. As mentioned before, we were using ReLU in the final layer of our model V1. The reason why ReLU isn’t useful for classification is because we want an output within a certain constraint, whether it would be binary or multiclass. For ReLU, we will obtain output that ranges from zero to infinity. In CNN model V1, we used a binary loss function which did not align with the problem we were trying to solve. We have 47 classes that we use in this problem while a binary problem only works with two
US Traffic Sign Recognition Using CNNs
659
classes. We then switched to using a sparse categorical loss function. This function will compare the distribution of the predictions (the activations in the output layer, one for each class) with the true distribution, where the probability of the true class is set to 1 and 0 for the other classes. To put it in a different way, the true class is represented as a one-hot encoded vector, and the closer the model’s outputs are to that vector, the lower the loss. This model had an architecture where we used three convolutional layers and two dense layers that were used instead of 1. We concluded in this test that we saw a spike in test accuracy results because instead of using ReLU as an activation function we used softmax which is good for use in classification problems.
Fig. 1. CNN Model V2 architecture.
4 Results For the CNN V1 model, 80% samples were used for training and 20% samples were used for testing. We were able to achieve a 24% testing accuracy on the LISA dataset. We concluded that we saw such poor results because we were using only one dense layer and in that layer the ReLU activation function is not the best to be used for classifying inputs. Again, 80% samples were used for training and 20% of the data was used for testing for Model V2. Figure 2 illustrates the results that we were able to achieve a 99%
660
W. S. Brown et al.
testing accuracy on the LISA dataset. Table 1 shows the results obtained using two CNN models employed in this research. We trained our V2 model over 100 epochs where we observed a highest training accuracy of 100% at the 65th epoch and 94% validation accuracy on the 44th epoch.
Fig. 2. Training and test accuracies of CNN V2 model for different epochs.
Table 1. Model results on the LISA dataset. Version LISA V1
24%
V2
99%
In order to test the accuracy of our best model, CNN Model V2, in a noisy situation, first, we add Salt & Pepper to the LISA dataset. We still observed 99% accuracy on the dataset where all images were grayscaled. On the test where we used RGB color data we attained 98% accuracy. We assume the 1% drop exists because of color similarities amongst different classes. Next, we added Gaussian noise to the dataset, and applied our CNN V2 model. We also observed 99% accuracy on the dataset where all images were grayscaled. On the test where we used RGB color data we attained 99% accuracy. Here we assume that the color, or lack thereof, does not make a difference when using gaussian noise. Table 2 shows the results achieved on noisy data.
US Traffic Sign Recognition Using CNNs
661
Table 2. Table of results of noisy data. Noise
LISA
Salt & Pepper
99%
Salt & Pepper (Color) 98% Gaussian
99%
Gaussian (Color)
99%
5 Conclusion In this paper, we present a CNN approach to classifying traffic signs on the LISA traffic sign dataset. We use the TensorFlow library to test different CNN models to see which layers give the best test accuracy. We first wanted to be able to successfully classify traffic signs and second we wanted to see if we could achieve high accuracy when noise was added to the dataset. We were able to achieve 99% accuracy with our CNN model V2. When we added noise to our images we were able to maintain the 99% accuracy standard that we had with the non-noisy dataset. To the best of our knowledge, there is not another scientific paper that uses salt & pepper or Gaussian noise in the traffic sign classification spectrum. Acknowledgments. We would like to acknowledge the support from the Center of Excellence in Cybersecurity Research, Education and Outreach, North Carolina A&T State University.
References 1. Ruta, A., Porikli, F., Watanabe, S.: In-vehicle camera traffic sign detection and recognition. Mach. Vis. Appl. 22(2), 359–375 (2016) 2. Ruta, A., Li, Y., Liu, X.: Real-time traffic sign recognition from video by class-specific discriminative features. Pattern Recogn. 43(1), 416–430 (2010) 3. Berkaya, S., Gundu, H., Ozsen, O., Akinlar, C., Gunal, S.: On circular traffic sign detection and recognition. Expert Syst. Appl. 48, 67–75 (2016) 4. Makaremi, M., Lacaule, C., Mohammad-Djafari, A.: Deep learning and artificial intelligence for the determination of the cervical vertebra maturation degree from lateral radiography. Entropy 21(12), 1–24 (2019) 5. Arcos-García, Á., Alvarez-García, J.A., Soria-Morillo, L.M.: Deep neural network for traffic sign recognition systems: an analysis of spatial transformers and stochastic optimisation methods. Neural Netw. 99, 158–165 (2018) 6. Lim, K., Hong, Y., Choi, Y., Byun, H.: Real-time traffic sign recognition based on a general purpose GPU and deep-learning. Plos One 12(3), 1–22 (2017) 7. Møgelmose, A., Trivedi, M., Moeslund, T.: Vision based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans. Intell. Transp. Syst. 13(4), 1484–1497 (2012) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Grasping Unknown Objects Using Convolutional Neural Networks Pranav Krishna Prasad(B) , Benjamin Staehle(B) , Igor Chernov(B) , and Wolfgang Ertel(B) Institute of Artificial Intelligence, Ravensburg-Weingarten University, 88250 Weingarten, Germany {Pranav.KrishnaPrasad,chernovi,ertel}@hs-weingarten.de, [email protected]
Abstract. Robotic grasping has been a prevailing problem ever since humans began creating robots to execute human-like tasks. The problems are usually due to the involvement of moving parts and sensors. Inaccuracy in sensor data usually leads to unexpected results. Researchers have used a variety of sensors for improving manipulation tasks in robots. We focus specifically on grasping unknown objects using mobile service robots. An approach using convolutional neural networks to generate grasp points in a scene using RGBD sensor data is proposed. Two convolutional neural networks that perform grasp detection in a top down scenario are evaluated, enhanced and compared in a more general scenario. Experiments are performed in a simulated environment as well as the real world. The results are used to understand how the difference in sensor data can affect grasping and enhancements are made to overcome these effects and to optimize the solution. Keywords: Convolutional neural networks Grasping unknown objects
1 1.1
· Mobile service robots ·
Introduction Background
Grasping objects is an important part of most tasks performed by service robots and it is still considered an unsolved problem in generic situations. Grasping is usually a problem because of moving components and errors in sensor data, in this case, the RGBD sensor. RGBD sensors are a specific type of depth sensing devices that work in association with a RGB camera, that are able to augment the conventional image with depth information (related with the distance to the sensor) in a per-pixel basis. When it comes to RGBD sensors there are various environmental factors such as lighting and object transparency that affect the data. This is an important problem because, most algorithms involving RGBD sensors, perform feature extraction and other similar kinds of data processing methods. Therefore, predictions can vary significantly. Using convolutional neural networks can improve the accuracy in prediction significantly since deep c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 662–672, 2021. https://doi.org/10.1007/978-3-030-55190-2_51
Grasping Unknown Objects Using Convolutional Neural Networks
663
learning techniques can identify patterns in data where classical algorithms fail. Previous works based on this topic are mentioned below. There exist many approaches toward grasping unknown objects. The old approach is the classical approach where the manipulator is pre-programmed with the object position and trajectory to reach the object. Machine learning techniques have been tried on robot manipulation tasks from as far back as the 1990s. But recent developments in machine learning algorithms have improved the results of applying the same on robotic manipulation tasks. Bohg and Kragic in 2010 have used a vision based grasp predictor using supervised learning [4]. They have used several image examples as a training set to achieve this. In 2014, Bezak, Bozek and Nikitin trained a deep learning network that develops a hand-object contact model to achieve successful grasping [5]. In 2015, Lenz, et al., created a deep learning neural network for vision based robotic grasp detection [6]. In 2017, Levine et al., used a deep learning technique trained with extensive data collection to learn hand-eye coordination [7]. There are many researchers using similar vision based learning for grasping objects. In 2018, Bousmalis et al., used simulation based training using a dataset of over 25,000 trials to achieve successful grasping [8]. Also Bone et al., used a wrist mounted camera to capture images of target object from different angles to get a 3D vision based model to predict successful grasps [9]. Chernov used machine learning to fit cuboids into objects to calculate 3D grasps [10]. Also, there is the problem of stability and trajectory planning when it comes to manipulation. When the same goal for grasping is carried out twice, there is always a chance that the inverse kinematic solver could calculate a different trajectory and this could lead to failure. Therefore, it is always better to predict multiple grasps for every object. In this case, even if the same grasp fails, a different grasp might succeed thus increasing the chances of success. 1.2
Purpose
The motivation of this work is to evaluate the grasping capabilities of a service robot using unknown objects. Therefore, we compare and enhance the performance of two existing neural networks, trained to generate grasps on a top view of an unknown scene, on a mobile service robot which has an angular view of a scene. The networks are implemented in two different approaches and the performance is compared in both cases. Also, the two approaches provide information on how grasping in 3D space can be optimized. Learning from this information would provide clarity and ideas on how to further improve and optimize robot grasping in 3D space. There is also a focus on bench-marking and using standard experiments to evaluate and compare grasping neural networks. Further information on the performance and conclusions are detailed in the Method section.
2
Method
The initial stage of the work is to implement the Generative Grasping CNN [1] and the RCNN Multi Grasp [2] on a mobile service robot, namely a Tiago
664
P. Krishna Prasad et al.
platform by PAL Robotics. We have developed a ROS wrapper for each of the networks that will be available on our github1 group. Both neural networks used in this work are trained on the Cornell Grasping Dataset [3]. The experiments were conducted using a set of objects required in the robotcup@home GermanOpen 2019 tournament2 that do not overlap with the training dataset. In our implementation of both neural networks, the grasp predictions outside the object’s bounding box where filtered out using the YOLO object detection [11] framework as a region of interest discriminator. The RoboCup objects in the simulated environment are shown categorically in Fig. 1.
Fig. 1. Simulated RoboCup objects listed categorically.
2.1
Grasping Neural Networks
In this section we will highlight the details of the two neural networks used in this work. Details will include the differences, functionalities and implementation of the networks. Generative Grasping CNN The network used in [1] is a simple six-layered neural network that uses a 300 × 300 pixel depth image as the input layer. It performs a pixel-wise grasp detection on a given depth image. The output consists of a grasp point, grasp quality, grasp angle and gripper width for every pixel. This output is called a GraspMap (Fig. 2). A simple mathematical representation of the neural network is shown in Eq. 1. Mθ (I) = (Qθ , Φθ , Wθ )
1 2
https://github.com/iki-wgt. https://github.com/RoboCupAtHome/GermanOpen2019/tree/master/objects.
(1)
Grasping Unknown Objects Using Convolutional Neural Networks
665
With M being the Neural Network, θ the Weights, Qθ the Grasp Quality, Φθ the Grasp Angle and Wθ the Gripper width. As this approach is derived from a top down scenario the grasp angle only contains information about the roll axis of the gripper. The grasp quality parameter is assigned a value between 0 and 1. From the output every pixel with grasp quality larger than a desired threshold are considered. Once the pixels are selected a 3D grasping pose has to be computed using the Point Cloud. In our baseline attempt this was done by simply mapping the 2D pixel to it’s nearest neighbor in the 3D point cloud. Every pixel in the grasp map contains the following parameters present in the grasp representation (Eq. 2). The network uses a feed-forward and back propagation method during training. The weights are optimized using the gradient descent method during back propagation. The meta-parameters of the neural networks are present in the configuration files that are present with the code. g = (s, φ, w, q)
(2)
s - Grasp Pose, φ - Grasp Angle, w - Gripper Width and q - Grasp Quality
Fig. 2. Grasp map of a scene generated by the Generative Grasping CNN
RCNN Multi-grasp The network from [2] was originally developed as an object detection network that was trained on the Cornell grasping dataset [3] to produce grasp rectangles instead of bounding boxes (Fig. 3). This network has multiple feature extraction layers, object classification layers and loss function layers that work together to generate grasp rectangles. This network performs a discreet sampling of grasp candidates as opposed to the pixel-wise grasp detection used in the Generative Grasping CNN. A mathematical grasp representation is shown in Eq. 3. g = x, y, θ, w, h
(3)
With g being the Grasp, (x, y) the 2D Grasp Pose on the image plane, θ the Grasp Angle and w the Gripper width. The input layer uses only 2D RGB data
666
P. Krishna Prasad et al.
and the grasp detection is done without any depth or 3D information. Once grasp rectangles are generated the implementation orients the gripper in line with the rectangle. The received 2D grasp pose, is converted into 3D using the same method as above.
Fig. 3. Grasp predictions by the RCNN Multi Grasp
2.2
Implementation
This section covers the enhanced implementation approaches used in this thesis. Enhancements are necessary because the two neural networks used in this thesis are originally created for only top-down scenario. The enhancements enable the implementation on a mobile service robot to perform grasping tasks in the 3D world. Approach 1 - Object-Based Grasp Decisions In the first approach for implementation, a simple object-based decision is made. In terms of service robots, there are two types of grasps, namely Front/Side grasps and Top-Down grasps. In this approach, the type of grasp is distinguished using thresholds based on object dimensions. Once the grasp type is identified, end effector orientations are calculated based on grasp type and grasp angle. Approach 2 - Surface Normal Approach The second approach is a more generic and object independent approach. In this approach, the surface normal of every detected grasp point is extracted and the end effector orientation is calculated generically in the direction of the surface normal. This way the implementation becomes object independent. Let (xn , yn , zn ) be the extracted surface normal vector. A 4D matrix M (refer 4) using the surface normal and two vectors orthogonal to the surface normal is created. This matrix basically represents a co-ordinate system and the three orthogonal vectors are the axes.
Grasping Unknown Objects Using Convolutional Neural Networks
⎤ ⎡ x2 xn 1 + z2n 0 0 n ⎥ ⎢ ⎥ ⎢ y2 ⎢ yn 0 1 + zn2 0⎥ M =⎢ n ⎥ 2 ⎥ ⎢ x2 yn −yn n ⎦ ⎣ zn ( 1 + z2n )( −x ) ( 1 + )( ) 0 2 zn zn zn n 0 0 0 1
667
(4)
Next the matrix is multiplied by a rotation matrix R (refer 5), which represents rotation about the surface normal for an angle α. α being the grasp angle provided by the neural network corresponding to the grasp point. Then the rotated matrix MR (refer 6) is converted into a quaternion QM (refer 7). Let t = 1 − cos α, ⎡
⎤ t ∗ x2n + cos α t ∗ xn ∗ yn − zn ∗ sin α t ∗ xn ∗ zn + yn ∗ sin α 0 ⎢t ∗ xn ∗ yn + zn ∗ sin α t ∗ yn2 + cos α t ∗ yn ∗ zn − xn ∗ sin α 0⎥ ⎥ R=⎢ ⎣t ∗ xn ∗ zn − yn ∗ sin α t ∗ yn ∗ zn + xn ∗ sin α t ∗ zn2 + cos α 0⎦ 0 0 0 1 (5) MR = R ∗ M
QM
(6)
⎡ ⎤ (MR [2, 1] − MR [1, 2])/2 ∗ (1 + MR [0, 0] + MR [1, 1] + MR [2, 2]) ⎢(MR [0, 2] − MR [2, 0])/2 ∗ (1 + MR [0, 0] + MR [1, 1] + MR [2, 2])⎥ ⎥ =⎢ ⎣(MR [2, 1] − MR [1, 2])/2 ∗ (1 + MR [0, 0] + MR [1, 1] + MR [2, 2])⎦ (7) (1 + MR [0, 0] + MR [1, 1] + MR [2, 2])/2
Next a quaternion QAA is calculated using only the surface normal and angle α (refer 8). The two calculated quaternions are multiplied to get the final gripper orientation Qg (refer 9). ⎡ ⎤ xn ⎢ yn ⎥ ⎥ (8) QAA = normalize ⎢ ⎣ zn ⎦ cos 2α Qg = QM ∗ QAA 2.3
(9)
Experiments and Analysis
The Experiments are performed on the robot simulation using both neural networks each implemented using both approaches and the relevant results are recorded. All experiments were performed in a tabletop scene with the robot placed in front of the table. Experiments were performed on 21 different objects with various shapes and sizes. The evaluation metric used here is Force Closure,
668
P. Krishna Prasad et al.
meaning if an object restricts the gripper from completely closing, then the grasp is considered successful. The experiments include 5 trials per object, per network and per approach (total 420 trials). A total of 420 trials are performed using different objects, neural networks and approaches. Although the object positions were chosen at random, they are consistent for all objects. The recorded results are analyzed and conclusions are drawn. Finally, both the neural networks are implemented on the real robot using the better performing approach. Real robot experiments are performed on 14 different objects with 7 trials per object (total 196 trials) and the relevant results are recorded. Further the objects are grouped into different categories according to their shape and appearance to make performance comparisons for particular cases. The real world objects used in the experiments are categorically shown in Fig. 4.
Fig. 4. Real world objects listed categorically
3
Results
This section consists of results based on object groups and comparison of both the neural networks in the two different approaches. Further the two approaches are also compared with each other and conclusions are drawn based on these comparisons. The objects used in the simulated and real world experiments are classified into groups in order to make comparisons and conclusions. The simulated objects are divided into four groups: Cylindrical objects, Cuboid objects, Irregular objects and Spherical objects. The real world objects are divided into five groups: Cylindrical objects, Cuboid objects, Irregular objects, Transparent objects and Difficult objects. The group difficult objects contains objects that are small and typically challenging to grasp. The conclusion also contains a proposal on how to further improve the results. Figure 5, 6 and 7 represent the comparative results based on grouped objects for all the six experimental cases.
Grasping Unknown Objects Using Convolutional Neural Networks
669
Fig. 5. Success rates based on grouped objects for Approach 1
Fig. 6. Success rates based on grouped objects for Approach 2
Approach 1 vs Approach 2. In the first approach, the decision of grasp type is made based on object dimensions. Although this approach is not generic, this approach is a good solution to grasp small and irregular objects because it is easier to grasp small objects in a straight top-down manner since other orientations could be singular or could lead to collision. Moreover, since Approach 2 uses surface normals, irregular objects can have bad surface normals that could lead to bad gripper orientations. The results based on grouped objects in Fig. 5
670
P. Krishna Prasad et al.
Fig. 7. Success rates based on grouped objects for real robot experiments
and 6 show clearly that both neural networks perform better for irregular objects in approach 1 than in approach 2. The major problem with approach 1 occurs when detecting grasps for large objects. Since approach 1 only uses standard front grasp and top-down grasp orientations, when grasps are detected with an offset from the center of large objects, grasping fails. This problem is resolved in the second approach by using surface normals to create orientations that make the gripper align with the surface normal. The surface normal approach only solves this problem for cylindrical and spherical objects not for cuboid objects. It can be seen from the results based on grouped objects in Fig. 5 and 6 that the performance on cylindrical and spherical objects is better in approach 2 than approach 1. Generative Grasping CNN vs RCNN Multi Grasp. The major difference between the Generative Grasping CNN and the RCNN Multi Grasp is that the former predicts grasps from depth images and the later uses RGB images. This section will cover how the inputs of the neural networks affect the grasping performance in specific cases. The Generative Grasping CNN is incapable of predicting grasps for transparent objects while the RCNN Multi Grasp is. This is because transparent objects do not reflect enough light from the active depth camera and is hence not clearly visible in the depth image and point cloud produced by the RGB-D sensor but transparent objects are clearly visible in RGB images. Predicting grasps for small objects also poses a similar problem since small objects are represented by insufficient number of pixels on the depth image and
Grasping Unknown Objects Using Convolutional Neural Networks
671
this also greatly depend on lighting conditions. But when it comes to RGB images, small objects are clear and predictions mostly are independent of lighting. As a result the RCNN Multi Grasp performs better on small objects than the Generative Grasping CNN. It is clear from the comparative results that the RCNN Multi Grasp network performs better, in terms of grasp detection, than the Generative Grasping CNN in most cases. The only major advantage of using depth image is that the grasp poses are predicted in 3D.
4
Conclusions and Future Work
The conclusion of this thesis is drawn from the experimental results and comparisons as explained in the results section. The initial conclusion is that the RCNN Multi Grasp which uses RGB images to predict grasps outperforms the Generative Grasping CNN in most cases. This is specifically noticeable in the cases of transparent, small and irregular objects. The results based on grouped objects show categorically the specific cases in which the two approaches thrive. Both the approaches have positives and negatives. Based on this it would be possible to improve the results further by combining the two approaches. In this new approach, the algorithm would initially distinguish the grasp type based on the object dimensions, then if the grasp type is front/side grasp the surface normal approach will be used and for small objects that need top-down grasps the first approach is used. Implementing this new approach would solve most of the problems that exist while individually implementing the two approaches. Further the neural networks were trained on the Cornell Grasping Dataset which is a comparatively small dataset. Using larger datasets, that consist of more objects and more instances per object like the Jacquard Dataset, to train the neural networks could also improve the results further.
References 1. Morrison, D., Corke, P., Leitner, J.: Closing the loop for robotic grasping: a realtime, generative grasp synthesis approach. In: Robotics: Science and Systems - RSS (2018) 2. Chu, F.-J., Ruinian, X., Vela, P.A.: Real-world multi-object, multi-grasp detection. IEEE Robot. Autom. Lett. 3(4), 3355–3362 (2018) 3. Cornell University, Robot Learning Laboratory, Cornell Grasping Dataset (2009). http://pr.cs.cornell.edu/grasping/rect data/data.php 4. Bohg, J., Kragic, D.: Learning grasping points with shape context. Robot. Auton. Syst. 58(4), 362–377 (2010) 5. Bezak, P., Bozek, P., Nikitin, Y.: Advanced robotic grasping systems using deep learning. In: Modelling of Mechanical and Mechatronic Systems, MMaMS 2014 (2014) 6. Lenz, I., Saxena, A., Lee, H.: Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34(4–5), 705–724 (2015)
672
P. Krishna Prasad et al.
7. Levine, S., Pastor, P., Krizhevsky, A.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37(4–5), 421–436 (2017) 8. Bousmalis, K., Irpan, A., Wohlhart, P.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: IEEE International Conference (2017) 9. Bone, G.M., Lambert, A., Edwards, M.: Automated modeling and robotic grasping of unknown three-dimensional objects. In: IEEE International Conference (2018) 10. Chernov, I., Ertel, W.: Generating optimal gripper orientation for robotic grasping of unknown objects using neural network. In: Federated AI for Robotics Workshop (FAIR), IJCAI-ECAI-2018, Stockholm (2018) 11. Bjelonic, M.: YOLO ROS: real-time object detection for ROS (2018). https:// github.com/leggedrobotics/darknet ros
A Proposed Technology IoT Based Ecosystem for Tackling the Marine Beach Litter Problem Stavros T. Ponis(B) School of Mechanical Engineering, National Technical University Athens, Heroon Polytechniou 9, Zografos, 15780 Athens, Greece [email protected]
Abstract. The increasing demand and recalcitrant nature of plastic materials combined with waste mismanagement have been responsible for the progressive accumulation of plastic in marine ecosystems and subsequent multifaceted deleterious environmental and socioeconomic effects. The situation is even more dire in the case of Beach Marine Litter, since over 80% of marine pollution comes from anthropogenic and land-based activities. This paper proposes an innovative technology IoT-based ecosystem utilizing the cost effectiveness and flexibility of Low Power Wide Area Networks and sensor technologies for monitoring pollution indicators and waste generation activities in beach areas in an attempt to provide a useful technology tool for the authorities responsible for managing and maintenance of beach areas. At the same time, and in order to raise the awareness regarding the problem and enhance citizen’s engagement and conscious recycling behavior, the proposed research introduces an innovative ‘Serious Game’ for uniting all the stakeholders responsible for keeping the beach clean, under the same vision and objective, which is to protect valuable natural resources and promote the sustainable development of coastal cities. Keywords: Circular economy · Marine littering · Internet of things · Unmanned aerial vehicles · Wireless sensor networks · Gamification · Serious games
1 Background Marine pollution is a serious and rapidly increasing problem at a global scale as millions of tons of waste end up each year in the oceans causing multiple negative environmental, economic, health and aesthetic impacts. This harsh reality demonstrates the fact that the harmonization of waste management with the basic rules of the Circular Economy (CE) is still at an embryonic level. We are, therefore, daily witnessing an inefficient economy unable to exploit millions of tons of plastic, which instead of returning a large part of their value back to the economy, they end up polluting the world’s beaches, causing immeasurable and long-term damage instead. In recent years, the reduction of waste in the marine environment has been recognized as a primary challenge for the preservation of the ecosystem and human health. Marine plastics, constitute the majority of marine litter [1, 2] and therefore lie at the core of political action as expressed through the EU © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 673–678, 2021. https://doi.org/10.1007/978-3-030-55190-2_52
674
S. T. Ponis
Marine Strategy Framework Directive [3], the European Strategy for plastics strategy in a circular economy [4] and EU waste management legislation framework in general. The problem of pollution of beaches and coastlines a.k.a. Marine Beach Litter (MBL) is in the epicenter of EU’s marine pollution strategy and at the core of the proposed research presented in this paper. The significance of the problem is further magnified in the case of Greece, which occupies the 12th place in the list of countries with the longest coastline on the planet (13,676 km). The importance of the proposed research becomes even greater if one considers the impact of MBL on Greece’s tourism industry struggling to offer high added value services through thousands of organized beaches and retain their excellent reputation in relation to their cleanliness and good maintenance, as it is presumed by the 519 Greek beaches awarded the Blue flag for 2018 ranking second behind Spain in the relative list [5]. A relatively recent report by EU and the Joint Research Centre [6], based on the collection of 355,671 waste from 276 beaches in Europe, found that plastic material represents 84% of the total marine litter items found on European beaches in 2016 84% with its 50% constituted by disposable plastics. The report concludes with the need for taking drastic measures to tackle the MBL problem. That is exactly where the proposed research sets its vision and aspirations. Specifically, it aims to provide an array of interconnected technological solutions made available to local authorities, supporting them in their battle against MBL through enhanced waste traceability and increased citizen awareness, leading in their active involvement. Indeed, lack of public awareness of marine pollution and best practices for waste disposal is a major global problem, further intensified in the case of Greece which ranks – hand-in-hand with Malta- at the bottom among European Union Member States in waste management and recycling, according to data provided by the Technical Chamber of Greece. For that reason, Greece has been granted a five-year extension to achieve the goal of recycling 50% of municipal waste, due in 2020 [7]. Unfortunately, to date, existing efforts seem inadequate and ineffective, as campaigns to raise awareness of marine pollution and people’s accountability are usually limited by high costs. At the same time, it is now obvious that the EU will not be able to cope with the challenge of marine pollution in an effective, substantial and cost-effective manner without the increased and continuous involvement of citizens. In this direction, the research presented in this paper proposes a cost-effective awareness-raising programme, fueled by an integrated technological toolset based on Industry 4.0 technologies, that creates a positive experience for citizens in order to create engagement conditions, maintain their continued involvement with recycling and ultimately reducing marine pollution and MBL in particular. The need for such a programme becomes even more intense if one considers that the anthropogenic activity of tourists populating the coasts of the Mediterranean Sea, creates a 40% increase in waste generation, 95% of which are made of plastic according to a recent WWF survey [8].
2 Purpose Our research proposes an integrated technological solution that combines a set of best of breed Industry 4.0 technologies. The introduction of the term ‘Industry 4.0’ can be
A Proposed Technology IoT Based Ecosystem
675
traced back at Hannover Fair in 2011 [9], when Professor Wolfgang Wahlster, Director and CEO of the German Research Center for Artificial Intelligence (AI), addressed the opening ceremony audience. Since then, Industry 4.0 has been discussed and studied under different names in various countries, such as “Advanced Manufacturing Partnership (AMP)” in the USA, “Made in China 2025”, “La Nouvelle France Industrielle” in France or “Factories of the Future (FoF)”, a program launched by the European Commission in 2014. In the proposed research, each technology used plays a significant role in shaping a technological ecosystem entirely dedicated to combating MBL. In the epicenter of this technology ecosystem, lies a Wireless Sensor Network (WSN) based on the basic principles of IoT technology. Its objective is to sense, collect and transmit pollution related data such as water quality and/or waste generation activity related data, such as waste optical identification streaming information. Sensors will be integrated in the area surrounding the beach, in strategically picked locations, safeguarding maximum communication range and reliability. This ground-based sensor network will be complemented by a pair of unmanned vessels (Unmanned Aerial Vehicle and Unmanned Undersea Vehicle) equipped with integrated sensors, audiovisual equipment and wireless communication capability for the extraction of data from sensors and visual scanning of the supervised area. Sensor data will be aggregated on a cloud analytics platform responsible for the centralized control, monitoring, processing and reporting of results to the system’s super user, usually a local authority organization and/or all the other interested stakeholders, e.g. central government or NGOs providing voluntary waste collection and beach maintenance services. At the same time an additional web platform will be made available providing networking functionality between beaches participating in the program and a two-way communication with citizens through a gamified mobile application. This gamification platform web platform processes information received from the cloud analytics tools, integrates them with user entered information from the mobile app and is responsible for informing beach authorities and citizens about the ranking of each participating beach in the ecosystem based on its cleanliness and maintenance services. In that way, by creating an ecosystem and a gamified environment the participating beaches compete each other on a daily basis. The winning beach, among those participating in this ‘Serious Game’ game, receives an award in the form of a certificate for cleanliness and first class maintenance services, which in addition to the visibility and prestige it entails, it can also be associated with monetary incentives for the local authority that manages the beach (after negotiations with the central government). The ‘Serious Game’ proposed in this research plays a central role in the adoption, acceptance and sustainability of the technology ecosystem aiming to a) attract local authorities and the beaches they manage to become members of the ecosystem and b) attract citizens which are the main actuators in the ecosystem by giving and receiving information about the state of the beach in real time e.g. uploading a photograph with accumulated plastics that must be retrieved before drifting and entering the sea from gases or sea currents and participating in information/training programs in relation to recycling/circular economy issues and responsible recycling practices. The proposed technology ecosystem aims to raise awareness and meaningful participation of citizens in the raging war against marine pollution from plastic and to escalate this effort through
676
S. T. Ponis
the dissemination of the message at European level by tourists who will occasionally participate in the ecosystem during their holiday season. The ecosystem described above is presented in Fig. 1.
Fig. 1. The technology ecosystem of the proposed research
3 Method The successful implementation of the proposed research can contribute significantly to the reduction of discarded plastics to the Greek beaches and to the upgrading of the services offered by tourist companies of the region in which the beach resides. The success of the proposed research is based, to a significant extent, on the methodological approach to be followed during its implementation, which focuses on four main axes:
Fig. 2. Proposed research – methodology plan
A Proposed Technology IoT Based Ecosystem
677
a) research on current Scientific Excellence, b) development of the technological systems of the project, c) control and testing of functionality and interoperability of integrated systems and d) pilot application and evaluation of results. The research methodology is further detailed in Fig. 2.
4 Conclusions This paper aims to contribute to the design and development of innovative and more effective technology systems for supporting EU efforts towards the achievement of the 2030 objectives for sustainable development and the Paris agreement to undertake ambitious efforts to combat climate change and adapt to its effects. In doing so, it aligns with the directive, as expressed in the European Union’s strategy on plastics for a circular economy, promoting awareness-raising actions and enhancing citizen’s engagement and joining forces with local governments towards the solution of the plastic marine littering problem. The research presented in this paper is based on a concise methodology and implementation plan and proposes the development of technology ecosystem utilizing the power of IoT and Low Power Wide Area sensor networks and cloud analytics. The proposed system has the capacity of monitoring plastic waste generation activities in beach areas while at the same time introduces an innovative ‘Serious Game’, for uniting all the stakeholders responsible for keeping the beach clean, under the same vision and objective, which is to protect valuable natural resources and promote the sustainable development of coastal cities. Acknowledgments. This research has been co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (project code: T1EDK-05095).
References 1. Galgani, F., Fleet, D., Van Franeker, J.A., Katsanevakis, S., Maes, T., Mouat, J., Oosterbaan, L., Poitou, I., Hanke, G., Thompson, R., Amato, E.: Marine Strategy Framework - Report on the identification of descriptors for the Good Environmental Status of European Seas regarding marine litter under the Marine Strategy Framework Directive. Office for Official Publications of the European Communities (2010) 2. Watts, A.J., Porter, A., Hembrow, N., Sharpe, J., Galloway, T.S., Lewis, C.: Through the sands of time: beach litter trends from nine cleaned North Cornish beaches. Environ. Pollut. 228, 416–424 (2017) 3. European Union. https://eur-lex.europa.eu/. Marine Strategy Framework Directive MSFD, Official Journal of the European Union. Directive 2008/56/EC of the European Parliament and Council. Accessed 4 Jan 2020 4. European Union. https://ec.europa.eu/environment/circular-economy/pdf/plastics-strategybrochure.pdf. A European strategy for plastics in a circular economy. Accessed 4 Jan 2020 5. Ελληνική Εταιρία Προστασίας της Φύσης. . https://www.eepf.gr. Accessed 20 May 2019
678
S. T. Ponis
6. Addamo, A.M., Laroche, P., Hanke, G.: Top Marine Beach Litter Items in Europe. EUR 29249 EN, Publications Office of the European Union, Luxembourg (2017). https://doi.org/10.2760/ 496717. JRC108181. ISBN 978-92-79-87711-7 7. Chrysopoulos, P.: Greece Ranks Last in EU in Waste Management, Recycling, Greek Reporter (2019) 8. Hellenic Society for the Protection of Nature. https://wwf.fi/mediabank/11094.pdf. Out of the Plastic Trap - Saving the Mediterranean from Plastic Pollution. Accessed 4 Jan 2020 9. Efthymiou, O., Ponis, S.T.: Current status of Industry 4.0 in material handling automation and in-house logistics. Int. J. Ind. Manuf. Eng. 13(10), 1370–1373 (2019)
Machine Learning Algorithms for Preventing IoT Cybersecurity Attacks Steve Chesney(B) , Kaushik Roy, and Sajad Khorsandroo Applied Science and Technology, Department of Computer Science, N.C. A&T State University, Greensboro, NC 27411, USA [email protected], {kroy,skhorsandroo}@ncat.edu
Abstract. The goal of this paper is to understand the effectiveness of machine learning (ML) algorithms in combatting IoT-related cyber-attacks, with a focus on Denial of Service (DoS) attacks. This paper also explores the overall vulnerabilities of IoT devices to cyber-attacks, and it investigates other datasets that can be used for IoT cyber-defense analysis, using ML techniques. Finally, this paper presents an evaluation of the CICDoS2019 dataset, using the Logistic Regression (LR) algorithm. With this algorithm, a prediction accuracy of 0.997 was achieved. Keywords: Cybersecurity · Internet of Things · Machine learning (ML) · Supervised Learning · Unsupervised Learning · Reinforcement Learning (RL) · Logistic Regression (LR) · DDoS · Botnet
1 Purpose In recent years there has been a proliferation of devices that use the Internet for communications. These devices have been given the name IoT (Internet of Things) because they extend the traditional form-factor of computing devices to forms that are more amenable to diverse use-cases. IoT devices encompass a wide range of devices, such as sensors, actuators, smart-devices, RFID-devices, etc. [1] and they are all connected to the Internet for the purposes of collecting and transmitting data. The number of IoT devices connected to the Internet has grown to the point that by 2015 there were over 4.9 billion devices connected. By 2020, there will be 25 billion connected IoT devices. This growth has brought about an enormous amount data and network traffic. This data can be classified as Big Data because of the volume of data that has been generated (terabytes and greater), the variety of data (NetFlow, pcap, etc.) and because of the velocity (rate of change of collection) of the data. Although these devices have brought about major advances, they have also brought about new challenges: they are the new frontier for cyber-attacks. IoT devices are vulnerable to cyber-attacks because IoT devices are heterogeneous (different types, different methods of communication and different types of data being transmitted), they are numerous (in the billions), they have limited computing resources and because they normally operate on the edge of computer networks. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 679–686, 2021. https://doi.org/10.1007/978-3-030-55190-2_53
680
S. Chesney et al.
Distributed Denial of Service attacks (DDoS) are one example of the type of cyberattacks that IoT devices will need to be protected against. Since traditional cyber security methods are ineffective for IoT device, ML-enabled solutions are the new means of protecting IoT devices. With the many ML algorithms that are available, the Logistic Regression (LR) algorithm was used in this project against the CICDDoS2019 [2] dataset. This dataset has been made publicly available for research and testing by the University of New Brunswick and the Canadian Institute of Cybersecurity (CIC). In light of this information, this paper has been organized as follows: Sect. 1 discussed the purpose of this research; Sect. 2 provides the background and the significance of this study; Sect. 3 outlines the research methodology that has been used; Sect. 4 gives the results, along with a discussion of the results, while Section 5 provide the conclusion, the references used and the acknowledgements, respectively.
2 Background/Significance 2.1 The Current State of IoT Devices IoT devices have various architectures and communications mechanisms. There are basically two IoT device types: consumer and industrial. Both wired and wireless IoT devices exist, but there is a lack of industry standardization amongst communications protocols. This lack of standardization exposes users, the devices themselves, their networks and the data which they process to risk. Consumer IoT devices exist in the forms of smartphones, smartwatches, smart-appliances and smart-homes, etc. Industrial IoT devices (IIoT) are used in some of the following industries: medical, transportation, retail, military, automotive and critical infrastructures (CI), to name a few. In [1], it is understood that Wireless Sensor Networks (WSNs) provide one of the underlying communications infrastructures for Internet of Things (IoT) devices. These networks are susceptible to cybersecurity attacks, because IoT devices are resource constrained. This section surveys the different cyber-attacks targeted against IoT devices which are supported by WSNs. It also examines how machine learning (ML) can be used to protect these IoT assets. Cañedo states in [2] that IoT devices exist in two primary categories: edge IoT devices and gateway IoT devices. Gateway IoT devices are reviewed for enhancing IoT security because this is where IoT network traffic is aggregated and connected to the Internet. There is a richer set of traffic patterns and data at the gateway-level, as opposed to the edge, because at the edge only traffic from a small set of devices is captured. Meneghello states in [3] that there are four main IoT wireless communication protocols: ZigBee, Bluetooth Low Energy (BLE), 6LoWPAN and LoRaWAN. Attacks against IoT devices via these protocols can be easily carried out because of the lack of cybersecurity awareness, culture and expertise. Lastly, there are three main operational levels of IoT protocols: Information, Access and Functional. 2.2 IoT Device Cyber-Attack Types IoT devices provide enhanced user experiences through immediate access to data and information; however, with these new devices there are also new vulnerabilities to cybersecurity attacks. Cybersecurity is the applied science of computing which focuses upon
Machine Learning Algorithms for Preventing IoT Cybersecurity Attacks
681
preventing attacks and access breaches to compute resources, applications, data and networks. Accordingly, there is a taxonomy of IoT security attacks that are posed against the four protocols noted in [3]. They are categorized as Edge Layer, Access/Middleware Layer and Application Layer attacks. There are security vulnerabilities for each IoT communication protocol type and operational level. The protocol-level attacks are categorized as Key-related Attacks, Denial of Service Attacks on the Data Plane, Denial of Service Attacks on the Device, Replay Attacks and Attacks to the Privacy of Communications. According to [1], DDoS attacks overwhelm needed network resources, such as a service, with bogus requests so that the device eventually fails, preventing legitimate requests from being fulfilled. DDoS attacks are orchestrated from botnet (robot network) attacks where a rogue compute takes over several zombie compute nodes and floods a needed network resource. IoT devices are very susceptible to DDoS attacks in that they are on the edge of the network and cannot run traditional security software. They are often very small and have limited compute resources. They are also too heterogeneous for traditional security software to address their cyber-protection needs. Lu and Xu in [1] show the areas that need attention to enhance IoT security: Cloud Service Security; 5G Mobile Network Security and Quality of Service Design. Further, IoT devices can no longer be seen as trivial devices that can be compromised without impact to society. This paper provides a taxonomy of IoT cyber-attacks. Memdouh et al. [4] categorize the types of cyber-attacks that are used against IoT devices into 3 main categories: Goal-Oriented Attacks, Performance-Oriented Attacks and Layer-Oriented Attacks. Within these categories are different attack types, such as Denial of Service (DoS) Attacks, Man in the Middle Attacks, Selective Forwarding Attacks and IoT Device Vulnerability Attacks. This listing of attack types is then mapped to the most effective ML algorithms, like Support Vector Machine (SVM) and Artificial Neural Network (ANN) that combat them. Meneghello in [3] also provides a listing of the current security mechanism used to protect IoT devices at each protocol level. The key findings in this paper are the security vulnerabilities that are listed for each IoT communications protocol and operational level that these attack types target. The vulnerabilities are the ease at which information can be leaked from side signals (Edge Layer), the ease at which sniffing, injections and redirections can occur (Access/Middleware Layer) and the susceptibility of data corruption/manipulation Application Layer). 2.3 Machine Learning-Enabled Solutions for IoT Devices Author in [5] states that ML-based solutions can enhance the 4 areas below, as increased protection is greatly needed: 1. Authentication - helps IoT devices determine the sources of communication and focuses on identity-based attacks (spoofing and Sybil attacks). 2. Access control - prevent unauthorized users from accessing IoT resources. 3. Secure Offloading - allow IoT devices to use the compute and storage resources of backbone network server and edge network resources.
682
S. Chesney et al.
4. Malware detection - protects IoT device from viruses, worms, Trojan Horse attacks, which can deplete power, inhibit network performance and cause data leakages. Further, with these major areas of attack, the following ML algorithm types have proven effective: 1. Supervised Learning - SVM, K-Nearest Neighbor (KNN), Neural Networks (NNs), Deep Neural Networks (DNNs) and Random Forest (RF) can be used to label network traffic characteristics for classification and regression use-cases. 2. Unsupervised Learning - can cluster unlabeled network traffic data into groups. 3. Reinforcement Learning - empower IoT devices to select security protocols and important criteria when in the face of a cyber-attack. In [6], the proposed solution is the use of an unsupervised machine learning technique called SVM and its RBF (radial basis function) kernel. SVM is a classifier to label malicious traffic from normal network traffic. A smart-home environment was used for the testing as disruptions to smart-homes pose a huge threat by cyber-criminals to disabling home alarm systems and malicious use of appliances. SVMs with the RBF kernel are highly effective against edge IoT attacks with a 91–99% success rate in various measurements. Author in [2] shows how Deep Learning can be used to explore IoT network traffic via ANN. Deep Learning is an advanced form of machine learning. Deep Learning techniques mimic the functioning of neurons in the brain to process large amount of data for decision making. When ANNs were used at the network gateway level, they were able to predict when the temperature sensors were manipulated with invalid data. They are very effective in detecting anomalies and intrusions in IoT gateway networks. Author in [7] also shows that Deep Learning solutions - Convolutional Neural Networks (CNNs), LSTM (Long Short-Term Memory) and hybrid CNN + LSTM can be used to detect DDoS attacks in centralized IoT networks. Deep Learning methods were used against the CICIDS2017 dataset to detect DDoS attacks with 97.16% accuracy. The CNN + LSTM hybrid solution was most accurate. Deep Learning techniques have many advantages over traditional machine learning techniques, but they require even larger amounts of data and more time to be evaluated. Denial of Service attacks are primarily focused upon Access Control and Secure IoT Offloading processes. In [8], the authors described an approach for generating real-world cyber-attack traffic. Although many different types of IoT cybersecurity approaches have been investigated, few are able to mitigate Zero-Day attacks, which are new attacks that have no previous detection history. Within the cybersecurity community, honeypots have been used to lure attackers by exposing the vulnerabilities of networks and IT resources, with the intent of understanding their attack methods and vectors. When IoT honeypots are used to create machine learning datasets, the accuracy of these datasets increases and improves the chances for detecting and defending against Zero-Day attacks. This paper lists the recently developed IoT honeypots for DDoS (distributed denial of service) attacks; and in this paper, the “ThingPot” honeypot was used to understand IoT specific network behaviors and characteristics. ThingPot also possesses an IoT device simulator
Machine Learning Algorithms for Preventing IoT Cybersecurity Attacks
683
for modeling IoT traffic, noting that ThingPot should be implemented at the network router level.
3 Methodology Used ML algorithms require datasets that are large so that accurate training and classification can take place. For this research, the DDoS Evaluation Dataset (CICDDoS2019) [9] was used and the following Python 3.7.4 programming language libraries were implemented: Pandas, Matlplotlib, Scikit-Learn and possibly Seaborn. The CICDDoS2019 dataset was imported into a 64-bit, Windows 10 laptop, with 32 GB of RAM and 2.8 GHz CPU. According to [9] the CICDDoS2019 dataset is made up of normal and malicious (DDoS) traffic, gathered from packet captures. The traffic flows contained the following features: time stamp, source and destination IPs, source and destination ports, protocols, etc. In this dataset there are DDoS attacks such as PortMaps, NetBIOS, LDAP, MSSQL, UDS, UDP-lag, SYN, NTP, DNS and SNMP given at different time intervals. The “LDAP.csv” dataset, last modified on 2019-12-03 was used for this project. The LR algorithm from Python’s Scikit-Learn ML library was used for this evaluation, because it can classify multiple class labels; although, traditionally it has been used for binary classifications. The class labels for the CICDDoS2019 dataset are Benign, NetBIOS and LDAP. The CICDDoS2019 LDAP DDoS dataset contains 80 features [9], but not all 80 features were used for this evaluation. According to [9], the following features should be used for making predictions: Max Packet Length, Fwd Packet Length Max, Fwd Packet Length Min, Average Packet Size and Min Packet Length. The labels that were used are 0 for “Benign traffic”, 1 for “NetBIOS attacks” and 2 for “LDAP” attacks. To display the results of the evaluation, the Scikit-Learn Classification Report and Confusion Matrix were used. Lastly, the Seaborn visualization library’s Heatmap was used to plot the confusion matrix results more clearly.
4 Results The results of this evaluation can be seen in Fig. 1, 2 and 3. Figure 1 shows an accuracy score of 0.997, which is calculated as the total number of correct predictions divided by the total number of the dataset. The precision (P) scores (the accuracy of positive predictions) for this evaluation are 0.99, 0.99 and 1.0, for classes 0, 1 and 2, respectively. The recall scores (the ratio of true positives to the sum of true positives and false negatives) are 0.58, 1.00 and 1.00 and the F1-scores (the percent of positive predictions were correct, measured via harmonic mean) are 0.73, 0.99 and 1.00. In Fig. 2, the Confusion Matrix for this evaluation shows that there were actually 425 instances of Class 0 which were predicted correctly; however, there are 5 instances which were misclassified as Class 1 and 11 instances that were misclassified as Class 2. Similarly, there are 401 instances that were predicted as Class 1, but are actually Class 0, and there are 40,562 Class 1 instances that were predicted correctly, with 98 instances predicted as Class 2, which were actually Class 1. Finally, for Class 2, there were 168,210 instances that were predicted correctly as Class 2, with 0 misclassification errors for Class 1 and 3 misclassification errors for Class 0. The main diagonal gives the correct
684
S. Chesney et al.
Fig. 1. Classification report
425
401
3
5
40562
0
11
98
168210
Fig. 2. Confusion matrix
Fig. 3. Heatmap of the confusion matrix
Machine Learning Algorithms for Preventing IoT Cybersecurity Attacks
685
predictions as 425, 40,562 and 168,210, which equals 209,197. The total attempted predictions were 209,715, confirming the accuracy (209,197/209,715) as 99.75%. In Fig. 3, we see the heatmap of the confusion matrix results, which shows the “diagonal” of correctly predicted values. Finally, the following warning was encountered when the code for this evaluation was executed: “ConvergenceWarning: lbfgs failed to converge (status = 1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.” In future evaluations, experiments will be done to increase the number of iterations of the model via the “max_inter=” parameter, and data normalization techniques will be used, through the sklearn.preprocessing.StandardScaler package.
5 Conclusion As observed, machine learning solutions are making invaluable contributions to the protection IoT devices. The use of edge and gateway computing resources to augment the limited resources of IoT devices will enable large amount of network data to be classified in near-real-time, such that malicious traffic can be quickly identified. The detection of Zero-day attacks will also be enhanced. There have been several DDoS datasets for traditional network traffic made available for public use by governments, academia and by private organizations; however, future research and dataset development should include IoT-specific network datasets for benign and malicious traffic. Datasets like the Bot-IoT dataset [10] should be considered. Acknowledgments. This research is based upon the work supported by Cisco Systems, Inc.
References 1. Lu, Y., Xu, L.D.: Internet of Things (IoT) cybersecurity research: a review of current research topics. IEEE Internet Things J. 6(2), 2103–2115 (2019). https://doi.org/10.1109/JIOT.2018. 2869847 2. Cañedo, J., Skjellum, A.: Using machine learning to secure IoT systems. In: 2016 14th Annual Conference on Privacy, Security and Trust (PST), pp. 219–222, Auckland (2016) 3. Meneghello, F., Calore, M., Zucchetto, D., Polese, M., Zanella, A.: IoT: Internet of threats? A survey of practical security vulnerabilities in real IoT devices. IEEE Internet Things J. 6(5), 8182–8201 (2019). https://doi.org/10.1109/JIOT.2019.2935189 4. Mamdouh, M., Elrukhsi, M.A.I., Khattab, A.: Securing the Internet of Things and wireless sensor networks via machine learning: a survey. In: 2018 International Conference on Computer and Applications (ICCA), pp. 215–218, Beirut (2018) 5. Xiao, L., Wan, X., Lu, X., Zhang, Y., Wu, D.: IoT security techniques based on machine learning: how do IoT devices use AI to enhance security? IEEE Signal Process. Mag. 35(5), 41–49 (2018). https://doi.org/10.1109/MSP.2018.2825478 6. Hou, S., Huang, X.; Use of machine learning in detecting network security of edge computing system. In: 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), pp. 252– 256, Suzhou (2019). https://doi.org/10.1109/icbda.2019.8713237
686
S. Chesney et al.
7. Roopak, M., Yun Tian, G., Chambers, J.: Deep learning models for cyber security in IoT networks. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0452–0457, Las Vegas (2019). https://doi.org/10.1109/CCWC.2019.8666588 8. Vishwakarma, R., Jain, A.K.: A Honeypot with machine learning based detection framework for defending IoT based botnet DDoS attacks. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1019–1024, Tirunelveli (2019) 9. Sharafaldin, I., Lashkari, A., Hakak, S., Ghorbani, A.: Developing realistic distributed denial of Service (DDoS) attack dataset and taxonomy. New Brunswick University, Canadian Institute of Cybersecurity (CIC). https://ieeexplore.ieee.org/abstract/document/8888419 10. Moustafa, N.: The Bot-IoT dataset. IEEE Dataport (2019). http://dx.doi.org/10.21227/r7v2x988. Accessed 15 Jan 2020
Development of Web-Based Management System and Dataset for Radiology-Common Data Model (R-CDM) and Its Clinical Application in Liver Cirrhosis SeungJin Kim1(B) , Chang-Won Jeong1(B) , Tae-Hoon Kim1 , ChungSub Lee1 , Si-Hyeong Noh1 , Ji Eon Kim1 , and Kwon-Ha Yoon2 1 Medical Convergence Research Center, Wonkwang University, Iksan, Republic of Korea
{koch369369,mediblue}@wku.ac.kr 2 Department of Radiology, Wonkwang University School of Medicine and Wonkwang
University Hospital, Iksan, Republic of Korea
Abstract. The Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM) used in distributed research networks has low coverage of clinical data and does not reflect the latest trends of precision medicine. Radiology data have great merits to visualize and identify the lesions in specific diseases. However, radiology data should be shared to obtain the sufficient scale and diversity required to provide strong evidence for improving patient care. Our study was to develop a web-based management system for radiology-CDM (R-CDM), as an extension of the OMOP-CDM, and to assess the feasibility of an R-CDM dataset for application of radiological image data in AI learning. This study standardized a cirrhosis of liver (LC) R-CDM dataset consisting of CT data (LC 40,575 images vs. non-LC 33,565 images). With use of modified AI learning algorithm, the diagnostic accuracy was 0.99292 (error rate = 0.00708), and its sensitivity was 0.99469 for LC and specificity was 0.99115 for non-LC. We developed a webbased management system for searching and downloading standardized R-CDM dataset and constructed a liver cirrhosis R-CDM dataset for clinical practice. Our management system and LC dataset would be helpful for multicenter study and AI learning research. Keywords: Radiology-Common data model (R-CDM) · R-CDM dataset · Web-based management system · Artificial intelligence (AI) · Liver cirrhosis
1 Purpose This study was to develop a web-based management system for radiology-CDM (OHDSI proposed R-CDM) [1, 2], as an extension of the OMOP-CDM, and to evaluate the probability of R-CDM dataset for application of radiological image data in clinical practice.
© Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 687–695, 2021. https://doi.org/10.1007/978-3-030-55190-2_54
688
S. Kim et al.
2 Background/Significance 2.1 Background To date, the distributed research network has been adopted by global research collaboration groups, including the Observational Health Data Sciences and Informatics (OHDSI) consortium. The Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM) was developed by the OHDSI consortium and includes clinical data of electronic health records (EHR) from over 20 countries, with information of 1.5 billion patients transformed to date. However, OMOP-CDM used in distributed research networks has low coverage of clinical data and does not reflect the latest trends of precision medicine [3, 4]. Recently, a research group belongs to OHDSI developed genomic CDM (G-CDM), as an extension of the OMOP-CDM to improve clinical data coverage. G-CDM provided the effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. Compared to EHR and genomic data, radiological image data have great merits to visualize and identify the lesions in specific diseases. However, radiology data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient’s diagnosis and care. Although radiology data internationally standardized to DICOM (Digital Imaging and Communications in Medicine) format (as header and image information), detailed information within DICOM tag vary across institutes [5]. Moreover, a distributed research network for radiology data allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. 2.2 Related Research OHDSI consortium provided useful tools for visualization and statistical analysis of CDM data. The Achilles [6], the main tool, visualizes CDM in a table format and Atlas [7] can easily handle statistical analysis such as building cohort, customizing propensity variables, survival analysis, and calculating relative risk on a web-based basis. Recently, a Korean research group developed genomic CDM (G-CDM) [8], as an extension of the OMOP-CDM to include human genome data, allowing for standardized genomic data and data sharing across institutes.
3 Method 3.1 Radiology-CDM Data structure for R-CDM is basically used the OMOP-CDM structure. To link clinical data in the OMOP-CDM (‘Condition_Occurrence,’ in left-side dashed-line box), the following information on each patient with radiological image data was stored in a separate corresponding table (in right-side long dashed-line box): ‘Radiology_Occurrence’ and ‘Radiology_Image’ (Fig. 1). R-CDM tables directly linked to OMOP-CDM tables.
Development of Web-Based Management System and Dataset
689
Fig. 1. The schematic diagram of radiology-common data model (R-CDM) data structure.
3.2 Standardization of Terminology and Imaging Information for R-CDM Terminology of OMOP-CDM is used “SNOMED” and “SNOMED Clinical Terms® (SNOMED CT®)” for standardization of terminology. SNOMED and SNOMED CT® was originally created by the College of American Pathologists. “SNOMED”, “SNOMED CT” and “SNOMED Clinical Terms” are registered trademarks of the SNOMED International (www.snomed.org). Also, a web service of standardized vocabularies (called ‘Athena’) is available at http://athena.ohdsi.org/search-terms/terms (Fig. 2). In order to standardize the R-CDM vocabulary, R-CDM data are used not only “SNOMED” as OMOP-CDM, but also “RadLex radiology lexicon” produced from Radiological Society of North America (RSNA), available at https://www.rsna.org/en/ practice-tools/data-tools-and-standards/radlex-radiology-lexicon.
690
S. Kim et al.
To standardize clinical information for R-CDM dataset, the important imaging conditions recorded to ‘Radiology_Occurance’ table including of the ‘Condition occurrence,’ ‘Device_concept_id,’ ‘Radiology_modality,’ ‘Person_orientation,’ ‘Radiology_modality,’ and so on. To standardize image parameters, medical imaging information was stored in the ‘Radiology_Image’ table from DICOM tag information.
Fig. 2. Searching standardized terms in Athena web-site: cirrhosis of liver
3.3 Configuration of R-CDM Management System Management system of R-CDM developed by web-based client server architecture using Python-Django Rest Framework and JavaScript language-based React library as shown
Development of Web-Based Management System and Dataset
691
in (Fig. 3). API Servers were designed based on the React UI library-based web client and Python Django Rest Framework. Asynchronous distributed upload method was introduced through Nginx Web Server, Message Queue, and Task Worker to collect medical data from each institution.
Fig. 3. The diagram of web-based management system configuration for R-CDM
The dataset standardization procedure was as follows: selection of clinical condition, uploading radiological image dataset, extraction of metadata, and build standard RCDM dataset. The system provided searching & downloading functions, Occurrence List Viewer and Image Viewer.
4 Results 4.1 Standardization of Medical Imaging Information The ‘Radiology_Occurrence’ table and ‘Radiology_Image’ tables for R-CDM data are shown in Tables 1 and 2, respectively. The Radiology_Occurrence table consists of information for distinguishing data sets such as patient information, organs, protocols, and so on. The Radiology_Image table consists of imaging parameters within DICOM tag. 4.2 Visualization of Standardized R-CDM Data Using Management System Figure 4 demonstrates the web-based R-CDM system and dashboard from a sample data set stored. R-CDM tool can be checked and searched under the conditions (any query) required by the researchers (condition, device, modality, etc.), and the data set can be downloaded.
692
S. Kim et al. Table 1. Radiology_Occurrence table for standardizing R-CDM data DICOM tag number DICOM tag name (0010, 0010)
Patient name
(0010, 0020)
Patient ID
(0008, 0030)
Study time
(0008, 0020)
Study date
(0010, 1010)
Patient age
(0010, 0040)
Patient sex
(0008, 0033)
Content time
(0018, 5101)
View position
(0018, 0087)
Magnetic field strength
(0008, 1010)
Station name
(0008, 1030)
Protocol name
(0018, 0060)
KVP
(0008, 0060)
Modality
(0018, 1150)
Exposure time
(0010, 4000)
Patient comments
(0020, 000D)
Study instance UID
Table 2. Radiology_Image table for standardizing R-CDM data DICOM tag number DICOM tag name (0028, 0010)
Rows
(0028, 0011)
Columns
(0008, 0008)
Image type
(0028, 1050)
Window center
(0018, 0050)
Slice thickness
(0008, 0031)
Series time
(0020, 0011)
Series number
(0008, 0032)
Acquisition time
(0020, 0012)
Acquisition number
(0008, 103E)
Series description
(0020, 0037)
Image orientation (Patient)
(0020, 0013)
Instance number
(0008, 0018)
SOP instance UID
Development of Web-Based Management System and Dataset
693
Fig. 4. Dashboard window of a sample data set using R-CDM management system
4.3 Data Set Description of Liver Cirrhosis for Clinical Application In order to construct an R-CDM dataset, the study design was retrospective study and the study protocol was approved by the institutional review board (IRB) of our University Hospital. A total of 565 patients with liver cirrhosis (LC) and 565 non-LC subjects were recruited from January 2015 to December 2018. This study standardized a LC R-CDM dataset consisting of CT data (LC 40,575 images vs. non-LC 33,565 images). The disease code for liver cirrhosis (LC) is obtained in SNOMED Concept Code (19943007) (Fig. 5). Also, the private information such as Patient Name (DICOM header Tag No. = 0010, 0010), Patient ID (0010, 0020), Patient Sex (0010, 0040), and Patient Age (0010, 1010) are deleted for data anonymization to prevent the identification of the of patient.
Fig. 5. SNOMED concept code for ‘cirrhosis of liver’
4.4 AI Learning Application Using Liver Cirrhosis Dataset Figure 6 shows the diagnostic accuracy using a LC R-CDM dataset and modified GoogleNet-V3 algorithm. The diagnostic accuracy was 0.99292 (error rate = 0.00708), and its sensitivity was 0.99469 for LC and specificity was 0.99115 for non-LC, respectively.
694
S. Kim et al.
Fig. 6. Diagnostic accuracy using artificial intelligence (AI) learning and LC R-CDM data set.
5 Conclusion In this study, we established an R-CDM in combination with OMOP-CDM for a distributed research network. We developed a web-based management system for searching and downloading standardized R-CDM dataset and constructed a liver cirrhosis R-CDM dataset for clinical practice. Our management system and LC dataset would be helpful for multicenter study and AI learning research. Acknowledgments. This study was supported by the grants of Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare (HI18C1216), the National Research Foundation of Korea (NRF) (2016M3A9A7918501) and Technology Innovation Program (20001234) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).
References 1. OHDSI Forum. https://forums.ohdsi.org/t/oncology-radiology-imaging-integration-into-cdm/ 2018/7 2. OHDSI/Radiology-CDM. https://github.com/OHDSI/Radiology-CDM 3. Hripcsak, G., Duke, J.D., Shah, N.H., Reich, C.G., Huser, V., Schuemie, M.J., Suchard, M.A., Park, RW., Wong, ICK., Rijnbeek, P.R., Lei, J., Pratt, N., Norén, G.N., Li, Y-C., Stang, P.E., Madigan, D., Ryan, P.B.: Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015) 4. FitzHenry, F., Resnic, F.S., Robbins, S.L., Denton, J., Nookala, L., Meeker, D., Ohno-Machado, L., Matheny, M.E.: Creating a common data model for comparative effectiveness with the observational medical outcomes partnership. Appl. Clin. Inform. 6(3), 536–547 (2015)
Development of Web-Based Management System and Dataset
695
5. Bidgood Jr, W.D., Horii, S.C., Prior, F.W., Van Syckle, D.E.: Understanding and using DICOM, the data interchange standard for biomedical imaging. J. Am. Med. Inform. Assoc. 4(3), 199– 212 (1997) 6. OHDSI/Achilles. https://github.com/OHDSI/Achilles 7. OHDSI/Atlas. https://github.com/OHDSI/Atlas 8. Shin, S.J., You, S.C., Park, Y.R., Roh, J., Kim, J.H., Haam, S., Reich, C.G., Blacketer, C., Son, D.-S., Oh, S., Park, R.W.: Genomic common data model for seamless interoperation of biomedical data in clinical practice: retrospective study. J. Med. Internet Res. 21(3), e13249 (2019)
Shared Autonomy in Web-Based Human Robot Interaction Yug Ajmera(B) and Arshad Javed Mechanical Department, Birla Institute of Technology and Science, Pilani, Hyderabad, India {f20170644,arshad}@hyderabad.bits-pilani.ac.in
Abstract. In this paper, we aim to achieve a human-robot work balance by implementing shared autonomy through a web interface. Shared autonomy integrates user input with the autonomous capabilities of the robot and therefore increases the overall performance of the robot. Presenting only the relevant information to the user on the web page lowers the cognitive load of the operator. Through our web interface, we provide a mechanism for the operator to directly interact using the displayed information by applying a point-and-click paradigm. Further, we present our idea to employ a human-robot mutual adaptation in a shared autonomy setting through our web interface for effective team collaboration. Keywords: Human-robot interaction Telerobotics
1
· Shared autonomy ·
Introduction
There has been an increase in the number of applications for robot teleoperation including military [1], industrial [2], surveillance [3], telepresence [4] and remote experimentation [5]. Improving operator efficiency and ensuring the safe navigation of robots is of utmost importance. Studies show that human-robot joint problem-solving results in safe and effective task execution [6]. Humans are better at reasoning and creativity, whereas robots are better at carrying out a particular task precisely and repeatedly. Therefore, combining robot capabilities with human skills results in an enhanced human-robot interaction. Having remote access to robots through user-friendly interfaces is very important for effective human-robot interaction. Different methods for the control of mobile robots are being developed and tested. Gomez has developed a GUI for teleoperation of robots for teaching purposes [7]. Creating the GUI using Qt creator reduces the usability of the system in comparison to a web-based interface. Lankenau’s telepresence system called Virtour provides remote access to wheeled robots through the website [8]. The tour leader controls the robot, whereas the guest robots follow it. This system provides relatively limited autonomy since the control lies solely in the hands of an operator. Our web-interface is unique in a way that it allows the users point-and-click navigation to arbitrary locations c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 696–702, 2021. https://doi.org/10.1007/978-3-030-55190-2_55
Shared Autonomy in Web-Based Human Robot Interaction
697
in unknown environments. The interface described in [9], implements shared autonomy but is limited to a tablet computer. We have tried to overcome these typical shortcomings like poor accessibility and usability through our web-based interface. Such an interface would enable a more straightforward control of the robot for novices and experts alike. The web-interface allows users to control robots within their home or workplace through any web-enabled devices. The web clients are created using modern web standards; hence it does not require users to download any extra software in order to use it. Furthermore, we have tried to display all the visualization data on the web page. This makes it convenient for users to perform the entire navigation process by just using the web interface. The objective of this research is to develop an intuitive web interface for robot control using shared autonomy. The interface is initially built and tested on the three-wheeled telepresence robot of our lab. It is an open-ended design and can be extended to various use cases. Further, we describe our idea to integrate a bounded-memory adaptation model (BAM) of the human teammate into a partially observable stochastic process to enable a robot to adapt to a human. Studies show that this retains a high-level of user’s trust in robots and significantly improves human-robot team performance [10].
2 2.1
Web-Interface Software Infrastructure
ROS (Robot Operating System) is used as a back-end system for most robots nowadays. It is an open-source middleware platform that provides several useful libraries and tools to develop robot applications. The telepresence robot of our lab is based on ROS. We have implemented the ROS Navigation stack to perform autonomous navigation.
Fig. 1. System flow of the web interface
698
Y. Ajmera and A. Javed
The graphical user interface is created using HTML, CSS, and Javascript. Apart from this, Robot web tools [11] are used to enable these web applications to interface with the ROS system and provide visualization libraries. For interaction between ROS and web page javascript requests, we have used rosbridge [12]. The user’s activities on the web page are interfaced as JavaScript Object Notation (JSON) commands which are then converted to ROS commands. Roslibjs [13] is a standard ROS javascript library that connects rosbridge and the web application. It enables the interface to publish and subscribe to ROS topics. Through the use of web sockets, rosbridge and roslibjs can readily be used with modern web browsers without any installation. This feature makes it an ideal platform for us. Furthermore, rosbridge provides the feature of data logging. Analyzing the logged data and correcting the errors can increase the efficiency drastically. A detailed system flow of the web interface is shown in Fig. 1. For hosting our web page, we have employed the roswww package which provides an HTTP web server at a specified port. Hence the user can access the web page as long as they are connected to the wi-fi shared by the robot. Finally, the web video server package is used to display the live video feed from the camera of the robot to the web page. It streams the images through an image topic in ROS via HTTP. 2.2
User Interaction
The web interface (shown in Fig. 2) is divided into two parts that work simultaneously: Manual teleoperation and autonomous navigation. In the manual control, the user is provided with an on-screen touch-capable joystick and the live video feed from the camera of the robot. The extent of pull determines the fraction of the maximum velocity. The orientation of the pull calculates the corresponding linear and angular velocities. These are then published in the velocity topic
Fig. 2. The graphical user web interface
Shared Autonomy in Web-Based Human Robot Interaction
699
as a geometry/Twist ROS message. The current maximum linear and angular speeds of the robot are displayed. Two buttons each are provided for increasing or decreasing these velocities by ten percent of the current velocity. By default, the camera topic is set as /camera/rgb/image raw. The user can change it by typing a different topic name in the space provided. The click of the “Load Video” button loads the video feed. A joystick is used instead of buttons because it employs a game-based strategy to teleoperate robots that are intuitive even for non-expert users. For better teleoperation, we need to enhance the information provided to the user. Hence, we have provided real-time video data on the web page. This provides a robotcentered perspective to the user and it replicates the environment which the user would perceive in place of the robot. The autonomous navigation section allows the users point-and-click navigation to arbitrary locations. The current position of the robot is displayed as a yellow pulsating arrow on the map. The user can give a goal position and orientation by clicking on an arbitrary point on the map. The goal is marked by a red arrow. This is sent to the move base node which plans a safe path to the goal. In that instance, the robot starts moving autonomously towards the goal position.
3
Shared Autonomy
In the case of full control or teleoperation method, the user has to manually guide the robot through the desired path using the joystick. Because the user has to be in control of the robot continually as well as be aware of the surroundings of the robot until the execution of the task, this method is cumbersome. On the other hand, in the case of full autonomy, there is no involvement of the user once the goal position is marked. The robot autonomously navigates through the obstacles and reaches the goal position through the shortest route. If the userintended path is not the shortest, then the robot fails to meet the expectations of the user. This problem may lead to the disuse of the system. Therefore, shared autonomy is necessary for an efficient human-robot interaction. The web application presented in this paper, provides control over the robot to the user, while simultaneously using the existing autonomous navigation capabilities and obstacle avoidance to ensure safety and correct operation. It enables varied autonomy of the robot during the execution of tasks. This feature changes the level of user involvement in carrying out the tasks. When the user marks the goal position on the map, the move base ROS node calculates the shortest path using Dijkstra’s algorithm, and the robot starts following the path. At any point, if the user feels that the robot is malfunctioning, or if the user wants the robot to follow a different path to reach the goal, the user can override the control using the joystick, and guide the robot to that path. A ROS service then republishes the original goal to move base ROS node and the robot re-plans its trajectory.
700
3.1
Y. Ajmera and A. Javed
Mutual Adaptation
In this section, we propose an additional method to employ shared autonomy that maintains the user’s trust in the system. In order to implement mutual adaptation, the robot should not only suggest efficient strategies which may be unknown to the user, but it should also comply with human’s decision in order to gain his trust. We assume that the robot knows the intention for the task: to reach the goal position. Consider a situation where the robot has two choices to avoid an obstacle: it can take a right or a left path. Taking the right path is a better choice, for instance, because the left path is too long, or because the robot has less uncertainty about the right part of the map. Intuitively, if the human emphasises on the left path, the robot should comply; failing to do so can have a counteractive impact on the user’s trust in the robot, which may lead to the desuetude of the system. We formulate the problem with world state xworld ∈ Xworld , robot action ar ∈ Ar and human action ah ∈ Ah . The goal is assumed to be among a discrete → set of goals g ∈ G. The transition function is defined as T : Xworld × Ar × Ah − (Xworld ). The world state consists only of robot states in a shared autonomy, xr ≡ xworld . Further, the human actions ah ∈ Ah are read through the inputs of the web interface and hence does not affect the world state. The transition → Xr . function becomes deterministic and reduces to T : Xr × Ar − The Boundary-memory Adaptation Model (BAM) simplifies the problem by limiting the history length to k steps. Based on a history of k steps, we compute the modal policy or mode of human mh and the modal policy of robot mr towards the goal using the feature selection method described in [14]. α denotes the probability that the user will comply with robot’s decision. As the adaptability is not known beforehand, it is initially assumed that the human is adaptable (α = 1). Based on BAM, the probability with which the user will switch to a new mode (mh ) is given by: ⎧ ⎪ mh ≡ mr ⎨α, P (mh |α, mh , mr ) = 1 − α, mh ≡ mh ⎪ ⎩ 0, Otherwise For mutual adaptation, the robot has to estimate two variables: the human adaptability (α) and human mode (mh ). Since both are not directly observable we use a mixed-observability Markov decision process (MOMDP) [15]. A reward function R(t) is assigned in each step that depends on robot action ar , human action ah and human mode mh . The robot then maximises the func∞ tion t=0 γ t R(t) where γ denotes a discount factor that gives higher values to immediate rewards.
4
Conclusion and Future Work
In this paper, we have successfully developed a web interface for enhanced human-robot interaction. We have achieved joint-problem solving by implementing shared autonomy through the use of our web application. Further, we have
Shared Autonomy in Web-Based Human Robot Interaction
701
described a mutual adaptation model in a shared autonomy setting to enable a robot to adapt to a human. Developing such predictive models for determining robot’s decision is an exciting area of future work. A follow-up work would be to formulate more such models that retain a higher level of human’s trust in robots and determine their usability. User studies will be carried out to compare and contrast such methods in terms of the overall performance of the system. We anticipate our web application to be an open-ended design that can be extended and built upon by other developers to use it in various research and industrial applications.
References 1. Kot, T., Nov´ ak, P.: Application of virtual reality in teleoperation of the military mobile robotic system taros. Int. J. Adv. Robot. Syst. 15(1), 1729881417751545 (2018) 2. Korpela, C., Chaney, K., Brahmbhatt, P.: Applied robotics for installation and base operations for industrial hygiene. In: 2015 IEEE International Conference on Technologies for Practical Robot Applications (TePRA), pp. 1–6. IEEE (2015) 3. Lopez, J., Perez, D., Paz, E., Santana, A.: Watchbot: a building maintenance and surveillance system based on autonomous robots. Robot. Auton. Syst. 61(12), 1559–1571 (2013) 4. Mishra, R., Ajmera, Y., Mishra, N., Javed, A.: Ego-centric framework for a threewheel omni-drive telepresence robot. In: 2019 IEEE International Conference on Advanced Robotics and its Social Impacts (ARSO), pp. 281–286, October 2019 5. Pitzer, B., Osentoski, S., Jay, G., Crick, C., Jenkins, O.C.: Pr2 remote lab: an environment for remote development and experimentation. In: 2012 IEEE International Conference on Robotics and Automation, pp. 3200–3205. IEEE (2012) 6. Musi´c, S., Hirche, S.: Control sharing in human-robot team interaction. Ann. Rev. Control 44, 342–354 (2017) 7. G´ omez, C., Hern´ andez, A.C., Crespo, J., Barber, R.: Learning robotics through a friendly graphical user interface. In: ICERI 2016 Proceedings, pp. 483–492 (2016) 8. Lankenau, P.: Virtour: Telepresence system for remotely operated building tours (2016) 9. Birkenkampf, P., Leidner, D., Borst, C.: A knowledge-driven shared autonomy human-robot interface for tablet computers. In: 2014 IEEE-RAS International Conference on Humanoid Robots, pp. 152–159. IEEE (2014) 10. Nikolaidis, S., Zhu, Y.X., Hsu, D., Srinivasa, S.: Human-robot mutual adaptation in shared autonomy. In: 2017 12th ACM/IEEE International Conference on HumanRobot Interaction (HRI, pp. 294–302. IEEE (2017) 11. Toris, R., Kammerl, J., Lu, D.V., Lee, J., Jenkins, O.C., Osentoski, S., Wills, M., Chernova, S.: Robot web tools: efficient messaging for cloud robotics. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4530–4537. IEEE (2015) 12. Crick, C., Jay, G., Osentoski, S., Pitzer, B., Jenkins, O.C.: Rosbridge: ROS for non-ROS users. In: Robotics Research, pp. 493–504. Springer, Heidelberg (2017) 13. Osentoski, S., Jay, G., Crick, C., Pitzer, B., DuHadway, C., Jenkins, O.C.: Robots as web services: reproducible experimentation and application development using rosjs (2011)
702
Y. Ajmera and A. Javed
14. Nikolaidis, S., Kuznetsov, A., Hsu, D., Srinivasa, S.: Formalizing human-robot mutual adaptation: a bounded memory model. In: The Eleventh ACM/IEEE International Conference on Human Robot Interaction, pp. 75–82. IEEE Press (2016) 15. Ong, S.C.W., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29(8), 1053–1068 (2010)
tanh Neurons Are Bayesian Decision Makers Christian Bauckhage, Rafet Sifa(B) , and Dirk Hecker Fraunhofer IAIS, Sankt Augustin, Germany {christian.bauckhage,rafet.sifa,dirk.hecker}@iais.fraunhofer.de
Abstract. The hyperbolic tangent (tanh) is a traditional choice for the activation function of the neurons of an artificial neural network. Here, we go through a simple calculation that shows that this modeling choice is linked to Bayesian decision theory. Our brief, tutorial-like discussion is intended as a reference to an observation rarely mentioned in standard textbooks.
Keywords: Hyperbolic tangent network
1
· Decision boundaries · Neural
Introduction to the Hyperbolic Tangent
The hyperbolic tangent parametrized by β > 0 maps any x ∈ R to the open interval (−1, +1) and is given by tanh(βx) =
eβx − e−βx sinh(βx) = βx . cosh(βx) e + e−βx
(1)
Its graphs are of sigmoidal shape (see Fig. 1) and, in the limit β → ∞, approach that of the signum function sign(x). Since the hyperbolic tangent thus provides a continously differentiable surrogate for a threshold operation, it and its variations (such as the sigmoidal function) have historically been a popular choice for the activation functions in multilayer perceptrons and recurrent neural networks [1– 7].
2
Bayesian Decision Theory
In order to see how the right hand side of (1) relates to Bayesian decision making, we next consider a simple binary classification problem. That is, we consider the problem of having to assign incoming data x ∈ R to either one of two classes Ω1 or Ω2 and note that it can be solved using a Bayesian classifier Ω : R → {Ω1 , Ω2 } where Ω1 if p(Ω1 | x) ≥ p(Ω2 | x) Ω(x) = (2) Ω2 otherwise. c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 703–707, 2021. https://doi.org/10.1007/978-3-030-55190-2_56
704
C. Bauckhage et al.
Letting i ∈ {1, 2}, the decision rule of this classifier crucially depends on the two posteriors p(Ωi | x) =
p(x, Ωi ) p(x | Ωi ) · p(Ωi ) = p(x) p(x, Ω1 ) + p(x, Ω2 )
(3)
which in turn depend on the likelihoods p(x | Ωi ) and priors p(Ωi ). In practice, these ingredients of the classifier are either estimated from training data or follow from informed modeling assumptions made by human experts. For the point we want to make in this note, we adhere to the latter strategy and introduce a couple of simplifying modeling assumptions. First of all, we will assume that the two class-specific likelihoods p(x | Ωi ) in the middle of (3) both follow a normal distribution p(x | Ωi ) = N x μi , σi2 = √ 1
2πσi2
−
e
(x−μi )2 2σ 2 i
(4)
whose mean and variance parameters are μi and σi2 , respectively.
1
-1
1 -1
β = 1/2 β= 1 β= 2
Fig. 1. Graphs of tanh(βx) for different choices of β.
Second of all, we will assume that these two normal densities are mirror symmetric about x = 0. In other words, we assume that their means are given by μ1 = +μ
(5)
μ2 = −μ
(6)
and that their variances are identical, namely σ12 = σ22 = σ 2 .
(7)
Third and last of all, we will assume equal priors for both classes. That is, we simply let p(Ω1 ) = p(Ω2 ) = 12 . (8)
tanh Neurons Are Bayesian Decision Makers
705
Given these three assumptions, the class-specific densities in the numerator on the right hand side of (3) can now be written as x2
μ2
xμ
x2
μ2
xμ
p(x, Ω1 ) =
1 √ 1 2 2πσ 2
e− 2σ2 e− 2σ2 e+ σ2
p(x, Ω2 ) =
1 √ 1 2 2πσ 2
e− 2σ2 e− 2σ2 e− σ2
(9)
and (10)
respectively (see Fig. 2). Adding both expressions, we obtain the evidence in the denominator on the right hand side of (3) and find that it amounts to p(x) =
1 √ 1 2 2πσ 2
μ2
x2
e− 2σ2 e− 2σ2
xμ
xμ
e+ σ2 + e− σ2
.
(11)
All in all, our modeling assumptions therefore lead to the following posteriors xμ
p(Ω1 | x) =
e+ σ2
xμ
xμ
(12)
xμ
(13)
e+ σ2 + e− σ2
and xμ
p(Ω2 | x) =
e− σ2
xμ
e+ σ2 + e− σ2
for class Ω1 and Ω2 , respectively (see Fig. 3).
1 p(x, Ω2 )
p(x, Ω1 )
−μ
+μ
Fig. 2. Example of normally distributed class densities which are mirror symmetric about the axis x = 0.
At this point, it becomes evident where our discussion is headed. Noting that
⇔
p(Ω1 | x) ≥ p(Ω2 | x)
(14)
p(Ω1 | x) − p(Ω2 | x) ≥ 0,
(15)
706
C. Bauckhage et al.
p(Ω2 | x) =
p(x,Ω2 ) p(x)
1
p(Ω1 | x) =
−μ
p(x,Ω1 ) p(x)
+μ
Fig. 3. Posterior distributions of the classes in Fig. 2.
we realize that the Bayesian classifier in (2) implcitly evaluates the difference of two posteriors. Using our specific results in (12) and (13), this difference simply amounts to xμ xμ e+ σ2 − e− σ2 (16) p(Ω1 | x) − p(Ω2 | x) = + x μ xμ e σ2 + e− σ2 (see Fig. 4) which we recognize as the hyperbolic tangent with scale parameter β=
1
−μ
μ . σ2
(17)
p(Ω1 | x) − p(Ω2 | x) +μ
-1 Fig. 4. Subtracting the second from the first posterior in Fig. 3 yields a hyperbolic tangent.
3
Conclusion
To conclude this short note, we recall the following model for the behavior a neuron in an artificial neural network: upon input x ∈ Rm , a simple tanh neuron with synaptic weights w ∈ Rm and bias b ∈ R computes y(x) = tanh w x − b . (18) Given our discussion, this computation may be interpreted in terms of probabilistic decision making: in any case, the neuron projects its input x into a
tanh Neurons Are Bayesian Decision Makers
707
one-dimensional subspace spanned by w. With respect to this subspace, the neuron “assumes” that any x = w x comes from either one of two normally distributed classes Ω1 or Ω2 . It further “assumes” that the respective class means are b + μ and b − μ and that both variances are σ 2 = μ. To indicate that the joint probability p(x, Ω1 ) exceeds the joint probability p(x, Ω2 ), the neuron produces a positive output; otherwise, its response is negative. The absolute value of the neuron’s response always has a lower bound of 0 and an upper bound of 1 and can be understood as a degree of belief in its decision. Acknowledgments. In parts, the authors of this work were supported by the Fraunhofer Research Center for Machine Learning (RCML) within the Fraunhofer Cluster of Excellence Cognitive Internet Technologies (CCIT) and by the Competence Center for Machine Learning Rhine Ruhr (ML2R) which is funded by the Federal Ministry of Education and Research of Germany (grant no. 01—S18038A). We gratefully acknowledges this support.
References 1. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1996) 2. Haykin, S.: Neural Networks and Learning Machines: A Comprehensive Foundation. Prentice Hall, New York (2008) 3. MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003) 4. Ramamurthy, R., Stenzel, R., Sifa, R., Ladi, A., Bauckhage, C.: Echo state networks for named entity recognition. In: International Conference on Artificial Neural Networks. Springer (2019) 5. Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (1996) 6. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995) 7. Sifa, R., Bauckhage, C.: Archetypical motion: supervised game behavior learning with archetypal analysis. In: Proceedings of the IEEE CIG (2013)
Solving Jigsaw Puzzles Using Variational Autoencoders Mostafa Korashy1(B) , Islam A. T. F. Taj-Eddin1 , Mahmoud Elsaadany2 , and Shoukry I. Shams3 1
Faculty of Computers and Information, Assiut University, Assiut, Egypt [email protected] 2 ´ ´ Department of Electrical Engineering, Ecole de technologie sup´erieure (ETS), Montr´eal, Canada 3 Department of Electrical and Computer Engineering, Concordia University, Montr´eal, Canada
Abstract. Machine learning has recently occupied a remarkable position due to the ability of engagement in various systems and applications. As a result, a significant effort has been directed to enhance the existing techniques and present a critical assessment of applying these techniques in different applications. However, there is a wide room of improvements, especially for solving common problems using simpler architectures such as variational autoencoders. The simplicity of variational autoencoders makes it suitable for wide range of applications and systems. Among these applications is solving Jigsaw problem, which is based on reconstructing images from shuffled image tiles. Many articles addressed the Jigsaw problem and proposed different solutions, however the presented techniques suffer from high complexity and long training time. In this work, we explore the use of variational autoencoders to learn high level features to reconstruct images from shuffled image tiles. We also explore the use of the learnt features in transfer learning to adapt the trained model to other tasks such as classification or detection. To the best of the authors knowledge, this type of problems has never been addressed using variational autoencoders technique before. We obtained around the bar results with the more complex CNN-based models. Keywords: Autoencoders learning
1
· Deep learning · Jigsaw puzzles · Transfer
Introduction
Transfer learning refers to the process of increasing the generalization of a model trained in certain setting (i.e., distribution P 1) in another setting (i.e., distribution P 2) [1]. Transfer learning has many advantages such as higher slope, higher start, or higher asymptote. The higher slope means that the rate of learning skills is steeper compared to other techniques, while the higher starts refers to the initial skills on a source model are more effective. In addition, the higher c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 708–712, 2021. https://doi.org/10.1007/978-3-030-55190-2_57
VAE for Jigsaw Puzzles
709
asymptotes indicates that the converged skills of the trained model are better than it otherwise would be [2]. The idea behind transfer learning is to learn an architecture to solve simple task. After that, the model can be trained to more sophisticated tasks. Typically, the simple task is an unsupervised task that does not require dataset labeling since dataset annotation is a time consuming complex task in its own. Reconstructing images from their shuffled tiles (solving Jigsaw puzzle) is one of these unsupervised tasks that can be used to pre-train the model before training it to the required task. A variety of solutions have been proposed to solve the Jigsaw problem using complex CNN-based models [3,4]. However, such models suffer from high complexity. In this context, we propose an efficient solution to the problem, yet simple, using variational autoencoders (VAE). Autoencoders are gaining a lot of attention in the field of deep learning. The authors in [1] classified autoencoders to two categories: undercomplete and overcomplete autoencoders. Undercomplete autoencoders restrict the dimension of the latent representation to be smaller than the input dimension. This restriction assures that the encoder learns the most salient features that helps the decoder to reconstruct the original input. While in regularized overcomplete category, instead of restricting the dimension of the latent representation, they allow the latent code to have larger, or the same, dimension as the input. However, they add a restriction that prevent the autoencoder from learning the identity function (copying the input to the output). This restriction can be sparsity in the latent code (sparse autoencoder), reconstructing clear data from noisy input (denoising autoencoders) or any other restriction. Our model lies in the regularized overcomplete category and the restriction, in this case, is to reconstruct clear data from shuffled tiles of the input. The contributions of this work are as follows: 1) introducing the first work to solve Jigsaw puzzle using variatinoal autoencoders. VAE are more simple to use with fewer parameters than traditional CNN-based architectures; 2) exploring the importance of the learnt features in supervised tasks (i.e. Classification). We proved that using VAE to solve Jigsaw puzzles can lead to faster convergence and higher accuracy than training random weights.
2
Proposed Model
We propose using variational autoencoder [5] to reconstruct images from shuffled image tiles. VAE is an autoencoder architecture that learns a distribution instead of learning a scalar value for each dimension in the latent space. Learning a distribution for each latent dimension gives us the ability to sample a value for each latent dimension from its corresponding distribution and reconstruct the input by passing these values to the decoder. This way, we can use the decoder part of the autoencoder as a generative model. To be able to reconstruct the original data, the VAE model have to learn high level features from the input data instead of going to the specific details in it. In the proposed model, we maximize the variational lower bound for an observation x which is given by:
710
M. Korashy et al.
Fig. 1. Learned 2D manifold of MNIST (a) and fashion-MNIST (b) datasets.
L(x) = Ez∼q(z|x) logPmodel (z, x) + H(q(z|x))
(1)
where the first term is the joint log-likelihood of the observations and the associated latent code under certain posterior approximation q(z—x) and H(q(z|x)) is the entropy of the posterior approximation. This equation can be reduced to: L(x) = Ez∼q(z|x) logPmodel (x|z) − DKL (q(z|x)||Pmodel (z))
(2)
In Eq. 2, the construction loss is represented by the first term in the equation, while the KL divergence term between the model prior distribution Pmodel (z) and the approximate posterior q(z|x) ensures that the two distributions are similar enough to each other. Note that the value of Eq. 2 is less than or equal to the log likelihood of the prior distribution log(Pmodel (x)). So that, the value of Eq. 2 is termed Evidence Lower BOund (ELBO). To backpropagate the error terms through stochastic nodes, we employed the reparametrization trick or perturbation analysis [5]. In this trick, instead of sampling from the distribution parameterized by the mean μ and variance σ outputed from the encoder Pφ (z|x), we sample a value from a zero mean unit variance Gaussian distribution. The sampled value then multiplied by the variance σ and added to the mean μ. This means defining y to be: y ∼ N (0, 1) and hence, the value of the latent vector can be defined as: Z = μ + σy. In this case, we can easily backpropagate the error terms through the graph nodes.
3
Results and Discussion
In this section, we present the results of solving Jigsaw puzzle using VAE and using the learnt features in transfer learning.
VAE for Jigsaw Puzzles
711
Fig. 2. Reconstructing images of MNIST (a) and Fashion MNIST (b) datasets from shuffled image tiles. We can see the original images in the first row of each subfigure. The second row shows the randomly shuffled image tiles that used as input to the model while the bottom row shows the reconstruction result.
Results of Solving Jigsaw Puzzle Using VAE: We used VAE architecture to learn high level features of the data, including the spatial information of the image tiles. We used two datasets (MNIST and Fashion-MNIST) to train the model. Figure 2 shows the reconstruction results in each dataset. We randomly shuffled the image tiles before passing them to the encoder so that the encoder does not learn a fixed mapping between the blocks. The results show that the VAE model is able to reconstruct the images from image tiles. Figure 1 shows that the model is capable of learning a good manifold for each dataset. Results of Transferred Learning: We showed that the VAE model is capable of learning high-level features and spatial information in order to solve jigsaw puzzle. Here, we show the results of transferring these learnt features to solve a different supervised tasks (classification in this case). In this experiment, we used an MLP classification architecture with random weights and another model (the same architecture and capacity) with trained weights using our proposed unsupervised representation learning. Figure 3 (a) shows the results for a classification model trained to classify EMNIST Digits dataset [6]. We can see from Fig. 3 (a) that the model with transferred features achieved higher start in the accuracy. It also achieved higher accuracy (in the same number epochs) than the model with random weights. Another result that emphasis the effectiveness of the proposed technique is shown in Fig. 3 (b). In this case, we built a
712
M. Korashy et al.
classification model for MNIST dataset with and without the learned features. As it has been shown before, the model with the learned features achieved higher start than the model with random weights.
Fig. 3. The accuracy of a classification model for EMNIST (Digits) dataset (a) and MNIST dataset (b) using transferred versus random weights.
4
Conclusion
In this work, we presented a novel solution for learning high level representation by solving Jigsaw puzzle using variational autoencoders. The ability of the proposed technique has been validated based on different datasets. The obtained results show that variational autoencoders are able to solve the Jigsaw problem with near to state-of-the-art accuracy. The learnt features are shown to be useful in adapting the model to other supervised tasks such as classification.
References 1. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 2. Olivas, E.S., Guerrero, J.D.M., Sober, M.M., Benedito, J.R.M., Lopez, A.J.S.: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques. IGI Publishing (2009) 3. Kim, D., et al.: Learning image representations by completing damaged jigsaw puzzles. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2018) 4. Gallagher, A.C.: Jigsaw puzzles with pieces of unknown orientation. In: Computer Vision and Pattern Recognition (CVPR) (2012) 5. Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes arXiv preprint arxiv:1312.6114 (2013) 6. Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: an extension of MNIST to handwritten letters, arXiv preprint arXiv:1702.05373 (2017)
Followers of School Shooting Online Communities in Russia: Age, Gender, Anonymity and Regulations Anastasia Peshkovskaya1,2(B) , Yuliya Mundrievskaya1 , Galina Serbina1 , Valeria Matsuta3 , Vyacheslav Goiko1 , and Artem Feshchenko1 1 Tomsk State University, Lenin Avenue, 36, Tomsk, Russian Federation
[email protected] 2 Mental Health Research Institute, Tomsk National Research Medical Center of the
Russian Academy of Sciences, Tomsk, Russia 3 Department of Psychology, Tomsk State University, Tomsk, Russian Federation
Abstract. Today, for about five online communities promoting aggressive ideology are registered daily in the largest Russian social network Vkontakte. These youth communities advocate aggression and suicide. Community administrators are gradually forming a large group of young people for whom aggression is normal. Although there are a lot of studies on school shooting phenomenon, most of them are focused on media issues, the psychology of shooters, and a legal characteristics Current study was aimed to identify the real audience (followers) of school shooting online communities in the social network and their gender, age and social characteristics. Over the course of three months, we collected and analyzed data from 9 online communities. We found that school shooting community followers were mostly males aged 15 to 22. We suggested this age and gender group of young people the most vulnerable in front of the destructive information influence in social media. We also noted that the policy of banning school shooting communities for security reasons is effective as a barrier for spreading dangerous ideas among youth. Nevertheless, the rules and regulations including banning also make specific barriers for researchers, who investigate community content, behavior and characteristics of their followers. Keywords: School shooting · Social network · Online social network · Aggression · Suicide · Behavior
1 Introduction Internet is a powerful tool for influencing certain behavior [1, 2]. Today, it becomes more and more responsible for deviant behavior, aggression and suicide [3, 4]. Importantly, Internet has a special role in school shootings since the Columbine attack [5]. Since that, Internet and social networks are used to attract followers and to promote destructive ideology to the public by uploading videos and texts [6]. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 713–716, 2021. https://doi.org/10.1007/978-3-030-55190-2_58
714
A. Peshkovskaya et al.
Today, for about five online communities promoting aggressive ideology are registered daily in the largest Russian social network Vkontakte. These online communities advocate aggression and suicide. Community administrators are gradually forming a large group of young people for whom aggression is normal. Although there are a lot of studies on school shooting phenomenon, most of them are focused on media issues, the psychology of shooters, and a legal characteristics [7– 10]. Current study was aimed to identify the real audience (followers) of school shooting online communities in the social network, and their gender, age and social characteristics.
2 Method The list of linguistic markers was developed to identify the school shooting online communities. The vocabulary included words, word combinations, abbreviations, numeric symbols related to school shooting and mass executions, as well as those reflecting the current and urgent events. On the basis of this vocabulary, we have performed the primary monitoring of online communities. We have utilized the social network analysis to identify a community. Then, the manual search in community’ subscribers was performed to identify active users and related communities. Within 2 months of observation, we have identified another school shooting communities by utilizing the ‘snowball effect’. For research purposes, the methodology of social network analysis was used. We have used the ‘friendly links’ of groups’ members as a data to study the structure of the school shooting communities.
3 Results Over the course of three months, we collected and analyzed activity data from nine online school shooting communities. In this article, we replaced community names with numbers. Description of the communities is presented in the table below (Table 1). The smallest community had 29 followers. The largest one estimated 958 followers. The total number of identified school shooting online community members was 1 561. As expected, men were presented in the majority. Importantly, for 2 months of analysis, 2 out of 9 communities were blocked according the social network policy. Next, we analyzed the age of the school shooting community followers (Table 2). Here, we registered a certain tendency towards anonymity - most of the followers hide their age. Nevertheless, we found that young people aged 19 to 22 years old and 15 to 18 years old were main share of the community followers. We also found that in most cases community followers tried to maintain their anonymity. They hide their city and country of residence. In numerous cases, followers identified themselves with the Columbine shooters and indicated Columbine as their hometown. In addition, a number of followers indicated the city of Blagoveshchensk, Russia as a city of their residence. One of the widely known school shooting attack took place there in 2019.
Followers of School Shooting Online Communities in Russia
715
Table 1. Description of the online school shooting community followers Social network community
Members, total
Male, %
Female, %
Status
#1
958
48
52
Banned
#2
140
61
39
Active
#3
138
27
63
Banned
#4
122
66
34
Active
#5
60
50
50
Active
#6
59
58
42
Active
#7
54
74
26
Active
#8
29
60
40
Active
#9
1
No data
No data
Active
Total
1561
Table 2. Distribution of community followers by age (%) Community #1 #2 #3 #4 #5 #6 #7 #8 #9 No data 15–18 19–22 23–26
52 74 61 66 75 64 65 63 100 3 12 19
6 –
12 11 12 17 15 17 17
8
9 79 11
6 –
4
1
2
1
3
2
0
6 –
27–30
8
1
0
2
2
2
0
7 –
30 and older
16
4
5
3
2
3
0 12 –
4 Discussion Internet plays a very special role in school shootings. School shooters are inspired by previous attacks, as well as by shooters and their representation in media, their personal blogs, videos and texts publicly accessible online. School shooting online communities used wide range of tactics to survive and attract a larger audience [11]. We believe it is strongly necessary to outline network characteristics of such communities and to describe their typical target audience. This study was aimed to identify the real followers of school shooting online communities in the largest Russian social network. Despite the strong trend of anonymity, we found that school shooting community followers were mostly males aged 15 to 22 from Russia. We suggested this age and gender group of young people the most vulnerable in front of the destructive information influence in social media. We also should note that the policy of banning school shooting communities for security reasons is effective as a barrier for spreading dangerous ideas among youth.
716
A. Peshkovskaya et al.
Nevertheless, the rules and regulations including banning also make specific barriers for researchers, who investigate community content, behavior and characteristics of their followers. Acknowledgment. The study was funded by the Russian Science Foundation, Project 19-7810122 “Development of an algorithm for identifying risk factors for the safety of social networks users based on an analysis of the content and psychological characteristics of its consumers”.
References 1. Peshkovskaya, A., Babkina, T., Myagkov, M.: Social context reveals gender differences in cooperative behavior. J. Bioecon. 20(2), 213–225 (2018). https://doi.org/10.1007/s10818-0189271-5 2. Peshkovskaya, A., Babkina, T., Myagkov, M.: Gender effects and cooperation in collective action: a laboratory experiment. Ration. Soc. 31(3), 337–353 (2019). https://doi.org/10.1177/ 1043463119858788 3. Bondü, R., Scheithauer, H.: Leaking and death-threats by students: a study in German schools. Sch. Psychol. Int. 35(6), 592–608 (2014). https://doi.org/10.1177/0143034314552346 4. Lee, K.J.: Social media’s influence on frequency of incidents. Elon J. Undergr. Res. Commun. 9(2), 28–35 (2018) 5. Wright, B.L.: Don’t fear the nobodies: a critical youth study of the Columbiner Instagram Community (2019) 6. Holt, T.J., Freilich, J.D., Chermak, S.M.: Internet-based radicalization as enculturation to violent deviant subcultures. Deviant Behav. 38, 855–869 (2016). https://doi.org/10.1080/016 39625.2016.1197704 7. Gerard, F., Whitfield, K., Porter, L., Browne, K.: Offender and offence characteristics of school shooting incidents. J. Invest. Psychol. Offender Profiling 13, 22–38 (2015). https://doi.org/10. 1002/jip.1439 8. Lankford, A., Adkins, K.G., Madfis, E.: Are the deadliest mass shootings preventable? An assessment of leakage, information reported to law enforcement, and firearms acquisition prior to attacks in the united states. J. Contemp. Crim. Justice 35(3), 315–341 (2019). https://doi.org/ 10.1177/1043986219840231 9. Bondü, R., Scheithauer, H.: Media consumption in German school shooters. In: Muschert, G.W., Sumiala, J. (eds.) School Shootings, Mediatized Violence in a Global Age, pp. 69–89. Emerald Group, Bingley (2012) 10. Kiilakoski, T., Oksanen, A.: Soundtrack of the school shootings, cultural script, music and male rage. Young 19, 247–269 (2011) 11. Raitanen, J., Oksanen, A.: Global online subculture surrounding school shootings. Am. Behav. Sci. 62(2), 195–209 (2018). https://doi.org/10.1177/0002764218755835
Discrimination of Chronic Liver Disease in Non-contrast CT Images using CNN-Deep Learning Tae-Hoon Kim1,2(B) , Si-Hyeong Noh1 , Chang-Won Jeong1,2(B) , ChungSub Lee1 , Ji Eon Kim1 , SeungJin Kim1 , and Kwon-Ha Yoon2,3 1 Medical Convergence Research Center, Wonkwang University, Iksan, Republic of Korea
{tae_hoonkim,mediblue}@wku.ac.kr 2 Wonkwang University Hospital, Smart Health IT Center, Iksan, Republic of Korea 3 Department of Radiology, Wonkwang University School of Medicine, Wonkwang University
Hospital, Iksan, Republic of Korea
Abstract. Recently, there have been considerable efforts to develop non-invasive image-based method for diagnosis, staging, and monitoring of chronic liver disease (CLD). This study developed a deep learning (DL) algorithm for discriminating the CLD using non-contrast abdominal CT images. This study enrolled 499 patients with CLD and 122 healthy controls. The main structure of DL algorithm used GoogLeNet-V3 and several modules fine-tuned for this study. In the test data sets, the DL algorithm had a discrimination accuracy of 99.4% (loss is about 0.7%) and an AUROC of 0.998, for diagnosis between normal controls and CLD patients. In the validation test, we achieve good validation result as follows: specificity (for healthy controls) 0.98292 (error rate: 0.01708) and sensitivity (for CLD) 0.99469 (error rate: 0.00531), respectively. Our deep learning algorithm would be useful for accurate discrimination in CLD from the abdominal CT images without the use of contrast agent. Further study is needed to diagnose the disease severity within CLD patient group for clinical application. Keywords: Chronic liver disease (CLD) · Deep Learning (DL) · Diagnostic accuracy
1 Introduction Chronic liver disease (CLD) is clinically ‘silent liver disease’ and most patients with CLD are asymptomatic until development of cirrhosis and hepatic decompensation. CLD is defined as ongoing inflammation in the liver for at least 6 months. Chronic inflammation leads to potentially reversible liver fibrosis and ends in irreversible cirrhosis [1]. CLD encompass many different causes, including mainly viral infections, nonalcoholic fatty liver disease, alcohol abuse, primary sclerosing cholangitis, primary hemochromatosis, and autoimmune disease. Recent study reported that CLD can lead to hepatic fibrosis, cirrhosis, end-stage liver disease, portal hypertension, and hepatocellular carcinoma (HCC) and constitute an important cause of morbidity, mortality, and health care costs in © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 717–722, 2021. https://doi.org/10.1007/978-3-030-55190-2_59
718
T.-H. Kim et al.
the US [2]. Viral hepatitis C infection accounts for approximately 40% of all CLD, results in an estimated 8,000–10,000 deaths annually, and is the most frequent indication for liver transplantation [3]. Thus, the precise detection of CLD is critical in clinical management because liver transplantation constitutes the only curative therapy for decompensated liver cirrhosis.
2 Background/Significance Liver biopsy is the reference standard for diagnosing and staging the fibrosis and cirrhosis as hallmarks of CLD, however the method has some limitations, including sampling errors, low patient acceptance and complications such as pain, bleeding, infection and rarely death [4]. Hence, there have been considerable efforts to develop noninvasive imaging techniques and quantification programs for diagnosis, staging, and monitoring of CLD. Recently, several radiological imaging methods [4–6], such as computed tomography (CT), contrast enhanced CT, magnetic resonance imaging (MRI), diffusionweighted MR imaging (DWI) and MR elastography (MRE) have been introduced for assessing the clinical indication of in CLD. Among them, CT is a frequently used diagnostic tool in modern medicine. Clinically occult liver fibrosis (progressive stage in CLD) as an auxiliary finding in routine abdominal CT scans is underdiagnosed. Even liver cirrhosis (end stage) has a moderate sensitivity (77.1%–84.3%) and specificity (52.9%– 67.6%) in CT [7]. It is still limited to clearly differentiate the imaging patterns from the CT images according to disease progression or disease severity in CLD. Therefore, more accurate method to detect the CLD in CT scans could bring forward the precision diagnosis and anticipate the disease development before its clinical appearance. Up to now, deep learning (DL) has got great interest in medical image analysis such as classification, detection, segmentation and diagnosis. Various types of deep learning algorithms are in use in research like convolutional neural networks (CNN), deep neural network (DNN), deep belief network (DBN) and so on. The CNN model is getting a lot interest in digital imaging processing and vision. There are different types of architectures of CNN such as, Alexnet, Lenet, faster R-CNN, GoogleNet, ResNet, VGGNet, ZFnet, etc. Recent deep learning algorithms frequently applied to ascertain various unsolved issues in clinical, and here we used a CNN architecture for image-based discrimination in CLD. Therefore, the aim of this study was to develop a DL algorithm for discriminating the CLD by using non-contrast abdominal CT images.
3 Method 3.1 Study Population The study protocol was approved to a retrospective research by the institutional review board (IRB) of Wonkwang University Hospital. CT image data set was obtained from January 2015 to December 2018 in format of observational health data science and informatics (OHDSI) common data model (CDM; ver. 5.3). A total of 621 subjects consisting of 499 CLD patients and 122 healthy controls were enrolled for this study.
Discrimination of Chronic Liver Disease
719
The inclusion criteria of CLD patients were as follows: i) elevated liver function enzyme more than 6 months; ii) patients who had liver cirrhosis; iii) patients who were scheduled to do a liver biopsy due to focal liver lesion; iv) patients who were diagnosed with HCC according to American Association for the Study of Liver Diseases (AASLD) and who had a surgery; and v) living donors for liver transplantation. Table 1 listed the detailed etiology of included study subjects. Table 1. Etiology of chronic liver disease defined from serologic test data Cause
Chronic liver disease (No. of pts) Healthy control
Normal
-
122
Hepatitis B
168
-
Hepatitis C
25
-
Coinfection
106
-
Alcohol
75
-
NAFLD
60
-
Autoimmune 65
-
NAFLD: non-alcoholic fatty liver disease
3.2 Data Set and Image Diagnosis for Chronic Liver Disease CLD data sets were created abdominal CT images including CDM electrical health record (EHR). Eligible non-contrast CT image data sets were 33,650 images (n = 499) for CLD and 33,565 images (n = 122) for healthy control, respectively. Final data sets in both CLD and healthy control were blindly evaluated by two expert radiologists (with more than 10 years of experience). They had no knowledge of the clinical outcome and access to the readings of the other. Diagnostic accuracy in each group was assessed a percent value as follows: (diagnosed patient number)/(the number of total patients) × 100. 3.3 Deep Learning Algorithm for Discriminating CLD Main deep learning architecture was used original GoogLeNet-V3 (Fig. 1A) and advanced DL algorithm fine-tuned for this study (as shown in Fig. 1B). The covariates (age group, gender, index year) and procedures combined with latent feature vector extracted from pre-contrast abdominal CT. The diagnostic performance of the DL algorithm was evaluated in separate test data sets for 6,665 images (randomly 10% from total 67,215 images). The influence of patient characteristics and CT techniques on the diagnostic accuracy of the DL was evaluated. In a validation subset of 6,665 images (totally 10%), the diagnostic performance of the DL algorithm was compared with that of the test data assessment.
720
T.-H. Kim et al.
Fig. 1. Architecture of original GoogleNet inception-V3 model (A) and the structure of modified deep-learning algorithm (B) to differentiate CLD patient from healthy control in clinical.
4 Results Diagnostic accuracy (%) by two radiologists were 99.50 ± 0.14 for CLD and 99.59 ± 0.58 for healthy control, respectively. Diagnostic accuracy using original GoogleNetV3 was 53.0% (loss is about 47.0%). Thus, the original DL model for discrimination of CLD is required to modify several modules for clinical application. Fig. 2 and 3 show the feature maps of deep learning and the graphs of discrimination accuracy (%) (Fig. 3). In the test data sets, the DL algorithm had a diagnostic accuracy of 99.4% (loss is about 0.7%) and an AUROC of 0.998, for distinction between normal controls and CLD patients. Therefore, diagnostic accuracy using our DL algorithm showed good discrimination rate (>99%) that is equivalent to radiologists. In the validation test, we achieve good validation result as follows: specificity (for healthy controls) 0.98292 (error rate: 0.01708) and sensitivity (for CLD) 0.99469 (error rate: 0.00531), respectively.
Fig. 2. Representative CT images (A: healthy control; B: chronic liver disease), pooling image and feature maps with Inception 3a.
Discrimination of Chronic Liver Disease
721
Fig. 3. Discrimination accuracy and loss curve using modified DL algorithm.
5 Conclusion Our deep learning system allows for accurate discrimination in CLD by using routine abdominal CT images without the use of contrast agent. Our datasets and deep learning algorithm can be applied to implement the follow-up study for management of CLD.
6 Future Work Future study is needed to clarify large population and multicentric studies for generalization. Furthermore, it will be performed to differentiate the disease severity or the disease progression within CLD patient group. Acknowledgments. This study was supported by the grants of the National Research Foundation of Korea (NRF) (2016M3A9A7918501) and the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare (HI18C1216). We appreciated the clinical support of Smart Health IT center at Wonkwang University Hospital.
References 1. Ebel, N.H., Horslen, S.P.: Complications and management of chronic liver disease (ch. 21). In: Kelly, D.A. (ed.) Diseases of the Liver and Biliary System in Children, 4th edn. (2017)
722
T.-H. Kim et al.
2. Afdhal, N.H., Nunes, D.: Evaluation of liver fibrosis: a concise review. Am. J. Gastroenterol. 99, 1160–1174 (2004) 3. Lauer, G.M., Walker, B.D.: Hepatitis C virus infection. N. Engl. J. Med. 345(1), 41–52 (2001) 4. Taouli, B., Ehman, R.L., Reeder, S.B.: Advanced MRI methods for assessment of chronic liver disease. AJR Am. J. Roentgenol. 193(1), 14–27 (2009) 5. Yeom, S.K., Lee, C.H., Cha, S.H., Park, C.M.: Prediction of liver cirrhosis, using diagnostic imaging tools World J. Hepatol. 7(17), 2069–2079 (2015) 6. Babu, A.S., Wells, M.L., Teytelboym, O.M., Mackey, J.E., Miller, F.H., Yeh, B.M., Ehman, R.L., Venkatesh, S.K.: Elastography in chronic liver disease: modalities, techniques, limitations, and future directions. Radiographics 36(7), 1987–2006 (2016) 7. Kudo, M., Zheng, R.Q., Kim, S.R., Okabe, Y., Osaki, Y., Iijima, H., Itani, T., Kasugai, H., Kanematsu, M., Ito, K., Usuki, N., Shimamatsu, K., Kage, M., Kojiro, M.: Diagnostic accuracy of imaging for liver cirrhosis compared to histologically proven liver cirrhosis. Intervirology 51(1), 17–26 (2008)
Analysis and Classification of Urinary Stones Using Deep Learning Algorithm: A Clinical Application of Radiology-Common Data Model (R-CDM) Data Set Si-Hyeong Noh1 , SeungJin Kim1 , Ji Eon Kim1 , Chung-Sub Lee1 , Seng Chan You5 , Tae-Hoon Kim1,7 , Yun Oh Lee4,7 , Ilseok Chae4,7 , Rae Woong Park5 , Sung Bin Park6 , Kwon-Ha Yoon2,3,7 , and Chang-Won Jeong1,7(B) 1 Medical Convergence Research Center, Wonkwang University, Iksan, Republic of Korea
{nosij123,mediblue}@wku.ac.kr 2 Department of Radiology, Wonkwang University School of Medicine,
Iksan, Republic of Korea 3 Department of Radiology, ChungAng University Hospital, Iksan, Republic of Korea 4 Computing and Information Team, Wonkwang University Hospital, Iksan, Republic of Korea 5 Department of Biomedical Informatics, Ajou University School of Medicine,
Suwon, Republic of Korea 6 Department of Radiology, ChungAng University Hospital, Seoul, Republic of Korea 7 Smart Health IT Center, Wonkwang University Hospital, Iksan, Republic of Korea
Abstract. Recent advances in artificial intelligence have been used for a variety of clinical interests, such as lesion detection, segmentation, classification, and differentiation in medical images, and a great deal of research is underway. A good algorithm is also needed to obtain meaningful artificial intelligence research results, but a correctly created data set is also important. To provide these data sets, we proposed the radiology-common data model (R-CDM), as an expansion of OMOP-CDM. With the use of R-CDM, this study created a data set to distinguish urinary diseases for 873 patients and showed significance using an artificial intelligence algorithm that modified GoogleNet, a deep-learning algorithm based on CNN. In this experiment, 99% accuracy and AUROC 0.9 were obtained. External validation was performed using urinary stones data sets for 200 patients at other hospitals to increase confidence in the results of deep learning. In external validation, the results of 90% sensitivity, 97% specificity, and AUROC 0.984 confirm that the data sets and algorithms have very high efficiency. The study will help researchers who perform artificial intelligence through medical imaging. Keywords: Deep learning · Machine learning · Medical image · Common data model
1 Background Recent advances in machine learning in relation to deep learning help identify, classify and quantify patterns in medical images. At the heart of this development is the © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 723–729, 2021. https://doi.org/10.1007/978-3-030-55190-2_60
724
S.-H. Noh et al.
ability to utilize hierarchical functional expressions learned only from data, instead of those designed manually based on domain-specific knowledge. Deep learning has quickly become state-of-the-art technology, improving performance in a variety of medical applications [1]. Medical images are a very important part of a patient’s electrical health record, is widely used for disease diagnosis. Therefore, the development of medical image analysis software based on artificial intelligence technology is increasing [2]. In recent clinical studies, researches that have employed machine learning techniques on medical images have been actively conducted. Although medical images are stored in compliance with the DICOM international standard, different standards are used for each institution, similar to the clinical data used in the CDM standard. The selection of the optimized clinical protocol for each disease and medical information stored in key medical images should, therefore, be standardized and stored respectively [3, 4]. Therefore, we suggested the common data model in the radiology field. To create a new R_CDM, we have designed and implement based on the OMOP CDM of the OHDSI (Observational Health Data Sciences and Informatics). In this paper, we introduce R-CDM, a new medical image standard for machine learning. This medical image standard information has a differentiation of the medical image information stored inside PACS by the FAIR guideline principle. We used a training data set of NECT (nonenhanced CT) images of 873 patients, and validation data set of 200 patients. The purpose of this study was to develop and validate the deep learning algorithm for the detection of urinary stone by using the CT images based on R-CDM.
2 Web Environment for Radiology CDM 2.1 R_CDM Structure Figure 1 is a conceptual diagram of the OMOP_CDM with the R_CDM and is configured to be connected via the primary id of the existing OMOP_CDM table. The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format. It is associated with tables such as Device_Exposure, condition_accurance, notes, etc. on OMOP_CDM. It can connect to other tables via the person table in OMOP-CDM table. These connections allow the creation of a data set for deep learning by providing conditions such as conditions or conditions for equipment. These generated data sets are expected to greatly help improve the accuracy of deep learning by meeting the exact conditions.
Analysis and Classification of Urinary Stones
725
Fig. 1. Conceptual diagram of the OMOP_CDM with the R_CDM
2.2 Web Environment R_CDM Figure 2 provide custom datasets based on the user’s needs, the download type can be set to basic or plane phase modes according to the user-specified configuration. In addition, the data can be downloaded in DICOM, PNG, EXCEL, and NIFTI formats, according to the machine learning data type, depending on the purpose of the researcher. By enabling the software to create custom datasets, the cost of classifying data and datasets for machine learning can be greatly reduced, and the integrity of datasets for machine learning can be enhanced.
726
S.-H. Noh et al.
Fig. 2. Web environment R_CDM (Up), Data download page (Down)
3 Image Data Set and Applied Machine Learning 3.1 Deep Learning for Classification of Urinary Stone For deep learning, a pre-trained GoogleNet CNN network was used for training and testing purposes. This environment was Caffe in NVIDIA DIGITS. Figure 3 shows the graph results of deep learning using an anonymous and downloaded DICOM file converted to PNG. The data set was organized with 6100 urinary stone images and 6123 normal images. The data ratio for training, testing and validation were set at 9:1:1. The data set also has a ratio of axial and coronal images around 9:1. We performed a deep learning algorithm for training through a dataset generated by R_CDM. According to the execution of results, it gets about 99% accuracy was obtained. Figure 4 shows a graph of the accuracy and loss derived through deep learning and the ROC curve produced through it. Accuracy was about 99%, and loss was extremely variable but ultimately low by 0.1%. The ROC excluded the results with an accuracy of not more than 75% and then calculated and obtained the results, and the AUROC obtained a significant value of 0.9.
Analysis and Classification of Urinary Stones
727
Fig. 3. (a) Normal data set image, (b) urinary stone data set image, (c) urinary stone data set count, (d) normal image data set count.
Fig. 4. ROC curve on the results of deep learning performance
3.2 External Validation Figure 5 shows the results of external validation execute with 200 urinary patients data set from other hospitals. Validation was carried out through 818 of stone image and 801 of the normal image.
728
S.-H. Noh et al.
Fig. 5. External validation result
The data of axial and coronal images were made at 9:1. And also the test was conducted with the stone image as positive and the normal image as negative. As result, positively recognized 742 of 818 as stone images, and neutral recognized 779 of 801 as normal images, this had a high sensitivity of 90.70, a specificity of 97.25. The accuracy achieved by our deep learning algorithm was 0.93947 according to AUROC values in Fig. 6.
Fig. 6. ROC curve on the results of deep learning performance
4 Conclusion We proposed R_CDM, an expandable CDM based on OMOP_CDM, for efficient use in deep learning. In this paper, we show usefulness through urinary stone deep learning research that uses data set generated by R_CDM. This R_CDM provides standardized data sets for clinical trials and artificial intelligence research. This will help medical imaging researchers produce more meaningful research results.
Analysis and Classification of Urinary Stones
729
Acknowledgments. This study was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare (HI18C1216) and the Technology Innovation Program (or Industrial Strategic Technology Development Program(20001234) and the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No.2018R1D1A1B07048833).
References 1. Ker, J., Wang, L., Rao, J., Lim, T.: Deep learning applications in medical image analysis. IEEE Access 6, 9375–9389 (2017) 2. Shen, D., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19, 221–248 (2017) 3. Hripcsak, G., Duke, J.D., Shah, N.H., Reich, C.G., Huser, V., Schuemie, M.J., Suchard, M.A., Park, RW., Wong, ICK., Rijnbeek, P.R., Lei, J., Pratt, N., Norén, G.N., Li, Y-C., Stang, P.E., Madigan, D., Ryan, P.B.: Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015) 4. You, S.C., Lee, S., Cho, S.Y., Park, H., Jung, S., Cho, J., Yoon, D., Park, R.W.: Conversion of National Health Insurance Service-National Sample Cohort (NHIS-NSC) Database into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM). Stud. Health Technol. Inform. 245, 467–470 (2017)
Intelligent Monitoring Using Hazard Identification Technique and Multi-sensor Data Fusion for Crude Distillation Column Peter Omoarebun1 , David Sanders1 , Favour Ikwan1 , Mohamed Hassan2 , Malik Haddad1(B) , Mohamad Thabet3 , Jake Piner4 , and Amjad Shah3 1 Mechanical and Design Engineering, Portsmouth University, Portsmouth, UK
[email protected], [email protected] 2 School of Chemical Engineering, Southampton University, Southampton, UK 3 Energy and Electronic Engineering, Portsmouth University, Portsmouth, UK 4 InTandem Systems Ltd., Watton Lane, Southampton, UK
Abstract. Hazard assessment techniques and multi-sensor fusion are used for intelligent systematic monitoring. Firstly, a hazard identification technique is considered using failure mode and effect analysis and advantages of using a combined hazard technique is discussed. Data sources are identified considering component failures and some sensors associated with potential failure. Possible consequences in a hazardous situation are identified using failure mode and effect analysis to choose suitable safety measures. Failure mode and effect analysis is systematically considers how sequences of events can lead to accidents by looking at components and faults recorded by sensors and anomalies. Data were presented based on their threat levels using a traffic light color code system. Refineries use sensors to observe the process of crude refining and the monitoring system uses real-time data to access information provided by sensors. Understanding hazard assessments, sensor multi-fusion and sensor pattern recognition in a distillation column could help to identify trends, flag major regions of growing malfunction, model risk threat of a crude distillation column and help to systematically make decisions. The decisions could improve design regulations, eliminate anomalies, improve monitoring and reduce threat levels. Keywords: Hazard · Sensors · Crude · Distillation · Sensor · Intelligent · Monitoring
1 Introduction In gas and oil industries, sensors are used to constantly monitor situations to avoid a likelihood of disasters. A refinery operates in a closed uniform environment that contains some mixture of varying pressure and temperature conditions. This could result in catastrophic events if not monitored prudently. The monitoring of numerous sensors in a refinery could be time-consuming and labour intensive even with periodic maintenance, faults might not be detected during maintenance [1–3]. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 730–741, 2021. https://doi.org/10.1007/978-3-030-55190-2_61
Intelligent Monitoring Using Hazard Identification Technique
731
Assessing the safety of a complex refinery involves the handling of historic data, fuzzy risk parameters and incomplete models of risk. A quantitative or qualitative approach to safety could be used but to achieve that assessment, it may be possible to categorize risk elements that might cause events leading to systems failure [4]. Analysing risk involves a logical reasoning of cause and effect [5]. Most models for analyzing risk use quantitative approaches that is a mathematical approach to quantifying risk in terms of consequences and their likelihood of occurring [6]. Risk evaluation is important in studying risk of hazards. Durrant in [7] emphasized a need for machine learning and data mining techniques to extract knowledge/information from gas and oil industry data. Data mining applications have been described in [8–11]. Detecting faults within refinery sensors is important because: • It helps with making decisions about whether production processes should be halted before situations become critical, even if significant information may be missing. • It can help operators to decide on maintenance needs. • Detecting repeated similar faults could provide information about sensor quality and could lead to improved design.
2 Hazard Identification Techniques Some commonly used procedures for identifying hazards, and for the evaluation and analysis of risk are described here. 2.1 Failure Mode Effect and Analysis Failure Mode Effect and Analysis (FMEA) is an effective systematic technique for accessing and organizing possible failures in a design or process by itemizing a process and then assigning a risk priority to each item (Table 1). FMEA has been used for reliability and safety assessment of system components [12]. It prioritizes failure against severity, detectability and occurrences (Consequences): • Severity – How serious is the consequence of failure. • Detectability – How difficult is it to detect failure. • Occurrences – How often a failure occurs. In the work described in this paper, FMEA was used with other safety techniques. 2.2 Fault Tree Analysis A Fault Tree is a diagram of the interrelationships between causes or modes of failure that could precede undesired events. Fault Tree Analysis (FTA) identifies risk and can determine probability or the level of risk that an undesirable event might occur. That analysis involves sequencing failure interrelationships [13]. FTA was applied to a section of a crude distillation column and the basic events were identified in Fig. 1. Figure 1 shows the causes that led to the event shown.
P. Omoarebun et al.
Leakage
Failure Mode(s)
Breakage, Cracks, corrosion,
Effects of failure
Leak
6
Causes of failure
Overpre ssure
7
Design Control
Examine burst, Validation of the pressure cycle
Detectability (D)
Function
Occurrence (O)
Table 1. Failure Mode Effect and Analysis (FMEA)
Severity (S)
732
2
RPN
84
Input each failure mode and potential consequence(s) Severity – Severity rate of each failure using a scale of 1-10. (10 = as most severe).
Occurrence – Input potential cause(s) and likelihood of each failure using a scale between 1 -10. (10 = most likely).
Action
Test included in the prototype and production validation
Action taken/ Recommendations Risk Priority Number – Combination of weighted Severity, Occurrences and Detectability. RPN = S × O × D
Detectability – Examine the current design, then rate the Detectability using a scale of 1 – 10 for each failure mode (10 = least detectable).
Fig. 1. Fault Tree Analysis of leak in a pipeline.
Intelligent Monitoring Using Hazard Identification Technique
733
2.3 Event Tree Analysis Event Tree Analysis (ETA) is technique for assessing safety, especially after an abnormal function or accident. ETA is an inductive reasoning process. The process begins with some initiating-event and produces an illustration to analyze the effect of a set of undesirable events [14]. ETA establishes the frequency or probability of an accident. Suitable safeguarding measures can then be put in place to prevent or at least to mitigate escalation following an undesired event [14]. An initiating-event is created depending on some initiating conditions (no/yes, false/true, failure/success) and then the consequences of the event are traced within branches of the ETA [15].
3 Using Combined Hazard Techniques Bow-tie analysis can be helpful for combining hazard analysis techniques and identifying risk from both techniques. 3.1 Bowtie Analysis Bowtie analysis is a probabilistic approach based on consequences and causes of undesirable events [15]. It is a logical method that establishes relationships between cause and effect that can be used to mitigate, control and prevent accidents [16]. Bowtie analysis has been used to assess risk, safety and reliability of complicated systems. Bow-tie is a risk approach that graphically displays the relationships between hazardous events; its causes and consequences and the risk control barriers in place to stop the accident sequence (Fig. 2).
Fig. 2. Bow-tie analysis explaining the relationship between barriers, threat and consequences
734
P. Omoarebun et al.
3.2 Fault Tree and Event Tree Analysis Investigating failure modes using event tree (ETA) and fault tree (FTA) has been an effective way of identifying failure. These might include human error or concurrent component failures. The combination allowed for an analysis of cause and effect that can lead to technical breakdown or accident. The use of ETA and FTA in the new work described in this paper provided a thorough analysis of potential hazards and their potential causes. That helped improve overall performance. FTA and ETA are complimentary (and they have often been used together) but they focus on opposite sides of an undesired event. Bow-tie analysis can link both methods together and the new combination of FTA and ETA can help to determine a single undesired event that could lead to catastrophe. In reality though, multiple causes can initially lead to different events, with each of them expanding into multiple consequences. FTA focuses on preventive measures and ETA on mitigation measures [17] as shown in Fig. 3.
Fig. 3. Combined hazard techniques using FTA and ETA.
Bow-tie Analysis involved combining both deductive and inductive techniques based on an ETA and FTA. Table 1 shows a matrix table of 50 years of crude distillation tower failure, with failure modes identified, and effects identified, with possible sensor/fault detection listed. A hazard technique using FMEA was applied to the data using a colour coding system from a scale of 1–10 to identify the occurrences, detectability and severity to calculate a risk priority number. Red represented a more catastrophic state, amber represented a critical operational margin and green represented an optimal operational condition. A Risk Priority Number (RPN) formula was used. RPN = O × D × S
(1)
The higher the RPN number, the higher the risk value of the system (components). The lower the RPN the lower the likelihood of risk. Data in Table 2 shows that Abnormal operation incidents (Start-up, shutdown, commissioning) represented the highest failure
Intelligent Monitoring Using Hazard Identification Technique
735
mode in a crude distillation column, while heat integration issues accounted for the lowest number of modes of failure for the system.
4 Sensors and Sensor Fusion Sensors are devices designed to receive prompt and respond with an electrical signal and they can be modified and amplified by electronic devices. Sensors are energy converters and some quantities that could be sensed include: motion, displacement, force, strain, pressure, flow, sound, moisture, light intensity, radiation, temperature, chemical presence velocity and acceleration. Different sensors have the ability to provide information above about a component reading and also provide to interact with other sensors when put together to provide a different data reading compared to readings from an individual sensor [18]. Multi-sensor fusion as a multi-level, multifaceted process handling the automatic detection, association, correlation, estimation, and combination of data and information from several sources White [19]. Klein [20] generalized the definition, stating that data could be provided either by a single source or by multiple sources. Both definitions were general and in different fields including remote sensing. Sensor fusion data are sometimes difficult to pre-process, this requires a careful treatment of data fusion algorithms in other to avoid counter intuitive information [21]. Another problem that could arise in data fusion was calibration error that could lead to the use of data fusion successfully [10]. 4.1 Sensor Pattern Recognition Research described in this paper investigated using Artificial Intelligence (AI) techniques for sensor fusion, especially pattern recognition. Pattern recognition depended mainly on trend data and data acquired from sensor’s information. Combinations of data from multiple-sensors were considered. Pattern recognition and AI techniques can convert large amounts of real time data into smaller pieces of information. That can allow complete system analysis and clustering methods and to be used. The new proposed strategy is shown in Fig. 4. It aims to combine both the potential failure mode of a system with various sensor faults, in other to reduce the likelihood of disaster [22], and a scoring technique to analyze and identify systemic risk. The goal was to improve accuracy and reliability.
5 Machine Learning using Data Fusion Sensor fusion was used within machine learning to build a prototype intelligent monitoring system. The combination of multiple sensors could establish a pattern that could help create machine learning to find patterns in existing data. It used a model that recognized patterns in new data (Fig. 5).
736
P. Omoarebun et al. Table 2. Crude distillation tower risk matrix.
Tower Column Malfunction
Function
Failure Mode
Effect Causes
Tower base
High and low base level, Vapour Misdistribution.
Re-boiler return
Faulty level measurement or control, Excess re-boiler pressure drop.
Damage to internal tower
Insufficient mechanical resistance, water induced pressure.
Water induced surge
Undrained stripping steam lines, water in feed/slop.
Abnormal operation incidents (Start-up, shutdown , commissi oning) Assembly mishaps
Backflow, water removal from refinery fractionators, leads to plugging/coking and internal damage. Incorrect assembly or error in packing.
Packing Plugging and liquid overflow, poor distributors irrigation quality, feed entry problems.
Possible Control /Fault Detection Sensors. Through changes in temperature and pressure. Through change in level sensing and change in pressure.
O
D
S
RPN
Action
8
4
6
192
5
3
6
90
Abnormal operation, change in Temp& Press. Readings. Change in temperature and pressure.
8
2
7
112
Prevent excessive tower base level, good sump design. Have a frequent assessment checks, have a level sensor in place. The key prevention is to keep the water out.
6
3
8
144
Flow transmitter and Level sensing.
8
4
8
364
X-ray, acoustics sensors.
7
5
7
196
Level sensor, Flow transmitter.
6
4
7
168
Monitor water in-let, avoid abnormal operation Implement hazard techniques HAZOP and conduct frequent safety audits. Introduce / initiate a systematic and thorough tower process inspection programs.
(continued)
Intelligent Monitoring Using Hazard Identification Technique
737
Table 2. (continued) Intermediate draws (includes chimney trays)
Level sensor, leak detector.
5
3
8
120
Pressure sensor.
5
4
7
140
Decompositions of compound, chemical release, violent reactions.
Pressure and temperature sensor.
4
9
8
288
Heat exchangers (re-boiler tube, preheater, pump around exchanger condenser), Column chemicals to atmosphere. Simulations Poor VLE predictions.
Temperature sensor.
3
7
3
63
Pressure sensor.
3
6
5
90
Understandin g and having a good VLE prediction.
Condensers issues
Pressure sensor.
3
2
6
36
Ensure the vent is clean, well vented and draining.
Temperature sensor.
1
3
6
18
Bypass tower or preheater.
Re-boilers
Tower Column Malfunction
Chemical explosions
Leakage at the draw, restriction or vapour choke of draw line, plugging /coking, level measurement, vapour impingement. Excess change in pressure in kettle circuit, Forced circulation of reboilers.
Leaks
Heat integration issues
Inadequate venting, inadequate condensate removal, flooding/ entrainment in partial condenser. High degree heat integration practiced (multifeed arrangements), preheaters, inter-reboilers and recycle loops.
Review the design of the intermediate draws, Reduce pushing of towers to maximum capacities. Reduction and evaluation of troublesome re-boiler types. Reduce excessive temperature and catalysis of metal or catalyst by air leaks.
738
P. Omoarebun et al.
Fig. 4. Sensor pattern recognition and failure mode identified.
Fig. 5. Machine-learning Patterns through sensor fusion.
6 Discussion and Conclusion A methodology for intelligent monitoring using multi-sensor fusion for a crude distillation column was presented. The method explained the technique using simple input and output information and observing information. The idea was to provide a control solution using a binary matrix to predict trends in order to create an accurate monitoring system. Focusing on risk assessment methods applied to a distillation column helped with the identification of trends and some regions where the likelihood of malfunctions may be growing. The history of malfunctions was studied and future work may simulate column malfunctions to assist operators and engineers. Using combined hazard-modelling
Intelligent Monitoring Using Hazard Identification Technique
739
techniques to model risk threat to a crude oil distillation column during operational disturbances could help to systematically make suitable decisions. That could help to improve design regulations and identify potential hazards, eventually leading to safe operation. Table 1 identified the failure mode and effect analysis technique used. It was integrated into a practical example using 50 years of historical data (Table 2). FTA was introduced. Bow-tie analysis (Fig. 2) demonstrated how two-hazard analysis techniques could be combined to create a more robust risk identification technique to improve fault detection and data reliability, and eliminate any previous improper datasets. Multi-sensor fusion improved observability by broadening the baseline of physical observable and this could result in significant improvements. The use of a single sensor might not be effective in creating an intelligent monitoring system. With the complexity of combining sensors to create a learning algorithm, it is easy for a data set to be mismatched. Multiple sensor fusion could help to establish a pattern from independent sensors though to form a systemic way of thinking. This method helped in improving pattern recognition which could be implemented into machine learning. That could be used to detect or predict anomalies, predict faults and possibly predict catastrophes. A limitation was a concern with the accuracy and reliability improvements. Future work will investigate phase changes and material balance on sensor fusion performance and the data will be fed into some intelligent control systems that have recently been used to control powered wheelchairs [23–27]. It will also be used with decision-making systems [28, 29] and decision-making algorithms that are currently being investigated [30–32].
References 1. Bartok, J., Habala, O., Bednar, P., Gazak, M., Hluchý, L.: Data mining and integration for predicting significant meteorological phenomena. Procedia Comput. Sci. 1, 37–46 (2010) 2. Chang, C.D., Wang, C.C., Jiang, B.C.: Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors. Expert Syst. Appl. 38(5), 5507–5513 (2011) 3. Mabrouki, C., Bentaleb, F., Mousrij, A.: A decision support methodology for risk management within a port terminal. Saf. Sci. 63, 124–132 (2014) 4. Slovic, P., Finucane, M.L., Peters, E., MacGregor, D.G.: Risk as analysis and risk as feelings: some thoughts about affect, reason, risk, and rationality. Risk Anal. 24(2), 311–322 (2004) 5. Deng, Y., Sadiq, R., Jiang, W., Tesfamariam, S.: Risk analysis in a linguistic environment: a fuzzy evidential reasoning-based approach. Expert Syst. Appl. 38(12), 15438–15446 (2011) 6. Mokhtari, K., Ren, J., Roberts, C., Wang, J.: Decision support framework for risk management on sea ports and terminals using fuzzy set theory and evidential reasoning approach. Expert Syst. Appl. 39(5), 5087–5103 (2012) 7. Durrant-Whyte, H.: Sensor models and multisensor integration. Int. J. Robot. Res. 7(6), 97–113 (1988). https://doi.org/10.1177/027836498800700608 8. Mohaghegh, S.D.: A new methodology for the identification of best practices in the oil and gas industry, using intelligent systems. J. Petrol. Sci. Eng. 49, 239–260 (2005) 9. Hajizadeh, E., Davari, A.H., Shahrabi, J.: Application of data mining techniques in stock markets: a survey. J. Econ. Int. Finance 2(7), 109–118 (2010)
740
P. Omoarebun et al.
10. Khaleghi, B., Khamis, A., Karray, F., Razavi, S.: Multisensor data fusion: a review of the state-of-the-art. Inf. Fusion 14(1), 28–44 (2013) 11. Ikwan, F.: Reducing energy losses and alleviating risk in petroleum engineering using decision making and alarm systems. J. Comput. Syst. Eng. 422–429 (2018). ISSN 1472-9083 12. Mandal, S., Maiti, J.: Risk analysis using FMEA: fuzzy similarity value and possibility theory based approach. Expert Syst. Appl. 41(7), 3527–3537 (2014) 13. Luo, R., Kay, M.: Multisensor integration and fusion in intelligent systems. IEEE Trans. Syst. Man Cybern. 19(5), 901–931 (1989). https://doi.org/10.1109/21.44007 14. Riahi, R., Robertson, I., Bonsall, S., Jenkinson, I., Wang, J.: A proposed methodology for assessing the reduction of a seafarer’s performance with insufficient recuperative rest. J. Mar. Eng. Technol. 12(2), 11–28 (2013) 15. Ferdous, R., Khan, F., Sadiq, R., Amyotte, P., Veitch, B.: Handling data uncertainties in event tree analysis. Process Saf. Environ. Prot. 87(5), 283–292 (2009) 16. Lavasani, S.M.M.: Advanced quantitative risk assessment of offshore gas pipeline systems. Ph.D. thesis, School of Engineering and Maritime Operations, Liverpool John Moores University, UK (2010) 17. Selim, G., Seker, S.: Signal based approach for data mining in fault detection of induction motor. Sci. Res. Essays 6(22), 4720–4731 (2011) 18. Saybani, M.R., Wal, T.Y.: Applied data mining approach in ubiquitous world of air transportation. In: 4th International Conference on Computer Science Convergence Information Technology, ICCIT 2009, pp. 1218–1222 (2009) 19. White, F.E.: Data Fusion Lexicon, Joint Directors of Laboratories, Technical Panel for C3, Data Fusion Sub-Panel. Naval Ocean Systems Center, San Diego (1991) 20. Klein, L.A.: Sensor and Data Fusion Concepts and Applications, 2nd edn. Society of PhotoOptical Instrumentation Engineers (SPIE), Bellingham (1999) 21. Smets, P.: Analyzing the combination of conflicting belief functions. Inf. Fusion 8(4), 387–412 (2007) 22. Omoarebun, P.O.: Disaster risk reduction in petroleum engineering. J. Comput. Syst. Eng. (2018). ISSN 1472-9083 23. Sanders, D., Gegov, A., Ndzi, D.: Knowledge-based expert system using a set of rules to assist a tele-operated mobile robot. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Studies in Computational Intelligence, vol. 751, pp. 371–392. Springer (2018) 24. Sanders, D., Okonor, O.M., Langner, M., Hassan Sayed, M., Khaustov, S.A., Omoarebun, P.: Using a simple expert system to assist a powered wheelchair user. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 662–379. Springer 25. Gegov, A., Gobalakrishnan, N., Sanders, D.A.: Rule base compression in fuzzy systems by filtration of non-monotonic rules. J. Intell. Fuzzy Syst. 27(4), 2029–2043 (2014) 26. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: IEEE Proceedings of the SAI Conference on Intelligent Systems, London, U.K., pp. 426–433 (2018) 27. Sanders, D., Gegov, A., Haddad, M., Ikwan, F., Wiltshire, D., Tan, Y.C.: A rule-based expert system to decide on direction and speed of a powered wheelchair. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 822–838. Springer 28. Sanders, D.: Using self-reliance factors to decide how to share control between human powered wheelchair drivers and ultrasonic sensors. IEEE Trans. Neural Syst. Rehabil. Eng. 25(8), 1221–1229 (2017)
Intelligent Monitoring Using Hazard Identification Technique
741
29. Haddad, M., Sanders, D., Bausch, N., Tewkesbury, G., Gegov, A., Hassan Sayed, M.: Learning to make intelligent decisions using an Expert System for the intelligent selection of either PROMETHEE II or the Analytical Hierarchy Process. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 868, pp. 1303–1316. Springer 30. Haddad, M.J.M., Sanders, D., Tewkesbury, G., Gegov, A., Hassan Sayed, M., Ikwan, F.: Initial results from using Preference Ranking Organization METHods for Enrichment of Evaluations to help steer a powered wheelchair. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Advances in Intelligent Systems and Computing, vol. 1037, pp. 648–661. Springer 31. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M.J.M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhati, R. (eds.) Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer 32. Sanders, D., Robinson, D.C., Hassan Sayed, M., Haddad, M.J.M., Gegov, A., Ahmed, N.: Making decisions about saving energy in compressed air systems using ambient intelligence and artificial intelligence. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing, vol. 869, pp. 1229–1236. Springer
Factors Affecting the Organizational Readiness to Design Autonomous Machine Systems: Towards an Evaluation Framework Valtteri Vuorimaa1 , Eetu Heikkilä1(B) , Hannu Karvonen1 , Kari Koskinen2 , and Jouko Laitinen2 1 VTT Technical Research Centre of Finland Ltd., P.O. Box 1300, 33101 Tampere, Finland
[email protected] 2 Tampere University, P.O. Box 589, 33014 Tampere, Finland
Abstract. Increasing autonomy of machine systems is a major trend in industry. It is foreseen to improve efficiency and safety of operations, but also to introduce new kinds of socio-technical challenges and risks. To address these challenges, development organizations need a set of capabilities, processes and tools to achieve the needed readiness for designing autonomous machine systems (AMS). In this study, the key challenges, design approaches and proposed solutions were studied by performing a series of semi-structured interviews with companies working in the domain of mobile machines that are in different phases of development and deployment of AMS. Based on literature and interview findings, a set of factors affecting the organizational readiness to design AMS were identified. These factors can be used within organizations to perform assessments of their capability in design-related aspects of AMS, and further to develop an evaluation framework or tool for more comprehensive organizational assessments. Keywords: Autonomous systems · Engineering and design · Readiness
1 Introduction The shift towards increasingly autonomous machine systems (AMS) is a global trend in several industries. The applications range from autonomous road vehicles and drones to various specialized machine systems used, for example, in construction industry and logistic operations. Increased autonomy, when implemented in a well-designed manner, is seen to improve both safety and efficiency of operations. On the other hand, new risks and challenges are introduced with the new intelligent technologies and their interactions with humans and the surrounding infrastructure. Being complex socio-technical systems, the development of AMS requires a combination of various capabilities, processes and tools from the organizations involved in their development. In previous research, however, these readiness factors have not been studied in detail. Thus, companies may find it challenging to form an understanding of their organizational readiness and to identify capability gaps regarding development of AMS. In this paper, we take the first steps towards structuring these organizational readiness factors that are necessary to develop AMS. © Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 742–747, 2021. https://doi.org/10.1007/978-3-030-55190-2_62
Factors Affecting the Organizational Readiness to Design AMS
743
1.1 Autonomous Machine Systems Development There is no single agreed definition for the concept of autonomy. In most cases, however, autonomy is defined as the combination of achieving goals and operating independently [1]. Therefore, an autonomous system should have the ability to perceive its environment, reason, and make decisions based on its knowledge [2]. In AMS, these abilities are implemented with sophisticated use of software and hardware. Perception, planning and control are core functions of an AMS [3, 4]. Software and algorithm development aims to enhance these functions to an extent that is made possible by the advances in computing power and sensing technologies. It is common that in industry the terms autonomy and automation are used for the same purpose. The level of autonomy describes the wide range of operating conditions of an AMS, between being completely or not at all dependent on human intervention. Thus, various categorizations are used to define the level of autonomy or automation, depending on the application area and related terminology. As an example, the categorization adopted from SAE-J3016 [5] describes six levels for driving automation (Table 1). Similar levels are expected also in industrial machinery. Table 1. Categorization for levels of automation in road vehicles (adopted from [5]). Level
Description
Level 0
No automation. Driver is solely responsible of the driving task
Level 1
Driver assistance. System assists the driver with driving modes, leaving the driver responsible of steering and acceleration
Level 2
Partial Automation. System is capable of steering and acceleration by using environmental information, but driver is responsible of monitoring environment
Level 3
Conditional Automation. System executes the driving task, but driver is responsible of intervening appropriately when required
Level 4
High Automation. System is capable of executing the driving task, without requiring appropriate human intervention
Level 5
Full Automation. System has full autonomy and is capable of executing the same driving tasks in as a human driver in all conditions
Organizations that aim to develop AMS need a combination of capabilities to be able to construct systems at desired levels of autonomy. In literature, several views on organizational readiness have recently been studied, especially from the point of view of adopting digital technologies and related to digital transformation in general (e.g. [6, 7]), as well as related to specific technologies such as artificial intelligence [8]. Regarding development process practices for autonomous systems, however, virtually no information is available. Thus, there is a need to structure the aspects that are important in AMS development to support companies in assessing and improving their readiness for implementing AMS. The scope of this paper is set so that we focus specifically on the technical development of AMS, and exclude aspects related to social acceptance or
744
V. Vuorimaa et al.
ethics of AMS. As a result, we propose a set of organizational factors that are necessary for an organization to possess a sufficient readiness to design AMS.
2 Methods The key challenges and factors affecting the readiness to design AMS were studied as a combination of a literature study and interviews with companies that are in different phases of development and deployment of AMS. Specifically, a series of semi-structured interviews were conducted with eight original equipment manufacturing (OEM) and system integrator companies that are mostly active within the field of mobile heavy machinery. Interviewees consisted of company representatives with a good general view on their company’s direction towards increased autonomy, as well as the overall developments in industry (i.e. R&D managers and corresponding roles). Prior to the interviews, the relevant research problems and main phenomena were identified based on a literature study to build a thematic interview framework [9]. The purpose of the interview framework was to structure the interview topics, and to aid the interviewer to deepen an interesting subject provided by the interviewee. The theme sections were only to provide a structure for the interview while giving the interviewees freedom to represent their personal opinions on the subjects given. The findings of this qualitative research were structured following the process of thematic analysis [10].
3 Results and Discussion 3.1 Design Challenges in Autonomous Machine Systems Development In the literature and interview studies, a number of design challenges were identified. In traditional human-machine collaboration, human is typically responsible of sensing and thinking, whereas the machine performs the actions. As human takes less part in the collaboration, the responsibility of sensing and thinking is delegated to the machine. The autonomous properties of a machine increase the complexity of the machine, thus creating new requirements for the design. The design challenges of an AMS are diverse: from a technology point of view, the challenges vary from the component to a system level and extend into various dimensions, such as reliability, safety and security [9]. When designed and implemented ideally, AMS are likely to be safer than human operated machines. However, the uncertainty of possible hazards may cause the system to operate in an undesired way, leading towards unsafe operation. For example, cyber security issues, such as spoofing (incorrect input signals), can trick a controller to operate incorrectly [11]. Requirements management faces the challenge of transforming the high-level requirements to system and component levels, taking into account the ability to prepare also for unforeseen situations. Describing the multi-technological system and its subsystems and integrating existing technologies based on what is required from the AMS is a complex task. To develop such a complex system to field operation, extensive testing, verification and validation is required. For example, the testing of a decision-making system in various situations is difficult, since an unforeseen situation may remain outside the scope test scenarios.
Factors Affecting the Organizational Readiness to Design AMS
745
Still, the AMS should operate safely even if facing an uncertain situation. The more complex the AMS operation and environmental conditions are with more independency from human intervention, the more difficult the design is. Table 2. Organizational readiness elements and factors affecting the readiness to develop AMS (adopted and modified from [9]). Readiness element
Organizational factors needed to develop AMS
Operational principle of the AMS
– Ability to conceptualize new operations at different levels of autonomy and to evaluate the benefits of autonomy – Understanding of the technological requirements and maturity of the available technologies for implementing autonomous functions – Ability to identify new business models enabled by autonomy, including aspects like maintenance, data management and cyber security related services
Design practices and competences – Active cooperation of different engineering disciplines within the organization – Well-guided and clearly documented design process management to synchronize design tasks – Clearly defined interfaces between engineering systems – Cross-skilled team with the capability to design relevant autonomous functions – Agile and modular software design approaches – Model-based design approach to manage complex design tasks – Defined systematic approach for requirements modeling and management – Defined practices and competence for data acquisition, management and analysis – Co-operation with end users to identify valuable data – Competence for human-machine interaction (HMI) in design – Design capability of the infrastructure that supports AMS operation Digital design tools and practices
– Availability of digital V&V practices for simulating operational scenarios – Ability to utilize data from relevant sources, such as field tests and previous installations – Capability for simulating operational scenarios that are not feasible for field testing – Utilization of simulations of varying types as relevant for the application (ranging from simple to detailed simulations and digital twin simulations) – Knowledge of and access to the necessary software tools
Physical testing practices
– Facilities for physical testing of operating scenarios that are not feasible for digital testing, such as various environmental conditions – Ability to define relevant test scenarios at component and system levels – Data collection and processing capability from field tests – Procedures for data validation between physical tests and simulations
(continued)
746
V. Vuorimaa et al. Table 2. (continued)
Readiness element
Organizational factors needed to develop AMS
Partnerships and ecosystems
– Competences that are not present within the organization are available through ecosystems or partnerships – Development ecosystems share common design goals and defined methods for co-operation and communications – Longevity of ecosystems is ensured to create impact
3.2 Organizational Readiness for Designing AMS The vast number of design challenges may hinder the industry’s ability to harness the benefits of new autonomous technologies. In many cases, it may be especially challenging to assess the overall ability to design a particular autonomous system or some specific part of such system. This may lead to difficulties in decision-making regarding the choice of technologies and the level of autonomy to be pursued. In some industries, various readiness evaluation indices have been proposed for assessing the ability to introduce new practices and technologies. For instance, Singapore Economic Development Board has launched an evaluation framework for digital developments in the manufacturing industry, describing a combination of factors affecting an organization’s readiness for industry 4.0 [7]. Based on the challenges and solution proposals discovered both in literature and in the interviews, key factors affecting the readiness to design AMS’s were identified. A listing of identified factors is presented in Table 2. The listing is structured into five high-level organizational readiness elements that were found necessary for AMS development. More detailed organizational factors were also identified within each of these categories. Each of the factors is an organizational characteristic which increases the organization’s ability to develop AMS, and the absence of which is a hindrance for development.
4 Conclusions As such, the identified readiness factors presented in Table 2 provide a general overview of the organizational aspects that are needed to develop AMS, and reflect the complexity of the design tasks that are related to AMS development. It should be noted that the factors identified in this study are mostly from the perspective of mobile industrial machinery. However, many of the aspects are applicable to other types of industries as well. In the future, the scope can be further expanded by performing specific studies also on other domains, such as autonomous passenger transport systems or drones. As future work, the identified readiness factors can be developed further into an evaluation framework. The framework can include ratings for each factor, e.g. as a simple yes/no questionnaire or on a numeric scale. The framework can also be implemented into a tool for organizations to use for performing self-assessment of their operations. In such tool, a graphical representation can be used to visualize the strengths and weaknesses of the organization to support decision-making and improvement actions.
Factors Affecting the Organizational Readiness to Design AMS
747
References 1. Fong, T.: Autonomous Systems NASA Capability Overview (2018). https://www.nasa.gov/ sites/default/files/atoms/files/nac_tie_aug2018_tfong_tagged.pdf 2. Baudin, É., Blanquart, J., Guiochet, J., Powell, D.: Independent safety systems for autonomy: state of the art and future directions. Technical Report 07710, LAAS-CNRS (2007) 3. Amer, N., Zamzuri, H., Hudha, K., Kadir, Z.: Modelling and control strategies in path tracking control for autonomous ground vehicles: a review of state of the art and challenges. J. Intell. Robot. Syst. 86, 225–254 (2017) 4. Pendleton, S., Du, X., Shen, X., Andersen, H., Meghjani, M., Eng, Y., Rus, D., Ang, M.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 6 (2017) 5. SAE-J3016: Taxonomy and Definitions for Terms Related to Driving Automation systems for On-Road Motor Vehicles. SAE International (2018) 6. Lokuge, S., Sedera, D., Grover, V., Dongming, X.: Organizational readiness for digital innovation: development and empirical calibration of a construct. Inf. Manag. 56(3), 445–461 (2019) 7. Singapore Economic Development Board: The Singapore Smart Industry Readiness Index: Catalysing the transformation of manufacturing. EDB Singapore White Paper (2017) 8. Pumplun, L., Tauchert, C., Heidt, M.: A new organizational chassis for artificial intelligence exploring organizational readiness factors. In: Proceedings of the 27th European Conference on Information Systems (ECIS), Stockholm & Uppsala, Sweden, 8–14 June 2019, pp. 1–15 (2019) 9. Vuorimaa, V.: Readiness assessment of engineering practices for designing autonomous industrial mobile machinery. M.Sc. (Tech.) thesis. Tampere University (2019) 10. Thematic Analysis: University of Jyväskylä (2010). https://koppa.jyu.fi/avoimet/hum/menete lmapolkuja/en/methodmap/data-analysis/thematic-analysis 11. Ya˘gdereli, E., Gemci, C., Akta¸s, A.: A study on cyber-security of autonomous and unmanned vehicles. J. Defense Model. Simul.: Appl. Methodol. Technol. 12(4), 369–381 (2015)
RADAR: Fast Approximate Reverse Rank Queries Sourav Dutta(B) Huawei Research Centre, Dublin, Ireland [email protected]
Abstract. Reverse k-rank queries, for a given query item, extract the top-k users whose ranking of the item (based on individual user’s preferences) is best among all the users. Such queries have recently received significant interest in diverse applications like market analysis, product placement, sales and e-commerce. Current approaches employ efficient high-dimensional data indexing techniques to prune input data points for improving query run-time. However, they fail to provide practical runtime characteristics for online and streaming scenarios, typical in such applications. This paper proposes the RADAR algorithm to efficiently compute, in real-time, approximate reverse k-rank queries. RADAR sorts the input data on each dimension (i.e., item aspects), and utilizes the ranking of a query in each of the dimensions to approximate the final ranking of the query by users based on their preferences. Empirical evaluations on real datasets demonstrates upto 50× run-time improvements over existing approaches, with a high accuracy of around 90%. Further, experiments on synthetic datasets showcase the scalability and efficacy of our algorithm for large scale and high-dimensional datasets. Keywords: Reverse k-rank query combination · High dimensionality
1
· Approximate ranking · Linear · Real-time computation
Introduction
Ranking and Rank-aware queries (e.g., k-NN [11], reverse k-NN [12], top-k [15]) have been studied as a fundamental operator in database management systems and query processing. Such queries extract the top-k records depicting high/low ranks (or scores) based on user-defined scoring functions [15,18]. The top-k and reverse-k rank queries broadly categorize two different kinds of view-models. The top-k query (under a linear model consideration) provides a “user-view” to retrieve the top-k products/items best matching the user preferences. On the other hand, reverse-k queries generate a “manufacturer-view” for identifying prospective customers by considering user preferences that enable a product to match the query condition. The reverse top-k and reverse k-rank queries have recently been proposed for applications like marketing, product advertising, and recommendations [14,21]. c Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 748–757, 2021. https://doi.org/10.1007/978-3-030-55190-2_63
RADAR: Fast Approximate Reverse Rank Queries
749
Specifically, the reverse top-k query returns the users for whom the given product is ranked within their top-k list [15]. However, it fails to capture relevant users for less popular or niche or long-tailed products (that might be absent in the top-k). Hence, to ensure full coverage, not guaranteed by the reverse top-k, the reverse k-rank query was proposed [21], to extract the top-k users with the best rankings (or scores), based on their preferences. Motivation. Existing algorithms [5,21] for reverse k-rank queries employ high dimensional indexing techniques such as R-trees or Grids to extract the exact k users with the best scores (for a query item) by efficiently pruning candidates from the answer set, thereby reducing run-time computations. However, such approaches fail to cater to the real-time necessities of modern streaming and online applications (e.g., location-based advertising) on high-dimensional data. Further, such applications operate under the ambit of a trade-off between realtime (few msec) and a tolerable error rate. To this end, we propose the RADAR algorithm for efficient real-time retrieval of approximate k users (with high accuracy) for reverse k-rank queries. Problem Statement. Consider P = {p1 , p2 , · · · , pn } to be a collection of n data points (e.g., products) with each point pi = {p1i , p2i , · · · , pdi } represented as a d-dimensional vector, where pji captures the value of the j th dimension (e.g., attribute or aspect) of item pi . Let W = {w1 , w2 , · · · , wm } be the set of m users with each user wi = {wi1 , wi2 , · · · , wid } also represented as a d-dimensional vector, where wij defines user wi ’s preference (or weight) for the j th dimension. d Individual user weights are assumed to be normalized, i.e., ∀i, j=1 wij = 1. The d score assigned by user wa to data point pb is defined by, Sc(wa , pb ) = j=1 pjb ·waj . For each user, the data points are considered to be ranked in the decreasing order of their scores, Sc(·), i.e., a higher score depicts a closer match with the user preferences and is hence placed higher in the ranked list (e.g., item with highest score is placed at the top with rank 1). Let rank(wa , pb ) denote the number of products (points) with a lower rank than pb by user wa [21]. Thus, the reverse k-rank query extracts the top-k users in W with the smallest rank(·, q) value in P for a d-dimensional query item q. Contributions. This paper proposes, RADAR, the first algorithm (to the best of knowledge) to extract approximate top-k users for reverse k-rank queries. RADAR utilizes a linear combination on the ranks of data points in each dimension and user preference weights to approximate the final ranking of a product by a user. This enables our approach to exhibit high run-time efficiency by reducing the number of computations (compared to state-of-the-art approaches). Experimentally we showcase, on real and synthetic data, that RADAR is extremely efficient for real-time needs on large datasets and is highly accurate even for high-dimension data.
750
2
S. Dutta
The RADAR Algorithm
This section presents the working of the proposed RAnking on Dimensions for Approximate Reverse k-Rank (RADAR) algorithm for efficient highdimensional approximate reverse k-rank queries. From the above discussion, recall that P = {p1 , p2 , · · · , pn } is the input set of n d-dimensional data points, and W = {w1 , w2 , · · · , wm } is the set of m users where each user is represented as a d-dimensional preference vector. A score Sc(·, ·) is assigned by each user to the individual data points (based on user preferences), and rank(wa , pb ) denotes the number of points with a rank lower than pb by user wa . In this setting, we can observe the following: Observation 1. Consider two d-dimensional data points, pi = {p1i , · · · , pdi } and pj = {p1j , · · · , pdj }. If pli > plj for all dimensions (l ∈ [1, d]), pi will be positioned d l l higher in the ranked list than pj by all users – as, Sc(w, pi ) = l=1 pi w > d l l Sc(w, pj ) = l=1 pj w , for any user weight vector w. Hence, the value of rank(·, pi ) will be lower than rank(·, pj ). Observe, the reverse also holds, i.e., for pli < plj (∀l ∈ [1, d]), rank(·, pi ) > rank(·, pj ). Considering a point pi , we now define σ[pli ] to be the rank() of point pi , with respect to the input set P , based on the values of lth attribute (dimension) of the points. Hence, we can observe: Observation 2. In Observation 1, for points pi and pj , if pli > plj (∀l ∈ [1, d]), we have rank(·, pj ) < rank(·, pj ). Thus, for pli > plj , the rank() of pi based on the lth dimension, σ[pli ] = rank(·, pli ) is lower than σ[plj ] = rank(·, plj ); since Sc(·, pli ) > Sc(·, plj ). Hence, for pli > plj , we have σ[pli ] < σ[plj ] (∀l ∈ [1, d]). d d Thus, l=1 σ[pli ]wl < l=1 σ[plj ]wl , for any user preference vector w. Similarly, observe that the reverse condition also holds. Observations 1 and 2 provide the major intuitions and working principles for RADAR – the linear combination of user preference weights and ranking of data points along the dimensions exhibits similar behaviour (in certain cases), and can thus be used as proxy for the rank() of a data item. This enables RADAR to efficiently extract approximate reverse k-rank queries, using the following operational stages. (1) Pre-Processing. As a pre-processing step, RADAR initially sorts (in decreasing order) the input data points in P based on their values along each of the d dimensions, to generate d rank lists R = {r1 , r2 , · · · , rd }. Specifically, rank list ri contains the sorted ordering of the values in the ith dimension of the data points in P . (2) Query Processing. Given a d-dimensional query point, q = {q 1 , q 2 , · · · , q d }, RADAR computes the ranking of q in each of the d dimensions using rank lists R, i.e., it computes the values σ[q i ], ∀i ∈ [1, d]. To this end, σ[q i ] is obtained via a binary search for q i in the pre-computed rank list ˆ of q for each user w is computed ri . Finally, the approximate rank value, rank()
RADAR: Fast Approximate Reverse Rank Queries
751
Fig. 1. Operational computation framework of the RADAR algorithm.
ˆ q) = d σ[q i ]wi . A min-heap is then using the linear combination, rank(w, i=1 ˆ q) values, as the utilized to extract the top-k users, depicting the lowest rank(·, output approximate reverse k-rank for q. Example: Figure 1 presents the computational stages of our proposed algorithm along with a running example. Consider the set of input data points P = {A, B, C} to consist of n = 3 data points, where each point is 3-dimensional and is defined as A = {7, 40, 11}, B = {5, 3, 63} and C = {12, 5, 34}. Also, consider W = {X, Y, Z} (m = 3) as the input set of three users with their preference (weight) vector as X = {0.7, 0.2, 0.1}, Y = {0.5, 0.3, 0.2} and Z = {0.3, 0.5, 0.2} respectively for the d dimensions. RADAR initially pre-computes the 3 rank lists R = {r1 , r2 , r3 } based on the sorted values (in descending order) of the data points in P along each of the 3 dimensions. Hence, r1 = {12, 7, 5}, r2 = {40, 5, 3} and r3 = {63, 34, 11}. Given a query q = {23, 1, 31}, we next obtain the ranking of the query point along each dimension (σ[q i ]) using binary search on rank lists in R. Thus we have, σ[q 1 = 23] = 1, σ[q 2 = 1] = 4 and σ[q 3 = 31] = 3. RADAR then computes the approximate ranking of the query by each of the users based on the combination ˆ q) = of user preferences and the query. Thus, for user X, we have rank(X, 3 j j = 1 ∗ 0.7 + 4 ∗ 0.2 + 3 ∗ 0.1 = 1.8. Similarly, we compute j=1 σ[q ]X ˆ ˆ q) = 2.9. Finally, user X having the minimum rank(Y, q) = 2.6 and rank(Z, ˆ rank(·) value is returned as the reverse 1-rank for q. Observe, that Na¨ıve computation of the reverse 1-rank for q entails computation of score Sc(·, ·) corresponding to each point to obtain the rank(·, q) for each user. Thus, in our example, for user X and points in P , we have Sc(X, A) = 0.7 ∗ 7 + 0.2 ∗ 40 + 0.1 ∗ 11 = 14, Sc(X, B) = 10.4, Sc(X, C) = 12.8, Sc(X, q) = 19.4, with rank(X, q) = 0 (i.e., no point with higher score). Similarly, we obtain, rank(Y, q) = 1 and rank(Z, q) = 2. The smallest rank(·, q) is attained for user X and is reported as the result. RADAR thus accurately computes the result (in this case), and we later present experimental results showcasing the high accuracy achieved. Performance: RADAR is extremely efficient and can cater to the real-time necessities of modern applications. The pre-processing phase constructs d sorted rank lists of n points (in P ) along each of the d dimensions, taking O(dn log n) time. During run-time, on arrival of the query, d binary searches are performed within the pre-computed rank lists to obtain the rankings of the query along
752
S. Dutta
each of the dimensions, incurring O(d log n). Computation of the approximate ˆ user ranks, rank() takes O(m) computation, considering unit time for linear combination operation. Finally, the min-heap procedure to obtain the final result takes O(k log m). Hence, the total run-time complexity of RADAR is O(d log n + k log m + m) ≈ O(m), i.e., linear, providing significant improvements over state-of-the-art algorithms. The space complexity is O(nd) for storing the d rank list over data points P .
3
Experimental Results
In this section, we experimentally compare the proposed RADAR algorithm against competing state-of-the-art approaches, using real life as well as synthetic datasets. Specifically, we benchmark the run-time efficiency, accuracy and robustness of RADAR against the following algorithms: (1) Marked Pruning (MPA) [21] using R-tree based pruning. (2) Grid-Index Reverse (GIR) algorithm [5] based on grid search. Our empirical setup captures the performance of algorithms with variations in the following parameters – (i) (ii) (iii) (iv)
no. no. no. no.
of of of of
user vectors (W ), data points (P ), dimensions (d), and reverse k-rank (k).
All experiments were conducted on a 3.2 GHz Intel processor with 8 GB RAM running Ubuntu 16.04 LTS. Real Datasets. We perform experiments on two real datasets – (1) Bank dataset (from archive.ics.uci.edu/ml/datasets/Bank+Marketing) containing 45, 211 records for marketing campaigns of banks pertaining to customer age, balance, and duration (i.e., 3 dimensions). (2) Color dataset (from kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures. html) consisting of 68, 040 tuples of 9-dimensional image features. For the Bank and Color dataset, 10 K 3-dimensional and 100 K 9-dimensional randomly uniformly user preference weight vectors (normalized to 1) were generated resp., as in the setup of [5,21]. Figures 2(a) and (b) compares the run-time of RADAR against competing approaches, while Table 1 reports the pre-processing time and accuracy of our proposed method with varying values of k for reverse k-rank queries. For RADAR, we observe around 50× run-time improvements over M P A with around 90% accuracy on Bank dataset, and nearly 6× run-time gains over GIR with a full 100% accuracy on Color dataset.
RADAR: Fast Approximate Reverse Rank Queries
753
Fig. 2. Run-time with respect to (a) M P A on Bank and (b) GIR on Color datasets. Table 1. Pre-Processing time and accuracy of RADAR on real datasets. Dataset
BANK dataset
COLOR dataset
Top-K (k) Pre-process Acc. (%) time (msec)
Top-K (k)
Pre-process Acc. (%) time (msec)
10
12.35
93.07
100
69.54
100
20
12.28
90.55
300
69.31
100
40
12.86
89.88
500
70.07
100
Synthetic Datasets. To further assess the performance of RADAR against existing approaches on large synthetic datasets, we randomly and uniformly generated W , the set of (normalized) user weight vectors. Further, the data point set P was also randomly uniformly generated with the value of each dimension randomly chosen between [0,10 K], as in [5,21]. Table 2 and Fig. 3 presents the performance comparison between RADAR and GIR. We observe that RADAR outperforms GIR in run-time (with nearly 10× improvements) with high accuracy of results for different values of dimensions and top-k. The performance gap further increases as the number of users (W ) or data points (P ) are increased, thereby depicting RADAR to efficiently cater to real-time approximate reverse k-rank queries on real data. Similarly, Table 3 tabulates the comparison results between RADAR and M P A on synthetic data. RADAR is observed to perform significantly better in terms of run-time (along with high accuracy) across varying parameter values, demonstrating upto 11× improvements. Scalability. Finally, we explore the scalability in the performance of our proposed RADAR algorithm. Figure 4 showcases the run-time performance of RADAR on synthetic data under the different parameter variation scenarios. We observe that the query run-time increases linearly with the number of users (|W | = m) while nearly no influence is observed for the other parameters, as described in the performance analysis in Sect. 2. It is important to note that
754
S. Dutta Table 2. Comparison with GIR on synthetic data. Algorithm/parameter RADAR Pre-process Run-time time (msec) (msec)
Acc. (%)
GIR Run-time (msec)
Dimen. (d)
[Parameter setting: |W| = |P| = 100K, k = 100]
10
80.15
69.63
75.67
704
20
121.58
73.97
83.85
753
30
203.71
77.75
86.15
814
50
407.70
93.01
89.40
919
Top-K (k)
[Parameter Setting: |W| = |P| = 100K, d = 6]
100
48.71
51.37
62.63
504
200
49.87
53.82
67.85
512
300
47.08
56.40
71.67
534
500
48.13
62.82
73.30
588
Fig. 3. Run-time comparison with GIR on synthetic data.
RADAR takes only around 8 s to compute the reverse 200-rank query results for 5 million users with 200 K data points with a dimensionality of 30 – thus providing computational efficiency for large high dimensional data. The preprocessing time is also seen to increase linearly with the number of data points (|P | = n) and the number of dimensions (d). Interestingly, the value of k is not prominent in the run-time analysis of RADAR and is thus seen to have limited or no influence on both the run-time and the pre-processing time in RADAR. On the other hand, Fig. 5 demonstrates the effect of parameter variation on the accuracy of RADAR for synthetic data. We observe that the accuracy initially increases and then stabilizes at a high value (around 82–90%) with variations in P , d, and k. This initial increase can be attributed to the fact that larger number of points or dimensions provide more differentiating data values or attributes for computation of approximate query rank based on Observation 2.
RADAR: Fast Approximate Reverse Rank Queries
755
Fig. 4. Run-time scalability of RADAR with parameter variation. Table 3. Performance comparison of RADAR with M P A on synthetic dataset. Parameter /algorithm
Run-time (msec) [Default Parameter Setting: |W| = 400K, |P| = 20K, d = 3, k = 10] No. of users (W)
No. of points (P)
No. of dimensions (d)
200K
10K
3
300K
500K
30K
50K
5
No. of top-K (k) 7
10
30
50
RADAR
126.17 210.57 459.92 122.43 308.48 478.08 252.48 335.91 404.58 252.48 292.59 330.28
MPA
2751
2923
3814
923
3618
5422
3175
11342 28561 3175
3204
3192
However, with increase in the number of user weight vectors (W ), we observe a slight decrease in accuracy of RADAR (to around 76%). With increase in the number of users, the number of similar user weight vectors increase (as it is normalized), leading to similar (or same) approximate ranking of the query by the algorithm. In such cases, a larger number of correct (or accurate) results might be removed from the min-heap (containing the reverse k-rank result) owing to the random tie breaking criteria, leading to a decrease in the accuracy.
Fig. 5. Accuracy scalability of RADAR with parameter variation.
Similar results were observed for zipfian distributed data (omitted due to space constraints). Hence, under varying scenarios with real and synthetic datasets, RADAR is seen to effectively handle large high-dimensional data for efficient reverse k-rank queries for modern real-time applications.
4
Related Work
Ranking queries, along with its variants have been widely used in diverse applications in the fields of database management systems and query processing. In
756
S. Dutta
the domain of e-commerce and sales, ranking provides an important property for evaluating the position of a product and comparison based on user or market preferences. As such, the extensively research literature for this problem can be broaded classified and discussed next. Top-k Query: It involves extraction of k products/items with the minimal ranking score based on a scoring function [11]. Fagin’s Threshold Algorithm (TA) and No Random Access (NRA) [6] perform rank-combinations of sorted object lists based on attributes. The NRA algorithm was further improved with monotone aggregate functions in [13]. The onion technique [1] pre-computes the convex hull of data points in layers akin to an onion structure, with evaluation starting from the outermost layer. A materialized view of top-k result sets close to the scoring functions was used in the PREFER system [10]. Top-k results were also defined by k nearest neighbour approach in spatial databases in [9]. The robust index method based on minimum possible ranking of items was proposed in [18]. Such queries have been extended to distributed and streaming scenarios. Reverse Top-k Query: The reverse top-k query returns the aggregate function that ranks an item as the highest. In other words, it evaluates the impact of a product by finding customers that treat the product as one of their top-k favourite [8,15]. The reverse top-k Threshold Algorithm (RTA) for monochromatic and bichromatic variants of top-k was presented in [15,16]. A more efficient tree-based branch-and-bound approach was proposed by [17]. Critical kpolygon based indexing was also explored [2] for 2-dimensional reverse top-k queries. Computing all top-k queries in batch using block indexed nested loops with a view-based algorithm was proposed in [8]. Reverse top-k queries for large graphs was studied in [20] using Random Walk with restart distance. The related Reverse k-nearest neighbour (RKNN) was also studied [7,12,19]. Reverse k-rank Query: For niche or less popular products, the reverse top-k query tends to return an empty answer set. To alleviate this problem, reverse k-rank queries were recently proposed [21], with a histogram and R-tree pruning based algorithm was proposed to identify prospective candidates. To tackle the curse of dimensionality, a grid indexing based algorithm was shown to perform superior for high-dimensional data [5]. Since, the above approaches were designed to tackle only one query product, efficient algorithms for aggregate reverse k-rank on multiple queries were explored in [3,4]. Reverse k-rank queries on large graphs was also recently studied in [14].
5
Conclusion
We proposed RADAR, the first approach to efficiently compute approximate reverse k-rank queries, based on a linear combination of ranks and user preferences along dimensions. Extensive empirical evaluations on real & synthetic data have demonstrated upto 50× run-time gains with a high accuracy of 90% on real datasets. We showcase the scalability of our algorithm for large high-dimensional data compared to existing approaches. Future work involves deriving theoretical bounds for accuracy of RADAR.
RADAR: Fast Approximate Reverse Rank Queries
757
References 1. Chang, Y., Bergman, L.D., Castelli, V., Li, C., Lo, M., Smith, J.R.: The onion technique: indexing for linear optimization queries. In: SIGMOD, pp. 391–402 (2000) 2. Chester, S., Thomo, A., Venkatesh, S., Whitesides, S.: Indexing reverse top-k queries in two dimensions. In: DASFAA, pp. 201–208 (2013) 3. Dong, Y., Chen, H., Furuse, K., Kitagawa, H.: Aggregate reverse rank queries. In: DEXA, pp. 87–101 (2016) 4. Dong, Y., Chen, H., Furuse, K., Kitagawa, H.: Efficient processing of aggregate reverse rank queries. In: DEXA, pp. 159–166 (2017) 5. Dong, Y., Chen, H., Yu, J.X., Furuse, K., Kitagawa, H.: Grid-index algorithm for reverse rank queries. In: EDBT, pp. 306–317 (2017) 6. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003) 7. Gao, Y., Miao, X., Chen, G., Zheng, B., Cai, D., Cui, H.: On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J. 26, 467–492 (2017) 8. Ge, S., U, L.H., Mamoulis, N., Cheung, D.: Efficient all top-k computation - a unified solution for all top-k, reverse top-k and top-m influential queries. TKDE 25(5), 1015–1027 (2012) 9. Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. TODS 24(2), 265–318 (1999) 10. Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD, pp. 259–270 (2001) 11. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. (CSUR) 40(4), 11:1– 11:58 (2008) 12. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD, pp. 201–212 (2000) 13. Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. TODS 32(3), 19 (2007) 14. Qian, Y., Li, H., Mamoulis, N., Liu, Y., Cheung, D.W.: Reverse k-ranks queries on large graphs. In: EDBT, pp. 37–48 (2017) 15. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørv˚ ag, K.: Reverse top-k queries. In: ICDE, pp. 365–376 (2010) 16. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørv˚ ag, K.: Monochromatic and bichromatic reverse top-k queries. TKDE 23(8), 1215–1229 (2011) 17. Vlachou, A., Doulkeridis, C., Nørv˚ ag, K., Kotidis, Y.: Branch-and-bound algorithm for reverse top-k queries. In: SIGMOD, pp. 481–492 (2013) 18. Xin, D., Chen, C., Han, J.: Towards robust indexing for ranked queries. In: VLDB, pp. 235–246 (2006) 19. Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: SLICE: reviving regions-based pruning for reverse-k nearest neighbors queries. In: ICDE, pp. 760–771 (2014) 20. Yu, A.W., Mamoulis, N., Su, H.: Reverse top-k search using random walk with restart. PVLDB 7(5), 401–412 (2014) 21. Zhang, Z., Jin, C., Kang, Q.: Reverse k-rank query. PVLDB 7(10), 785–796 (2014)
Author Index
A Abbas, Irfan, 128 Abbod, Maysam, 430 Abuhmed, Tamer, 482 Afza, Seemal, 29 Ahmad, Fahad, 331 Ahmad, Masood, 331 Ahmad, Munib, 128 Ajmera, Yug, 696 Ali, Gohar, 331 Ali, Sarwan, 400 Alsalemi, Abdullah, 188 Alvarez-Veintimilla, Marcelo, 386 Amanul, Muhamed, 55 Amira, Abbes, 188, 430 Amme, Wolfram, 638 Azhar, Samreen, 128 B Bai, Liu, 270 Bakhtadze, Natalya, 115 Bakutkin, Ilya V., 441 Bakutkin, Valery V., 441 Barker, Tom, 206 Bauckhage, Christian, 703 Bausch, Nils, 206, 584, 594 Becarra, Victor, 206 Becherer, Marius, 256 Belianin, Alexis, 241 Bensaali, Faycal, 188 Bhushan, Braj, 416 Boranbayev, Askar, 139 Boranbayev, Seilkhan, 139 Bourgeois-Gironde, Sacha, 241
Brown, W. Shannon, 657 Butt, Umair Muneer, 128 C Cha, Youngkyun, 631 Chae, Ilseok, 723 Cheng, Shao-chi, 218 Chernov, Igor, 662 Chesney, Steve, 679 Chiluisa-Velasco, Gabriela, 386 Chiverton, John, 559, 571 Coronato, Antonio, 452 Čukić, Milena, 493 D Dar, Zaheer, 128 de Izaguirre, Francisco, 353 Dimitrakopoulos, George, 188 Diveev, Askhat, 73 Dolezel, Petr, 229 Dutta, Sourav, 748 E Eaw, H. C., 168 Ebara, Jiro, 315 Elsaadany, Mahmoud, 708 El-Sappagh, Shaker, 482 Ertel, Wolfgang, 662 Escobar, Jesús Jaime Moreno, 527 F Faizullah, Safiullah, 400 Feshchenko, Artem, 713
© Springer Nature Switzerland AG 2021 K. Arai et al. (Eds.): IntelliSys 2020, AISC 1252, pp. 759–761, 2021. https://doi.org/10.1007/978-3-030-55190-2
760 G Gegov, Alexander, 584, 594, 617 Giasin, Khaled, 584 Gil, Maite, 353 Goiko, Vyacheslav, 713 Guo, Yang, 218 Gutkin, Boris, 241 H Haddad, Malik, 206, 559, 571, 584, 594, 604, 617, 730 Hamed, Radwa, 153 Hanafusa, Ryo, 315 Hassan, Mohamed, 559, 604, 617, 730 Hecker, Dirk, 703 Heiden, Bernhard, 647 Heikkilä, Eetu, 742 Himeur, Yassine, 188 Holik, Filip, 229 I Ikwan, Favour, 571, 594, 604, 617, 730 Ishii, Akira, 13 Islam, Mohammad Amanul, 364 J Javaid, Qaisar, 331 Javed, Arshad, 696 Jeong, Chang-Won, 687, 717, 723 Jerónimo, Tomás, 545 Jin, Ke, 1 JosephNg, P. S., 168 K Karduck, Achim, 96, 256 Karvonen, Hannu, 742 Khan, Imdadullah, 400 Khan, Muhammad Asad, 400 Kher, Shubhalaxmi, 181 Khorsandroo, Sajad, 679 Kim, Hyungheon, 631 Kim, Ji Eon, 687, 717, 723 Kim, SeungJin, 687, 717, 723 Kim, Tae-Hoon, 687, 717, 723 Kim, Taewoo, 631 Ko, In-Young, 96 Korashy, Mostafa, 708 Koskinen, Kari, 742 Kouznetsova, Valentina L., 511 Krishna Prasad, Pranav, 662 Kwak, Kyung Sup, 482
Author Index L Lagla-Quinaluisa, Johana, 386 Laitinen, Jouko, 742 Langner, Martin, 559, 584, 594 Lee, ChungSub, 687, 717 Lee, Chung-Sub, 723 Lee, Yun Oh, 723 Li, Jeremy, 511 López, Mario Mendieta, 527 Lopez, Viktoria, 493 Lüdtke, Andreas, 303 Lussange, Johann, 241 M Martínez, Jhonatan Castañón, 527 Masood, Fatima, 128 Matamoros, Oswaldo Morales, 527 Matsuta, Valeria, 713 Mecheter, Imene, 430 Merta, Jan, 229 Mokhtar, Bassem, 153 Monzón, Pablo, 353 Mundrievskaya, Yuliya, 713 N Niermann, Dario, 303 Noh, Si-Hyeong, 687, 717, 723 Nurbekov, Askar, 139 O Okadome, Takeshi, 315 Okano, Nozomi, 13 Omoarebun, Peter, 559, 571, 584, 604, 617, 730 P Padilla, Ricardo Tejeida, 527 Palaniswami, Marimuthu, 416 Paragliola, Giovanni, 452 Park, Rae Woong, 723 Park, Sung Bin, 723 Pastrana-Brincones, José Luis, 88 Payandeh, Shahram, 340 Pérez, Nicolás, 353 Peshkovskaya, Anastasia, 713 Piner, Jake, 206, 730 Pokrajac, Dragoljub, 493 Pombo, Nuno, 545 Ponis, Stavros T., 673 Q Qidwai, Uvais, 469 Qureshi, Adnan N., 29
Author Index R Radke, Richard J., 285 Ranjan, Rajesh, 416 Razzaq, Shahid, 128 Rivas-Lalaleo, David, 386 Rizk, Mohamed, 153 Rogers, Ian, 571 Rolón, Marco, 353 Romm, Eden, 511 Roy, Etee Kawna, 181 Roy, Kaushik, 657, 679 S Salam, Abdu, 331 Sanders, David, 206, 559, 571, 584, 594, 604, 617, 730 Sardianos, Christos, 188 Schäfer, André, 638 Serbina, Galina, 713 Shah, Amjad, 730 Shams, Shoukry I., 708 Sharif, Usman, 29 Sharma, Gyanendra, 285 Sheikh, Sarah, 469 Shibghatullah, A. S., 168 Sifa, Rafet, 703 Silva, Bruno, 545 Sofronova, Elena, 73 Staehle, Benjamin, 662 Stursa, Dominik, 229 Suleykin, Alexander, 115 T Taj-Eddin, Islam A. T. F., 708 Tesha, Gaudence Stanslaus, 55 Tewkesbury, Giles, 206, 594, 604, 617
761 Thabet, Mohamad, 206, 584, 594, 604, 617, 730 Tonino-Heiden, Bianca, 647 Tsigelny, Igor F., 511 U Ullah, Asad, 400 unnisa, Zaib, 128 V Varlamis, Iraklis, 188 Vatchova, Boriana, 559, 571 Verma, Alok, 416 Vivilyana, Viva, 168 Vuksanovic, Branislav, 604 Vuorimaa, Valtteri, 742 W Wahid, Ishtiaq, 331 Wang, Yiming, 1, 270 Wu, Chen, 270 Wu, Cheng, 1 Y Yoon, Kwon-Ha, 687, 717, 723 Yosifova, Veneta, 45 You, Seng Chan, 723 Yuan, Xiaohong, 657 Z Zaidi, Habib, 430 Zayko, Yuriy N., 441 Zelenov, Vladimir A., 441 Zhou, Shikun, 559, 571 Zipperle, Michael, 96