163 40 53MB
English Pages 724 [709] Year 2021
Advances in Intelligent Systems and Computing 1378
Tareq Ahram Redha Taiar Fabienne Groff Editors
Human Interaction, Emerging Technologies and Future Applications IV Proceedings of the 4th International Conference on Human Interaction and Emerging Technologies: Future Applications (IHIET – AI 2021), April 28–30, 2021, Strasbourg, France
Advances in Intelligent Systems and Computing Volume 1378
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156
Tareq Ahram Redha Taiar Fabienne Groff •
•
Editors
Human Interaction, Emerging Technologies and Future Applications IV Proceedings of the 4th International Conference on Human Interaction and Emerging Technologies: Future Applications (IHIET – AI 2021), April 28–30, 2021, Strasbourg, France
123
Editors Tareq Ahram Institute for Advanced Systems Engineering University of Central Florida Orlando, FL, USA
Redha Taiar Campus du Moulin de la Housse Université de Reims Champagne-Ardenne GRESPI Reims Cedex 2, France
Fabienne Groff Directrice Hôpitaux Universitaires de Strasbourg IFMK Strasbourg, France
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-73270-7 ISBN 978-3-030-74009-2 (eBook) https://doi.org/10.1007/978-3-030-74009-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book, entitled Human Interaction and Emerging Technologies IV: Future Applications, aims to provide a global forum for presenting and discussing novel emerging technologies, human interaction, and engineering applications, methodologies and solutions for integrating people, concepts, trends, and applications in all areas of human interaction endeavor. Such applications include, but are not limited to, health care and medicine, transportation, business management and infrastructure development, manufacturing, social development, a new generation of service systems, as well as safety, risk assessment, and cybersecurity. Indeed, rapid progress in developments in cognitive computing, modeling, and simulation, as well as smart sensor technology, will have a profound effect on the principles of human interaction and emerging technologies at both the individual and societal levels in the near future. This interdisciplinary book will also expand the boundaries of the current state of the art by investigating the pervasive complexity that underlies the most profound problems facing contemporary society today. Emerging technologies included in this book covers a variety of technologies such as educational technology, information technology, nanotechnology, biotechnology, cognitive science, robotics, and artificial intelligence. The book, which gathers selected papers presented at the 4th International Conference on Human Interaction and Emerging Technologies: Future Applications (IHIET-AI 2021) held on April 28–30, 2021, in Strasbourg, France, focuses on advancing the theory and applications for human interaction requirements as part of an overall system development lifecycle, by adopting a humancentered design approach that utilizes and expands on the current knowledge of user-centered design and systems engineering supported by cognitive software and engineering, data analytics, simulation and modeling, and next-generation visualizations. This book also presents many innovative studies with a particular emphasis on the development of technology throughout the lifecycle development process, including the consideration of user experience in the design of human interfaces for virtual, augmented, and mixed reality applications. Reflecting on the above-outlined perspective, the papers contained in this volume are organized into seven sections, including: v
vi
Preface
Section Section Section Section Section Section Section
1: 2: 3: 4: 5: 6: 7:
Artificial Intelligence and Computing Augmented, Virtual and Mixed Reality Simulation Human–computer Interaction Human-centered Design Applications in Healthcare and Wearable Technologies Human Technologies and Future of Work Management, Training and Business Applications
We hope that this book, which presents the current state of the art in human interaction and emerging technologies, will be a valuable source of both theoretical and applied knowledge, encouraging innovative design and applications of a variety of products, services, and systems for their safe, effective, and pleasurable use by people around the world. April 2021
Tareq Z. Ahram Redha Taiar Fabienne Groff
Contents
Artificial Intelligence and Computing Supplementing Machine Learning with Knowledge Models Towards Semantic Explainable AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jennifer Sander and Achim Kuwertz
3
CAGEN - Context-Action Generation for Testing Self-learning Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Stang, Maria Guinea Marquez, and Eric Sax
12
Improving the Retention and Progression of Learners Through Intelligent Systems for Diagnosing Metacognitive Competencies – A Case Study in UK Further Education . . . . . . . . . . . . . . . . . . . . . . . . Tej Samani, Ana Isabel Canhoto, and Esin Yoruk Introduction of an Algorithm Based on Convolutional Neural Networks for an Automated Online Correction of Braided Cardiovascular Implants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benedikt Haas, Marco Stang, Valentin Khan-Blouki, and Eric Sax Increasing the Understandability and Explainability of Machine Learning and Artificial Intelligence Solutions: A Design Thinking Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arianit Kurti, Fisnik Dalipi, Mexhid Ferati, and Zenun Kastrati Natural Language Understanding (NLU) on the Edge . . . . . . . . . . . . . . Nicolas Crausaz, Jacky Casas, Karl Daher, Omar Abou Khaled, and Elena Mugellini Exploiting Home Infrastructure Data for the Good: Emergency Detection by Reusing Existing Data Sources . . . . . . . . . . . . . . . . . . . . . Sebastian Wilhelm
20
28
37 43
51
vii
viii
Contents
Towards a General and Complete Social Assistive Robotic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bilal Hoteit, Ahmad Faour, Ali Abdallah, Imad Alex Awada, Alexandru Sorici, and Adina Magda Florea
59
Intelligent Control of HVAC Systems in Electric Buses . . . . . . . . . . . . . Martin Sommer, Carolin Junk, Tobias Rösch, and Eric Sax
68
Optimal Prediction Using Artificial Intelligence Application . . . . . . . . . Marwan Abdul Hameed Ashour and Iman A. H. Al-Dahhan
76
An Exploration of One-Shot Learning Based on Cognitive Psychology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dingzhou Fei
84
Integrity Mechanism of Artificial Intelligence for Person’s Auto-poiesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicolay Vasilyev, Vladimir Gromyko, and Stanislav Anosov
89
Augmented, Virtual and Mixed Reality Simulation A Technological Framework for Rapid Prototyping of X-reality Applications for Interactive 3D Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanouil Zidianakis, Antonis Chatziantoniou, Antonis Dimopoulos, George Galanakis, Andreas Michelakis, Vanesa Neroutsou, Stavroula Ntoa, Spiros Paparoulis, Margherita Antona, and Constantine Stephanidis
99
Design and Evaluation of an Augmented Reality Application for Landing Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Harald Schaffernak, Birgit Moesl, Wolfgang Vorraber, Reinhard Braunstingl, Thomas Herrele, and Ioana Koglbauer Virtual Collection as the Time-Shift Appreciation: The Experimental Practice-Led Research of Automated Marionette Hsiao Ho-Wen Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chih-Yung Chiu Extended Reality in Business-to-Business Sales: An Exploration of Adoption Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Heiko Fischer, Sven Seidenstricker, and Jens Poeppelbuss Towards Augmented Reality-Based Remote Family Visits in Nursing Homes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Eva Abels, Alexander Toet, Audrey van der Weerden, Bram Smeets, Tessa Klunder, and Hans Stokking
Contents
ix
Virtual Reality Environment as a Developer of Working Competences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Jorge A. González-Mendívil, Miguel X. Rodríguez-Paz, and Israel Zamora-Hernández Digital Poetry Circuit: Methodology for Cultural Applications with AR in Public Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Vladimir Barros, Eduardo Oliveira, and Luiz F. de Araújo Human–Computer Interaction Co-creating Value with the Cooperative Turn: Exploring Human-Machinic Agencies Through a Collective Intelligence Design Canvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Soenke Zehle, Revathi Kollegala, and David Crombie Virtual Assistant: A Multi-paradigm Dialog Workflow System for Visitor Registration During a Pandemic Situation . . . . . . . . . . . . . . 162 Martin Forsberg Lie and Petter Kvalvik Challenges of Human-Computer Interaction in Foreign Language Teaching: The Case of a Russian Technological University . . . . . . . . . . 170 Alexander Gerashchenko, Tatiana Shaposhnikova, Alena Egorova, and Dmitry Romanov Influence of Gender, Age, and Frequency of Use on Users’ Attitudes on Gamified Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Adam Palmquist and Izabella Jedel Personalizing Fuzzy Search Criteria for Improving User-Based Flexible Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Mohammad Halim Deedar and Susana Muñoz-Hernández Virtual Reality to Improve Human Computer Interaction for Art . . . . . 200 Fayez Chris Lteif, Karl Daher, Leonardo Angelini, Elena Mugellini, Omar Abou Khaled, and Hayssam El Hajj A Gaze-Supported Mouse Interaction Design Concept for State-of-the-Art Control Rooms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Nadine Flegel, Christian Pick, and Tilo Mentler Video Conferencing in the Age of Covid-19: Engaging Online Interaction Using Facial Expression Recognition and Supplementary Haptic Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Ahmed Farooq, Zoran Radivojevic, Peter Mlakar, and Roope Raisamo Real-Time Covid-19 Risk Level Indicator for the Library Users . . . . . . 224 Sadiq Arsalan and Khan Maaz Ahmed
x
Contents
Pilot’s Visual Eye-Track and Biological Signals: Can Computational Vision Toolbox Help to Predict the Human Behavior on a Flight Test Campaign? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Marcela Di Marzo, Jorge Bidinotto, and José Scarpari Meeting the Growing Needs in Scientific and Technological Terms with China’s Terminology Management Agency – CNCTST . . . . . . . . . 239 Jiali Du, Christina Alexandris, Yajun Pei, Yuming Lian, and Pingfang Yu The Effect of Outdoor Monitor on People’s Attention . . . . . . . . . . . . . . 246 Chau Tran, Ahmad Bilal Aslam, Muhammad Waqas, and Islam Tariq Acceptance and Practicality of Voice Assistance Systems in the Everyday Life of Seniors: A Study Design . . . . . . . . . . . . . . . . . . 254 Dietmar Jakob Exploring the Effectiveness of Sandbox Game-Based Learning Environment for Game Design Course in Higher Education . . . . . . . . . 262 Tengfei Xian Testing a Trojan Horse: Fifth Step in the Experiment/Research with City Information Modelling (CIM) and the Design Ethics . . . . . . . 273 Gonçalo Falcão and José Beirão Human Computer Interaction Design of Online Shopping Platform for the Elderly Based on Flow Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Jin Zhou and Meiyu Zhou User Experience Evaluation of a Smoking Cessation App in People Who Are Motivated to Quit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Ananga Thapaliya, K. C. Kusal, and Prabesh Nepal Exploring Methods of Combining Eye Movement and Facial Expression for Object Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Qing Xue, Qing Ji, and Jia Hao The Correlation of Biscuit Packaging Image Based on Visual-Taste Synesthesia Psychology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Jiamu Liu and Meiyu Zhou Human Error Analysis, Management and Application for Airborne System of Civil Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Kun Han and Hongyu Zhu Human-Centered Design Interaction Design Patterns for International Data Space-Based Ecosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Torsten Werkmeister
Contents
xi
Display of Range Changes in E-Trucks: An Empirical Investigation of Three Concept Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Pia S. C. Dautzenberg, Gudrun M. I. Voß, Philip Westerkamp, Sabine Bertleff, and Stefan Ladwig Acceptance of Smart Automated Comfort Functionalities in Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Maria Guinea, Marco Stang, Irma Nitsche, and Eric Sax User-Centered Optimization System at Workshop Level for More Energy-Efficient Machine Tool Operations . . . . . . . . . . . . . . . . . . . . . . . 339 Thore Gericke, Alexander Mattes, Benjamin Overhoff, and Lisa Rost Effect of Guiding Information from the Elbow to Foot Proprioception During Horizontal Perceptual Tasks in Individuals with Impaired Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Tadashi Uno, Tetsuya Kita, Ping Yeap Loh, and Satoshi Muraki Green Densification Strategies in Inner City for Psycho-PhysicalSocial Wellbeing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Cristiana Cellucci and Michele Di Sivo Built Environment Design and Human Interaction: A Demanded Arrangement for Humanized Cities . . . . . . . . . . . . . . . . . 359 Cristina Caramelo Gomes Fashion Technology – What Are the Limits of Emerging Technological Design Thinking? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Pertti Saariluoma, Hanna-Kaisa Alanen, and Rebekah Rousi Smart Meeting Room Monitoring System with Sanitizing Alerts . . . . . . 375 Haroon Mohammad Sarwar, Muhammad Umar, and William Svea-Lochert Relation Between Usability and Accessibility: A Case Study in Peruvian E-Commerce Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Freddy Paz and Freddy Asrael Paz Towards Human-Centered Design of Workplace Health Stimulation Interventions: Investigation of Factors Contributing to Office Workers’ Exercise Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Tianmei Zhang and Jaap Ham Effectiveness of Video Modelling to Improve Playing Skills of Children with Autism Spectrum Disorders . . . . . . . . . . . . . . . . . . . . 397 Myriam Squillaci and Hélène Dubuis Design of Children’s Creative Toys Based on Cultural Experience . . . . 405 Jinwu Xiang and Huajie Wang
xii
Contents
Software Instruments for Management of the Design of Educational Video Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Yavor Dankov and Boyan Bontchev Designing Software Instruments for Analysis and Visualization of Data Relevant to Playing Educational Video Games . . . . . . . . . . . . . 422 Yavor Dankov and Boyan Bontchev Measuring Environmental Variables and Proposal of Determination of Thermal Comfort in Industrial Plant . . . . . . . . . . . . . . . . . . . . . . . . . 430 Norma Pinto, Antonio Xavier, Guatacara Santos Jr, Joao Kovaleski, Silvia Gaia, and Kazuo Hatakeyama Analysis of Color Control and Humanized Design in Ship Cabin Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Fuyong Liu, Yong Li, and Ruimin Hao Applications in Healthcare and Wearable Technologies A Smart Mirror to Encourage Independent Hand Washing for Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Nils-Christian W. Rabben and Stine Aurora Mikkelsplass Combining Weather and Pollution Indicators with Insurance Claims for Identifying and Predicting Asthma Prevalence and Hospitalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Divya Mehrish, J. Sairamesh, Laurent Hasson, and Monica Sharma Spatial Summation of Electro-Tactile Displays at Subthreshold . . . . . . . 463 Rahul Kumar Ray and M. Manivannan Assisting People During COVID-19 with Data Visualization and Undemanding Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Ida Christine Karlsen, Sundarrajan Gopalakrishnan, Syeda Miral Kazmi, and Godavari Sandula Modeling CoVid-19 Diffusion with Intelligent Computational Techniques is not Working. What Are We Doing Wrong? . . . . . . . . . . 479 Marco Roccetti and Giovanni Delnevo A Smart Chair Alert System for Social Distance Implementation for COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Waseem Hussain, Ahmed Iftikhar, and Simen Grøndalen Predicting Psychological Pathologies from Electronic Medical Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Chaimae Taoussi, Imad Hafidi, Abdelmoutalib Metrane, and Abdellatif Lasbahani
Contents
xiii
Emergency Information Transmission Based on BeiDou (BD) Short Message System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Liang Dong, Jingjing Mao, Pincan Su, Jinou Chen, Changqin Pu, and Zhengwu Pu Ergonomics Assessment Based on IoT Wearable Device . . . . . . . . . . . . 508 Jorge Corichi-Herrejón, Andrea Santiago-Pineda, Begoña Montes-Gómez, Ana S. Juárez-Cruz, Gabriela G. Reyes-Zárate, Jesús Moreno-Moreno, and Daniel Pérez-Rojas Understanding Health and Behavioral Trends of Successful Students Through Machine Learning Models . . . . . . . . . . . . . . . . . . . . 516 Abigale Kim, Fateme Nikseresht, Janine M. Dutcher, Michael Tumminia, Daniella Villalba, Sheldon Cohen, Kasey Creswel, David Creswell, Anind K. Dey, Jennifer Mankoff, and Afsaneh Doryab Validation of the JiBuEn® System in Measuring Gait Parameters . . . . 526 Qin Gao, Zeping Lv, Xuefei Zhang, Yao Hou, Haibin Liu, Weishang Gao, Mengyue Chang, and Shuai Tao Lumbo-Pelvic-Hip Angle Changes During Upright and Free Style Sitting in Office Workers with Lower Crossed Syndrome . . . . . . . . . . . 532 Pailin Puagprakong, Poramet Earde, and Patcharee Kooncumchoo Anthropometry of Thai Wheelchair Basketball Athletes . . . . . . . . . . . . 539 Aris Kanjanasilanont, Raul Calderon Jr., and Pailin Puagprakong Analysis of Ergonomic Risks of Drilling Hammers Activity in the Granite Mining Through WinOWAS Software . . . . . . . . . . . . . . 545 Norma Pinto, Antonio Xavier, Thalmo Coelho, and Kazuo Hatakeyama Human-Technologies and Future of Work Human-Centered Test Setups for the Evaluation of HumanTechnology Interaction in Cockpits of Highly-Automated Vehicles . . . . 555 Patrick Schnöll Lost People: How National AI-Strategies Paying Attention to Users . . . 563 Pertti Saariluoma and Henrikki Salo-Pöntinen Blockchain 3.0: Internet of Value - Human Technology for the Realization of a Society Where the Existence of Exceptional Value is Allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Junichi Suzuki and Yasuhiro Kawahara Identifying Positive Socioeconomic Factors of Worker Roles . . . . . . . . . 578 Shivam Zaveri
xiv
Contents
Design of an Extended Technology Acceptance Model for Warehouse Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Benedikt Maettig Historical Factors in Transportation Styling . . . . . . . . . . . . . . . . . . . . . 599 Yong Li, Fuyong Liu, and Ruimin Hao Lessons Learned from Distance Collaboration in Live Culture . . . . . . . 608 Sven Ubik, Jakub Halák, Martin Kolbe, Jiří Melnikov, and Marek Frič Training Students for Work with Emerging Technologies in a Technology Park Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 Tatiana Shaposhnikova, Alexander Gerashchenko, Alena Egorova, and Vyacheslav Minenko The Sequence of Human’s Aesthetic Preference and Products’ Mechanical Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Ruimin Hao, Yong Li, and Fuyong Liu Use of Data Mining to Identify the Technological Resources that Contribute to School Performance in Large-Scale Evaluations of Brazilian High School . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Ivonaldo Vicente da Silva and Márcia Terra da Silva Management Aspects in the Higher Education Quality Assurance System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Yuliya Zhuravel, Nazariy Popadynets, Inna Irtyshcheva, Ihor Stetsiv, Iryna Stetsiv, Iryna Hryhoruk, Yevheniya Boiko, Iryna Kramarenko, Nataliya Hryshyna, and Antonina Trushlyakova Influence of Participation and Tardiness to Synchronous Learning Sessions as a Motivation Factor for E-Learning . . . . . . . . . . . . . . . . . . . 643 Ernesto Hernández, Zury Sócola, Lucia Pantoja, Angélica Atoche, and Walter Hernández Management, Training and Business Applications Factors Related to the Use and Perception of a Gamified Application for Employee Onboarding . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Izabella Jedel and Adam Palmquist Multi-objective Schedule Optimization for Ship Refit Projects: Toward Geospatial Constraints Management . . . . . . . . . . . . . . . . . . . . . 662 Daniel Lafond, Dave Couture, Justin Delaney, Jessica Cahill, Colin Corbett, and Gaston Lamontagne BIASMAP – Developing a Visual Typology and Interface to Explore and Understand Decision-Making Errors in Management . . . . . . . . . . . 670 Martin Eppler and Christian Muntwiler
Contents
xv
Gender Approach to Studying History of Social Work in Ukraine . . . . 678 Oksana Kravchenko, Olha Svyrydiuk, Iryna Karpych, and Olha Boiko Health and Safety Management at Workplaces Through Interactive Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686 Sigrid Kontus and Karin Reinhold Mechanisms to Manage the Regional Socio-Economic Development and Efficiency of the Decentralization Processes . . . . . . . . . . . . . . . . . . . 694 Inna Irtyshcheva, Iryna Kramarenko, Taras Vasyltsiv, Yevheniya Boiko, Olena Panukhnyk, Nataliya Hryshyna, Oryslava Hrafska, Olena Ishchenko, Nataliya Tubaltseva, Ihor Sirenko, Nazariy Popadynets, and Iryna Hryhoruk Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Artificial Intelligence and Computing
Supplementing Machine Learning with Knowledge Models Towards Semantic Explainable AI Jennifer Sander(B) and Achim Kuwertz Fraunhofer IOSB, Institute of Optronics, System Technologies and Image Exploitation, Karlsruhe, Germany {Jennifer.Sander,Achim.Kuwertz}@iosb.fraunhofer.de
Abstract. Explainable Artificial Intelligence (XAI) aims at making the results of Artificial Intelligence (AI) applications more understandable. It may also help to understand the applications themselves and to get an insight into how results are obtained. Such capabilities are particularly required with regard to Machine Learning approaches like Deep Learning which must be generally considered as black boxes, today. In the last years, different XAI approaches became available. However, many of them adopt a mainly technical perspective and do not sufficiently take into consideration that giving a well-comprehensible explanation means that the output has to be provided in a human understandable form. By supplementing Machine Learning with semantic knowledge models, Semantic XAI can fill some of these gaps. In this publication, we raise awareness for its potential and, taking Deep Learning for object recognition as an example, we present initial research results on how to achieve explainability on a semantic level. Keywords: Artificial Intelligence · Explainable Artificial Intelligence · Explainability · Machine Learning · Deep Learning · Knowledge models · Knowledge engineering · Semantics
1 Introduction The world becomes increasingly digitized and networked. The large amounts of data being available can often only be processed with machine support based on AI (Artificial Intelligence). In today’s AI applications, users and people affected by the results of an AI often do not know how and why certain results have been obtained. When an AI delivers an incorrect result, it is also often unclear how to avoid this in the future. This particularly (but not exclusively) applies to AI approaches from the field of Machine Learning. The typical goals in Machine Learning have so far been focused on training and optimization (in terms of accuracy and error frequency). However, the questions why, for instance, Deep Learning generates certain results and if they are meaningful, have hardly been researched for a long time. Deep Learning techniques are essentially based on the use of huge amounts of data; they derive their own (implicit) statistical models on-the-fly from the data presented to them. For well-performing Deep © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 3–11, 2021. https://doi.org/10.1007/978-3-030-74009-2_1
4
J. Sander and A. Kuwertz
Learning, a high amount of qualitatively appropriate training data with ground truth annotations is required and complex models being hard to comprehend by a human are built by the Machine Learning algorithms. It is essential that the training data capture all relevant aspects – which is often difficult to ensure. As a consequence, current Deep Learning approaches give few guarantees on what they will learn. It cannot be guaranteed that they will learn (all) the desired rules and connections which are well known to humans. Also, more generally, using current Deep Learning approaches, it is often very difficult to understand what a system has learned from the given data and what not. Such complex Machine Learning approaches are therefore generally considered to be black boxes, today. This can be highly critical, for example with regard to end-user acceptance, with regard to the implementation of AI in assistance-based systems, and with regard to its use in security-critical applications. In addition, the compliance of an AI with basic ethical principles and, for instance, the General Data Protection Regulation might not be guaranteed. Explainable AI (XAI) aims at making results delivered by AI applications more understandable. It may also help to understand the AI applications themselves and to get an insight into how results are obtained. In the last years, XAI has turned into a hype and different XAI approaches became available. However, many of these approaches adopt a mainly technical perspective – in the sense that they do not sufficiently take into consideration that giving a well-comprehensible explanation to a human (a nontechnical expert) means that the output has to be provided in a human understandable form, i.e., conform with the way humans build their meaning of objects, situations, etc. It is pertinent to note that, with this regard, also human factors have to be taken into consideration and also that no XAI solution may meet per se the requirements and expectations of all stakeholders [1, 2]. By supplementing Machine Learning approaches with semantic knowledge models, Semantic XAI can fill some of these gaps. In this publication, we raise awareness for its potential and, taking Deep Learning for object recognition as an example, we present initial research results on how the use of knowledge models in the area of Machine Learning can contribute to explainability on a semantic level and to an improvement of the validity of results. The rest of this publication is organized as follows. In Sect. 2, we provide additional background and insight into selected aspects with regard to XAI. Sect. 3 is devoted to our research approach to Semantic XAI. After an introductory overview (Sect. 3.1), essences of a closer conceptual examination of the approach are presented (Sect. 3.2) and an illustrative example with regard to one specific possible application is given (Sect. 3.3). Finally, we conclude this publication in Sect. 4 by giving a short summary and a brief outlook on further research.
2 Closer Look on Aspects of Explainable Artificial Intelligence As already mentioned in Sect. 1, in recent years, XAI has gained enormous attention. It should be noted in this respect that, as pointed out for instance in [3], XAI is not a new research topic. However, the increasing use of AI applications from the field of Machine Learning – which in turn is due the enormous growth in computer power, the availability
Supplementing Machine Learning with Knowledge Models
5
of large amounts of data and algorithmic improvements – made the central importance of XAI drastically apparent. In addition, the increasing use of Machine Learning (also in new applications) conditioned the need for new XAI approaches. There is no generally accepted formal definition of XAI and also central terms like explanation, explainability, transparence, trustworthiness (of AI) are at least in detail used differently in the technical literature. A harmonization of the terminology would be basically expedient. However, respective activities have to consider the diversity of XAI as research field and particularly also the fact that, as addressed for instance in [4], there will be no concepts and techniques for XAI being always suited. The following fundamental distinction with regard to the concrete objective which the application of XAI approaches may have is often helpful: 1. Enable the full traceability of the AI application itself 2. Make sure that the main influential factors with regard to one or several concrete individual results of the AI can be identified. In practice, objective 1 may often be very hard or even not achievable. This is particularly the case for complex Machine Learning approaches. In addition, objective 1 may require the disclosure of key aspects of the respective AI application. In many cases, achieving objective 2 may be sufficient and/or actually desired. For instance, it is sufficient to satisfy the requirements that the General Data Protection Regulation implies for AI applications [5]. To give another example, for a physician using an AI based patient diagnostic system, getting a detailed insight into the system-internal processes may not be helpful. However, it will be helpful for him to know what symptoms and/or laboratory data have been relevant for the diagnosis delivered by the AI. The detailed objectives and priorities which should be intended by XAI depend on various aspects, amongst others on the concrete application area. For instance, DARPA’s XAI program (DARPA: Defense Advanced Research Projects Agency) specifically addresses the following two application areas being of high operational relevance in the military domain [6]: data analytics, where analysts are looking for interesting items in large heterogeneous multimedia datasets, and autonomy, where operators need to direct autonomous systems to accomplish several missions. In particular, different people/people in different roles also have different needs and expectations with regard to the explainability of an AI application [4]. For instance, a physician using an AI based patient diagnostic system requires different kinds of explanations as compared to his patient or the designer of the system. The current technical literature provides a large variety of XAI concepts and techniques; giving a more detailed overview is not within the scope of this publication. With this regard, we refer interested readers to the surveys given in [4, 7–9]. An adequate XAI system needs not only to be based on appropriate internal models. It is also essential that the information exchange between the XAI system and its users is realized via appropriate interfaces. In addition, the explanation has to be delivered in a suitable representation. For instance, such a representation may use reference data (e.g., reference images), visual representations (e.g., heatmaps, graphs) or textual elements (in natural language). It is also worth noting that, in particular, multimodal and interactive XAI approaches can offer significant benefits [2, 10].
6
J. Sander and A. Kuwertz
3 An Approach to Semantic Artificial Intelligence 3.1 Overview As already described in Sect. 1, Machine Learning approaches like Deep Learning learn statistical relationships which they derive from training data. The larger and the more representative the training data is, the more likely problem-relevant relationships (e.g., what is important to classify an object in an image as a vehicle) will be learned. However, this fact is not guaranteed and not directly verifiable in concrete tasks. From this, the high importance of XAI with regard to Machine Learning results. Many available XAI approaches adopt a mainly technical perspective and, from a methodological point of view, they are often limited to the levels of data and features. Sufficient expertise is required for the interpretation of the findings and also for performing needed actions with regard to their application (e.g., parameter adjustments). Giving a comprehensive and well-comprehensible explanation requires XAI methods that need to support also (at least) interpretation and (possibly) also further exploration on the information/knowledge level. The output has to be provided in a human understandable form and under consideration of human factors. Furthermore, it has to be considered that explainability depends also on the knowledge and capabilities of the addressees of the XAI techniques and the XAI results. While Machine Learning algorithms work purely data-driven, humans also use knowledge about semantic relationships to generate findings, to check their plausibility, and to justify them. For example, questions whether a certain (recognized) object is really a car would arise if this object does not appear to have wheels and/or appears to be moving on water. Such semantic relationships are often derived from general or background knowledge and they are often related to a symbolic level (vs. subsymbolic), i.e., they can be formulated in natural language. The relationships can relate to part-whole relationships for objects (e.g., a car generally has wheels, a windshield, etc.), relations with regard to their environment (e.g., cars drive on roads, not on water or in the air), proportions, etc. They can be formalized in semantic knowledge models by modeling corresponding classes and connecting them via relations. If, for instance, a Machine Learning algorithm for object recognition delivers also partial results like recognized object parts, with the aid of semantic knowledge models, these partial results can be used to check the plausibility of the (total) result of the object recognition and/or to explain it. We investigate the conceptual background of the described Semantic XAI approach in the next section. 3.2 Closer Conceptual Examination In order to avoid erroneous conclusions from explanations given by Semantic XAI the precise objectives and the necessary pre-conditions enabling the intended conclusions have to be worked out in a more differentiated way. To this end, we investigated more precisely what the term explainability may stand for, here. Basically, in the field of Semantic XAI, semantic knowledge models can be used for different purposes. They may be used to enable the generation of semantic explanations to verify the plausibility of a proposed classification result; an example for such a semantic
Supplementing Machine Learning with Knowledge Models
7
explanation is “The class car is plausible because the respective object is located on a street and it possesses wheels, doors, and a windshield.” Semantic knowledge models may also be used to increase the traceability of the decision obtained by the Machine Learning. This may be possible on basis of a model-based interactive contrafactual explanation dialogue; an example for a finding that may result from this is: “Overall, the object was recognized as a car because it possesses wheels, doors, and a windshield. Without the doors, the findings (obtained by the Machine Learning) would be also consistent with the object (class) buggy.” Finally, semantic knowledge models may be used to support predictability with regard to future results of the Machine Learning; an example for a finding that may result from this is: “Recognizing at least two wheels is (at least with a high probability) a sufficient condition for an object being recognized as (class) car.” Predictability concerns the generalization ability of the Machine Learning, decision-making regions, etc. We make the basic assumption that – in addition to the actual object description, i.e., the proposed object classification result being delivered by the Machine Learning – also additional descriptions like descriptions of object parts and/or scene parts can be obtained. With regard to the questions from which sources such additional descriptions originate and whether there exist dependencies between the decision processes, three cases have to be differentiated: 1. The additional descriptions are gained from the actual Machine Learning algorithm itself, with the additional descriptions being integral features of the actual decision process with regard to the actual object classification. 2. The additional descriptions are gained from the actual Machine Learning algorithm itself, whereby, however, the proposed object classification has been derived independently of them. 3. The additional descriptions are gained from external sources, for example from other (independent) Machine Learning algorithms. Basically, verifying the plausibility of a proposed classification result is possible in all three cases, whereas increasing the traceability of the decision obtained by the Machine Learning is only possible in case 1 and, with some restrictions, also in case 2. For supporting the predictability with regard to future results of the Machine Learning, it has to be ensured that case 1 holds. The additional descriptions have to be assigned to the concepts of the semantic knowledge model. A manual assignment may be required if their semantic meaning is not yet explicit represented or if their mapping to the concepts of the semantic knowledge model is needed and not possible by the application of automatic means. The semantic knowledge model should reflect the human conception of what an object (of a certain class) is. It should reflect what necessary characteristic parts it has and/or in which context it usually occurs/is used. It is pertinent to note that, usually, an appropriate semantic knowledge model also depends on the application domain, the role of the user, etc. Such a model may be individually designed. As this process is cumbersome, the integration of already existing modelled knowledge from resources such as Wikidata and WordNet may be promising. Such external models are automatically generated or
8
J. Sander and A. Kuwertz
human curated. In particular in the first case, aspects with regard to data and model quality have to be considered seriously when using them. It is worth mentioning that their integration (e.g., the mapping of concepts) is not a trivial task. Generally, knowledge modeling is a complex topic; this concerns procedures, principles, and formalisms for representation [11]. As consequence, with regard to Semantic XAI, a close collaboration between the knowledge engineering community and the Machine Learning community would be promising and scientifically expedient. As stateof-the-art in terms of formalisms for knowledge representation, particularly the use of ontologies (see e.g. [12] and references given therein) has established itself. It should be pointed out that the notion of ontologies is closely related to knowledge graphs, whereby for the latter no unique definition exists [13, 14]. 3.3 Example The example being presented in this section for further illustration addresses the plausibility of object recognition in images. It is a local approach which means that it aims at explaining the individual object recognition results. We apply it to already existing Deep Neural Networks working on image data which are being regarded as black boxes (thus, post hoc). It is based on the prerequisite that, in addition to the actual object proposals, parts of the objects and/or context regions can be recognized in the input image. Example: wheel 1 […], wheel 2 […], street […]. The recognition of these additional descriptions has not necessarily to be done through the Deep Neural Network to be explained. They may correspond to interpretable keypoints or to semantically segmented parts of the input image. For vehicles, for instance, VeRi-776 dataset [15] keypoints (headlights, mirrors, logo, etc.) may be used; as these are not yet semantically anchored their manual assignment to concepts of the semantic knowledge model is required. With regard to a detected class, the semantic knowledge model has to define and link these parts that the instances of this class have. Thereby, object parts can be modelled as attributes of the concepts representing the corresponding object classes. Figure 1 schematically illustrates the integration of the semantic knowledge model into the XAI system and indicates the basic course of the data/information flows. The key aspect with regard to Semantic XAI is the DL-2-Concept mapping; it is the basis for the interactions between the Deep Learning system and the semantic knowledge model. The DL-2-Concept mapping delivers the connection of semantic concepts (and their attributes) with the possible results of the Deep Learning algorithm(s). For verifying the plausibility of a recognition result (i.e., of a proposed classification result), it is looked up in the semantic knowledge model which additional description may be recognized and which algorithms can be used for this purpose. These algorithms are applied. Then the explanation is formulated on basis of both, these additional results and the actual object recognition result. To this end, the connections being represented in the semantic knowledge model are used. The explanation is formulated textually in natural language. Its generation may be based on (text) templates which are filled with the obtained results; alternatively, also more sophisticated means from the field of Natural Language Generation (NLG) may
Supplementing Machine Learning with Knowledge Models
9
be used. The latter may be promising to enable personalization, i.e., to adjust the given explanation more specifically to the needs and expectations of the user.
Fig. 1. Schematic illustration of the integration of the semantic knowledge model into the XAI system to enable the verification of the plausibility of object recognitions which are delivered by the Deep Neural Network.
The DL-2-Concept mapping ensures that the object recognition result is semantically meaningful. On this basis, the user could possibly also ask for further information about the recognized object and/or explore the semantic knowledge model. To enable this, corresponding interaction opportunities between the user and the semantic knowledge model have to be foreseen at system design (these are not contained in Fig. 1). In addition, the semantic knowledge model has to be engineered also under consideration of this additional intended use (e.g., lexical explanations or reference images could be included additionally). Needless to say, realizing an interactive explanation dialogue may be especially promising to meet the user needs at the best.
4 Conclusion While Machine Learning algorithms work purely data-driven, humans also use knowledge about semantic relationships to generate findings, to check their plausibility, and to justify them. By supplementing Machine Learning approaches with semantic knowledge models, Semantic XAI allows to take this important aspect into account and, by this, to fill some gaps of current XAI approaches. This publication raised awareness for its potential. Taking Deep Learning for object recognition as an example, we presented initial research results. In further research, we particularly plan to consider interactive approaches for Semantic XAI in greater depth. Specific findings from the field of Interactive Machine Learning (IML) [2, 16] will be helpful, here.
References 1. Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
10
J. Sander and A. Kuwertz
2. Sokol, K., Flach, P.: One explanation does not fit all. KI - Künstliche Intelligenz 34, 235–250 (2020) 3. Goebel, R., Chander, A., Holzinger, K., Lecue, F., Akata, Z., Stumpf, S., Kieseberg, P., Holzinger, A.: Explainable AI: the new 42?. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds.) Machine Learning and Knowledge Extraction. CD-MAKE 2018. LNCS, vol. 11015, pp. 295–303. Springer, Cham (2018) 4. Arya, V., Bellamy, R.K.E., Chen, P., Dhurandhar, A., Hind, M., Hoffman, S.C., Houde, S., Liao, Q.V., Luss, R., Mojsilovic, A., Mourad, S., Pedemonte, P., Raghavendra, R., Richards, J., Sattigeri, P., Shanmugam, K., Singh, M., Varshney, K.R., Wei, D., Zhang, Y.: One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv:1909.030 12v2 [cs.AI] (2019) 5. Döbel, I., Leis, M., Molina Vogelsang, M., Welz, J., Neustroev, D., Petzka, H., Riemer, A., Püping, S., Voss, A., Wegele, M.: Maschinelles Lernen. Eine Analyse zu Kompetenzen, Forschung und Anwendung. Fraunhofer-Gesellschaft, München (2018) 6. Gunning, D., Aha, D.W.: DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019) 7. Arrieta, A.B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020) 8. Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019) 9. Samek, W., Müller, K.-R.: Towards explainable artificial intelligence. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS, vol. 11700, pp. 5–22. Springer, Cham (2019) 10. Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Multimodal explanations: justifying decisions and pointing to the evidence. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 185–191. IEEE (2018) 11. Kuwertz, A., Schneider, G.: Ontology-based meta model in object-oriented world modeling for interoperable information access. In: ICONS 2013, The Eighth International Conference on Systems. IARIA (2013) 12. Kuwertz, A.: On adaptive open-world modeling based on information fusion and inductive inference. In: Beyerer, J. (ed.) Proceedings of the 2010 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. Karlsruher Schriften zur Anthropomatik, vol. 7, pp. 227–242. KIT Scientific Publishing, Karlsruhe (2010) 13. Ehrlinger, L., Wöß, W.: Towards a definition of knowledge graphs. In: Martin, M., Cuquet, M., Folmer, E. (eds.) Joint Proceedings of the Posters and Demos Track of 12th International Conference on Semantic Systems (SEMANTiCS2016) and 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS16). CEUR Workshop Proceedings, vol. 1695, pp. 13–16 (2016) 14. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Labra Gayo, J.E., Kirrane, S., Neumaier, S., Polleres, A., Navigli, R., Ngonga Ngomo, A.-C., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge Graphs. arXiv:2003.02320v3 [cs.AI] (2020)
Supplementing Machine Learning with Knowledge Models
11
15. Wang, Z., Tang, L., Liu, X., Yao, Z., Yi, S., Shao, J., Yan, J., Wang, S., Li, H., Wang, X.: Orientation invariant feature embedding and spatial temporal regularization for vehicle reidentification. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 379– 387. IEEE (2017) 16. Holzinger, A., Plass, M., Kickmeier-Rust, M., Holzinger, K., Crisan, G.C., Pintea, C.-M., Palade, V.: Interactive machine learning: experimental evidence for the human in the algorithmic loop. Appl. Intell. 49, 2401–2414 (2019)
CAGEN - Context-Action Generation for Testing Self-learning Functions Marco Stang1(B) , Maria Guinea Marquez2 , and Eric Sax1 1 Institute for Information Processing Technologies, Karlsruher Institute of Technology,
Engesserstraße 5, 76131 Karlsruhe, Germany {marco.stang,eric.sax}@kit.edu 2 Daimler AG, Sindelfingen, Germany [email protected]
Abstract. This paper presents the general concept and the prototypical implementation for generating data for testing self-learning functions. The concept offers the possibility to create a data set consisting of several different data providers. For this purpose, a mediation pattern adapted to the data generation was developed. The concept was applied to test a self-learning comfort function by classifying the context data into three subgroups: real-, sensor-, and user data. This separation allows a more realistic simulation using the data to test a self-learning comfort function and detect possible malfunctions. The CAGEN concept was implemented as a prototype by simulating GPS and temperature data. Keywords: Data generation · Software engineering · Testing
1 Introduction and Motivation The topic of “artificial intelligence” (AI) is currently omnipresent and attracts the interest and attention of scientists, industry, and governments around the world. Once considered futuristic, this technology is already integrated into our everyday life and will impact many areas of our daily life in the foreseeable future. Alongside electrification, mobility as a service (MaaS), and the connection of vehicles [1] or infrastructure, highly automated driving is one of the megatrends in the automotive industry [2]. For this application, onboard computers with artificial intelligence combine sensor data from cameras, radar, LIDAR, and other sensors to fully automate the driving process. In addition to partially and highly automated vehicles, artificial intelligence enhances driving comfort for both the driver and passengers. For example, instead of a conventional rain sensor, Tesla Inc. utilizes an artificial neural network to control the windshield wiper and detect snow [3]. As more and more self-learning systems enter safety-critical areas, building trust in self-learning systems and their decisions are necessary. Thus, how can we persuade individuals to have confidence in self-learning algorithms? Intensiv testing can increase confidence in a self-learning system. When testing a self-learning system, the unique requirement is to find the relevant test cases from the enormous number of possible test cases on the one hand and generate new, previously unknown test cases that may occur during the use of the system on the other hand. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 12–19, 2021. https://doi.org/10.1007/978-3-030-74009-2_2
CAGEN - Context-Action Generation for Testing Self-learning Functions
13
2 Theoretical Background Recorded real data are rarely available in sufficient quantity and quality. Problems concerning quantity can occur at the beginning of system development. At this time, prototypes to be utilized for data recording or structures for data recording, such as automated pipelines, do not yet exist or are still in the planning stage. Since it is practically not feasible to wait for an adequate data basis during development and thus accelerate the development of self-learning systems, data generation methods are common practice [4]. The challenge of data quantity occurs predominantly with real recorded data and does not exist for simulated data, as these can be generated theoretically without limits. The second point of view concerns the quality of the data. The quality of real recorded data cannot be influenced or only with a high degree of effort. Quality in this context means a high variance of the data. Thus a high number of different conditions, also called scenarios, have to be converted from sensors to data. For example, it is not sufficient to train a self-learning system for autonomous driving exclusively with images of a freeway trip (straight route). Pictures of left or right turns must be present in the training data set as well, in order to enable the AI to learn a steering behavior. Therefore, the reliability of the learned functionality of a self-learning algorithm is mainly based on the quality and quantity of the training data used [5]. For these reasons, there are two interacting methods for generating data sets in artificial intelligence: Simulation for a sufficiently high quantity of data and augmentation to increase the variance and the quality of the data. In augmentation, the quality of the data is increased by adjusting real data. In principle, augmentation of simulated data is also possible but not widely practiced in the literature [6]. Adjustments in augmentation are in the field of visual image recognition: rotate, over/under exposure of images, or adding image sensor noise. Augmentation has its origin in the modification of images but can also modify sensor data (physical measurement values in time series format). In contrast to augmentation, purely synthetic data without real data are generated to increase the quantity. For this purpose, software simulation environments are created that simulates the environment in which the system to be developed will operate. Such an environment is also called context [7] and differs in the abstraction of reality. The advantage of using a simulation environment is the generation of nearly unlimited data out of different contexts with flexible modification of the variance [8]. Besides, this type of data generation is less cost-intensive than the real recording of data by physical systems. There is also no risk of damage to a system, for example, during test drives. A disadvantage of this type of data generation is the induction of a data bias by the simulation. Such a bias can occur if the simulated data does not describe the reality in which the system is running sufficiently accurately [9]. The development of a selflearning system can be divided into two parts: the training of the self-learning system and the testing of the resulted functionality. For both areas, relevant data are necessary, but they differ in their intended purpose. The development area uses data to adjust parameters, the training of a self-learning function. In the test area, data is used to detect a possible malfunction of the self-learning function. According to [10], the vast input space is the main challenge for testing self-learning functions. Therefore, the generation of test data that correctly represents the input space is an essential but challenging task.
14
M. Stang et al.
For the generation of test inputs, input augmentation (31%) and search-based approaches (25%) are, according to [9], the most popular techniques. The concept described below focuses on the augmentation of data, but it is also possible to use search-based approaches with evolutionary algorithms [11] or reinforcement learning [12].
3 Provider Concept for the Data Generation of Self-learning Functions For the challenge of data generation, a concept called CAGEN is described below. CAGEN is an acronym and stands for Context Action GENeration. The content of this work covers the generation of context-based data. The generation of actions, i.e., the creation of events in the system, will be described in a forthcoming paper. 3.1 Conceptual Requirements The creation of the concept for generating data for testing self-learning functions is subject to the following requirements. • [Req1] The concept is supposed to generate a data set from several different data providers, which is used for testing a self-learning function. • [Req2] The concept is designed to enable additional interchangeable data sources (providers, services, recorded data). • [Req3] The communication of the individual data providers must be reduced to one central node. • [Req4] The concept should enable the flexible addition of data providers. A combination of real recorded data and simulated data shall be given. • [Req5] Both (real and simulated data sources) shall be augmented by adding realistic interferences. For this purpose, a random offset or intelligent methods can be applied. 3.2 CAGEN The present concept is based on the mediator design pattern. The mediator pattern is a behavior-based development pattern to reduce the number of unstructured relationships between objects. The pattern restricts the direct communication of the objects and forces an indirect communication via a so-called mediator. As a result, the individual objects depend on only one mediator class without being coupled with any additional object. By the reduction of the communication ways, individual objects can be replaced, extended, or reused. The established mediator design pattern has been tailored for the fusion of individual data providers and the subsequent generation of a customizable test data set. Figure 1 illustrates the adapted design pattern. The individual key components are described in the following.
CAGEN - Context-Action Generation for Testing Self-learning Functions
15
Fig. 1. Adapted mediator design pattern for the use of different data providers and generation of a customizable test data set (CAGEN-Concept)
Data Providers In the present concept, the previously mentioned objects represent data providers, i.e., instances that provide data through external services. The data can be generated by services (APIs) or already exist in datafiles in the cloud. A provider provides a different number of inputs and outputs, allowing several services to be used for data generation. In this paper, however, only one service is used as input. The input, in the form of data provided by the service, is transformed by the internal logic of the provider. The internal logic of a provider is task-specific and can include trivial to advanced tasks. An example of such a logic can be the modification of the data by the addition of a random offset, but also an intelligent procedure for augmenting the data by a Generative Adversarial Networks (GAN) based on historical data. The recalculated data represents the output of one provider and is available to other following providers. A provider that contributes to a service is called a source provider. A provider that modifies or augments the data of a source provider is called a modification provider in the following. The source provider ensures the modularity of the providers as required in requirement 4. The change provider fulfills the option to augment the data described in requirement 5. A demonstration showing in detail the modification process is described in Fig. 3.
16
M. Stang et al.
Base Controller The Base Controller controls the fundamental program operation and determines the sequence of calculation of the individual providers. It is also responsible for controlling the data generation and preventing undeclared data access or changes by unauthorized providers. The Base Controller is initialized at program start, the required providers are added, and the internal simulation of the data is started. The operation of the Base Controller can be divided into two phases, the preparation phase, and the evaluation phase.
Fig. 2. Preparation of the data flow by the Base Controller during the setup phase. The numbers in the right diagram indicate the execution sequence of the providers resulting from their dependencies
In the preparation phase, the required data tables are first initialized. These data tables are still without content, but the structure, such as sampling rate or column caption, is defined. Afterward, the providers are arranged in a logical calculation sequence according to their dependencies. It is guaranteed that the calculations of a preceding provider are already completed and available for further calculation before the next calculation is started. These dependencies create a tree-like structure, which corresponds to the internal flow of data through the program. Providers without dependencies are positioned at the beginning of the calculation sequence. In addition to the sequence, the memory space assigned to each provider is reserved. The data of the providers are stored in a different number of tables, which can be structurally variable depending on the application. The data output of each provider is assigned to a table column. This enables centralized data management and monitoring solution with reduced storage requirements. Figure 2 shows the logical data flow resulting from the exemplary provider network (left) as a tree structure. The described behavior of dependencies is shown by Provider C. According to its internal logic, Provider C can only execute its calculations after Provider A and B have provided their output. During the evaluation phase, the individual providers are calculated multiple times for a predefined number of iterations. With each iteration, the output data of each provider changes depending on the implemented logic, resulting in successive data series. At the end of each iteration, the newly calculated data is appended to the corresponding data tables and finally stored in a data table (e.g., CSV-file). The advantage of the presented concept is the possibility to generate comparable quickly and individualized data sets.
CAGEN - Context-Action Generation for Testing Self-learning Functions
17
4 Prototypical Implementation The proposed CAGEN concept is used in this paper to generate the most realistic sensor data possible for testing a self-learning comfort function. The use of synthetically generated data avoids the problem of self-learning learning of too little data, data with too little variance, and the lack of data in the desired context. For a first evaluation and verification of the proposed concept, this paper deals with the generation of position and temperature-based data. For the data generation, three different contexts are calculated per simulation step, briefly introduced below.
Fig. 3. Visualization of the context using GPS coordinates longitude (1), latitude (2), and temperature (3). For (1) and (2), the sensor context is similar to the real context, which corresponds to a small inaccuracy of the GPS sensor. For (3), a temperature sensor with a higher noise is simulated, and the user context is also displayed.
Real Context In the first step of the data generation simulation, the real context is calculated. This context represents the original but simulated context of the vehicle. In this work, the real context is composed of the GPS-longitude and -latitude and the prevailing temperature at that time and position. This data is artificially generated and is used to reproduce realistic conditions. The simulated location is available from a data provider for the location. This
18
M. Stang et al.
data provider utilizes the API to the OpenRouteService1 and extracts GPS coordinates (longitude and latitude) (Fig. 3, (1), (2)). Further information like estimated travel time or traffic volume can also be retrieved via the proposed API. Similarly, the temperature provider gets historical weather data from the OpenWeatherMap API2 to create different weather conditions for the real context. For this purpose, a temperature provider uses the GPS coordinates of the location provider, forwards those coordinates to the OpenWeatherMap-API, and returns a temperature value related to the respective position. This is an example of the concatenation described in the concept and possible dependencies of providers (Fig. 2). Sensor Context In the second phase, the sensor context is calculated based on the data of the real context. Assuming that the sensor signals are not accurate, noisy, shifted, or temporarily unavailable, the sensor context is different from the real context. Only by considering error-free sensors, both contexts would be equal. Since this is not possible in practice, inaccuracies in measurements with sensors were simulated by the RandomNoiseProvider. Its internal logic connects each given input with a Gaussian distributed error term with a predefined scatter around its mean value. The RandomNoiseProvider is used to calculate the simulated vehicle position, which would otherwise be obtained from a GPS sensor. Figure 3 shows the visualization of the generated data concerning the real context and the sensor context and illustrates the described inaccuracy of the measurement. User Context For the third and final phase, the user context is derived by combining real context and sensor context. The fundamental principle behind this context is the different and person-dependent perception of environmental influences, some of which differ from reality. Unforeseeable factors further enhance this. For example, a person’s perception of warmth depends on the clothing they wear, and the air’s moisture content. Using the same approach as before, the RandomNoiseProvider is first selected to calculate both location and temperature data for the user context. Compared to the sensor context, larger deviations from the real data are assumed. Figure 3 shows the two data providers and their respective different contexts. The data of the Location Provider show are only shown by real and sensor context because a human (user) can estimate his approximate position in terms of street names or locations but cannot express it by GPS coordinates (longitude or latitude). For the temperature provider’s data, a simulation of all three possible contexts is reasonable because a sensor can indicate noise and malfunctions, and people can also perceive the temperature differently due to external influences.
5 Summary and Outlook The CAGEN concept for data generation for testing self-learning functions is based on the mediator design pattern. The pattern was adapted for the modular integration 1 https://openrouteservice.org/. 2 https://openweathermap.org/.
CAGEN - Context-Action Generation for Testing Self-learning Functions
19
of several different data sources (APIs, clouds, and data-files). By concentrating the logic in one mediator, here called base controller, the comprehensibility of the system’s logic is increased, and standardization of the generated data sets is achieved. For data generation as close to reality as possible, the different contexts, real, senor, and user, were introduced. For these contexts, GPS coordinates and temperature data were generated in a prototypical implementation to verify a self-learning function. As an outlook, an increase in the number of data providers is intended, integration of real recorded data, and the resulting hybrid data generation for testing different self-learning functions in a realistic approach. Acknowledgement. I want to thank the student assistants Vinzenz Rau, Matthias Zipp and Felix Schorle for their work and commitment to this project.
References 1. Kalmbach, R., et al.: Automotive Landscape 2025 [electronic Resource]: Opportunities and Challenges Ahead. Roland Berger Strategy Consultants (2011) 2. Guissouma, H., Klare, H., Sax, E., Burger, E.: An empirical study on the current and future challenges of automotive software release and configuration management. In: Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 298–305. IEEE (2018) 3. Teslamag.de. https://teslamag.de/news/neuronales-netz-soll-in-teslas-regen-und-schnee-erk ennen-25496 4. Edvardsson, J.: A survey on automatic test data generation. In: Proceedings of the 2nd Conference on Computer Science and Engineering (1999) 5. Obermeyer, Z., Emanuel, E.: Predicting the future—big data, machine learning, and clinical medicine. New Engl. J. Med. 375(13), 1216–1219 (2016) 6. Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019) 7. Matalonga, S., Rodrigues, F., Travassos, G.: Characterizing testing methods for context-aware software systems: results from a quasi -systematic literature review. J. Syst. Softw. 131, 1–21 (2017) 8. Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. (2020) 9. Dietterich, T.G., Kong, E.B.: Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report, Department of Computer Science, Oregon State University (1995) 10. Riccio, V., Jahangirova, G., Stocco, A., et al.: Testing machine learning based systems: a systematic mapping. Empirical Softw. Eng. 25(6), 5193–5254 (2020) 11. Lauber, A., Sommer, M., Fuchs, M., Sax, E.: Evolutionary algorithms to generate test cases for safety and IT-security in automotive systems. In: Syscon 2020–14th Annual IEEE International Systems Conference (2020) 12. Baumann, D., Pfeffer, R., Sax, E.: Automatic generation of critical test cases for the development of highly automated driving functions. In: 93rd IEEE Vehicular Technology Conference – Spring (2021)
Improving the Retention and Progression of Learners Through Intelligent Systems for Diagnosing Metacognitive Competencies – A Case Study in UK Further Education Tej Samani1(B) , Ana Isabel Canhoto2 , and Esin Yoruk3 1 Performance Learning and CBiS - Centre for Business in Society, Coventry University,
Coventry, UK [email protected] 2 Brunel University London, London, UK [email protected] 3 School of Strategy and Leadership and CBiS - Centre for Business in Society, Coventry University, Coventry, UK [email protected]
Abstract. Metacognitive competencies related to cognitive tasks have been shown to predict learning outcomes. Less however is known about how metacognitive competencies can enhance the retention and progression of learners in Further Education. This study provides evidence from Performance Learning (PL) and its intelligent system PLEX, PL’s proprietary technology, to show how learners’ self-reports on meta-cognitive dimensions can be used as predictors of learner retention and progression within the learner’s course/s. The results confirm the predictive potential of PLEX technology in early identification of metacognitive competencies in learning and helps learners with developing effective remedies to enhance their retention and progression levels. Keywords: Artificial intelligence · Meta-cognition · Machine learning · Mental health · Human interaction · Emerging technologies · Teaching · Learning
1 Introduction It is estimated that artificial intelligence (AI) has potential to contribute US$13 trillion to the global economy by 2030, amounting to 1.2% additional GDP growth per year [1]. One of the areas that AI opens up new application domains in is the field of education, by radically increasing learner performance and productivity [2]. Current applications of AI in the education sector aim mostly at areas such as teacher effectiveness, and student engagement via tutoring systems [3, 4]. However, the use of AI to identify learners at risk of underperforming, or failing to progress, in their studies is a neglected area of investigation, particularly as far as disadvantaged groups are concerned. Yet, the ability to identify such learners, and intervene in ways that support their academic performance, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 20–27, 2021. https://doi.org/10.1007/978-3-030-74009-2_3
Improving the Retention and Progression of Learners
21
is of critical importance for the learners themselves [5] and for society [6, 7], as well as for the educational institutions where those learners are enrolled [8]. This paper addresses this gap. We analyze the use of AI technology to identify learners at risk of underperforming academically, in order to enable early intervention. Specifically, we consider the PLEX tool, which assesses learners’ along 27 psychobehavioral dimensions that impact on learners’ academic performance, and classifies them in one of five different levels of academic performance risk. Then, using a case study methodology, we report on the use of this technology at a further education (FE)1 college in north of England. Based on data collected from 536 learners at the start and the middle of the academic year, the educational institution develop interventions to support student outcomes and retention, which were cost effective and addressed the needs of specific learners. This resulted in the educational institution identifying £60,000 in cost savings within two academic terms. Furthermore, the institution decided to embed PLEX in the induction process and extend it to all learners, to support their wellbeing, while achieving additional costs savings. We discuss how other institutions might benefit from the use of AI technology to identify learners at risk of underperforming.
2 PLEX: An Intelligent Tool to Assess Cognitive and Behavioral Risk PLEX is a multiple-choice assessment tool, developed by a UK based venture, Performance Learning Education, based on coaching principles, and which has been applied, refined and validated through application on more than 60,000 learners. The tool assesses learners along 27 psycho-behavioral dimensions which impact on learners’ academic performance, by affecting: (i) the process for learning information, (ii) the perception of the institution where they are enrolled, (iii) the perception of their own academic performance, and their own meta cognitive abilities, and (iv) their desired school and professional outcomes. PLEX is distinct from other learning assessment tools in that it considers multiple factors (Table 1), and it also detects the reasons for underperformance. The “metacognitive domains” items identify the presence of factors that hinder the learners’ ability to perform academically. While the absence of these factors does not ensure better class engagement, and does not guarantee academic success, their presence hinders learning [9]. These items are hygiene factors which can prevent or enable academic success. In order to scaffold the learners’ academic success, these barriers must be identified and removed. However, removing these barriers will not be enough to guarantee academic success. For that end, it is also necessary to improve learners’ perceptions [10]. The “perceptions” items assess the learners’ attitudes towards various dimensions of learning. While the absence of these items does not impact on learners’ ability to learn, it will impact on their drive to learn. The presence of positive perceptions towards specific learning dimensions is critical for academic success. Therefore, the “perceptions” are motivating factors, which driver academic success. Once the scaffolding has 1 Further education colleges are educational institutions whose courses focus on job specific
skills, and are often designed in collaboration with local employers. Some courses are designed as pathways into university degrees. Further education degrees tend to be more affordable than university degrees, with smaller class sizes.
22
T. Samani et al. Table 1. Dimensions that contribute to the overall PLEX result
Metacognitive domains
Perceptions
1. 2. 3. 4. 5. 6. 7.
1. 2. 3. 4. 5. 6. 7. 8.
Sleep & Tiredness Retaining & Recalling Information Test Anxiety Focus & Motivation Organization & Time Management Stress & Strain Confidence
Towards Effort & Determination to Work Towards Learning Readiness to Learn Towards Subject Demands Towards the Institution Towards Self Towards Showing Up to Learn Towards Tutors/Lecturers
been established by removing the metacognition barriers, the negative perceptions must be identified and addressed, in order to build the walks of the learners’ academic success. PLEX’s questions are designed to elicit self-assessment from the learners, which has advantages over assessment by a third party (e.g., teachers) [9]. The answers are used to calculate the overall risk level for each learner, with learners being classified into one of five alert levels, from “no alert” to “severe alert”. The higher the alert level, the higher the risk that the learners’ emotions and behaviour will impact their classroom engagement and performance, as well as their academic success.
3 The Case Study During the academic year 2019–20, the PLEX assessment tool was used to calculate the ability and readiness to learn of pupils at a medium-sized FE college located in the North of England, and henceforth referred to as The College. The College serves geographical areas which rank among the top 1% in England for multiple deprivation. 563 learners undertook the PLEX assessment at the start and middle of the academic year. The second assessment took place before the UK government announced lockdown measures to contain the Covid-19 pandemic2 . All learners in the tested pool had some cause for concern in terms of risk factors related to academic performance and retention (Table 2). Some learners (10.7% in Assessment 1 and 18.3% in Assessment 2) displayed behaviors and attitudes that did not, at the moment, prevent them from achieving academic success (e.g., completing their degree). However, they faced barriers that prevented them from achieving a higher-grade range. More worryingly, the majority of the learners were at a moderate or above risk of failing to succeed academically. In between the two assessment points, there was a decrease in the number of learners in the moderate alert level, and an increase in the Low Alert category. This signals that there was an improvement in the underlying behaviors and attitudes that support academic success for these learners. However, as the academic year progressed, there was also an increase in the number of learners in the high and severe alert levels. The College was facing a heightened risk that their learners would under-perform in academic 2 See https://www.legislation.gov.uk/uksi/2020/350/contents/made/data.htm.
Improving the Retention and Progression of Learners
23
Table 2. Overall Alert levels
Assessment 1
Assessment 2
Change
No Alert
Low Alert
Moderate Alert
High Alert
Severe Alert
Total
Number of students
0
60
423
79
1
563
Proportion of students
0.0%
10.7%
75.1%
14.0%
0.2%
100%
Number of students
0
103
331
123
6
563
Proportion of students
0.0%
18.3%
58.8%
21.8%
1.1%
100%
Number of students
0
43
-92
44
5
Proportion of students
0.0%
71.7%
-21.7%
55.7%
500.0%
assessment. Moreover, there was also an increased risk that learners would fail to progress in their studies, or even complete their degrees [11]. More than three quarters of the learners displayed problems in terms of the foundational behaviors and ability that scaffold academic success. Table 3 shows that learners were at significant disadvantage at the start of the academic year, as they lacked basic techniques for academic success. Table 3. Alert levels for Metacognition Domain – Assessment 1 Assessment 1 Domain
Number of students
Proportion of students
Sleep & Tiredness Retaining & Recalling Information Test Anxiety Focus & Motivation Organisation & Time Management Stress & Strain Confidence
528
94%
436
77%
501
89%
432
77%
507
90%
545
97%
494
88%
In assessment 2, there had been a slight improvement in alert levels for the metacognition domain. As per Table 4, around half of the learners were now reporting high levels of test anxiety and problems with organization and time management skills. While half of the cohort was still struggling on these dimensions, there was a significant improvement for “test anxiety”, and for “organization and time management skills”, respectively. Unfortunately, progress on other foundational skills such as “Sleep & tiredness”, “Stress & strain”, and “Confidence” was negligible. Moreover, the scores for “Retaining & recalling information” and “Focus & motivation” show a degradation of the situation in terms of these essential skills for academic success.
24
T. Samani et al. Table 4. Alert levels for Metacognition Domain – Assessment 2 Assessment 2
Change
Domain
Number of students
Proportion of students
Number of students
Proportion of students
Sleep & Tiredness Retaining & Recalling Information Test Anxiety Focus & Motivation Organisation & Time Management Stress & Strain Confidence
506
90%
-22
-4%
476
85%
40
7%
274
49%
-227
-40%
462
82%
30
5%
302
54%
-205
-36%
522
93%
-23
-4%
426
76%
-68
-12%
In summary, in the period under analysis there was a slight improvement in the overall score for meta-cognitive skills of learners at The College (Fig. 1). Change in Learner Numbers From Assessment 1 to Assessment 2 100
40
Difference in Number of Learners
50
30
0 -22
-23
-50 -68 -100
-150
-200 -205 -227
-250
SLEEP & TIREDNESS
RETAINING & EXAM ANXIETY FOCUS & ORGANIZATION RECALLING MOTIVATION & TIME INFORMATION MANAGEMENT Meta-Cogni ve Domains
STRESS & STRAIN
CONFIDENCE
Fig. 1. Change in alert levels for Meta-Cognitive domains
The PLEX results for learners’ perceptions show a mixed picture in terms of the motivational factors that drive learning and academic success. In the assessment conducted at the start of the academic year (Table 5), a significant number of learners were in the high-risk category. In particular, the negative attitudes “Towards Subject Demands” and “Towards the Institution”, required urgent and moderate attention, respectively. Attitudes for “Readiness to Learn” and “Towards Tutors/Teachers/Lecturers” were polarized, with 21% of learners and 25% of learners, respectively, requiring urgent attention because they were in the high alert levels. However, 72% and 57% would demand only low attention for these two dimensions, as they were deemed to be on the low alert levels. Thus, different groups in The College would require very different interventions. If The College were to adopt a blanket intervention, it would fail to meet the needs of a significant proportion of learners.
Improving the Retention and Progression of Learners
25
Table 5. Alert levels for Perceptions – Assessment 1
Towards Effort & Determination To Work Towards Learning Readiness To Learn Towards Subject Demands Towards The Institution Towards Self Towards Showing Up To Learn Towards Tutors/ Teachers/ Lecturers Overall perceptions
No Attention
Low Attention
11% 9% 0% 0% 8% 4% 11% 11% 7%
75% 77% 72% 9% 34% 75% 75% 57% 59%
Moderate Attention
Urgent Attention
Extremely Urgent Attention
Weighted risk level (1-5)
0% 0% 0% 0% 0% 0% 0% 0% 0%
2 2 2 4 3 2 2 2 3
Assessment 1 7% 7% 7% 7% 7% 21% 9% 82% 10% 47% 16% 5% 8% 6% 7% 25% 9% 25%
Learners’ self-perceptions deteriorated during the academic year (Table 6). In assessment 2, there was an increase in the proportion of learners requiring urgent interventions across all but two perceptual dimensions. Moreover, there were now 1% of learners requiring extremely urgent intervention across all perceptual dimensions. As the academic year progressed, many learners developed negative perceptions “Towards Self”, “Towards Subject Demands”, “Towards Learning”, “Towards Effort & Determination to Work” and “Towards Subject Demands”. When learners develop such negative perceptions, they are less likely to challenge themselves, or to try new approaches to problem solving. They may also be less resilient in the face of structural or temporal challenges, such as economic deprivation or a change in employment status. As a result, these learners’ academic performance may suffer, and some may even consider abandoning their studies. Table 6. Alert levels for Perceptions – Assessment 2
Towards Effort & Determination To Work Tow ards Learning Readiness To Learn Towards Subject Demands Towards The Institution Tow ards Self Towards Showing Up To Learn Towards Tutors/ Teachers/ Lecturers Overall perceptions
No Attention
Low Attention
18% 14% 0% 1% 15% 6% 18% 18% 11%
59% 63% 62% 9% 38% 59% 59% 58% 51%
Moderate Attention
Urgent Attention
Assessm ent 2 4% 19% 4% 19% 4% 34% 6% 83% 11% 36% 18% 16% 9% 14% 4% 20% 7% 30%
Extremely Urgent Attention
Weighted risk level (1-5)
No Attention
Low Attention
Moderate Attention
Urgent Attention
Extremely Urgent Attention
Weighted risk level (1-5)
1% 1% 1% 1% 1% 1% 1% 1% 1%
2 2 3 4 3 2 2 2 3
8% 6% 0% 1% 6% 2% 8% 8% 5%
-16% -14% -10% -1% 3% -16% -16% 0% -9%
Change -4% -4% -4% -2% 1% 2% 0% -4% -2%
12% 12% 13% 1% -11% 11% 8% -5% 5%
1% 1% 1% 1% 1% 1% 1% 1% 1%
7% 8% 10% 0% -9% 11% 5% -8% 2%
The results for assessment 2 also show an improvement in the “Towards the Institution” and “Towards Tutors/Teachers/Lecturers”. These indicate an increased appreciation of the institution and of teaching staff. However, overall, there are still significant numbers of learners in the high alert levels – 48% for the “Towards the Institution” category, and 25% for the “Towards Tutors/Teachers/Lecturers” one. Hence, between one quarter and one half of the learners still require attention in these two categories. The proportion of learners requiring moderate or above levels of attention is even higher for the other perceptions’ categories.
26
T. Samani et al.
In summary, as the academic year progressed, overall learners’ self-perceptions deteriorated, indicating a decrease in the drive to learn and, hence, academic success. Moreover, the results showed increased polarization in learners’ perceptions. To address this polarization, The College developed and applied targeted measures, reflecting the needs of different groups of learners.
4 Discussion and Concluding Remarks The analysis of the metacognition and perception profiles of learners at The College described in the previous sections, provides insight into the causes behind the increase observed in the number of learners in the high and severe alert levels. The PLEX technology shows that this result occurred because a) while there was an improvement in the barriers to learning, the overall level is still very high, meaning that many learners faced metacognition challenges that prevented them from learning; and b) there was a deterioration in 6 drivers for academic success, meaning that many learners found it difficult to recover from setbacks, and lacked motivation. The PLEX technology helped The College develop a range of interventions addressing the contextual and personal obstacles faced by the learners. Through the identification of learners in different risk levels, and the nuanced understanding of the causes of underperformance, The College could prioritize their interventions, and target their support, working with learners to overcome the obstacles that they faced. The initiatives developed, based on the PLEX data, resulted in lowered program withdrawals, improved learner outcomes, and increased overall satisfaction. Moreover, The College reported savings of up to £60,000. The costs were calculated based on the assumed savings associated with the retention of learners deemed at risk, which under normal circumstances have higher instances of early withdrawal or non-completion of Study Programmes. The resulting withdrawals would not only have a significant detriment to the individual learners and their prosperity, but consequently decrease the institutions retention factor and its learner volumes in the lagged funding model. In addition to this, there were also assumed savings factored into the associated costs with delays to identification and triaging of learners wider pastoral support needs. The role of enabling and motivational factors in supporting learners’ academic performance is well understood in the literature. Therefore, many educational institutions adopt some form of program to support their learners’ development, over and above the teaching of subject topics and the preparation for exams. However, many educators lack the ability to diagnose the presence of such factors, given the lack of readily available, standardized definitions of metacognition and perceptions, and the interdependency between these dimensions [12]. Moreover, in the absence of historical data connecting metacognition and perceptions on the one hand, and learners’ performance on the other, it is not possible for educators to predict the impact of the former on the latter. Consequently, despite the recognized individual, social and organizational costs of academic underperformance, many educating institutions are unable to offer targeted, cost-effective support for their learners, which improves academic outcomes and student retention. Using AI technology, PLEX offers a standardized tool of diagnosing learners’ risk levels, based on their performance across 7 metacognition domains and 8 perceptions,
Improving the Retention and Progression of Learners
27
as outlined in this report. This tool, which has been extensively tested and refined over the past decade, enables educational institutions to diagnose problem areas, and to prioritize their interventions (e.g., depending on the type of perception requiring the most urgent intervention). Moreover, the insight obtained via the PLEX assessment supports the development of customized interventions that addresses the specific needs of different leaners (e.g., focused on metacognition domains for some learners, but focused on perceptions for others). The type of proactive, targeted interventions enabled by PLEX’s data, delivers significant financial benefits for educational institutions, while supporting learners’ academic success, and contributing to the local community.
References 1. Bughin, J., Seong, J., Manyika, J., Chui, M., Joshi, R.: Notes from the AI frontier: modeling the global economic impact of AI. McKinsey Global Institute Report (2018). https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-theai-frontier-modeling-the-impact-of-ai-on-the-world-economy 2. Timms, M.J.: Letting artificial intelligence in education out of the box: educational cobots and smart classrooms. Int. J. Artif. Intell. Educ. 26, 701–712 (2016) 3. Chaudhri, V.K., Lane, H.C., Gunning, D., Roschelle, J.: Applications of artificial intelligence to contemporary and emerging educational challenges. Artif. Intell. Mag. Intell. Learn. Technol.: Part 2, 34(4), 10–12 (2013) 4. McArthur, D., Lewis, M., Bishary, M.: The roles of artificial intelligence in education: current progress and future prospects. J. Educ. Technol. 1(4), 42–80 (2005) 5. Kulhánová, I., Hoffmann, R., Judge, K., et al.: Assessing the potential impact of increased participation in higher education on mortality: evidence from 21 European populations. Soc. Sci. Med. 117, 142–149 (2014) 6. UNESCO. Global citizenship education: preparing learners for the challenges of the 21st century. UNESCO (2014). ISBN 978-92-3-100019-5, 978-89-7094-803-4 (kor). https://une sdoc.unesco.org/ark:/48223/pf0000227729. Accessed 16 Oct 2020 7. Parrett, S.: Recognising the importance of FE in the HE sector. London South East Colleges (2019). https://www.lsec.ac.uk/news/2239-recognising-fe-in-he. Accessed 16 Oct 2020 8. Beer, C., Lawson, C.: The problem of student attrition in higher education: an alternative perspective. J. Furth. High. Educ. 41(6), 773–784 (2017) 9. Porayska-Pomsta, K.K., Mavrikis, M., Cukurova, M., Margeti, M., Samani, T.: Leveraging non-cognitive student self-reports to predict learning outcomes. In: Penstein Rosé, C., Martínez-Maldonado, R., Hoppe, H.U., Luckin, R., Mavrikis, M., Porayska-Pomsta, K., McLaren, B., Du Boulay, B. (eds.) Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, 27–30 June 2018, Proceedings, Part II, pp. pp. 458–462. Springer, Cham (2018) 10. Bouchey, H.A., Harter, S.: Reflected appraisals, academic self-perceptions, and math/science performance during early adolescence. J. Educ. Psychol. 97(4), 673–686 (2005) 11. Fetler, M.: School dropout rates, academic performance, size, and poverty: correlates of educational reform. Educ. Eval. Policy Anal. 11(2), 109–116 (1989) 12. Samani, T., Porayska-Pomsta, K., Luckin, R.: Bridging the gap between high and low performing pupils through and curricula. In: Andre, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (eds.) AIED 2017. LNCS (LNAI), vol. 10331, pp. 650–655. Springer, Cham (2017)
Introduction of an Algorithm Based on Convolutional Neural Networks for an Automated Online Correction of Braided Cardiovascular Implants Benedikt Haas(B) , Marco Stang, Valentin Khan-Blouki, and Eric Sax Institute for Information Processing Technologies, Karlsruher Institute of Technology, Engesserstraße 5, 76131 Karlsruhe, Germany {Benedikt.Haas,Marco.Stang,Eric.Sax}@kit.edu, [email protected]
Abstract. The expense of treating cardiovascular diseases is significant. For example, sediments on the coronary arteries’ inner walls are among the most common risks of a heart attack. One possible treatment includes cardiovascular implants or stents. Stents are manufactured by a braiding process and afterward inspected for defects by human visual inspection. To reduce production costs, an automated inspection system is, therefore, the subject of this work. First, we propose a formalized problem description for camera-based automated visual inspection. Next, a machine learning based divide-and-conquer algorithm is presented. The CNN based algorithm can be used both to supervise the braiding process and to correct braiding errors. Keywords: Cardiovascular implants · Stents · Braiding · Pitch length measurement · Pitch length correction · CNN
1 Introduction In 2015, 13.7% of Germany’s medical expenses were caused by cardiovascular diseases [1]. A typical treatment includes cardiovascular implants or stents. Stents are used to support blood vessels’ structure or open up closed or blocked ones [2]. Therefore, they have to meet high structural requirements not to cause any damage to the blood vessels. Specifically, they have to, on the one hand, be flexible enough to be implanted in blood vessels, which can be curved. On the other hand, they have to be rigid enough to support the vessel’s structure and prevent further collapses or closings. In general, stents can be laser cut [3] or braided [4]. A disadvantage of the process of laser cutting a stent is a high amount of cut or waste. Therefore, braiding stents can be economically advantageous due to the lack of an unnecessary cut. A disadvantage of the
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 28–36, 2021. https://doi.org/10.1007/978-3-030-74009-2_4
Introduction of an Algorithm Based on Convolutional Neural Networks
29
braiding process is the absence of practical solutions to monitor the braiding process. To be specific, the braided stent is manually inspected after the braiding process [5]. If an error is detected, the braided stent is rejected, and a new one will be produced. The process of a machine using images to perform a visual inspection is called machine vision [6]. In the field of machine vision are so-called convolutional neural networks (CNNs) a suitable solution. CNNs are used in various tasks, like monitoring the state of a car’s driver based on facial informations [7] or autonomous driving using camera images [8]. Their unique architecture using convolutions in order to extract features from the image in combination with fully connected layers of neurons are designed to analyze images [9]. This paper has two main contributions in the field of online-inspection of a stents pitch using camera images. Firstly, a formal definition of the problem is given. This includes the pitch length and a matrix in order to describe the actual stent as well as the target geometry. Due to the limited field of vision of the said camera, the matrix and the problem description are adapted to the imperfect measurement system. Secondly, a system description for the mentioned task is proposed using the given definitions. The system can be divided into three main steps. In the first step, the pitch length will be extracted. This step includes four different kinds of CNNs. Dividing the task instead of using one very deep CNN has multiple advantages like interim results and less training time. The second step contains the decision, whether the braiding process has to be adapted due to occurring errors. This step is described using rules in order to be highly transparent. The last step targets an online correction of the stent, namely parameters like the pitch difference(s) or the location of the error(s). In summary, a system is proposed, which can detect and correct errors in the braiding process of stents.
2 Measurement System and Problem-Formalization This chapter covers the design of the measurement system and a formalization of the problem at hand. 2.1 Measurement System The measurement system, shown in Fig. 1, contains a braiding machine, a control unit, a computer, and a camera. The braiding machine braids the stent. This process is captured using the camera. The camera is plugged into a computer, which will perform all the necessary computing. In order to perform an online correction, the computer is connected to a control unit. This control unit controls the braiding machine and, therefore, the braiding process.
30
B. Haas et al.
Fig. 1. Concept of the used measurement system
2.2 Problem-Formalization The given task can be divided into three parts: (1) measurement of the pitch lengths, (2) decide whether a correction is necessary, and (3) a specification of the correction. (1) is a mapping from an grayscale image I with height x and width y to a matrix L containing all pitch lengths. We assume that the stent contains c columns and d rows like shown in Fig. 2. Following these definitions (1) can be written as: (x×y)
I ∈ N0
(c×d )
→ L ∈ R>0
.
The pitch length will be abbreviated using the letter l. Formally, the pitch length is l = ||ptop − pbottom || ∈ R>0 with ptop and pbottom being the interlacing points of the stent. Additionally, as shown in Fig. 2, the index will be used to describe a specific pitch length li,j in the ith column at the jth row of the stent. Since only one camera is used, we cannot capture the whole stent in one image. Therefore, (1) has to be adapted. One possibility is, to restrict L, according to the camera’s field of view resulting in Lcamera . In Fig. 3 an example of this case is given (“visible area”). In this example, the set of measured pitch lengths Kmeasured is. Kmeasured := li,1 , li,2 , li,3 |i mod 2 = 1, i ∈ {1, . . . 7} ∪ lj,1 , lj,2 |j mod 2 = 0, i ∈ {1, . . . 7} ,
which results in ill-defined matrix dimensions. Therefore we propose an alternative: at first, we define {j | li,j ∈ Kmeasured , Q := a := min(Q|i=p ) − min(Q) and b := max(Q) − max(Q|i=p )
Introduction of an Algorithm Based on Convolutional Neural Networks
31
Fig. 2. Schematic display of the connection of the stent, the mesh structure and the pitch lengths
with p ∈ R>0 . Then Lcamera will be: ⎛ Lcamera :=
lp
⎧ ⎪ case1 if ⎪ ⎪ ⎨ case2 if := ⎪ case3 if ⎪ ⎪ ⎩ case4 if
le
⎞
⎜ le+1 ⎟ ⎟ ⎜ ⎝ ... ⎠ lf max(Q) = max(Q|i=p ), min(Q) = min(Q|i=p ) max(Q) = max(Q|i=p ), min(Q) = min(Q|i=p ) max(Q) = max(Q|i=p ), min(Q) = min(Q|i=p ) max(Q) = max(Q|i=p ), min(Q) = min(Q|i=p )
and case1 : case2 : case3 : case4 :
lp = pad ((lp,g 0 lp,g+1 0 ... lp,h )2(max(Q)−min(Q))+1 ) lp = pad ((lp,g 0 lp,g+1 0 ... lp,h )2(max(Q|i=p )−2min(Q))+1 ∪ (0...0)2b ) lp = pad ((0...0)2a ∪ (lc,g 0 lp,g+1 0 ... lp,h )2(max(Q)−min(Q|i=p ))+1 ) lp = pad ((0...0)2a ∪ (lp,g 0 lp,g+1 0 ... lc,h )2(max(Q|i=c )−min(Q|i=p ))+1 ∪ (0...0)2b )
with the pad-function padding one zero at the beginning if p is uneven or at the end if p is even of lp and 1 ≤ e ≤ f ≤ c, 1 ≤ g ≤ h ≤ d , p ∈ {e...f }. Therefore, Lcamera got the dimension. (f − e + 1) × 2(h − g + 1) ⇐⇒ (f − e + 1) × 2(max(Q) − min(Q) + 1), which describes the camera’s field of view. Apart from the solved dimensionality problem, this will increase the readability of Lcamera . This is caused by imitating the stent’s mesh structure. Revisiting the braiding process, one could notice that the stent’s size increases during the production process. This would lead to an increased size of Lcamera . To be precise:
32
B. Haas et al.
Fig. 3. Example for a mesh being braided, the camera’s field of view (“visible area”) and the measured rows d.
f − e would not be constant over time (it will grow). This could lead to problems in the definition of the system used to measure the pitch lengths. CNNs, for example have a fixed output dimension. In the case of a different output dimension, the CNN has to be retrained [9]. This would lead to either a non-online-system or the need for multiple CNNs, which would be costly. Additionally, it has to be measured first which of the nets to use. To avoid this problem, only a part of the taken image, namely the amount of rows d, will be analysed resulting in Lcamera,d . A visual example is provided in Fig. 3, which will result in: ⎛
Lcamera,d =3
⎞(7−5+1)×2(3−1+1)=3×6 0 l5,1 0 l5,2 0 l5,3 = ⎝ 0 0 l6,2 0 l6,3 0 ⎠ . 0 l7,1 0 l7,2 0 l7,3
Introduction of an Algorithm Based on Convolutional Neural Networks
33
Following this, the conditions of the previous definition of lp can be further differentiated into g < h and f − e + 1 = d . This is based on the assumption, that the system is measuring, which is equivalent to d > 0 and h ≥ g + 1. In the second step, the decision, whether a correction is necessary or not, has to be taken. Let Lmodel be the target model, ergo a matrix of target pitch lengths. Additionally, let ε be the correction-threshold. Thus, a correction is necessary if ∃i, j, k, m ∈ N: |li,j − lk,m | > ε with li,j ∈ Lmodel , lk,m ∈ Lcamera,d which is equivalent to ∃li,j ∈ Ldiff := {Lcamera,d − Lmodel |Lcamera,d } : |li,j | > ε. Finally (3), the correction has to be specified. Ldiff contains the deviations of the pitch lengths, therefore, a pitch length based correction is proposed. The correction can be done according to Ldiff , especially if · is Euclidian. If a correction of the pitch length is not directly possible, the take-up speed of the mandrel could be altered. Since the relationship of the pitch length and the mandrel’s take-up speed depends on the used braiding machine, this will not be inspected further.
3 Concept Based on the previous analysis and definitions, an algorithm is proposed. The algorithm is visualized in Fig. 4 using a Nassi–Shneiderman diagram [10]. As described, initially are Lmodel and ε needed. Then, as soon as an image from the camera arrives, the following steps will be performed: (1) measure the pitch lengths (Sect. 3.1), (2) decide whether a correction is necessary (Sect. 3.2) and (3) specify the correction (Sect. 3.3). If no correction is necessary, the system will wait for a dedicated time t. The parameter t is dependent on multiple variables and will impact how many fps can be analyzed. It depends for example on the rest of the pipeline. If the computation time is high, then t would be chosen zero or nearly zero not to miss any pitch lengths. Another example is the braiding speed of the machine in use. To not miss any pitch lengths, t has to be reduced if the braiding speed increases. These steps will be repeated until every pitch length is measured and no correction has to be performed. If a correction is necessary, the system will notify the control unit. Then, the control unit will apply the communicated parameter in order to apply the correction. 3.1 Measurement of the Pitch Lengths In order to compute Lcamera using an image (as shown in Fig. 4. Measure pitch lenghts), four different kinds of CNNs are used: (1) the first kind is a binary classifier. It will decide if the image contains a stent or not. If the image contains a stent, (2) the second kind of CNN will compute (f − e + 1). (3) The third kind will extract how many rows of pitch lengths are visible, ergo h − g + 1. The last (4) type of CNN will measure and compute Lcamera,d ∈ (f − e + 1) × 2(h − g + 1). Additionally, for an online correction it is desired to measure the pitch lengths as soon as they are braided. Therefore c − f should be choosen as small as possible.
34
B. Haas et al.
Fig. 4. Nassi-Shneiderman diagram of the proposed algorithm
3.2 Decide Whether a Correction Has to Be Performed Using Lcamera,d , Ldiff will be computed and compared to ε, like presented in Sect. 2.2 and shown in Fig. 4. Decide if a correction is necessary. In practice, due to the stent’s stabilizing effects, especially near the braiding point, the pitch lengths in Lcamera,d can change over time. Therefore, a time- and stabilizing-sensitive approach is desirable. One isto track the error of a specific pitch length over time. Formally, solution possible if lti,j − ε > lti,j+ 1 − ε, lti,j > ε and lti,j+ 1 > ε then a stabilizing effect is probably happening with lti,j being the pitch length li,j at time t. In this case, no correction should be applied. Due to the restrictions resulting in area d, we have to define an exceptional case to this rule: If under the mentioned condition at time t + 1 lti,j+ 1 will not be visible anymore, a correction must be applied. 3.3 Specification of the Correction If, according to Sect. 3.2, a correction needs to be applied, Ltdiff , which contains the pitch length deviations at time t, can be used. As mentioned in Sect. 2.2 this specification is highly braiding machine dependent and will not be discussed further. Nevertheless, is this a crucial part of the correction process (as shown in Fig. 4: Specify correction) and therefore has to be differentiated in the practical use case.
4 Discussion A possible flaw is the definition of Ltcamera,d being a sparsely populated matrix. This could be a problem if further mathematical operations are being applied. Because this was not
Introduction of an Algorithm Based on Convolutional Neural Networks
35
intended to be done, this representation suffices in the considered scenario and has improved readability. Additionally, a CNN which uses some kind of end-to-end learning in order to specify a correction instead of the proposed algorithm, would probably be easier to design, but has other drawbacks. First, the used CNN would be deeper resulting in an increased resource and time allocation. Secondly there would not be any interim results, which can be used to understand the decision and to find any decision-faults. Furthermore the interchangeability would not be existent. In the proposed algorithm, if instead of the pitch length the braiding angle should be used, only the last type of CNN has to be re-trained, which will be much faster. At last, a system which does not use the image of a complete stent could be used. In this case, an additional system to extract the used sub-image has to be designed. Due to the fact that, depending on the braiding parameters, the braiding point is not always precisely at the same spot, some system to detect it and the pose is necessary. Therefore, this approach would probably result in a similar amount of sub-systems.
5 Summary and Outlook In this work, a formalization and an algorithm to perfom an online correction of a stent’s braiding process is given. The presented formalization enables a mathematical description of the given problem, as well as the desired solution and measured product. Additionally the formalization is adapted to the restrictions of a one camera system. The described algorithm is based on the divide-and-conquer principle and therefore produces interim results. These interim results can be analysed further, in order to find errors or optimize the system. In future, the given matrix description could be optimized such that the the number of rows containing only zeros will be reduced. Additionally, if the assumption holds, that the y-axis of the camera’s field of view is parallel to the mandrell, than the matrix definition can be further simplified. In the practical use-case the image I will not only contain the stent and the mandrell but the background as well. A possible appendix to the algorithm could therefore be a subsystem which segmentates the background from the stent and mandrell. This would lead to a reduced image size. A reduced image size will result in smaller layers of the used CNNs and therefore decrease training time and resource allocation. An alternative approach could be a window based system. Instead of using the fourth kind of CNN on the hole image, a kind of slinding window will cut it into sub-images. Then, the pitch lengths in the sub-images will be measured. Finally, the matrix can be reconstructed knowing the sub-images locations in the original image. This could lead to (1) a speedup due to a possible parallelization of the pitch measurements and (2) reduced costs because the lesser image size can lead to a smaller CNN and therefore less training time and resource allocation. Acknowledgments. Parts of this work has been developed in the project Stents4Tomorrow. Stents4Tomorrow (reference number: 02P18C022) is partly funded by the German ministry of education and research BMBF) within the research program ProMed. Additionally the authors want to thank Dr. Marc Braeuner (Admedes GmbH) and Kevin Lehmann (Admedes GmbH) for their continuous support and advice.
36
B. Haas et al.
References 1. Statistisches Bundesamt (Destatis). https://www.destatis.de/DE/Themen/Gesellschaft-Umw elt/Gesundheit/Krankheitskosten/_inhalt.html#sprg234880 2. Toor, S., Sazonov, I., Luckraz, H., Nithiarasu, P.: Aortic aneurysms: OSR, EVAR, stentgrafts, migration and endoleak—current state of the art and analysis. In Franz, T. (eds.) Cardiovascular and Cardiac Therapeutic Devices, pp. 63–92. Springer, Heidelberg (2013) 3. Bermudez, C., Laguarta, F., Cadevall, C., Matilla, A., Ibañez, S., Artigas, R.: Automated stent defect detection and classification with a high numerical aperture optical system. In: Automated Visual Inspection and Machine Vision II, vol. 10334, p. 103340C. International Society for Optics and Photonics (2017) 4. Kim, J.H., Kang, T.J., Yu, W.R.: Mechanical modeling of self-expandable stent fabricated using braiding technology. J. Biomech. 41(15), 3202–3212 (2008) 5. Bermudez, C., Laguarta, F., Cadevall, C., Matilla, A., Ibanez, S., Artigas, R.: Automated stent defect detection and classification with a high numerical aperture optical system. In: León, F., Beyerer, J. (eds.) Automated Visual Inspection and Machine Vision II, p. 103340C (2017) 6. Branscomb, D., Beale, D.: Fault detection in braiding utilizing low-cost USB machine vision. J. Textile Inst. 102(7), 568–581 (2011) 7. Stang, M., Sommer, M., Baumann, D., Zijia, Y., Sax, E.: Adaptive customized forward collision warning system through driver monitoring. In: Proceedings of the Future Technologies Conference, pp. 757–772. Springer, Cham (2020) 8. Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7644– 7652 (2019) 9. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1, no. 2. MIT Press, Cambridge (2016) 10. Nassi, I., Shneiderman, B.: Flowchart techniques for structured programming. In: ACM SIGPLAN Notices, no. 8, pp. 12–26 (1973)
Increasing the Understandability and Explainability of Machine Learning and Artificial Intelligence Solutions: A Design Thinking Approach Arianit Kurti(B) , Fisnik Dalipi, Mexhid Ferati, and Zenun Kastrati Department of Informatics, Linnaeus University, Växjö, Sweden {arianit.kurti,fisnik.dalipi,mexhid.ferati, zenun.kastrati}@lnu.se
Abstract. Nowadays, Artificial Intelligence (AI) is proving to be successful for solving complex problems in various application domains. However, despite the numerous success stories of AI-systems, one challenge that characterizes these systems is that they often lack transparency in terms of understandability and explainability. In this study, we propose to address this challenge from the design thinking lens as a way to amplify human understanding of ML (Machine Learning) and AI algorithms. We exemplify our proposed approach by depicting a case based on a conventional ML algorithm applied on sentiment analysis of students’ feedback. This paper aims to contribute to the overall discourse of a need of innovation when it comes to the understandability and explainability of ML and AI solutions, especially since innovation is an inherent feature of design thinking. Keywords: Explainable Artificial Intelligence (XAI) · Explainable machine learning · Design thinking · Understandability
1 Introduction and Background Machine learning (ML) and artificial intelligence (AI) have shown a great potential in supporting and automating processes that were previously not very accessible. These trends brought new possibilities and redefined approaches for information processing, decision making, automation and system engineering. Despite these advances, ML and AI solutions are usually not specifiable regarding their behaviors, especially since they are more stochastic in nature than resulting in binary decisions. This aspect brought the concept that ML and AI approaches are referred to as “black box” solutions that lack in terms of understandability and explainability for humans. The lack of understandability and explainability is defined by researchers as a crucial feature that affects the practical deployment of ML and AI solutions [1]. The need for understandability and explainability is receiving increased attention by the research community. Some of these efforts are more from an algorithmic [2] and gaming perspectives [3], but recently, design approaches also seem to take momentum. The need © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 37–42, 2021. https://doi.org/10.1007/978-3-030-74009-2_5
38
A. Kurti et al.
and motivation to understand and explain AI primarily stem from four main reasons: explain to justify, explain to control, explain to improve, and explain to improve [4]. In a recent work by [5], the emphasis is put on the approaches how design practices can support the explainability of AI solutions. The use of scenario-based design as an approach toward increased explainability of AI solutions has also been advocated by [6]. In her work, she suggests the use of explainability scenarios as a way to define the requirements and possible use of AI systems as early as possible. Additionally, through visualization approaches it is possible to increase accessibility of explainable AI interfaces [7]. In addition, there is scarce knowledge in the body of literature when it comes to the way of communicating the explanations to users such that they enhance the user’s trust and leads to a successful AI-user collaboration. Inspired by these trends, in this paper we argue on the need for further alignment of design thinking approach with ML and AI development efforts as a way forward for reaching a user-centered explanation for increased practical deployment of ML and AI solutions. We consider that teaming up human and machines is a way forward regarding the explainability of ML and AI solutions. Therefore, in this design journey, we aim to blend the interaction design perspective with ML/AI perspective in order to explain the AI models as well as have users understand them as a main path toward wider practical impact of intelligent solutions. After having described and explained these approaches, the associated challenges will be discussed, and some recommendations which will be provided in the end.
2 Our Approach According to a review conducted by [8], there are two categories of generating explainable AI (XAI): (a) black box (opaque) explanation problem and (b) transparent box design problem. The first category considers a reverse engineering approach to increase explainability to an already implemented AI, whereas the second category considers designing from inception an interpretable model along with its explanations. Our approach helps in increasing AI explainability in both categories. Inspired by the Design Thinking mindset [9], our proposed approach consists of five phases: Explore, Define, Select, Tune, and Evaluate. Similar to the first phase in design thinking, where we empathize with our target users by engaging with them and understanding their features, needs, and abilities, in the first phase of our approach, Explore phase, the user explores different dataset entities to identify explainable features. It is essentially a process to get familiar with the dataset, understand its characteristics, and get an understanding of features that could contribute to explaining the ML and AI outcome. The second design thinking phase is characterized by defining the problem we should be solving based on understanding the user needs. Similarly, in the Define phase of our approach, the user should be able to identify and define input and output units of the ML. The outcome of this phase is a list of input units that affect output units. In the third phase of the design thinking approach, we consider several alternative ideas that could be a possible solution to the problem being investigated. At this stage, sketching and visualization appear to be fundamental. Similarly, in the Select phase
Increasing the Understandability and Explainability of ML and AI
39
within our approach, the user is exploring which ML algorithm, among several, is best suited for the given dataset and desired outcome. Typically, the user will select here from conventional ML algorithms, such as, Decision Tree, SVM (Support Vector Machines), Naive Bayes, etc. Once the user has defined input/output units and the appropriate algorithm, in the Tune phase, similar to the prototype phase in design thinking, where the user explores various prototypes to materialize ideas, in our approach, we generate an interface that enables the user to tune the algorithm by tweaking one input unit and gauge effects of it on one or more output units. And similar to design thinking, where the outcome is to find the most promising prototype solution that could be tested, in our approach, the user aims to find the best algorithm which should be tested for explainability. Finally, within design thinking, we evaluate one or more prototypes in order to validate whether initial user needs are met and whether there has been any progress in solving the problem. Similarly, in the Evaluate phase of our approach, the user, being a domain expert, will test and evaluate the clarity of the explanation and with that, rate the transparency of the ML model. Similar to the design thinking approach, working iteratively through the phases of the approach is crucial and this is also depicted in Fig. 1, where using solid arrows we indicate the first iteration that requires sequential progress through phases. Using dashed arrows, however, we indicate the possibility to move iteratively and non-sequentially to any phase needed. This emphasizes the flexibility of moving from one phase to another and even conducting several iterations within one phase before moving to the next one.
Fig. 1. Design Thinking based approach to increase ML and AI explainability and understanding.
3 The Case To bring the proposed approach more in line to the reader’s intuitive understanding, we present a use case example. The case is about sentiment analysis of students’ feedback using machine learning [10]. Students’ feedback is an effective tool that provides
40
A. Kurti et al.
valuable insights about teaching effectiveness and helps to improve and refine course content, design, and teaching method to enhance student learning. Handling students’ feedback manually is a tedious and time-consuming task, therefore automatic methods and techniques are required to avoid human interventions [11]. Hence, in our example, we used several different ML techniques to capture and analyze students’ opinions and experiences about MOOC courses expressed in textual reviews. The first phase - Explore - involves collecting students’ feedback. There are different resources to collect the opinions of students, but in our case study, data were gathered from a publicly available education platform. In total, we have collected 20k+ reviews that were labelled manually from human annotators. In the second phase - Define - of the proposed approach, we focused on defining the input features to be fed to classification algorithms as students’ reviews can be of different forms, such as plain text or numerical ratings. The first form is structured and easy to handle, thus in this example we focus on textual reviews. We also have to define whether we want to identify the sentiment of an overall feedback or delving deeper on identifying the main aspects concerning students and then analyzing their attitude toward these aspects. In our case, we chose both options: first we classified the sentiment to either positive, negative or neutral, whereas the second option produced a set of aspect categories and their corresponding sentiment polarity. There are dozens of algorithms that can be used to perform sentiment orientation classification including conventional machine learning algorithms and deep neural networks. The third phase of our approach - Select - focuses on selecting the classification algorithm/s based on the available dataset and resources. In our example, the dataset consisted of a considerable amount of feedback, so we made use of conventional ML. In particular, we chose SVM, Decision Tree and Naive Bayes as they make different assumptions about learning functions and have proven to perform pretty well on various application domains. Measuring the accuracy of the trained model is the next phase - Tune. There are various parameters which affect the performance of the classification model, including hyper-parameters of the classifier, splitting ratio of a dataset for training and testing, etc. A trial-and-error approach is used to adjust or fine-tune these parameters in order to create an accurate and robust model. Finally, in the Evaluate phase, domain experts, in our case teachers and instructional designers of courses, were used to validate the system as they were the ones who would use the system for handling students’ feedback to improve the teaching and learning process. The evaluation was focused on checking if the system meets the required criteria or if there is anything else that needs to be modified in order to better identify and address students’ concerns.
4 Discussion and Conclusion An interesting discourse where understanding is described as a design problem is evidenced in a recent paper consisted of two parts [12, 13]. Moreover, the author in [14] suggests that “our consumption and production decisions, whether active or passive, influence the algorithms that have become the touchstone of modern life”. A common denominator of these two works emphasizes the importance of design and design
Increasing the Understandability and Explainability of ML and AI
41
thinking. Design thinking enables us to identify alternative possibilities that might not be instantly apparent with our initial level of understanding. Having this in mind, by employing a design thinking lens on the understandability and explainability of ML and AI solutions, we bring more insights on the process itself. The example depicted in the previous section shows that our design thinking inspired approach brings human thinking into an emphasis, thus potentially increasing the understandability and explainability of the decisions and choices made by AI and ML algorithms in the process. Anyhow, it should be noted that our proposed approach is limited to be applied to conventional (shallow) ML algorithms to increase their explainability and understanding, but it is difficult to apply it to expose deep neural networks due to their far more complex nature. Neural networks consist of complex architectures with several different layers which makes it hard to identify the features that can have direct effect on the output as required in phase two of the proposed approach. The understandability and explainability in the context of neural networks becomes challenging also having in mind that humans are unable to simultaneously process more than 5–7 unrelated variables [15]. Despite this, we consider that the approach advocated in this paper brings novelty of applying a design thinking approach to widen our understanding in the problem space of AI and ML solutions. This contributes to the discourse of the need of innovation when it comes to the understandability and explainability of ML and AI solutions, especially since innovation is an inherent feature of design thinking.
References 1. Arrieta, A.B., Diaz-Rodrigues, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gli-Lopez, S., Molina, D., Benjamins, R., Chatila, R.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges towards responsible AI. Inf. Fusion 58, 82–115 (2020) 2. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: European Conference on Computer Vision, pp. 3–19. Springer Cham (2016) 3. Fulton, L.B., Lee, J.Y., Wang, Q., Yuan, Z., Hammer, J., Perer, A.: Getting playful with explainable AI: games with a purpose to improve human understanding of AI. In: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp. 1–8. Association for Computing Machinery (ACM), New York (2020) 4. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018) 5. Liao, Q.V., Gruen, D., Miller, S.: Questioning the AI: informing design practices for explainable AI user experiences. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, p. 15. Association for Computing Machinery (ACM), New York (2020) 6. Wolf, C.T.: Explainability scenarios: towards scenario-based XAI design. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 252–257. Association for Computing Machinery (ACM), New York (2019) 7. Wolf, C.T., Ringland, K.E.: Designing accessible, explainable AI (XAI) experiences. ACM SIGACCESS Accessibility Comput. (125) (2020) 8. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018) 9. Brown, T.: Design thinking. Harv. Bus. Rev. 86(6), 84 (2008)
42
A. Kurti et al.
10. Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F., Nishliu, E.: Aspect-based opinion mining of students’ reviews on online courses. In: Proceedings of the 6th International Conference on Computing and Artificial Intelligence, pp. 510–514. Association for Computing Machinery, New York (2020) 11. Kastrati, Z., Imran, A.S., Kurti, A.: Weakly supervised framework for aspect-based sentiment analysis on students’ reviews of MOOCs . IEEE Access 8, 106799–106810 (2020) 12. Lissack, M.: Understanding is a design problem: cognizing from a designerly thinking perspective. Part 1. She Ji: J. Des. Econ. Innov. 5(3), 231–246 (2019) 13. Lissack, M.: Understanding is a design problem: cognizing from a designerly thinking perspective. Part 2. She Ji: J. Des. Econ. Innov. 5(4), 327–342 (2019) 14. Weller, A.J.: Design thinking for a user-centered approach to artificial intelligence. She Ji: J. Des. Econ. Innov. 5(4), 394–396 (2019) 15. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)
Natural Language Understanding (NLU) on the Edge Nicolas Crausaz(B) , Jacky Casas, Karl Daher, Omar Abou Khaled, and Elena Mugellini HES-SO University of Applied Sciences and Arts Western Switzerland, Bd de Pérolles 80, 1700 Fribourg, Switzerland {nicolas.crausaz,jacky.casas,karl.daher,omar.aboukhaled, elena.mugellini}@hes-so.ch
Abstract. Today, chatbots have evolved to include artificial intelligence and machine learning, such as Natural Language Understanding (NLU). NLU models are trained and run on remote servers because the resource requirements are large and must be scalable. However, people are increasingly concerned about protecting their data. To be efficient, the current NLU models use the latest technologies, which are increasingly large and resource-intensive. These models must therefore run on powerful servers to function. The solution would therefore be to perform the inference part of the NLU model directly on edge, on the client’s browser. We used a pre-trained TensorFlow.js model, which allows us to embed this model in the client’s browser and run the NLU. The model achieved an accuracy of more than 80%. The primary outcomes of NLU on edge show an effective and possible foundation for further development. Keywords: Machine learning · Artificial intelligence · Natural Language Understanding · Chatbot · Edge computing · Data privacy · Conversational agent
1 Introduction Natural Language Understanding (NLU) [1] is a technique used in the creation of conversational agents [2]. The goal is to extract the intent of the message from the input text sentence. Chatbots started by being scripted then evolved to include artificial intelligence and machine learning [3]. Nowadays, chatbots can perform as good as humans in specific domains [4]. Today, NLU models are trained and run on remote servers because the resource requirements are large and must be scalable. However, people are increasingly concerned about protecting their data and are not always willing to send it to large multinational companies to use it for their business. This behaviour is specifically very understandable in the health or financial field, where users wish to keep their data private [5]. The solution would therefore be to perform the inference part of the NLU model directly on edge, on the client’s browser, so that no data leaves the computer when chatting with a chatbot. In addition to ensuring an advanced secured confidentiality, moving the model © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 43–50, 2021. https://doi.org/10.1007/978-3-030-74009-2_6
44
N. Crausaz et al.
inference, should lead to better performance in terms of latency and scalability. In the future, customised and personalised models can be implemented. In this work, we reviewed the state of the art of existing NLU models and considered porting some of them to the client’s browser. To do so, we had to modify the current models into JavaScript so that they could run in the browser. The basic training of the model is still done on a server, but the inference phase is then done on the client-side after he has downloaded the model locally.
2 Related Work The success of deep learning in recent years is mainly due to the ever-increasing power of computers [6]. Even today’s smartphones have ever-increasing computing power. The second cause of this growth is the amount of data that keeps accumulating because it is obvious that deep learning improves with the amount [7]. Today, NLU models are formed and run on remote servers because resource requirements are large and must be scalable. The idea of moving the model inference directly “on the edge” solves several flaws of an inference performed on an external server [8, 9]. The first advantage is confidentiality. By moving the model inference to the browser, we avoid sending data across the network. This ensures optimal protection of user privacy since all information that is sent to the models does not leave the browser. This advantage is very important in certain fields such as the banking and medical sectors. The second advantage is latency. This is an essential element in areas such as video, audio, areas where data must be processed in real-time and as quickly as possible. When a large amount of data flows to the server, there is a risk of creating a queue. This queue could cause a delay because the network cannot satisfy all requests. If the model inference is done on the browser, delays can be minimised. The third advantage is scalability. Sending data across the network can lead to flexibility problems. The number of connected devices is constantly increasing [10] and the network is becoming progressively crowded. Once again, in a field like video, a source like bandwidth would become far too demanding if everything had to be sent over the network. The fourth advantage is the ability to develop a personal model for each user. Having the template on the client’s web page allows full customisation of the template. The last advantage is simply the price. Having the model hosted on the browser eliminates the cost of an external server where the NLU solution would run. Nevertheless, edge computing has its limits both in terms of resources and performance. 2.1 Machine Learning Frameworks There are several web frameworks (TensorFlow.js, Machine learning tools, Keras.js, ml5.js, nlp.js, ONNX.js, ConvNet.js, WebDNN and Brain.js) that allow you to run machine learning models on the browser. Among them is TensorFlow.js [11], an opensource library that allows to run machine learning models entirely in the browser. The library is part of the TensorFlow [12] ecosystem. This library is of particular interest to
Natural Language Understanding (NLU) on the Edge
45
us because we can import trained models into the browser. However, not every type of model can be imported. TensorFlow.js allows us to import two python model formats. The Keras models and the TensorFlow GraphDef models. The framework even makes available pre-trained models ready to use. 2.2 NLP Model TensorFlow.js provides two models for text processing: Universal Sentence Encoder (USE) [13] and Text Toxicity [14]. The Universal Sentence Encoder model will be of interest in this work. This model allows us to encode text using the “Embedding” technique. The encoded text can then be used as input for natural language processing tasks. The USE lite model [15], a lighter version than the original [16] (often used when computing resources are limited) has been converted into the Tensorflow.js GraphModel format [17] so that it can be used from the TensorFlow.js library. Unlike word embedding, which encodes a single word, this model is optimised to encode sentences, expressions and small paragraphs. This model, therefore, allows us to vectorise sentences. In order to compare the similarity between two sentences, i.e. vectors, we can use the cosine similarity formula. Universal Sentence Encoder is not the only model that has been analysed. ULMFiT [18] from fast.ai is also a model that allows us to perform NLP tasks. The model does not have a solution prepared to be deployed on the browser. Fast.ai does not plan to develop a Tensorflow Hub module that will allow easy use of the model from TensorFlow.js, but they support the idea if someone wants to start such a project. Hugging Face DistilBERT [19] is a distilled version of BERT. It is therefore smaller, faster, cheaper and lighter. It is interesting to analyse the possibilities offered by this model since it obtains very good results. DistilBERT offers several models compatible with Tensorflow that can easily be exported to SavedModel format once trained. The SavedModel format can be converted by the TensorFlow.js converter to a format that can be loaded directly into TensorFlow.js for inference. 2.3 Synthesis and Discussion The analysis of the related work of this project revealed a huge potential to run the model on the client’s browser. Moving the inference of the model directly “on the edge” allows for better confidentiality and faster inference results. It avoids overloading the network. Once the page is loaded, the model’s inference continues to work even if the network is no longer available. There is also tremendous potential for model customisation and reduction of deployment costs. After analysis, the prototype NLU solution will use the Universal Sentence Encoder model from TensorFlow.js. ULMFiT and DistilBert did not produce a working prototype. The problems are largely due to operators used by models that are not currently supported by JavaScript frameworks.
3 Methods The prototype we developed for this research is a chatbot that uses an NLU model on the edge that the user can use directly on the web browser. The chatbot is implemented in
46
N. Crausaz et al.
Vue.js. A dataset of 28 different intents and 3286 training sentences is used to train the model and serves as a basis for our study. Each one of the 3286 recordings contain the sentence in English and the corresponding intent. The intents range from “/bot.hobbies” to “/event”. The machine learning model has, therefore, to solve a multi-class classification problem with these 28 classes. The chatbot uses the NLU solution to determine the user’s intent, and so he can continue to discuss with it. The concept of the prototype is to encode the 3286 training phrases with the USE model. We then obtain 3286 vectors with 512 dimensions. Then, when the user sends a message to the bot, we encode his sentence and compare it with the other vectors. The goal is to find out which vector is most similar to the client’s sentence and determine the intent of the user’s sentence. The Cosine similarity [20] formula is used to determine the similarity between the vectors. The formula used to calculate the similarity between two n-dimensional vectors is described in Eq. 1. The cosine similarity formula returns a matrix. We calculate the average of the elements present in this matrix and, the closer the value is to 1, the greater the similarity between the two sentences. similarity = cos(θ ) =
A·B AB
(1)
3.1 Encoding In order not to overload the processors of the client’s computer, but also to speed up the process, we will not encode the sentences at each passage. Instead, we choose to encode these 3286 sentences in advance, once only. A script was written to encode the sentences and save the tensors in a JSON file, as seen in Fig. 1. Loading the 3286 tensors in the browser would be too greedy. For each intent of the text classification problem, a “medium” tensor will be computed. The first step to achieve this is to group all tensors that have the same intent. Then, we will use the torch.stack method from PyTorch [21] which will allow us to concatenate all the tensors of the same dimension into a single tensor (n, 512) with n being the number of tensors for this category. To calculate the “medium” tensor, we now use the torch.mean method which takes as input our tensor (n, 512). We specify the dimension at 0 for a correct calculation of the mean and to obtain a tensor dimension (1, 512) as a result. By performing this work for each intent, we finally end up with 28 tensors, each of them representing an intent. Now we have 28 tensors representing our 28 categories. We want to save them so that we can reuse them in the browser. The tensors and their intention are saved in a JSON file. 3.2 Inference on the Edge The first task “on the edge” is to load our 28 tensors from the JSON file. When the user sends a message to the bot, the user’s sentence is encoded using the USE model. We then compute the similarity only between the user’s tensor (sentence previously encoded) and our 28 tensors. Figure 2 shows this process. The similarity score is then retrieved for
Natural Language Understanding (NLU) on the Edge
47
Fig. 1. Encoding
each intention. To normalise the results to 1, we have implemented a softmax function [22] in JavaScript. We get the percentage of similarity for each intention and the chatbot can use these results to continue the discussion with the user.
Fig. 2. Inference on the browser
In order to improve the performance of the first prediction, which can take some time, it is smart to “warm up” the model by passing an input tensor of the same shape before using the actual data [23].
4 Results Currently, the prototype is compatible with most browsers (Chrome, Firefox, Safari, Edge) except Internet Explorer which does not support the use of the TensorFlow.js library, including the Universal Sentence Encoder model.
48
N. Crausaz et al.
4.1 Model Evaluation To calculate the performance of the learning machine model on the dataset, the data is split 90% for training and 10% for validation. The results metrics we used are precision, recall and F-score. We also used classification reports and confusion matrix from Sklearn [24] during the testing phase. The version of the Universal Sentence Encoder we used is the basic model called “Lite”. The different results we get with the model are shown in Table 1. Table 1. Model performance Precision Recall F1-score 0.817
0.796
0.794
Other NLP models have also been evaluated on the same dataset. In terms of comparison, DistilBert [19] obtains a precision of 0.925, DIET [25] of Rasa a precision of 0.897 and ULMFiT [18] of fast.ai obtains a precision of 0.906 (weighted average). The developed prototype uses the USE model, which obtained the worst performances, but the figures are nevertheless acceptable for such a task. 4.2 Inference Time and Weight on the Edge Several tests were performed to calculate the inference time. The tests were performed with sentences of different lengths. Out of a total of 20 tests, an average inference time of 3.24 s was obtained. The inference time varies and sometimes we can even exceed 6 s with a response from our NLU solution. Note also that when the model is questioned with the same sentence, the inference time decreases with each reiteration. Concerning the data that is loaded on the browser, there are two elements necessary for the operation of the learning on the edge machine: the JSON file that contains the 28 tensors and which is 533 KB which is really reasonable and the Universal Sentence Encoder model. The size of the model is 247 KB and the size of the vocabulary is 281 KB. With the weights of the model we reach a total of 29.3 MB for the use of the model with the tensors. Table 2 shows the different weights of the models trained on the same data. The USE model is a huge size saving compared to another model. Table 2. Weight of the models Model
Weight
USE
29.3 MB
ULMFiT
78.5 MB
DistilBERT 758.6 MB
Natural Language Understanding (NLU) on the Edge
49
5 Discussion and Conclusion Concerning the prototype, the use of the NLU on the edge solution for the chatbot is somewhat tricky. As demonstrated in Sect. 4.2, the inference time varies, but it is more significant than an inference on an external server. In a chat interaction, the user needs a quick answer, otherwise he does not feel like he is chatting live, and he quickly loses interest. The performance of the model could be improved. By analysing the tensors in detail, we could find a better solution to compute the average tensors. We could also find a solution to reduce the inference time, but this requires further analysis. The purpose of this paper is to show that we are at the dawn of a new era, an era where even the most complex models can be made lighter and run on a browser. JavaScript frameworks that allow machine learning are booming. We even have frameworks that allow deep learning on smartphones. Moreover, the advantages of running a model “on the edge” are very real. The preliminary results show the potential held behind the on edge technology. Deeper research and further development are needed to improve the qualitative outcome.
6 Future Work It would be really interesting to get another working NLP model that can solve the text classification problem on the browser. This would provide a way to really compare different models. We have already looked at other models and other ways to import models into the browser. The most common problem is that sometimes the operators used by the model are simply not supported by JavaScript frameworks. Of course, it is only a matter of time before these problems are resolved.
References 1. Jung, S.: Semantic vector learning for natural language understanding. Comput. Speech Lang. 56, 130–145 (2019) 2. Ramesh, K., Ravishankaran, S., Joshi, A., Chandrasekaran, K.: A survey of design techniques for conversational agents. In: Information, Communication and Computing Technology, pp. 336–350. Springer Singapore (2017) 3. Rahman, A.M., Mamun, A.A., Islam, A.: Programming challenges of chatbot: current and future prospective. In: 2017 IEEE Region 10 Humanitarian Technology Conference (R10HTC), pp. 75–78 (2017) 4. Adamopoulou, E., Moussiades, L.: An overview of chatbot technology. In: Artificial Intelligence Applications and Innovations, pp. 373–383. Springer International Publishing (2020) 5. Spring, T., Casas, J., Daher, K., Mugellini, E., Abou Khaled, O.: Empathic response generation in chatbots. In: Proceedings of the 4th Swiss Text Analytics Conference (SwissText 2019), CEUR-WS, Winterthur (2019) 6. Baji, T.: Evolution of the GPU device widely used in ai and massive parallel processing. In: 2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM), pp. 7–9 (2018) 7. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 8–12 (2009)
50
N. Crausaz et al.
8. Ma, Y., Xiang, D., Zheng, S., Tian, D., Liu, X.: Moving deep learning into web browser: how far can we go? In: The World Wide Web Conference, pp. 1234–1244. Association for Computing Machinery, New York (2019) 9. Chen, J., Ran, X.: Deep learning with edge computing: a review. Proc. IEEE. 107, 1655–1674 (2019) 10. Internet of Things - active connections worldwide 2015–2025. https://www.statista.com/sta tistics/1101442/iot-number-of-connected-devices-worldwide/ 11. Smilkov, D., Thorat, N., Assogba, Y., Yuan, A., Kreeger, N., Yu, P., Zhang, K., Cai, S., Nielsen, E., Soergel, D., Bileschi, S., Terry, M., Nicholson, C., Gupta, S.N., Sirajuddin, S., Sculley, D., Monga, R., Corrado, G., Viégas, F.B., Wattenberg, M.: TensorFlow.js: Machine Learning for the Web and Beyond (2019) 12. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016) 13. Universal Sentence Encoder lite converted for Tensorflow.js. https://github.com/tensorflow/ tfjs-models/tree/master/universal-sentence-encoder 14. Toxicity Classifier. https://github.com/tensorflow/tfjs-models/tree/master/toxicity 15. Universal Sentence Encoder lite. https://tfhub.dev/google/universal-sentence-encoder-lite/2 16. Cer, D., Yang, Y., Kong, S.-Y., Hua, N., Limtiaco, N., St. John, R., Constant, N., GuajardoCespedes, M., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., Kurzweil, R.: Universal Sentence Encoder (2018) 17. Load Graph Model. https://js.tensorflow.org/api/1.0.0/#loadGraphModel 18. Howard, J., Ruder, S.: Universal Language Model Fine-tuning for Text Classification (2018) 19. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019) 20. Wikipedia contributors: Cosine similarity — Wikipedia, The Free Encyclopedia (2020) 21. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 8026–8037. Curran Associates, Inc. (2019) 22. Wikipedia contributors: Softmax function — Wikipedia, The Free Encyclopedia (2020) 23. Tensorflow for JS, Platform and environment. https://www.tensorflow.org/js/guide/platform_ environment 24. Sklearn metrics. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics 25. Bunk, T., Varshneya, D., Vlasov, V., Nichol, A.: DIET: Lightweight Language Understanding for Dialogue Systems (2020)
Exploiting Home Infrastructure Data for the Good: Emergency Detection by Reusing Existing Data Sources Sebastian Wilhelm(B) Deggendorf Institute of Technology, Technology Campus Grafenau, Hauptstrasse 3, Grafenau, Germany [email protected]
Abstract. Monitoring people within their residence can enable elderly to live a self-determined life in their own home environment for a longer period of time. Therefore, commonly activity profiles of the residents are created using various sensors in the house. Deviations from the typical activity profile may indicate an emergency situation. An alternative approach for monitoring people within their residence we investigates within our research is reusing existing data sources instead of installing additional sensors. In private households there are already numerous data sources such as smart meters, weather station, routers or voice assistants available. Intelligent algorithms can be used to evaluate this data and conclude on personal activities. This, in turn, allows the creation of activity profiles of the residents without using external sensor technology. This work outlines the research gap in reusing existing data sources for Human Activity Recognition (HAR) and emergency detection, which we intend to fill with our further work. Keywords: Human Activity Recognition (HAR) · Human Presence Detection (HPD) · Ambient Assisted Living (AAL) · Activity monitoring · Presence detection · Ambient sensor · Emergency detection
1 Introduction Human Activity Recognition (HAR) is used for numerous applications such as controlling smart homes, detecting burglars, monitoring health or detecting emergency situations within the residence [1, 7, 22]. Many of these applications are referred to as Ambient-Assisted Living (AAL) systems with the objective of improving people’s quality of life [1]. Especially for elderly people these systems can support them to live as long as possible in their own homes e.g., by using mobile emergency response systems, fall detection systems or video surveillance. Further AAL systems provides support with daily activities, based on monitoring activities [24]. Most of the currently available AAL systems use sensor data to monitor the activities of the residents. In practice, motion detectors, fall detectors, fall mats or window/door contact sensors are used for such applications [24, 26, 30]. However, it’s usually expensive and time-consuming to fully equip existing residences with suitable environmental © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 51–58, 2021. https://doi.org/10.1007/978-3-030-74009-2_7
52
S. Wilhelm
sensors. Another disadvantage of most existing systems is that the components cannot be integrated completely inconspicuously into the apartment, which often gives the residents the feeling of being monitored [12]. An alternative approach for monitoring residents inside their home environment we investigates within our research is to reuse existing data sources instead of installing additional sensors. In private households there are already numerous data sources such as smart meters, digital water meters, weather stations, routers, mobile phones or voice assistants available. Intelligent algorithms can be used to evaluate this data and conclude on personal activities. This, in turn, allows the creation of comprehensive activity profiles of the residents. Deviations from the typical behavior profile of a person can indicate emergency situation (see Sect. 3). The main contribution of this paper is to outline the current state of research and commercial available systems for emergency detection by focusing on ambient sensors. In particular, the current approaches for HAR are considered and common methods for detecting potential emergency situation using behavior data are presented. The remaining of the paper is structured as follows: Sect. 2 provides a general overview over the AAL domain and introduces several use cases for HAR in the residence. Subsequently with Sect. 3 common methods for identifying potential emergency situations on the basis of HAR data are outlined. The current state of art in HAR and commercial available systems for emergency detection are presented in Sect. 4. Finally, in Sect. 5 we outline the potential of reusing home infrastructure data for HAR before concluding the paper in Sect. 6, which also highlights the research gap we intend to fill with our further work.
2 Ambient-Assisted Living According to DIN SPEC 91280 [1], AAL is defined as’concepts, products and services that combine and improve new technologies and the social environment with the aim of enhancing the quality of life for people at all stages of life’. Furthermore, an AAL system is defined as a’reactive, networked technical system that interacts with the environment with the aim of increasing the quality of life’ [1]. Most AAL systems focus on people with need of assistance – often senior citizens – with the aim of enabling them to live a self-determined life at home until old age [4]. In literature, there are four main fields of action in which AAL systems can be usefully applied: • Health: In the health sector, AAL systems can be used both as an acute warning system to identify emergency situations or as a monitoring tool for long-term observation of the state of health. In contrast to common emergency call systems, where the resident himself or herself has to actively request help, AAL systems are supposed to recognize automatically when a potential emergency situation exists. For this purpose, typical activity profiles of the residents are first created using measured values from various sensors in the house (see Sect. 4). This results in a formal as well as person- and environment-specific model regarding normal behavior. The deviation of a person’s behavior from his or
Exploiting Home Infrastructure Data for the Good
53
her typical normal behavior on a specific day is then examined. In case of a deviation beyond a certain threshold a potential emergency situation is detected (see Sect. 3) [1, 15, 20, 24]. Alternatively, it is possible to monitor the state of health using AAL systems and apply this in a telemedical environment. For example, it is possible to monitor medication intake or continuously analyze vital data. Numerous investigations also deal with the screening of dementia diseases [1, 5, 10, 24]. • Social interaction: AAL systems can be used to prevent social isolation. It is possible - e.g., in case of a deduction of a changed emotional state of the resident - to attend to the residents via an audio-visual or auditory connection to service providers or support persons [1, 5, 10, 24]. • Household, Supplies and Comfort: AAL systems can be used to initiate various services within the household. These include, for example, cleaning services or weather-indexed services [1, 10, 24]. • Security within the home environment: AAL systems can be used for surveillance inside the living environment. The possibilities range from burglary protection to the detection of fires, smoke or leaks [1, 10, 24].
3 Emergency Detection Using Behavioral Data In order to identify potential emergency situations on the basis of HAR data of the residents, numerous methods have already been investigated, the most common are briefly presented in this section. The approaches in general are based on the fact that the human daily routines follow a regular rhythm of about 24 h – this is called the circadian rhythm. Apart from the sleep-wake cycle, the human body is following this rhythm with the body temperature, weight, muscle power and blood pressure [16]. This circadian rhythm is also observed in a person’s daily activities (sleeping, waking up, eating, leisure, etc.) [27]. • Inactivity Pattern: Floeck and Litz [13, 14] have developed a method for detecting potential emergency situations within the residence based on inactivity pattern. The duration of inactivity dinact is measured. Inactivity ends with an input signal which indicates any human activity. If the duration of inactivity dinact rises above a certain threshold value δ, a potential situation of emergency is indicated. • Histogram Based Approach: Virone et al. [27, 28] has introduced a system that allows to monitor people within the rooms of their own residence, and thus detect deviations in the daily routine. For this purpose, residences are equipped with motion detectors, so that it can be determined at any time and for any room whether a person is in a room or not. They examine time slots of one hour each. Based on this data, the mean value (mean) and the standard deviation (std ) of the presence of a person in a specific room is calculated for each timeslot. Over a longer period of time, an approximation to the typical habits of the resident is obtained. If the current occupation rate with the statistical occupation profile for a specific room shows a deviation above a certain threshold value δ, an alert is triggered.
54
S. Wilhelm
• Probabilistic Model of Behavior (PMB): Cardinaux et al. [6] and Barger et al. [3] have developed models that detects deviations from a residents normal behavior using → a PMB. They therefore define − x as an activity vector, which represents a specific user behavior. Assuming that a set of behavior can be modeled as a mixture of Ng Gaussian distributions – Gaussian Mixture Model. Cardinaux et al. are describing a → normal behavior as λ. The probability that a new observed behavior − x represents a − → → normal behavior is defined asP( x |λ). It is now assumed that an activity − x represents − → a normal user behavior if x does not exceed a specified threshold value δ. The threshold value δ is determined by Cardinaux et al. [6] using the log-likelihoods resulting from a training set. • Circadian Rhythm Score (CRS): With the CRS Elbert et al. [11] present a method which combines the approaches of Virone et al. [27, 28] and Cardinaux et al. [6]/Barger et al. [3] and allows to observe deviations in daily behavior over a longer period of time. The primary intention is to support people with dementia by (i) identifying deterioration in their health status and (ii) ensuring the efficiency of any reminder services already in use. The core element of the CRS is the PMB. This enables the calculation of a value that describes the probability that a dedicated activity corresponds to normal user behavior (score). The average of several of these probabilities for a similar activity (e.g., meal preparation) is then calculated. This average value forms the final score for this specific activity. Since some activities need to be prioritized, these are then weighted. The final CRS sub-score is then calculated by combining all weighted activity scores. This score can now be calculated for fixed time periods (e.g., 24 h). This allows a long-term trend to be determined. However, this system is not suitable for detecting acute emergencies. An evaluation of the approach is pending.
4 State of the Art Our further work intends to develop an AAL system which is able to detect potential emergencies in the residences of elderly people by reusing data from the home infrastructure for HAR. For highlighting the research gap in this field we outline how currently HAR is commonly implemented in Sect. 4.1 before we present commercial available systems for ambient emergency detection with Sect. 4.2. 4.1 Common Systems for Human Activity Recognition Currently HAR is usually done by using various sensor data. Hence, it can be differentiated between ambient and vital sensor technologies. On the one hand the ambient sensors are integrated in the (home) environment like motion sensors, cameras or pressure mates. On the other hand vital sensors are worn directly on the body of the person to be monitored for example smart watches or sensors integrated into the clothing can be named [24, 26, 30]. There exist already various survey papers which investigates the different al- ready existing sensor technologies for HAR. The works form Rashidi and Milhailidis [24], Uddin et al. [26] and Wilhelm [30] are just worth mentioning.
Exploiting Home Infrastructure Data for the Good
55
It can be observed that the reuse of already existing data sources is rarely considered. Only the reuse of smartphone [7, 18, 21, 25], Wireless Local Area Network (WiFi) [17, 23, 29, 33], smart meter (power) [8, 9, 32] or home weather station [31] data for HAR is already investigated in literature. However, none of the studies so far focuses on identifying emergencies on the basis of the disaggregated data. 4.2 Commercial AAL Systems for Emergency Detection We conducted an online survey of commercial AAL systems available in the Germanspeaking countries, which are able to detect emergency situation in the household without the use of vital sensors and where the resident does not have to actively call for help (e.g., via an emergency button). In May 2020 we identified the following systems: • BeHome • casenio • Copilot • easierLife Home • NoFal™ • Gigaset smart care • Grannyguard • PAUL • ProLiving® SmartHome Plus+• SensFloor® Care • Tunstall Lifeline Smart Hub • Vitracom® Safe@home • Walabot Home. All these systems use external sensors – mostly motion sensors (e.g., Passive Infrared Motion Sensor (PIR)). Seamless, retrofittable integration into the existing building installation is not possible with any of these systems. In general, it can be concluded that the commercially available systems differ in two categories: (i) systems for (acute) fall detection and (ii) systems which are integrated into smart home environments. In the case of fall detection systems, high-precision sensor technology is usually used to determine whether a person is lying on the floor for a long time. In contrast, the systems integrated in smart home environments, use algorithms that detect unusually long periods of inactivity and interpret this as an emergency situation.
5 Reusing Home Infrastructure Data Wilhelm [30] surveyed which already existing data sources from the residential infrastructure are accessible and potential suitable for HAR. The author therefore systematically identified available data sources by examining the popular open source home automation software systems HomeAssistant 1 and OpenHAB2 and evaluated them individually based on the literature. Wilhelm identified over 45 potential suitable data sources which can exist in the home infrastructure and which could be used for HAR. However, only for a few of these sources have studies been conducted to confirm that these sources can actually be used for HAR. The literature has so far only confirmed for WiFi [17, 23, 29, 33], smart meter (power) [8, 9, 32] and home weather stations [31] that these ambient data sources are suitable for HAR.
6 Conclusion and Further Work This paper outlines both the current state of research and available market solutions in the field of intelligent, ambient AAL technologies for identifying potential emergency situations within the residence. 1 https://www.home-assistant.io. 2 https://www.openhab.org.
56
S. Wilhelm
It is shown that currently proprietary sensor technology for HAR is used and only the reuse of WiFi [17, 23, 29, 33], smart meter (power) [8, 9] or home weather station [31] data to detect activities of persons within the home environment is investigated [30]. Commercially available solutions for emergency detection – which do not use special fall sensors – are based almost exclusively on data collected by motion sensors. Reusing further existing data sources home infrastructure (e.g., water consumption data, smart devices) is not considered yet by commercial systems. In general, when reusing data from the home infrastructure, it is first necessary to disaggregate this data in order to draw conclusions about user activity. This can be done using intelligent algorithms (e.g., Machine Learning [2], Deep Learning [19]). However, it is to be expected that a certain error in recognition will creep in (false positives/false negatives). In consequence, it is necessary to consider the error probability of disaggregation in the behavioral model. Current methods for identifying potential emergency situations using behavioral data do not provide the possibility to process probability weighted activity data. In addition, the individual data sources are usually considered independently of each other, although the use of multi-component ambient sensor technologies would increase the quality of the systems [5, 26]. The aim of our further work is to fill this gap in research of reusing existing data sources from the home infrastructure to identify unusually long periods of inactivity that could indicate a potential emergency situation. Therefore we intend to disaggregate data from commercially available smart meters for power and water consumption. Furthermore, a generic model for the generation of individual activity profiles in private households will be developed and evaluated in a practical environment, which allows to process probability-weighted activity data. Acknowledgments. This work was funded by the Bavarian State Ministry of Family Affairs, Labor and Social Affairs.
References 1. DIN SPEC 91280:2012-09, technikunterstütztes leben (AAL) - Klassifikation von Dienstleistungen für technikunterstütztes leben im bereich der Wohnung und des direkten Wohnumfelds. https://doi.org/10.31030/1909231 2. Alpaydin, E.: Introduction to Machine Learning. MIT Press (2020) 3. Barger, T.S., Brown, D.E., Alwan, M.: Health-status monitoring through analysis of behavioral patterns. IEEE Trans. Syst. Man Cybern. - Part A: Syst. Hum. 35(1), 22–27 (2005). https:// doi.org/10.1109/tsmca.2004.838474 4. Braun, A., Kirchbuchner, F., Wichert, R.: Ambient assisted living. In: eHealth in Deutschland, pp. 203–222. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49504-9_10 5. Calvaresi, D., Cesarini, D., Sernani, P., Marinoni, M., Dragoni, A.F., Sturm, A.: Exploring the ambient assisted living domain: a systematic review. J. Ambient Intell. Hum. Comput. 8(2), 239–257 (2016). https://doi.org/10.1007/s12652-016-0374-3 6. Cardinaux, F., Brownsell, S., Hawley, M., Bradley, D.: Modelling of behavioural patterns for abnormality detection in the context of lifestyle reassurance. In: Lecture Notes in Computer Science, pp. 243–251. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-54085920-8_30
Exploiting Home Infrastructure Data for the Good
57
7. Chen, L., Hoey, J., Nugent, C.D., Cook, D.J., Yu, Z.: Sensor-based activity recognition. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 790–808 (2012). https://doi.org/10.1109/ tsmcc.2012.2198883 8. Clement, J., Ploennigs, J., Kabitzsch, K.: Smart meter: detect and individualize ADLs. In: Ambient Assisted Living, pp. 107–122. Springer, Heidelberg (2012). https://doi.org/10.1007/ 978-3-642-27491-6_8 9. Clement, J., Ploennigs, J., Kabitzsch, K.: Detecting activities of daily living with smart meters. In: Ambient Assisted Living, pp. 143–160. Springer, Heidelberg (2013). https://doi.org/10. 1007/978-3-642-37988-8_10 10. Dohr, A., Modre-Opsrian, R., Drobics, M., Hayn, D., Schreier, G.: The internet of things for ambient assisted living. In: 2010 Seventh International Conference on Information Technology: New Generations. IEEE (2010). https://doi.org/10.1109/itng.2010.104 11. Elbert, D., Storf, H., Eisenbarth, M., Ünalan, Ö., Schmitt, M.: An approach for detecting deviations in daily routine for long-term behavior analysis. In: Proceedings of the 5th International ICST Conference on Pervasive Computing Technologies for Healthcare. IEEE (2011). https:// doi.org/10.4108/icst.pervasivehealth.2011.246089 12. Eldib, M., Deboeverie, F., Philips, W., Aghajan, H.: Behavior analysis for elderly care using a network of low-resolution visual sensors. J. Electron. Imaging 25(4), 041003 (2016). https:// doi.org/10.1117/1.jei.25.4.041003 13. Floeck, M., Litz, L.: Activity- and inactivity-based approaches to analyze an assisted living environment. In: 2008 Second International Conference on Emerging Security Information, Systems and Technologies. IEEE (2008). https://doi.org/10.1109/securware.2008.22 14. Floeck, M., Litz, L.: Inactivity patterns and alarm generation in senior citizens’ houses. In: 2009 European Control Conference (ECC). IEEE (2009). https://doi.org/10.23919/ecc.2009. 7074979 15. Fouquet, Y., Franco, C., Demongeot, J., Villemazet, C., Vuillerme, N.: Telemonitoring of the elderly at home: real-time pervasive follow-up of daily routine, automatic detection of outliers and drifts. Smart Home Syst. 121–138 (2010). https://doi.org/10.5772/8414 16. Germain, A., Kupfer, D.J.: Circadian rhythm disturbances in depression. Hum. Psychopharmacol.: Clin. Exp. 23(7), 571–585 (2008). https://doi.org/10.1002/hup.964 17. Gu, Y., Ren, F., Li, J.: PAWS: passive human activity recognition based on WiFi ambient signals. IEEE Internet of Things J. 3(5), 796–805 (2016). https://doi.org/10.1109/jiot.2015. 2511805 18. Hassan, M.M., Uddin, M.Z., Mohamed, A., Almogren, A.: A robust human activity recognition system using smartphone sensors and deep learning. Future Gener. Comput. Syst. 81, 307–313 (2018). https://doi.org/10.1016/j.future.2017.11.029 19. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https:// doi.org/10.1038/nature14539 20. Munstermann, M.: Technischunterstützte Pflege von morgen. Springer Fachmedien Wiesbaden (2015). https://doi.org/10.1007/978-3-658-09797-4 21. Parra, L., Sendra, S., Jiménez, J.M., Lloret, J.: Multimedia sensors embedded in smartphones for ambient assisted living and e-health. Multimed. Tools Appl. 75(21), 13271–13297 (2015). https://doi.org/10.1007/s11042-015-2745-8 22. Perkowitz, M., Philipose, M., Fishkin, K., Patterson, D.J.: Mining models of human activities from the web. In: Proceedings of the 13th Conference on World Wide Web - WWW 2004. ACM Press (2004). https://doi.org/10.1145/988672.988750 23. Pu, Q., Gupta, S., Gollakota, S., Patel, S.: Whole-home gesture recognition using wireless signals. In: Proceedings of the 19th Annual International Conference on Mobile Computing & Networking - MobiCom 2013. ACM Press (2013). https://doi.org/10.1145/2500423.250 0436
58
S. Wilhelm
24. Rashidi, P., Mihailidis, A.: A survey on ambient-assisted living tools for older adults. IEEE J. Biomed. Health Inform. 17(3), 579–590 (2013). https://doi.org/10.1109/jbhi.2012.2234129 25. Reyes-Ortiz, J.L., Oneto, L., Samà, A., Parra, X., Anguita, D.: Transition-aware human activity recognition using smartphones. Neurocomputing 171, 754–767 (2016). https://doi.org/10. 1016/j.neucom.2015.07.085 26. Uddin, M., Khaksar, W., Torresen, J.: Ambient sensors for elderly care and independent living: a survey. Sensors 18(7), 2027 (2018). https://doi.org/10.3390/s18072027 27. Virone, G., Noury, N., Demongeot, J.: A system for automatic measurement of circadian activity deviations in telemedicine. IEEE Trans. Biomed. Eng. 49(12), 1463–1469 (2002). https://doi.org/10.1109/tbme.2002.805452 28. Virone, G., Alwan, M., Dalal, S., Kell, S.W., Turner, B., Stankovic, J.A., Felder, R.: Behavioral patterns of older adults in assisted living. IEEE Trans. Inf. Technol. Biomed. 12(3), 387–398 (2008). https://doi.org/10.1109/titb.2007.904157 29. Wang, W., Liu, A.X., Shahzad, M., Ling, K., Lu, S.: Understanding and modeling of WiFi signal based human activity recognition. In: Proceedings of the 21st Annual International Conference on Mobile Computing and Networking - MobiCom 2015. ACM Press (2015). https://doi.org/10.1145/2789168.2790093 30. Wilhelm, S.: Activity-monitoring in private households for emergency detection: a survey of common methods and existing disaggregable data sources (2021). https://doi.org/10.5220/ 0010180002630272 31. Wilhelm, S., Jakob, D., Ahrens, D.: Human presence detection by monitoring the indoor CO2 concentration. In: Proceedings of the Conference on Mensch Und Computer, MuC 2020, pp. 199–203. Association for Computing Machinery, New York (2020). https://doi.org/10. 1145/3404983.3409991 32. Wilhelm, S., Jakob, D., Kasbauer, J., Ahrens, D.: GeLaP: German labeled dataset for power consumption (2021, to appear) 33. Xie, X., Xu, H., Yang, G., Mao, Z.H., Jia, W., Sun, M.: Reuse of WiFi in- formation for indoor monitoring of the elderly. In: 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI). IEEE (2016). https://doi.org/10.1109/iri.2016.41
Towards a General and Complete Social Assistive Robotic Architecture Bilal Hoteit1(B) , Ahmad Faour2 , Ali Abdallah2 , Imad Alex Awada1 , Alexandru Sorici1 , and Adina Magda Florea1 1 Politehnica University of Bucharest, Splaiul Independentei 313, 060042 Bucharest, Romania 2 Lebanese University, 6573/14, Badaro, Museum, Beirut, Lebanon
Abstract. Social and service robots are designed to achieve complex tasks that are increasing daily. Cloud resources enhance service robots with strong computing capabilities and higher data storage centers. One of the major drawbacks of the cloud robotics model is the network latency that edge computing addresses. Considering the capabilities that a service robot must have to achieve somehow human-like capabilities, we propose a hybrid service-oriented fog robotic architecture. The proposed architecture follows the fog robotics approach to distribute the computations through the cloud, edge, and robot in a clear way. Robotic control systems are significantly varied based on the required tasks, environment, robot specifications, user and business needs, or programming tools that were used. The paper introduces several concepts to facilitate the designing or developing process of the robotic system based on the desired robot’s type and functions. Furthermore, automated planning, one of the most important factors to an intelligent and autonomous robot, has been evaluated based on several scenarios. These cases address the computational and the planning aspects through either the Edge or the Cloud server. Keywords: Robotics · Software engineer · Fog architecture · Artificial intelligence
1 Introduction Several architectures for the social service robot were proposed, and these structures differed according to the user’s needs, the environment, or the required tasks. Humanoid has been used as a service robot, and it was very successful in an organized environment. Currently, robots can perform highly specialized tasks but their operations are restricted by either a narrow range of environments and objects or to specific tasks. These robots significantly lack many resources, such as computing and storage, which prevent them from performing many diverse and complex tasks. The term “network robots” has been adopted by many researchers as complex tasks can be easily accomplished using cooperative robots [1]. Moreover, a single robot can use the network to perform part of its computational processes on a remote server. With the advancement of network technology, robotic devices, and general devices such © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 59–67, 2021. https://doi.org/10.1007/978-3-030-74009-2_8
60
B. Hoteit et al.
as CPUs, a robot can accomplish complex tasks, make more accurate decisions, and consume less energy. However, the network robot has restricted to the local network’s resources. To remove these limitations, researchers have proposed cloud technologies to support robotic systems with flexible resources [2]. Cloud technology removes the burden of information and computation restrictions from a robot. The cloud-robot model has suffered from many issues especially in catering to several time-sensitive robot applications. Therefore, network latency, computation location and workload must be taken into account during the design of any robotic system. Edge computing [3] has been proposed to enhance the cloud computing model by moving computing and storage resources close to the robot. Fog Robotics [4] is expanding Edge computing and is expected to become widespread in the next few years. Traditionally, three different planning paradigms have been used to control the robot: a case-based approach, a cognitive architecture, or AI symbolic approach. The latter is the most important since it maintains a goal-directed behavior. Robot Operating System (ROS) [5] has become a standard framework to design a robotic control system. ROSPlan framework [6] has been proposed as a modular framework, adopts the standard PDDL language [7] and ROS. Since, the proposed architecture is designed based on ROS, it uses the ROSPlan as a task planning and executing framework to increase the autonomy of the system. We implement the planning approach as a distributed phase to support the robot with human-like capabilities. This paper proposes a generic, modular, and flexible robotic architecture followed the fog approach. A Hybrid Services Robotic Architecture (HSRA) consists of several components and distributes among the network. The paper introduces several concepts to highlight the stage of robotic system decomposition.
2 Related Work The centralized cloud infrastructure provides a set of shared computing and storage resources that allow robots to load compute-intensive tasks for remote execution, resulting in “remote brain” robots. Several approaches to cloud robotics have been proposed and focused either on big data resources or on the computational power of the cloud. Bekris et al. [8] proposed an architecture that effectively plans the motion of the robot’s manipulator by splitting the planning task between the robot and the cloud. Luis Riazuelo et al. [9] proposed a system that generates a semantic map and merges VSLAM, object recognition system and RoboEarth together. Liu et al. [10] proposed a cloud-enabled robotic system that executes processes on either the cloud or the robot while the decision depends on the availability of the communication with the cloud (round-trip delay time (RTT)). A cloud-enabled network robotics system has been proposed and consists of a group of robots, many devices and cloud infrastructure to carry out the search and rescue mission in disaster management [11]. A social service robot, pepper-type, simulates the human’s motion by recording and sending the streaming video to a cloud that accomplish the huge computation in an efficient manner [12]. On the contrary, other proposals were made, and they only relied on a local server. Recently, a Fog Robotic system relying on an edge and cloud server, is introduced to facilitate the recognition and tracking tasks [13]. In conjunction with many software architectures, planning is one of the most important factors for the success of any robotic system. The mobile intelligent robot relies on a
Towards a General and Complete Social Assistive Robotic Architecture
61
Soar cognitive architecture that integrates planning and reasoning in order to obtain some cognitive abilities [14]. The Pepper robot was designed with several capabilities such as interacting with humans and navigating an internal environment, and it is controlled through a pre-assembled plan consisting of several behaviours [15]. Alternatively, AI symbolic planning can be used, specifically the ROSPlan framework which has integrated with AUVs in [16]. The systems used POPF [17] as an AI planner to generate a temporal plan during runtime. The ROSPlan framework has been extended to adopt contingent [18] and probabilistic planning [19].
3 Methodology This paper introduces a hybrid and multi-layered robotic engineering design in which several cognitive and service robotic applications are distributed based on the fog paradigm. The software architecture is presented not only to describe the functions assigned to robotic applications or how the system components collaborate, but also how to distribute the robotic components and secure the intensive resources for all applications to support the robot in achieving complex cognitive tasks. HSRA consists of several components, which are implemented as a black box, while others can be ROS robotic applications. These components can communicate with each other either as a simple case (invoking specific API to send or receive information) or as a complex case (many components work together to achieve a task). HSRA implements both a layered architecture and a behaviour-based architecture together. HSRA is a layered architecture where it implements sense, plan, and act phases iteratively, as shown in Fig. 1. On the other hand, several components provide specific skills and adopt the behavior-based model from a functional perspective.
Fig. 1. Three layered ROS-enabled robotic system.
HSRA consists of the cloud platform and local platform. The local platform is a single robot that makes use of the local network to communicate with a remote local server (edge design) and can be supported by a smart environment. A robot can connect to the cloud directly or indirectly (using the local server as a small cloud or access point). HSRA is designed to use both the cloud side and the local side simultaneously. The robot communicates with the local server over an ad hoc wireless network and with the cloud, as in the direct mode, over a wireless network using an Internet connection. In either case, due to the limitations of a single robot, resource intensive applications will be executed remotely, as shown in Fig. 2. The system is only supported by the local server, but with great performance with the availability of the cloud. Extensive computation
62
B. Hoteit et al.
is performed on the local or cloud infrastructure, and the resulting data is sent back to the robot. The proposed architecture distributes resource-intensive workloads across the robot, the local server, and the cloud server.
Fig. 2. High-level architecture overview
We push for static task allocation as components are distributed cloud and edge at design time. Our decision not only takes into account the contribution related to the characteristics of the proposed architecture, but also the issues raised in the context of the dynamic task assignment. Dynamic task assignment can degrade robotic systems due to the diversity and difference between these systems and mobile applications [11]. However, finding an appropriate balance of performance and cost is challenging in the robotic context [13]. 3.1 System Architecture Generally, network availability affects system performance and mobile robot increases communication problems due to crashes or unexpected obstacles in the environment. HSRA can easily solve such a problem as follows. Cloud infrastructure complements local infrastructure. In general, the motion planning task requires complex computation. Therefore, the grasping application will be implemented on the cloud infrastructure, because it requires a huge amount of data but it is not a time-sensitive application. While the navigation application will be implemented on the local infrastructure due to its classification as a time-sensitive application. Moreover, the grasping task itself can be split into both servers as object detection can be performed on the cloud and the grasping plan is generated on the local. A local infrastructure can be a cloud backup during a bad internet connection. For example, the ROSPlan framework can be used to create only one task plan on a local server while it can create a multitasking plan on the cloud server. The robots can rely on data collected in a first or separate step, as an environment map. A local copy of the object models is required to perform processing or detection tasks on the local server. The map and several information can be downloaded from the cloud to local storage on startup and upon request. Moreover, based on the robotic applications, the local server can also query specific information from the cloud. The information is serial updated while on good internet connection and on demand. These steps will help the robot to achieve the tasks during an internet disconnection.
Towards a General and Complete Social Assistive Robotic Architecture
63
While internet connection is lost, our system will perform all tasks on local server. On the other hand, during a stable internet connection, our system executes resourceintensive tasks that have no time sensitivity to increase the performance of specific robotic application. Figure 3 illustrates the HSRA.
Fig. 3. The Proposed Fog Robotic System (HSRA).
3.2 System Components For the sake of this paper, we have provided a brief summary of only two dedicated robotic components. More details on several robotic applications can be found in [16] which includes many extended robotic applications. Task Component. This component is responsible for scheduling, managing, and executing tasks. The Task component accepts multiple tasks from the HRI component, from internal processes, or from external applications, and it saves jobs in a specific task queue on a local server. This component is also responsible for synchronizing between the task queue and its cloud version, in addition to, invoking a specific web service to request a global plan from the ROSPlan cloud version. This component sends the overall task or the generated plan to be accomplished for the former or executed for the latter through the local ROSPlan. ROSPlan Framework. This component is a symbolic AI planner implemented on the local server to generate and execute a local plan for only one task that is in the task queue. There is a ROSPlan clone on the cloud, and it creates a global plan for many tasks simultaneously. The cloned version is an extended version and can advise the executable plan but is not responsible for the execution phase. Since the reasoning and planning for multiple tasks simultaneously are sampling-based and computational resource-intensive, they will be performed on the cloud. This mechanism will lead the robot to plan and execute the tasks in a human-like manner.
64
B. Hoteit et al.
3.3 Communication and Data Transmission ROS supports system distribution and provides communication channels either through message mode or service mode. Each system component is distributed by implementing multiple ROS nodes that represent core functions and communicate together to achieve a task. Several nodes related to different components are connected using the ROS network. When running multiple nodes simultaneously, the point-to-point connections are generated as a graph. Another mechanism is implemented to link the robot with two ROS Master devices simultaneously. The “Multimaster fkie” package is used to manage and control multiple ROS networks. We propose a preliminary architecture to transfer data indirectly between the robot and the cloud system where the edge server is the mediator, and at the same time, to transfer data directly between the local server and the cloud server, by implementing innovative web interfaces or APIs designed for synchronous and asynchronous data transfer. For instance, two cloud and local web services can be called simultaneously for communication and data exchange. The former receives messages from the edge service and publishes them on its ROS cloud system. The second subscribes to the edge ROS ecosystem and relays messages to the cloud service. The communication phase between the two web services uses a standard message exchange protocol as SOAP, and both services are designed and implemented to handle asynchronous operations to be symmetric to the flow of data within the ROS framework. 3.4 The Planning Approach Deliberation and planning techniques are done at two levels. Relying on the KB and domain model, the upper layer implements independent-domain planning and deliberates with available expected actions. The system components can also implement a dependent-domain planning. Aside from the whole ROSPlan implementation, we focus on the distribution phase of the planning approach. we present a use case scenario where a robot is requested to accomplish two tasks existed in the task queue. The first mission (M1) is to visit all the landmarks and greet the patients, while the second mission (M2) is to detect a specific patient. The system creates a plan for each problem and carries out each plan repeatedly one by one. Conversely, the task component (cloud cloning) implements a clustering algorithm that attempts to merge two or more tasks together. Figure 4 (a) shows a portion of the domain model for a social service robot that can be used in healthcare, while (b) and (c) represent the problem for the first and second task, respectively. Figure 4 (d) represents a portion of the problem file for both M1 and M2 together.
4 Guidelines The smart environment can remove some limitations from the modeling stage and facilitate robotic system functions such as tracking. The necessity of standardization has to be taken into consideration while designing robotic systems. ROS creates an abstraction layer and can act as an arbiter.
Towards a General and Complete Social Assistive Robotic Architecture
65
Web services can complement middleware and can accelerate development with a more flexible infrastructure. The Fog robotic approach benefits from a mixture of different levels of structure that can cooperate with each other to tackle almost all the robotic problems. Robot developers must balance real-time requirements, processing performance, and resource requirements for many robotic applications. The new 5G network could be a potential solution to controlling delay by efficiently reducing latency, and a large amount of data can be transferred to the cloud. Mapping tasks and the use of several grasping techniques can be performed simultaneously by relying on a 3D occupancy map. Due to limitations in GPS systems, relying on cloud-based SLAM visual processing as the FastSLAM algorithm supports building a large map, is a suitable solution. To navigate in a completely unknown environment (less map approach), a samplingbased method must be considered.
Fig. 4. (a) Fragment of the domain model. (b) Fragment of the problem for M1. (c) represent M2. (d) Fragment of the problem for both missions togethers.
5 Conclusion and Future Works This paper provided a draft version of the proposed architecture (HSRA). Edge computing supports HSRA for addressing the network latency. We assume that the fog approach will push the robot to a higher, more aware, and smarter level. HSRA is a concrete robotic architecture supported by two remote servers (local and cloud). The servers support the robotic system with intensive resources to deal with the computational loads of the system. Given the complexity of the proposed architecture, we focused on the higher
66
B. Hoteit et al.
structural level of HSRA and introduced some components taking into account the perspective of a social assistive robot. This paper examines the intensive resources required to support a robot to have human-like capabilities. Regardless of relying on the ROSPlan framework, the planning and deliberation processes are executed in an innovative way where the process is split in an appropriate manner. This paper attempts to highlight the principles that affect the performance of a robotic system depending on the software, hardware and technology used. In the future, we will focus on extending the capabilities and functions of system components and further investigate the integration and communication stage between robot and remote servers.
References 1. Akin, H.L., Birk, A., Bonarini, A., Kraetzschmar, G.: Two hot issues in cooperative robotics: network robot systems, and formal models and methods for cooperation. EURON Special Interest Group on Cooperative Robotics (2008) 2. Hu, G., Tay, W.P., Wen, Y.: Cloud robotics: architecture, challenges and applications. IEEE Netw. 26, 21–28 (2012) 3. Shi, W., Cao, J., Zhang, Q., Li, Y.: Edge computing: vision and challenges. IEEE Internet Things J. 3, 637–646 (2016) 4. Tanwani, A.K., Mor, N., Kubiatowicz, J., Gonzalez, J.E.: A fog robotics approach to deep robot learning: application to object recognition and grasp planning in surface decluttering. In: International Conference on Robotics and Automation (ICRA), pp. 4559–4566. IEEE (2019) 5. Cousins, S.: Exponential growth of ros [ros topics]. IEEE Robot. Autom. Mag. 1, 19–20 (2011) 6. Cashmore, M., Fox, M., Long, D., Magazzeni, D.: Rosplan: planning in the robot operating system. In: International Conference on Automated Planning and Scheduling, pp. 333–341. Association for the Advancement of Artificial Intelligence (2015) 7. Fox, M., Long, D.: PDDL2. 1: an extension to PDDL for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003) 8. Bekris, K., Shome, R., Krontiris, A., Dobson, A.: Cloud automation: precomputing roadmaps for flexible manipulation. IEEE Robot. Autom. Mag. 22, 41–50 (2015) 9. Riazuelo, L., Tenorth, M., Di Marco, D., Salas, M.: RoboEarth semantic mapping: a cloud enabled knowledge-based approach. IEEE Trans. Autom. Sci. Eng. 22, 432–443 (2015) 10. Liu, B., Chen, Y., Blasch, E., Pham, K.: A holistic cloud-enabled robotics system for real-time video tracking application. In: Future Information Technology, pp. 455–468. Springer (2014) 11. Wan, J., Tang, S., Yan, H., Li, D.: Cloud robotics: current status and open issues. IEEE Access 4, 2797–2807 (2016) 12. Tian, N., Kuo, B., Ren, X., Yu, M.: A cloud-based robust semaphore mirroring system for social robots. In: IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 1351–1358. IEEE (2018) 13. Tian, N., Tanwani, A.K., Chen, J., Ma, M.: A fog robotic system for dynamic visual servoing. In: International Conference on Robotics and Automation (ICRA), pp. 1982–1988. IEEE (2019) 14. Laird, J.E.: The Soar Cognitive Architecture. MIT Press (2012) 15. Gavril, A.F., Ghita, A.S., Sorici, A., Florea, A.M.: Towards a modular framework for humanrobot interaction and collaboration. In: 22nd International Conference on Control Systems and Computer Science (CSCS), pp. 667–674. IEEE (2019)
Towards a General and Complete Social Assistive Robotic Architecture
67
16. Cashmore, M., Fox, M., Long, D., Magazzeni, D.: Opportunistic planning for increased plan utility. ICAPS (2019) 17. Coles, A.J., Coles, A.I., Fox, M., Long, D.: Forward-chaining partial-order planning (2010) 18. Sanelli, V., Cashmore, M., Magazzeni, D., Iocchi, L.: Short-term human-robot interaction through conditional planning and execution. AAAI Press (2017) 19. Canal, G., Cashmore, M., Krivi´c, S., Alenyà, G., Magazzeni, D., Torras, C.: Probabilistic planning for robotics with ROSPlan. In: Annual Conference Towards Autonomous Robotic Systems, pp. 236–250. Springer (2019)
Intelligent Control of HVAC Systems in Electric Buses Martin Sommer1(B) , Carolin Junk1 , Tobias Rösch2 , and Eric Sax1 1 Institute for Information Processing Technologies, Engesserstraße 5, Karlsruhe, Germany
{ma.sommer,carolin.junk,eric.sax}@kit.edu 2 EvoBus GmbH, Hanns-Martin-Schleyer-Str. 21-57, Mannheim, Germany
[email protected]
Abstract. Battery electric buses (BEB) will increasingly replace buses with internal combustion engines in the fleets of transport companies. However, range prevents the application of BEB on all bus routes. Auxiliary consumers highly affect the range and the heating, ventilation and air conditioning (HVAC) system plays a major role within all. The high energy consumption of the HVAC system can possibly be reduced with intelligent control methods since their conventional counterparts guarantee compliance with specifications but do not consider energy consumptions. Thus, an energy-saving control is desired, which considers the minimization of energy consumption, but simultaneously complies with given specifications. To meet these requirements, following controllers were implemented: (1) model predictive control (MPC) and (2) reinforcement learning (RL) based control. This paper describes the implementation and application of both controllers on a Simulink model of a modern heat pump HVAC system and compares the results with PID control. Keywords: HVAC · Model predictive control · Reinforcement learning
1 Introduction Battery electric buses (BEB) have advantages over conventional diesel buses in terms of greenhouse gas emissions, improved air quality, efficiency and noise reduction [1]. However, when comparing BEB and diesel buses in terms of range, diesel buses still perform better. The reason for the lower range of BEB is the lower energy density of batteries compared to liquid fuels and the energy supply for auxiliary consumers out of the traction battery. The Heating, ventilation and air conditioning (HVAC) system in particular may consume up to the same amount as the traction motor in a BEB, which considerably reduces the range. Especially heating is energy consuming, since, unlike in diesel buses, no waste heat of the combustion engine can be used for heating [2]. [3] Conventional control enable compliance with set points like temperature or fresh air supply, but no further objectives are implemented such as the minimization of energy consumption [2]. This paper deals with more intelligent control strategies to optimize energy efficiency of HVAC systems in BEB. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 68–75, 2021. https://doi.org/10.1007/978-3-030-74009-2_9
Intelligent Control of HVAC Systems in Electric Buses
69
2 Background 2.1 Model Predictive Control Model predictive control is a model-based and online controller, that recurrently solves an optimization problem to perform optimal control actions in real-time. The controller uses predictions of plant outputs to set up an optimization problem. In order to predict the dynamic behavior of the plant, the controller requires an internal model, which represents an approximation of the plant. In case of an internal model (prediction model) that is set up before system operation from which follows there are no changes during run time, the MPC is called not adaptive, which was considered in this work [4]. 2.2 Reinforcement Learning Reinforcement learning (RL) aims to learn optimal control by itself. The learning entity is called agent, which is able to interact with the environment by taking actions in discrete time steps [5]. The agent selects an action of a predefined action set at each time step. This action is subsequently performed in the environment. Taking this action leads to a transition of states in the environment, which is observed by the agent (states are assumed to be fully observable here) and the agent receives a reward by the environment. The reward is calculated by a reward function that determines the quality of the action based on state observations and previously defined objective [6].
3 State of the Art Increasing demand for more energy efficiency has led to higher complexity in HVACSystems for BEB. More and more hardware components have been added to the system. Nevertheless, classical control approaches such as PID are still widespread for these components in order to control temperature, humidity and ventilation. Parameters for the PID controllers can be calculated using common tuning methods and are usually calibrated during test drives with each individual vehicle type. Some control loops are split up into cascaded PID control loops in order to achieve faster response and better performance with this cascaded control structure [7]. In a HVAC system, various components have to be controlled, like fans, valves, pumps, and the compressor. This leads to several dependent or independent control loops with no overall optimization strategy when using conventional control. However, science is already ahead of this and there are interesting approaches to take a look on. Many publications replaced the described control approaches with more sophisticated, intelligent techniques. Most of these publications deal with HVAC-Systems for residential buildings, some apply the techniques in the automotive sector, but only few in (electric) vehicles and almost none in the special case of battery electric city buses.
70
M. Sommer et al.
4 Related Work Eckstein et al. implemented MPC for the HVAC system of electric vehicles aiming to reduce energy consumption [8]. Their proposed prediction model comprises of a linearized model of the passenger compartment [8]. In order to achieve accurate predictions of a nonlinear plant, the internal plant is supposed to be nonlinear as well and hence should not be approximated by linearization. However, creating a mathematical model of nonlinear plants is complex, wherefore the usage of neural networks has become a convenient method to approximate nonlinear systems [9]. Li et al. [9] introduced neural network model predictive control (NNMPC) for controlling a central air conditioning system with single input and single output. The prediction model was built by a back propagation neural network and is supposed to predict the room temperature. The control action is the valve opening, the control variable is the room temperature. NNMPC can be applied to multiple types of control problems. Kittisupakorn et al. implemented NNMPC for the control of a steel pickling process [10]. A multi-layer feedforward network is used to approximate a chemical process with nonlinear behavior, where the controlled physical variables are three acids and the control actions are inlet flows [10]. RL has been implemented and tested for HVAC systems of commercial buildings [11]. The objective of RL control is to provide indoor temperature and humidity in comfortable ranges while minimizing the energy consumption. The system is controlled via four continuous control actions. Brusey et al. implemented and tested RL in simulation for the climate control of vehicle cabins [12]. The environment for the RL controller consisted of a single-zone car cabin model.
5 Concept and Realization The NNMPC and RL based control were implemented for a HVAC system model (plant model) that has been developed using Matlab/Simulink. This model is described briefly in the beginning of this chapter in order to give an overview of components that are included and operations that are processed in the plant model. The following Sect. 5.2 defines restrictions and the cost/reward function for the NNMPC and RL based controller. Both control implementations are explained briefly at the end of this chapter. 5.1 Description of the HVAC System Model The HVAC system model is composed of two components, which are the rooftop unit and a passenger compartment model. The rooftop unit consists of a CO2 heat pump that is used for heating and air conditioning. In the passenger compartment exchange of heat flows are modeled. Heat flows considered are: door openings, the environment, solar radiation, inner mass, passengers and heat flows from the HVAC system. Furthermore, the volume flow of fresh air is modeled that is introduced by the HVAC system.
Intelligent Control of HVAC Systems in Electric Buses
71
5.2 Cost/Reward Function and Restrictions Restrictions were defined for value ranges of the control actions. Control actions are the setting of the recirculation flap and rotational speeds of compressor, gas cooler fan and the evaporator fan. The value ranges are: 1) 2) 3) 4)
0 ≤ recirculationFlap ≤ 1 1000 ≤ nComp ≤ 1500 1000 ≤ nGCFan ≤ 1500 1000 ≤ nEvapFan ≤ 1700
respectively, where values of the recirculation flap denote 0 for fresh air supply, 1 for recirculation air and values in between for mixed air supply. Optimal control action sequences are calculated by an optimization problem. The problem consists of a cost function J that should be minimized and contains deviations of states from set points. The aim of control is to perform actions so that the plant reaches desired states, subject to the previously defined constraints. Besides, great changes of control actions should be avoided and are considered. The cost function is designed following the formula presented in [13] though it is slightly changed to the present case. In the following, yˆ denotes a vector containing predictions of states. In the vector, yˆ ref set points of states are defined. Since the cost function is set up for a specific future determined. time span, the prediction horizon N p must be It specifies the time period for which future system behavior is considered. y and u denote weighting matrices for rating deviations of plant outputs and control action changes, respectively. In time step k the cost function J is calculated as follows: Np yˆ (k + j) − yref (k + j))T · ·(ˆy(k + j) − yref (k + j)) J (k) = j=1
+ (u(k + j) − u(k))T ·
y
u
·(u(k + j) − u(k)
(1)
For j = 0, measurements of states are available, therefore yˆ (k) contains the direct measurements instead of predictions. Although the objectives of RL and NNMPC controller are the same, RL based control aims to maximize the received rewards, unlike MPC that aims to minimize the cost function. Therefore, the minimization problem of NNMPC is converted into a maximization problem for RL by applying a minus to each term and maximizing J(k). 5.3 MPC Algorithm The procedure of the MPC algorithm is shown schematically in a flowchart given in Fig. 1. Initially, the prediction horizon N p and control horizon N c must be determined. For the first function call, a counter variable i is initially set to one, and by calculating the initial action vectors for the next N c time steps, the flowchart is processed for the first time. Afterwards the algorithm of the flowchart is performed in every sampled time step. Summarizing the algorithm, a calculation of optimal control sequences is performed
72
M. Sommer et al.
every N c time steps and a cost function based on predictions is set up. Subsequently, an optimization algorithm calculates a sequence of N p optimal control actions which lead to a minimization of the cost function. The following N c time steps, already calculated control action vectors for the respective time step are passed to the plant and the counter i is updated for every control action vector. After passing N c control action vectors to the plant, a new calculation starts over again.
Fig. 1. Flowchart of the NNMPC algorithm
5.4 Reinforcement Learning Environment A predefined agent by Matlab, called Deep Deterministic Policy Gradient Agent (DDPG), was used as the algorithm for RL [14]. The plant model is represented by the HVAC system model described in Sect. 5.1. To enable interaction of agent and plant model, the structure of the environment was built according to Fig. 2. The environment is supposed to supply immediate rewards to the agent, wherefore a function for determining rewards is necessary. Rewards are calculated with respect to observations and previous actions provided by means of a unit delay block. A learning episode termination block called ’Check if done’, specifies additional stopping criteria for which a learning episode is terminated ahead of the planned episode length. The stopping criteria can either be fulfilled if states are close to set-points or if states poorly reach set points. The stopping criteria used is implemented so that learning episodes terminate if states are very far to its set points.
Intelligent Control of HVAC Systems in Electric Buses
73
Fig. 2. Structure of the RL environment
6 Evaluation The results of NNMPC and RL based control were tested against classical PID control under the same environmental conditions. All three controllers successfully regulated the states to set points and generally showed good results in terms of compliance with VDV specifications. This can be seen in Fig. 3 for the implementation of the NNMPC control. It should be noticed that the lower and upper curve shown in Fig. 3 represent the eco curve described by VDV [3]. In the case of heating, it is allowed to exceed the upper curve, whereas in cooling mode it is allowed to fall below the lower curve. The strategies of NNMPC and RL based control for reaching a set point of 24 °C were different. Especially at high ambient temperatures control actions differed. The NNMPC maximizes the rotations of the compressor, while RL based control minimizes all rotations. In terms of energy consumption, the RL based control operated more energy saving than the NNMPC. After two hours of simulation (as shown in Fig. 3) the energy consumption of the NNMPC was 10.77 kWh, which is more than the energy consumption of the RL based control at 9.52 kWh, but still less than the PID control at 11.3 kWh. To further compare the energy consumption and the comfort level of NNMPC, RL and PID control, more simulations were performed. The evaluation parameter %TCabin describes the percentage of samples that are in the desired value range determined by the VDV [3]. Both controllers were simulated for one hour at a constant ambient temperature of −10 °C, −5 °C, 0 °C and 15 °C, respectively. The results are displayed in Table 1. All controllers handle fresh air supply for the cabin very well, although colder ambient temperatures seem to make this problem harder to solve. The RL controller is the most energy saving controller under almost all ambient air temperature conditions. It is also worth noticing that the PID controller turns out to consume less energy than the MPC controller in most conditions. This was not nearly the case in the scenario displayed in Fig. 3 where passengers entered and left the bus and implemented more extreme ambient temperatures.
74
M. Sommer et al.
Fig. 3. NNMPC: Evaluation of the cabin temperature against specifications of the Association of German Transport Companies under different conditions with a set point of 24 °C
Table 1. Evaluation of NNMPC, RL and PID control in different ambient temperature scenarios (MPC ≙ NNMPC) Ambient temperature
%TCabin (%)
Electrical energy consumption (kWh)
15 °C
MPC: 100; RL: 100; PID: 100
MPC: 0.907; RL: 1.06; PID: 0.86
0 °C
MPC: 100; RL: 100; PID: 100
MPC: 4.04; RL: 3.85; PID: 4.85
−5 °C
MPC: 98.50; RL: 96.8; PID: 98.39
MPC: 5.12; RL: 4.28; PID: 4.89
−10 °C
MPC: 94.2; RL: 92.00; PID: 93.94
MPC: 4.61; RL: 4.00; PID: 4.42
7 Outlook The described control algorithms reveal great potentials in energy savings for BEB. Nevertheless, further improvements could be done. Currently the set point temperature of the passenger compartment is set to a fixed temperature of 24 °C. However, it is worth trying to dynamically calculate the set point temperature by considering the ambient temperature (outdoor temperature-controlled regulation) and the VDV specifications. To improve the accuracy of the RL based control, the number of training episodes could be increased and the design of networks could be adjusted. For both controllers the elimination of disturbances, which are in the present case door openings, would lead to an improvement of control. For future controller approaches the arrival time until the next bus stop could be calculated. The knowledge about door openings and passengers entering or exiting the bus makes it possible to precondition the passenger compartment of the BEB. Cloud-based vehicle functions [15], where entire or parts of vehicle applications run temporarily or permanently in the cloud, make these
Intelligent Control of HVAC Systems in Electric Buses
75
calculations possible. Various third-party databases, like weather forecasts and street maps with bus stops, could be used in the cloud in order to predict future behavior of the HVAC system and act foresighted. The use case of a city bus is particularly interesting here, as several buses serve the same route and HVAC related data can be provided between the buses via a cloud.
References 1. Karle, A.: Elektromobilität (2020) 2. Jefferies, D., Ly, T., Kunith, A., Göhlich, D.: Energiebedarf verschiedener Klimatisierungssysteme für Elektro-Linienbusse. DKV-Tagung Dresden, Ger. (2015) 3. VDV: Life-Cycle-Cost-optimierte Klimatisierung von Linienbussen - Teilklimatisierung Fahrgastraum - Vollklimatisierung Fahrerarbeitsplatz. Köln (2009) 4. Findeisen, R., Allgöwer, F.: An Introduction to Nonlinear Model Predictive Control (2002) 5. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. arXiv Preprint (2017) 6. Dong, H., Ding, Z., Zhang, S.: Deep Reinforcement Learning: Fundamentals, Research and Applications. Springer (2020) 7. Homod, R., Sahari, K., Mohamed, H., Nagi, F.: Hybrid PID-cascade control for HVAC system. Int. J. Syst. Control 1, 170–175 (2010) 8. Eckstein, J., Lüke, C., Brunstein, F., Friedel, P., Köhler, U., Trächtler, A.: A novel approach using model predictive control to enhance the range of electric vehicles. In: 3rd International Conference on System-Integrated Intelligence: New Challenges for Product and Production Engineering, pp. 177–184 (2016) 9. Li, S., Ren, S., Wang, X.: HVAC room temperature prediction control based on neural network model. In: Fifth International Conference on Measuring Technology and Mechatronics Automation, pp. 606–609 (2013) 10. Kittisupakorn, P., Thitiyasook, P., Hussain, M., Daosud, W.: Neural network based model predictive control for a steel pickling process. J. Process Control 19, 579–590 (2009) 11. Raman, N.S., Devraj, A.M., Barooah, P., Meyn, S.P.: Reinforcement learning for control of building HVAC systems. In: 2020 American Control Conference, pp. 2326–2332 (2020) 12. Brusey, J., Hintea, D., Gaura, E., Beloe, N.: Reinforcement learning-based thermal comfort control for vehicle cabins. Mechatronics 50, 413–421 (2018) 13. Schäfer, J., Çinar, A.: Multivariable MPC performance assessment, monitoring and diagnosis. IFAC Proc. 35, 429–434 (2002). https://doi.org/10.3182/20020721-6-ES-1901.00640 14. Mathworks: Deep deterministic policy gradient agents. https://de.mathworks.com/help/reinfo rcement-learning/ug/ddpg-agents.html. Accessed 18 Nov 2020 15. Milani, F., Beidl, C.: Cloud-based vehicle functions: motivation, use-cases and classification. In: 2018 IEEE Vehicular Networking Conference (VNC), pp. 1–4 (2018)
Optimal Prediction Using Artificial Intelligence Application Marwan Abdul Hameed Ashour1 and Iman A. H. Al-Dahhan2(B) 1 Administration and Economics College, University of Baghdad, Baghdad, Iraq 2 Continuing Education Center, University of Baghdad, Baghdad, Iraq
Abstract. Artificial neural networks (ANNs) are flexible computing frameworks and universal approximates that can be applied to a wide range of time series forecasting problems with a high degree of accuracy. However, despite all advantages cited for artificial neural networks, their performance for some real time series is not satisfactory. Improving forecasting, especially time series forecasting accuracy is an important, yet often difficult task facing forecasters. The purpose of this paper is to use artificial neural networks and traditional methods (ARIMA model) to forecasting time series, and to diagnose the best method for prediction. The research sample included data of China’s crude oil production chain for the period 1980–2015. The most important results that were reached through this paper are: The results proved that the best method for predicting time series is artificial neural networks, Wherever the error results improved by a large margin it was 99.5% . Keywords: Artificial neural networks · ARIMA model · Time series · Optimal forecasting · Minimize error
1 Introduction Despite the tremendous development in electronic computers and their technologies, traditional programming has not been able to solve some difficult dilemmas that cannot be solved using traditional methods, prompting researchers to search for smart methods that are characterized by self-learning and adaptation to any model, whether linear or non-linear. Therefore, research has taken a new path in the applications of artificial intelligence and the building of knowledge base systems. Interest is greatly increasing in the use of artificial intelligence applications in economic fields for the purpose of obtaining more accurate and reasonable results. Day after day, artificial neural networks are gaining great importance in predicting time series, which is of great economic importance for the purpose of developing appropriate plans. The research sample included the time series of Chinese oil production to forecast the next most important economy in the world. However, it remained below the ambition in certain areas. In the early seventies, research tended towards investing in the mechanisms of natural systems in living organisms, including humans, such as adaptation, deduction, learning, perception, and random convergence in the genetic system. Hence, the idea of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 76–83, 2021. https://doi.org/10.1007/978-3-030-74009-2_10
Optimal Prediction Using Artificial Intelligence Application
77
artificial neural networks to create an mathematical system that can adapt and modify by learning to find functions. The link between inputs and outputs or the conclusion of a specific decision based on thousands of possibilities and relationships as well as their ability to deal with missing information. The idea of artificial neural networks is the process of simulating data to reach a model for this data for the purpose of analysis, classification, prediction, or any other processing without the need for a proposed model for this data. Therefore, artificial neural networks have won the interest of many researchers and scientists as they have great flexibility compared to Mathematical methods used in the learning process to model data, store information and transmit it to an artificial neural network. There are several papers that dealt with the use of artificial neural networks as a method that competes with the traditional methods of time series [1–11], but what distinguishes this paper is the use of neural networks to predict the coming period to ensure the quality of economic plans that will be prepared using artificial intelligence applications. The structure of the paper is divided into three main sections: introduction, methodology (the purpose of the paper, the literature of review), the theoretical aspect, the practical aspect, results and conclusions.
2 Methodology 2.1 ARIMA Model The time series is a sequence of values sequentially observed during time. Z1 , Z2 , …, Zt , where t refers to the time period and Z refers to the value, may denote the sequence. With respect to its own history and a random disturbance term, a time series model explains a variable. In the past four decades, time series models have been commonly used for tourism demand forecasting, with the supremacy of the Box and Jenkins proposed integrated moving average (ARIMA) models. The Box-Jenkins approach, which involves the following steps, is the most common ARMA-model based forecasting method: Identification, Estimation, Diagnostic Checking [2, 3, 5, 12, 13]. 2.2 Artificial Neural Networks Artificial neural networks are titled by this name as they are a network of internal communication units, as these units are inspired by the study of vital nerve systems and are symbolized by an abbreviation ANN, which is an acronym for the beginning of its words in the English language “Artificial Neural Networks”. It is one of the applications of artificial intelligence “AI” [8, 14]. The idea of artificial neural networks is the process of simulating data to reach a model for this data for the purpose of analysis, classification, prediction, or any other processing without the need for a proposed model for this data. Therefore, artificial neural networks have won the interest of many researchers and scientists as they have great flexibility compared to Mathematical methods used in the learning process to model data, store information and transmit it to an artificial neural network. It is known as neural networks. It is a data-processing system based on simple mathematical models that has specific performance characteristics in a manner that
78
M. A. H. Ashour and I. A. H. Al-Dahhan
simulates a biological “nervous system” neural network and is considered a non-linear model. in general, we can describe any neural network that is arranged in the form of layers of artificial cells as follows: the input layer, the output layer, and the hidden layer that exists between the input and output layers. And every cell in one of these layers is connected to all the neurons in the layer that follows it and all the neurons in the layer that precedes it. Artificial neural networks are used in many fields including forecasting, engineering, medicine, science and optimization research, statistical models, risk management and finance, among others. Artificial neural network architecture is the way neurons link to each other to form a network [3, 4, 11]. 2.3 Back Propagation The Back Propagation (BP) Algorithm is one of the most popular NN algorithms. In order to update the weights of the hidden-output layer, the basic steps of the back propagation methodology (BP) are to calculate the output layer error, then to calculate the hidden layer error to update the input-hidden layer weights. The network output is then measured with the new weights in order to begin the process of measuring the error and updating the weights until the potential neural network error is reduced. The objective of the reverse error propagation technique training of the neural network is to obtain optimum weights that give the least error between the neural network output and the model data, in which these weights are used to measure predictions for new data not previously trained by the neural network. The basic steps of the back propagation methodology are to calculate the output layer error to update the hidden-output layer’s weights, then calculate the hidden layer’s error to update the input-hidden layer’s weights, and then calculate the network output with the new weights so that the mechanism continues to calculate the error and to update the weights until it reaches the aim of the reverse error propagation technique training of the neural network is to obtain optimum weights that give the least error between the neural network output and the model data (Criterion Section), in which these weights are used to measure predictions for new data not previously trained by the neural network. The algorithm can be disassembled in three steps [4, 9, 15]: • Forward stage • Back pass stage • Update weights networks Time Series and Artificial Neural Networks In processing and analyzing time series and ARIMA models for their ability to selfapproach was the basis of all the researchers’ hypotheses. By changing the time series by one degree or more, deciding the input of the artificial neural network. The approximate value of a variable in the ARIMA model is expected to be a linear combination of past values and past mistakes. Generally, it is possible to model a nonseasonal time series as a mixture of past values and errors that can be denoted as ARIMA, which is expressed in the following form [4, 5, 8, 10, 11, 13, 14]: zt+1 = f (z1 , z2 , . . . ..zt )
(1)
Optimal Prediction Using Artificial Intelligence Application
79
2.4 Optimal Accuracy The outcomes of the approaches adopted can be based on the following statistical principles for comparative purposes and diagnosis optimal method [6, 7, 12, 13, 16]. Mean squared error (MSE) ft MSE = (2) n Mean absolute percentage error (MAPE) ft zt
MAPE = ∼
n
(3)
∼
Where: ft = zt − z t , zt is value of actual, z t is value of forecast, n: is Sample size, ft is error. 2.5 Implementation Throughout this section, to measure the accuracy of the prediction methods, the time series data for china crude oil. According to the US Energy Information Administration, Table 1 displays data on the development of china crude oil for the period 1980–2015 (Fig. 1).
Fig. 1. China crude oil production series; Source United States Energy Information Administration
3 Results Modern prediction methods were adopted by ANN and conventional neural networks represented by ARIMA model to solve the Chinese oil series. The findings were as follows:
80
M. A. H. Ashour and I. A. H. Al-Dahhan
3.1 Traditional Method The best model for this series follows the ARIMA (1, 1, 0) (Fig. 2). Table 1. Results of ARIMA model Fit statistic
Mean
MSE
41369.52603
RMSE
203.395
MAPE
3.603
MaxAPE
50.03
MAE
96.385
MaxAE
1057.642
Normalized BIC 10.829
Fig. 2. Curve fitting of ARIMA model.
It is clear from the results of examining the model that it is the best and the error is significant. 3.2 Modern Method The designated network of ANN consists of three layers (input, hidden, output) compatible with the ties or weights, as the MSE error rate decides the trade-off between them.
Optimal Prediction Using Artificial Intelligence Application
81
The network input is calculated by the two “Lag1” series, so Zt−1 will be the input grid and Zt will be the output. In Table 2, the results of BP are shown (Fig. 3). Table 2. Results of ANN MSE
3.98
MAPE 0.019
Fig. 3. Curve fitting of ANN.
4 Decision and Conclusion The accuracy of the error results show that the best prediction method is artificial neural networks from ARIMA models. The error was minimized, and the error results improved by 99.5%. And the results prove that artificial neural networks are the best way to predict the chain of Chinese oil production, for the purpose of setting appropriate policies for the most important world economies. Table 3 shows the forecast results for the next ten years. The time series also showed that there is an increasing growth in Chinese crude oil production. Therefore, we recommend the use of artificial intelligence applications represented by artificial neural networks as an alternative method in time series.
82
M. A. H. Ashour and I. A. H. Al-Dahhan Table 3. Results of forecasting Period Forecast 2016
4273.556
2017
4269.427
2018
4265.315
2019
4261.218
2020
4257.137
2021
4253.071
2022
4249.021
2023
4244.987
2024
4240.968
2025
4236.964
References 1. Abbas, R.A., Jamal, A., Ashour, M.A.H., Fong, S.L.: Curve fitting predication with artificial neural networks: a comparative analysis. Period. Eng. Nat. Sci. 8(1), 125–132 (2020) 2. Tang, Z., De Almeida, C., Fishwick, P.A.: Time series forecasting using neural networks vs. Box-Jenkins methodology. Simulation 57(5), 303–310 (1991) 3. Khashei, M., Bijari, M.: An artificial neural network (p, d, q) model for timeseries forecasting. Expert Syst. Appl. 37(1), 479–489 (2010) 4. Chen, Y., Yang, B., Dong, J., Abraham, A.: Time-series forecasting using flexible neural tree model. Inf. Sci. (Ny) 174(3–4), 219–235 (2005) 5. Oancea, B., Ciucu, S.C.: ¸ Time series forecasting using neural networks. arXiv Preprint arXiv: 1401.1333 (2014) 6. Helmi, R.A.A., Jamal, A., Ashour, M.A.H.: Identifying high impact disaster triggers on computer networks in Asia pacific region. Int. J. Eng. Technol. 7(4.11), 95–98 (2018) 7. Nguyen, T., Nguyen, T., Nguyen, B.M., Nguyen, G.: Efficient time-series forecasting using neural network and opposition-based coral reefs optimization. Int. J. Comput. Intell. Syst. 12(2), 1144–1161 (2019) 8. Voyant, C., et al.: Machine learning methods for solar radiation forecasting: a review. Renew. Energy 105, 569–582 (2017) 9. Abdul, M., Ashour, H., Al-Dahhan, I.A.H., Hassan, A.K.: Forecasting by Using the Optimal Time Series Method, vol. 2. Springer (2020) 10. Ashour, M.A.H., Abbas, R.A.: Improving time series’ forecast errors by using recurrent neural networks. In: Proceedings of the 2018 7th International Conference on Software and Computer Applications, pp. 229–232 (2018) 11. Ashour, M.A.H., Jamal, A., Helmi, R.A.A.: Effectiveness of artificial neural networks in solving financial time series. Int. J. Eng. Technol. 7(4.11), 99–105 (2018) 12. Ashour, M.A.H., Al-Dahhan, I.A.H.: Turkish lira Exchange rate forecasting using time series models (2020) 13. Paterson, S.J.C.: A Comparison Between 8 Common Cost Forecasting Methods (2018) 14. Mathioulakis, E., Panaras, G., Belessiotis, V.: Artificial neural networks for the performance prediction of heat pump hot water heaters. Int. J. Sustain. Energy 37(2), 173–192 (2018)
Optimal Prediction Using Artificial Intelligence Application
83
15. Rzempoluck, E.J.: Neural Network Data Analysis Using SimulnetTM. Springer (2012) 16. Ashour, M.A.H., Al-Dahhan, I.A.H., Al-Qabily, S.M.A.: Solving game theory problems using linear programming and genetic algorithms. In: International Conference on Human Interaction and Emerging Technologies, pp. 247–252 (2019)
An Exploration of One-Shot Learning Based on Cognitive Psychology Dingzhou Fei(B) Department of Psychology, Wuhan University, Wuhan 430072, China
Abstract. The Gestalt cognition challenges interpretability for deep neural networks and still be underestimated. One-shot or transfer learning may be used to investigate the Gestalt. This paper gathers the few research in one-shot learning, where the Gestalt is partly represented as “concept activation vector (CAV)”. This study claims that the method CAV do not capture this classification uncertainty of Gestalt cognition, and thus shows how far transfer learning still be from the depths reached by human vision. Keywords: Interpretability · Deep neural networks · Gestalt · Concept activation vector (CAV) · One-shot learning · Transfer learning
1 What is the “Gestalt”? "Gestalt” is a transliteration of the German word “Gestalt”, which means “Gestalt” and “whole”. As a perception phenomenon, it was proposed at same time with Gestalt psychology [1], which was born in Germany in 1912. The Gestalt emphasized the wholeness of experience and behavior, believing that the whole is greater than the sum of its parts, consciousness is not equal to the collection of sensory elements, and behavior is not equal to the cycle of the arc of reflection. The Fig. 1 displays a set of common properties in the Gestalt phenomena. This thought not only runs through all the studies of gestalt school, but also has a great influence on the later cognitive psychology. For instance, human has the whole object preference. Children visualize a word by referring to a whole object rather than a part. And to use Cognitive Psychology to solve the “black box problem” in Deep Neural Networks, which is becoming more important as Neural Networks are applied to practical problems. And in DeepMind’s latest paper “Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study” [2], the shape preference, in which children imagine the meaning of a word based on the shape of an object, rather than color or texture has been verified by the Inception model. The whole object preference is driven by the Gestalt of human minds, and should further be discussion.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 84–88, 2021. https://doi.org/10.1007/978-3-030-74009-2_11
An Exploration of One-Shot Learning Based on Cognitive Psychology
85
Fig. 1. Gestalt’s properties: closeness, proximity, continuation, similarity, symmetry, and figure and ground. Source: Wikipedia.
2 Why is the Gestalt Phenomena Important to Computer Vision? Cognitive psychology as a study can provide inspirations to deep learning study in interpretability of models. First of all, it may be useful to clear the meanings of two terms: interpretability and explainability. Interpretability is understandable, and interpretability is interpretable. For example, if you do a chemical experiment, you have a chemical equation for reference, so that you can understand the experiment and know what kind of output the current input will produce. Interpretability means that you know the reaction process in the middle, which can be expressed in words. An explanatory model must be understandable, but an understandable model is not necessarily explanatory [3]. This duality can also be seen as the difference between interpretable models and model interpretable techniques. There are 3 kinds of systems: (1) In an opaque system, the mapping from input to output is invisible to the user; (2) For an interpretable system, a user can mathematically analyze the mapping; (3) For an understandable system, the model should output symbols or rules. Their specific output helps to understand the rationale behind the ongoing mapping. Gestalt cognition is a challenge for computer vision due to two reasons. First, it demands data labels to be confused, this is, the data has no single label which is contradicted to the assumption long hold by supervise learning. Second, the label is definitive and certain one; as demonstrated by the Fig. 2. This ambiguity has profound implications for study on the interpretability. One of the key difficulties is that most ML models operate on low-level features, such as pixel values, but this does not correspond to high-level features. It is easy to understand the concept of a high level. Therefore, if the ML is to be interpretable, the overall features of the image must be represented at a high level. Another difficulty that has been ignored by researchers so far is that even though deep neural network can express high-level concepts, it is more difficult to express the cognitive mechanism of gestalt, because gestalt is a multi-concept coexistence second, it demands algorithms behind model in deep learning to be interpretable or transparent to users. However, these challenges are
86
D. Fei
Fig. 2. The labels of this picture: two quarreling people or a table? The label ambiguity perplexes the interpretability in computer vision.
not fully responded by the communities, and they also not belong to the multi-label classification problems due to the ambiguity of labels.
3 How the Gestalt Phenomena is Represented in Deep Neural Networks? What are proper representations for the Gestalt cognition in deep neural networks? Because much attention has not paid to the Gestalt cognition by researchers on computer vision, the representation for it still be less studied than other vision phenomena. In this paper some candidates will be given for the representation of the Gestalt. (1) CNN-based representation: A Gestalt image can be a set of semantic segmentations with sub-labels, then to produce unambiguous classification results, CNN must input samples with a definite class, and CNN cannot produce representations for understanding images of Gestalt, although being with the ground truths for the images [4]. (2) Concept activation vector (CAV) -based representation: this approach has been promising one-shot learning approach but the results are rather limitative. A paper proposes a new method to explain depth model, called “testing concept activation vector (TCAV)” [5]. Different from Saliency in explaining a single example, TCAV tries to explain a concept and find the corresponding visual pattern. The overall idea of TCAV is that given a concept (such as a zebra), the user can customize a set of pictures including zebras and a set of random pictures without zebras. In order to explain a deep model, we use a linear classifier to distinguish the layer L of the deep model corresponding to the activation of the concept (positive and negative), and then find the orthogonal vector of the decision boundary is CAV. (3) Since the selection of random picture may affect the stability, this paper suggests multiple tests and then T-test to generate robust CAV (Fig. 3).
An Exploration of One-Shot Learning Based on Cognitive Psychology
87
Fig. 3. (a) A user-defined sample set and random counterexamples. (b) Labeled tested samples. (c) A trained classifier, where m is the result of the middle layer. (d) The process of obtaining CAV, the orthogonal vector on the interface of two sample sets. (e) The directional derivative is used to test the sensitivity of the model to the concept, which is essentially to see whether the interface of the model is the same as that of CAV, that is, the user provides a custom concept, and TCAV can measure the influence of this concept on the prediction. Source: [5]
This neural network seems not predicted that from two sets of positive and negative samples we obtain two tags for a Gestalt image or picture.
4 Discussion Although the convolutional neural network (CNN) model has great success in vision, even surpasses human capacity in cognition, one obstacle led by Gestalt vision still be a big problem, particularly for interpretability in deep neural networks. At present, the deep neural network obtains higher discrimination at the low cost of its black box representation interpretability. This paper try to show that transfer learning (CAV) is hard to inherit the interpretability from the Gestalt perception for images.
5 Conclusion Current interpretability research focuses on the visualization of models or the transparency of model mechanisms. From a cognitive psychological point of view, the Gestalt phenomenon means that these visualizations are not accuracy and restrictive. Cognitive psychology and deep neural networks should exchange their ideas for better understanding visions of human and computer. Acknowledgments. This work is supported by the project “Yuan Tong Artificial Intelligence” of Wuhan University (No. 2020AI002).
References 1. Woodward, W.: Gestalt psychology. In: Kaldis, B. (ed.) Encyclopedia of Philosophy and the Social Sciences, vol. 7, pp. 383–387. SAGE Publications, Thousand Oaks (2013)
88
D. Fei
2. Ritter, S., et al.: Cognitive psychology for deep neural networks: a shape bias case study. In: ICML (2018) 3. Arrieta, A.B., et al.: Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI (2020). https://www.sciencedirect.com/science/ article/pii/S1566253519308103 4. Amanatiadis, A., Kaburlasos, V., Kosmatopoulos, E.: Understanding deep convolutional networks through Gestalt theory (2018). https://arxiv.org/abs/1810.08697 5. Kim, B., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: ICML (2018)
Integrity Mechanism of Artificial Intelligence for Person’s Auto-poiesis Nicolay Vasilyev1(B) , Vladimir Gromyko2 , and Stanislav Anosov1,3 1 Fundamental Sciences, Bauman Moscow State Technical University, 2-d Bauman str.,
5, b. 1, 105005 Moscow, Russia [email protected] 2 Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskye Gory, 1-52, 119991 Moscow, Russia 3 Public Company Vozrozhdenie Bank, Luchnikov per., 7/4, b. 1, 101000 Moscow, Russia
Abstract. Coming system - informational culture supersedes industrial era and its ergonomics incapable to find time for dynamics of professional requirements sophistication. Inter-disciplinary activity in computer networks leans on education universality necessary for meanings understanding. Subject’s thinkingconsciousness exists in strain of discoordination with up-to-date rational noosphere not being trained relatively trans-fundamental speculation ability. Protophenomenon of man is endowed by spontaneous integrity and viability allowing adaptation to needed transcendental self-developing processes. Fundamental evolvement of mentality can be achieved in anthropogenic info-sphere of cypher galaxy only in partnership with deep-learned artificial intelligence (DL IA ). To accomplish rational auto-poiesis, image of integrity mechanism is to be applied by DL IA to untwist ergo-mind subjective objectization up to functional systems of ego-mind. Cogno-ontological knowledge base and system axiomatic method are to be used. The technology displays innate universal mathematical hardware for ergo-mind speculation. Keywords: Thinking - consciousness integrity · Deep-learned artificial intelligence · System axiomatic method · Language of categories · Cogno-ontological knowledge base · Universal tutoring · Auto-poiesis
1 Introduction Informatics intrusion in our life created problems of man’s consistency during interdisciplinary activity. It requires maintaining trans-disciplinary existence in system informational culture (SIC). Transcendental apperception and speculation are based on thinking - consciousness integrity (mentality). It requires knowledge understanding and its personal objectization resulting in subject’s rational auto-poiesis [1]. Educational problem has achieved post-neo-classical acuteness to ensure SIC subjects’ transcendental development [2]. Comprehension of philogenetic knowledge structure is needed. It can do only universal tutoring by deep-learned artificial intelligence (DL IA ). Only with its help modern cognitive crisis will be overcome by means of constitutive adaptive molding © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 89–95, 2021. https://doi.org/10.1007/978-3-030-74009-2_12
90
N. Vasilyev et al.
of thinking – apperception while studying professional synthetic educational space in the process of inter-disciplinary activity [2]. First of all, needed rational auto-poiesis happens leaning on obvious natural-scientific (KNS ) and super natural mathematical knowledge (SKNS ). Humanities (KH ) have formed intentional man with consciousness for perception that allows scientific vision and even discovery prevision. They present knowledge in the encyclopedias form that is insufficient for SIC requiring knowledge understanding. Formed by KNS theories highlighted big data networks conserved now in Internet files. Due to SKNS gender achievements is to be implanted in everybody’s semantic consciousness giving to the theories adequate interpretations that reflect human mentality. Axiomatic method (AM) application to human presentations allowed formalized meanings discovering. Phenomenology of AM is result of mind “midwifery”. This functional system speculates on anthropogenic nature on the constitutional base of intuition – discursion. The latter maintains human mentality. Spontaneous integrity mechanism of mind is to be used to evolve one’s rational auto-poiesis that is natural for SIC. Gender self-modification will occur under compulsory universal training and attendant semantic glottogenesis. Linguistic tools are to unite real and ideal presentations to understand SIC meanings. In this mind evolvement to accomplish it is impossible to do without adaptive neuro-partnership with and universal tutoring (TU ) by DL IA [3]. Only trained mind is prepared to labor with theories and systems properties so as underlying rational meanings are expressed in the form of mathematical abstractions. Interface between natural intelligence (IN ) and DL IA is to be built in mathesis universalis of language of categories (LC ) that TU leans on. Thinking - consciousness is put forward by semantic communication. So, rational modification of mind happens in the process of bio – social – cultural – spiritual evolution supported by thinking – consciousness integrity [8–10]. Knowledge wholesome structure is to be understood to achieve personal objectization [2–7, 9].
2 Second Consciousness Auto-building and Man’s Integrity Neurophysiology is base of high levels of nervous system activity, see Fig. 1:
Fig. 1. The highest level of nervous system.
Ergo-mind (eidos [9]) consists of networks structures of idea-images. They are prelinguistic state of thought – the reason of intuition. Born in profound part of ergo,
Integrity Mechanism of Artificial Intelligence for Person’s Auto-poiesis
91
meanings primaries are implanted in life by linguistic tools of human ego-soul. It is due to the man’s integrity uniting thinking - consciousness processes and maintaining symbiosis of real and ideal presentations, see Fig. 1. On the basis rational auto-poiesis ([1]) can be guaranteed by noosphere of artificial intelligence [3–5], see Fig. 2:
Fig. 2. Hermeneutic circle of noosphere: K S - scientific knowledge language.
Arising uncertainty of knowledge description and lack of understanding are partially compensated with intuitive forms of thinking [10]. To overcome the difficulties the help of KNS theories and AM application are to be used. On the highest level of knowledge modelling by means of system axiomatic method (AMS ) and description in LC underlying meanings are clearly displayed in the form of universal mathematical properties of the whole systems. The universalities understanding and descriptive tools (strict and constructive) strengthen ego-soul and modify ergo-mind with the help of subject’s inclusion in hermeneutic circle of SIC to grow meanings in it, see Fig. 2. Understanding is based on SIC spirit presented in needed system form. Comprehension process is to be assisted by DL IA leaning on cogno-ontological knowledge base (CogOnt) usage reflecting human integrity [2–4]. So, cognogenesis achieved philogenetic obviousness in natural sciences and is to be used for gender man-made rational evolution. Glottogenesis always supported knowledge generalization to resolve arising scientific problems and achieve corresponding unity of theories. Super natural mathematical form SKNS is the tool to explain and squeeze semantically human presentations resulting in knowledge self-obviousness, see Fig. 3. So, in the process of self-development subject leans on KNS obviousness to transit from it to SKNS self-obviousness attendant understanding. Meanings inheritance evolve glottogenesis and person’s transcendation as consciousness folding [7]. It worth mentioning that presentation on intuition-image corresponds to usage of “mentalis” – proto-language of thought [10, 11]. Its “existence” was corroborated by neuroscience [12, 13]. Identified universalities image can be expressed in functional language LC . It answers to their ontological co-image formation [14]. The latter admits speculation on structure and significance of theories. Semantic means of
92
N. Vasilyev et al.
Fig. 3. Ergo-ego-cognition processes: SsKN – super sensual knowledge. S
LC allow comparing systems by means of properties and not their construction with the help of elements [13]. AMS application helps to study primaries of thinking by means of concise mathematical tools, see Fig. 1, 2 and 3. Meta mathematical investigations and AMS explain knowledge origins and structure [15–17]. Leaning on without premises principle AM allows discovering theories potential [9]. So as intuitive idea-images of ergo-mind are transformed in definite proto-image (or knowledge co-image) described in LC , ergo-ego mind becomes able to support semantic communication in SIC ergonomics [5, 6].
3 Axiomatic Method as Speculation on Knowledge Integrity Modern axiomatic method (AMM ) studies algebraic systems using elements base of universal constructions. Language of categories describes universal properties of different systems to compare them on the level of AMS . Meta-mathematical means develop and support rational constituent of human integrity. To assist rational auto-poiesis DL IA can apply adaptive personal form of CogOnt [4, 7]. The partnership is necessary to keep balance between real and ideal presentations of a person to transform gradually one into another and vice versa, see Fig. 1 and 2. CogOnt is to be filled up by concepts in all variety of their interconnections. For instance, algebra enables mind with operating reality. Grandeur of a number consists of it being the tool to coordinate space. It is starting point for consequent generalizations resulting in attendant generalizations and mathematical glottogenesis. Wholesome view on number allows establishing correspondence among different protoimages, see Fig. 1, 2 and 3. Idea-image of number was initially displayed in geometry. At first, space-time was initially regarded on as part of physics used in measuring practice (Newton). Afterwards, primary properties of geometry were studied by AM and geometry became part of mathematics (Hilbert) [15, 17]. System of geometrical axioms helped to single out its proto-image with the meaning of numerical field. Constructive possibilities of geometrical tools [14, 15] are now replaced by more general ideal algebraic transformations complemented by abstract types of data introduced in informatics [16]. Space coordinatization by number is insufficient. Modern AM descripts geometries with the help of their symmetries groups. AMS promoted points and lines description by the symmetries themselves [17]. So, emerged numeric fields correspond to relations among points and lines in geometry and properties of applied constructive tools.
Integrity Mechanism of Artificial Intelligence for Person’s Auto-poiesis
93
Example 1. Inclusion as universal generative property of complex systems. Geometrical constructions in affine geometry caused necessity to extend field of rational numbers Q. If coordinates of beforehand given initial figure α ∈ / Q then corresponding extension is Q(α). Axioms of congruential segments and angles of Euclid’s geometry defines 2 2 field Q α + β . In Pythagoras’ geometry axioms of compasses and a ruler allow √ another extended field Q α . The practice engendered the synthetic view on the theory of fields and methods of their building. All finite dimensional extensions L ⊂ k of a field k are algebraic ones. Let Autk (L) be the group of auto-morphisms of the field L over the field k: ϕ ∈ Autk (L), ϕ : L L, ϕ|k = 1. Galois theory describes anti-isomorphism of the structures [18, 19]: Pk ≡ {L, ⊂} ↔ Autk ≡ {Autk (L), ⊂) .
(1)
To extend a field L universal concept of free object PL (x) ∈ Ring is to be used. It is ring of polynomials over field L. Any finite dimensional extension L ⊂ L(α) can be obtained with the help of a root α of an irreducible polynomial m(x) ∈ PL (x) over field L. Precisely, field L(α) is factor-object L(α) PL (x)/(m(x)) where (m(x)) is maximum ideal in the ring PL (x). This is universal construction [13] that can be applied to other rings factorization. For instance, finite fields G(p) Z/(p) of characteristic p are built by the method. Here, an integer numbers p = 2, 3, 5, ... is prime. Their algebraic extensions are G(pn ) PG (x)/(m(x)), n = 1, 2, ...(deg m = n). Anti-isomorphism (1) exists due to the functor : Field → Ring, : L → L(x), (ϕ) = ψ, ϕ : L L , ψ : L(x) L (x). ˜ answers to normal extension Moreover, each normal sub-group Autk (L) ⊂ Autk (L) L˜ of the field L. Normal extension L˜ = L(α1 , ..., αs ) is such that includes all roots αi : m(αi ) = 0, i = 1, 2, . . . , s, of an irreducible polynomial m(x). So, fields algebraic extensions arise under solving geometrical tasks on construction which are equivalent to corresponding algebraic equations solution in extended field. Their insolvability displays insufficiency of geometrical tools. So, in Pythagoras’ geometry trisection of an arbitrary angle cannot be resolved. Answering to the problem equation 4x3 − 3x − α = 0, α = cos 3ϕ, requires rational field extension Q α, 3 −(1 − α2 . It can be done with the help of hyperbola with eccentricity e = 2 as additional tool. There exists also algebraic closure kˆ of any field k presented by universal property of co-limit construction [13, 18, 19].
Any algebraic equation has roots in the field of all algebraic numbers Q. As new Euclid, fields replace all thinkable geometrical tools.
94
N. Vasilyev et al.
Besides, it can be constructed field k which is topological closure of k. Archimedes ordered total field of real numbers R = Q is categorical [14, 16]. It is continuous one. Its minimum algebraic extensions are also categorical. They are fields of complex numbers C = R(i) R(x)/(x2 + 1) and rational functions R(x) over R correspondingly. The latter is a field of partials similar to the field Q. System of partials k = {r/s}, s = 0, is also universal method to construct a field starting from commutative area of integrity A, A ⊂ k. In its turn, k can be extended algebraically to obtain k(x). Fields kT (x, y), k (x, y, z) of algebraic curves T and surfaces can be considered as the following generalizations. Field kT consists of rational functions r : T → k of two variables. If T is a curve of the second order then there is rationalizing isomorphism kT (x, y) k(t) [19].
4 Conclusions It is contributed rational evolutional approach to thinking – consciousness auto-poiesis expanding humanitarian one. AMS allows discovering universal proto-images of knowledge to assist a person to achieve subjective objectization in rational electronic galaxy. Strengthening of ergo-ego mind is necessary for inter-disciplinary activity in SIC. Protophenomenon of man is adapted to transcendental self-development. DL IA is to assist a subject to auto (self) – build his semantic consciousness by means of universal tutoring. Life humanization in SIC attendant one’s natural development is achieved with the help of DL IA supporting AMS application and LC glottogenesis. The technology of life-long partnership with DL IA is to be applied that is based on human mentality and leaning on semantic integrity of CogOnt.
References 1. Maturana, H., Varela, F.: The Tree of Knowledge: The Biological Roots of Human Understanding. Progress-Tradition, Moscow (2001) 2. Vasilyev, N.S., Gromyko, V.I., Anosov, S.S.: On inverse problem of artificial intelligence in system-informational culture. In: Human Systems Engineering and Design. Advances in Intelligent Systems and Computing, vol. 876, pp. 627–633 (2019) 3. Gromyko, V.I., Kazaryan, V.P., Vasilyev, N.S., Simakin, A.G., Anosov, S.S.: Artificial intelligence as tutoring partner for human intellect. In: Advances in Artificial Systems for Medicine and Education. Advances in Intelligent Systems and Computing, vol. 658, pp. 238–247 (2018) 4. Vasilyev, N.S., Gromyko, V.I., Anosov, S.S.: Emerging technology of man’s life-long partnership with artificial intelligence. In: Proceedings of Future Technologies Conference (FTC). Advances in Intelligent Systems and Computing, vol. 1069, pp. 230–238 (2019) 5. Vasilyev, N.S., Gromyko, V.I., Anosov, S.S.: Deep-learned artificial intelligence and systeminformational culture ergonomics. In: Proceedings of AHFE Conference. Advances in Intelligent Systems and Computing, vol. 965, pp. 142–153 (2020) 6. Vasilyev, N.S., Gromyko, V.I., Anosov, S.S.: Deep-learned artificial intelligence for semantic communication and data co-processing. In: Proceedings of Future of Communication Conference. Advances in Intelligent Systems and Computing, vol. 1130, p. 2, pp. 916–926 (2020)
Integrity Mechanism of Artificial Intelligence for Person’s Auto-poiesis
95
7. Vasilyev, N.S., Gromyko, V.I., Anosov, S.S.: Artificial intelligence as answer to cognitive revolution challenges. In: Proceedings of IHIET Conference. Advances in Intelligent Systems and Computing, vol. 1152, pp. 161–167 (2020) 8. Popper, K.: Objective knowledge. Evolutional approach. URSS, Moscow (2002) 9. Husserl, A.: From idea to pure phenomenology and phenomenological philosophy. General introduction in pure phenomenology, Book 1, Acad. Project, Moscow (2009) 10. Dennett, D.: An introduction to intuition pumps and other thinking tools, OOO “Ast”, Moscow (2019) 11. Fodor, J.: The Mind Doesn’t Work that Way: The Scope and Limits of Computational Psychology. MIT Press, Cambridge (2000) 12. Gromov, M.: Circle of mysteries: universe, mathematics, thinking. Moscow Center for Continuous Mathematical Education, Moscow (2017) 13. McLane, S.: Categories for working mathematician. Phys. Math. Edn., Moscow (2004) 14. Hilbert, D.: Grounds of geometry. Tech.-Teor. Lit., Moscow-Leningrad (1948) 15. Euclid: Elements. GosTechIzd, Moscow-Leningrad (1949–1951) 16. Engeler, E.: Metamathematik der Elementarmathematik. Springer, New York (1983) 17. Bachman, F.: Geometry Construction on the Base of Symmetry Notion. Nauka, Moscow (1969) 18. Skorniakov, L.A.: Elements of General Algebra. Nauka, Moscow (1983) 19. Shafarevich, I.R.: Main Notions of Algebra. Regular and Chaos Dynamics, Izhevsk (2001)
Augmented, Virtual and Mixed Reality Simulation
A Technological Framework for Rapid Prototyping of X-reality Applications for Interactive 3D Spaces Emmanouil Zidianakis1(B) , Antonis Chatziantoniou1 , Antonis Dimopoulos1 , George Galanakis1 , Andreas Michelakis1 , Vanesa Neroutsou1 , Stavroula Ntoa1 , Spiros Paparoulis1 , Margherita Antona1 , and Constantine Stephanidis1,2 1 Institute of Computer Science, Foundation for Research and Technology - Hellas,
Heraklion, Greece {zidian,hatjiant,dimopoulos,ggalan,michelakis,vaner,stant, spirosp,antona,cs}@ics.forth.gr 2 University of Crete, Heraklion, Greece
Abstract. Room-sized immersive environments and interactive 3D spaces can support powerful visualizations and provide remarkable X-reality experiences to users. However, designing and developing applications for such spaces, in which user interaction takes place not only by gestures, but also through body movements is a demanding task. At the same time, contemporary software development methods and human-centered design mandate short iteration cycles and incremental development of prototypes. In this context, traditional design and software prototyping methods can no longer cope up with the challenges imposed by such environments. In this paper, we introduce an integrated technological framework for rapid prototyping of X-reality applications for interactive 3D spaces, featuring real-time person and object tracking, touch input support and spatial sound output. The framework comprises the interactive 3D space, and an API for developers. Keywords: Mixed reality · Extended reality · Large display interfaces · Multi-display environments · Interactive 3D spaces
1 Introduction Building applications for immersive environments with wall-sized, high-resolution displays can be both a daunting and rewarding task. Immersive virtual reality environments have been around for a long time supporting a variety of modalities (motion tracking, touch, voice). Providing users with multiple ways of interaction inside such spaces can raise the level of immersion to extremely high levels that traditional applications could never reach. However, there is little effort reported in literature in making this task simple and straightforward. At the same time, large-scale wall projections have also been presented as a means to increase productivity and immersion in many cases, having however the intrinsic limitation of utilizing only the two dimensions of our three-dimensional physical world. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 99–106, 2021. https://doi.org/10.1007/978-3-030-74009-2_13
100
E. Zidianakis et al.
Motivated by the compelling nature of designing for immersive multimodal environments, this paper presents a technological framework aiming to facilitate designers and developers to quickly prototype or even create full-blown immersive multimodal applications, without the need to worry about implementing every specific component of their end product. The proposed framework features a large 3D space, consisting of three touch-enabled walls, supports people tracking and body-based interactions.
Fig. 1: Architectural design of the interactive 3D space (left)/Actual setup (right)
2 Related Work X-reality or extended reality (XR) technologies is a term that has been refined over the last two decades [14, 19] and has come to include the broad spectrum of technologies that aim in creating immersive experiences [11]. Those technologies include Augmented Reality (AR), Augmented Virtuality (AV), Virtual Reality (VR) and Mixed Reality (MR). Since their introduction, XR technologies have been used in a variety of application areas maintaining specific constants/aspects [1, 2, 6] such as tracking/registration, virtual environment modelling, etc. The details of those aspects define the immersive experience and set the boundary between reality and virtuality. Several interactive systems have been created with the aim of combining XR technologies with physical spaces and infrastructure to provide users with an immersive experience. Those systems are referred to as immersive environments characterized by Large High-Resolution Displays (LHRD) that surround users, captivating their visual field and thus creating the feeling of immersion. The first immersive environment system, CAVE, was introduced in the early 90’s [7, 8]. Since then, a significant number of immersive environments have been developed, such as X-rooms [13], CAVE2 [12], Reality Deck [18], CEL [10], and Dataspace [3]. Immersive environments are constructed cuboid or spherical spaces, usually with an approximate diameter of seven meters, combining multiple projectors or screens to offer high resolution video output, ranging from several tens of megapixels [12] to even a gigapixel [18]. Surround sound is usually offered via speaker arrays to contribute to the feeling of immersion. Graphics rendering in early implementations was made in stereoscopic 3D [7, 13], while recent approaches use only 2D [10, 18] rendering or a hybrid rendering of 2D or 3D [3, 12]. The standard interaction modality is the 3D wand that is tracked using computer vision techniques and allows users to navigate
A Technological Framework for Rapid Prototyping of X-reality Applications
101
and interact with the rendered 3D world [7, 10, 12]; although, recent implementations have introduced touch, voice and the ability to connect some external input devices e.g. VR joysticks [3, 10]. User motion tracking is also employed, either using markers or the OptiTrack technology [7, 12, 18], or markerless with camera sensors as the Vive or Kinect [3, 10]. Clusters of high-end computing units are used to render interactive graphics in such high resolutions. Since developing an application that would use a computer cluster to render graphics in multiple LHRDs is not an easy task, software frameworks have been developed and are commercially available. According to a recent taxonomy [5], these frameworks can be organized into the following categories: a) Transparent frameworks, which act as a middleware, intercepting the rendering process at specific stages and streaming results to specialized renderers across the cluster-based display. Drawbacks of these frameworks include lack of optimization and the possibility of disconnecting certain expected application behavior. The most popular framework of this category is SAGE2 [15] that is currently being used by CAVE2 implementations [12]; b) Distributed Scene Graph frameworks, such as OpenSG [20], which use scene graphs to abstract management of 3D models and scenes. Performance is significantly increased for applications since each computing node holds a copy of the scene graph list and only receives changes to that list; c) Interactive Application frameworks, supporting the development of applications that make use of novel interaction techniques and modalities. They provide an integrated environment for input management and configuration, and the ability to create custom event handlers for data collection from various input devices. An indicative example is CaveLib [4], where developers must use a dedicated API to create an application that runs as a multithreaded application at the cluster; d) Scalable Rendering frameworks, which allow developers to have precise control of the parallel rendering algorithms and the distributed hardware resources rather than abstracting them. Frameworks of this type are specifically designed for high-performance visualization applications.
3 Interactive 3D Space 3.1 Architectural Design and Construction The architectural design process focused on creating a fairly large space to accommodate the respective needs such as a) robustness, to support a wide variety of use cases, such as applications addressing children where high endurance and fault tolerance are required, b) low cost, built with materials that can be manipulated and replaced quite easily, c) portable, meaning that it could be easily re-assembled in any other setting provided the appropriate space. The main body of the construction consists of three panels made of plywood, 5 × 3 m each, arranged in a 3-sided box as depicted in Fig. 1. The walls are supported by a metal frame screwed both on the ceiling and the floor, making the whole structure extremely robust. 3.2 Hardware Setup Interactive Projectors: A total of six projectors EPSON EB-696Ui are used for the whole setup, hanging from the ceiling, at 2.80 m from the floor. Each pair of projectors
102
E. Zidianakis et al.
spans the total length of each wall and the majority of its height (~1.60 m) at a resolution of 3840 × 1200 creating an ultra-wide (11520 × 1200 px) display for fully immersive visual outputs. Each projector is equipped with a touch sensor unit that projects a thin infrared layer on top of each panel enabling touch interaction on the walls. Moreover, stereo speakers are integrated in each projector to allow audio playback. Computing Units: A set of two computing units act as the processing units of the interactive 3D space. The first unit, purposed to serve very demanding 3D applications, consists of three computers equipped with an Intel Core i7 processor, 16 GBs of RAM and an NVIDIA GTX 1080 Ti GPU. Each computer is connected to a pair of projectors. The second unit is a computer equipped with an Intel Core i7 processor, 16 GBs of RAM and an NVIDIA NVS 810 GPU capable of driving 8 WUXGA video outputs. Every projector is connected to this computer too. Scenarios that do not demand highend GPU are served through this computing unit, while all the background services of the framework are also run here. RGB-D Camera Network: The setup uses four Kinect v1 RGB-D sensors that cover a wide area, allowing the detection of up to five people. The motion sensors are connected to two computers (2 per computer). The choice for the Kinect v1 sensors (instead of Kinect v2) was made for a number of reasons: (i) Kinect v1 sensors serve less resolution and in turn detection capabilities (ii) combining two of those sensors into a single PC results in capturing almost half of the required space, and (iii) Kinect v1 can be processed with only CPU; therefore, a setup consisting of only 4 sensors and 2 low cost computers is adequate for serving even the most highly demanding applications. Network Infrastructure: A 24-port unmanaged network switch is installed and configured to create a local area network where each device (projectors, sensors, and computers) is identified by its own assigned network address.
4 Technological Framework The technological framework for rapid prototyping of X-reality applications for interactive 3D spaces consists of two parts. The physical construction including the supplied hardware (projectors, motion sensors, computers, network devices) and the software framework that supports building applications upon it (Fig. 2). The latter is provided as a Unity3D Package, exposing an API that the deployed applications on the computing units can use to communicate with the desired hardware components. The main components of the proposed framework are: • Touch Interaction. Instead of just being passive displays, the walls of the physical construction can accept touch inputs. Every projector hanging above each panel has a touch sensor unit positioned below it (Fig. 1). However, getting touch information from each panel is not helpful on its own. The application framework package handles touch, using the LASIMUP approach [9], whereby a dedicated service listens for touch input and transforms it to the corresponding native touch events, making essentially the array of walls to act as an enormous touch screen.
A Technological Framework for Rapid Prototyping of X-reality Applications
103
Fig. 2: High level architecture of the proposed technological framework
• Spatial Sound. To enhance the auditory experience of applications, the software package provides an easy way to output sound with respect to positional context. Utilizing the ACOUSMA platform [16], all available room audio output devices can be controlled over the network to provide spatial auditory feedback. The six projectors’ stereo speakers are used to direct sound at any position of the three-dimensional X-reality space. Furthermore, artifacts or devices with audio capabilities can dynamically connect to ACOUSMA and be used as a sound source within the room. This extensibility, allows for developers to provide diverse auditory experiences that can precisely simulate any virtual, augmented or mixed reality acoustic environment. • People Tracking. OpenPTrack [17], an open source, scalable, multi-camera solution for person tracking was employed for the needs of our setup. The OpenPTrack’s detection node processes the input from one RGB-D camera and suggests person detections. All detections required for the 4x RGB-D installed cameras, are fused into a single tracking node, which associates the detections to unique tracks. The latter is also responsible for publishing tracking information based on JSON messages. The calculated performance of 30 FPS for real-time multi-person tracking (up to 5) is more than appropriate for ordinary use cases, i.e. people walking. However, a known issue of long-range depth sensors is that depth accuracy degrades near their limit. Error in depth eventually affects tracking, e.g. tracker may assign two tracks to a single person. This issue can be alleviated by more careful calibration and/or detection and tracking parameters tuning. Another issue is that proper placement of the sensors, especially tilting is very important and affects detection. An explanation is that the classifier (human vs not) is not trained to detect persons if imaged in top view setups. Placing more sensors in order to better capture the place would be an option, however such a setup is costlier and more importantly vulnerable to sensor interference due to multiple overlapping IR emitters that would further degrade depth quality. • Synchronization Layer. In order to achieve maximum performance, XR applications should be able to run simultaneously on the available computers, which will drive the projectors with the corresponding view/part of the virtual space. The framework provides a synchronization layer that abstracts this task. The virtual scene is automatically divided into three separate scenes and an appropriate service synchronizes the frames that need to be displayed on each of the three walls. In short, each wall displays one third of the application view, while a synchronization mechanism makes sure that the final frames appear as a smooth linear image. The synchronization layer is responsible for synchronizing scene’s objects across all running instances. Developers are responsible for providing this information to the layer and then the synchronization mechanism makes sure they are smoothly synchronized on all displays.
104
E. Zidianakis et al.
• Unity 3D Package. In order to allow developers to focus on the actual development of their applications and not have to go through the configuration and setup process of aforementioned technological components, we have implemented a Unity Package that simplifies the process. This Unity Package is responsible for handling the communication between applications and the Framework’s libraries. Developers have to register the corresponding listeners for each of the libraries they want to integrate into their applications in order to enable their functionality. • Support Tools. A list of various tools is provided to developers, such as the Projectors’ remote control service which exposes an API enabling the control of any projector in the room, the Click & Go which is responsible to keep the employed services or applications up and running ensuring that critical processes do not become unavailable, the Deployer which distributes an application to the separate computers and applies the settings required by the Synchronization Layer, and the People Simulator which emulates the People Tracking service thus supporting the prototyping of applications remotely (developers can test their code while not being in the room).
5 Use Cases Currently, the framework is equipped with three demonstrators, to exhibit how it can be used. More specifically, the demonstrators that have been developed were selected so as to be purposefully demanding in terms of hardware resources required, and pertain to the visualization of our solar system, an underwater exploration simulation, and a game that simulates the restoration of a lake.
Fig. 3: Planetarium (left)/Reef (right)
In particular, Planetarium (Fig. 3 left) is an application that represents our solar system. The sun displayed on the central wall, while the planets are orbiting around it and can move from one wall to the other, according to their relative position on the orbiting trajectory, thus giving the feeling that users are found within the solar system in between the planets. The application supports touch-based interaction. Reef (Fig. 3 right) presents an underwater environment with fish, corals and underwater plants on a reef, supporting touch-based and body-based interactions, making the environment responsive to the user’s location. For example, if a user moves closer to a wall the fishes that are schooling nearby may go away or come closer depending on their curiosity attribute. By touching the wall, users can throw food to the fishes, which will run to it if it is in their proximity range.
A Technological Framework for Rapid Prototyping of X-reality Applications
105
Finally, Stymfalia (Fig. 1 right) is a virtual environment simulating a lake, while user can direct an amphibian reed cutter machine, illustrated at the bottom of the display, via augmented controls (gear, steering wheel), so as to clean the lake from large areas of reed-beds. The demonstrator aims to exhibit an additional interaction modality that of augmented controls, as well as the framework’s synchronization layer.
6 Performance Evaluation To assess the rendering limits of the hardware, we created a scene in Unity at Ultra Quality settings i.e. full resolution textures (2048 × 2048 px), 2x multi sampling, realtime global illumination, hard and soft shadows with a shadow distance of 150 and four shadow cascades. We tested the rendering performance at numerous values of triangles. Rendering 11.4 million triangles in this scene run smoothly at stable 60 frames per second. An increase to 174M triangles dropped the frame-rate to 26 FPS, which is an impressive result considering a workload of more than 15 times higher. Rendering 234 to 393 million triangles resulted in around 15 frames per second. These numbers show that the hardware supporting the framework is capable of extremely high performance allowing for the development of highly immersive applications. Thus, developers are able to use their imagination freely in order to achieve high quality results without the need to waste time in performance optimizations.
7 Conclusion and Future Work This work presented a technological framework for rapid prototyping of X-reality applications for interactive 3D spaces, featuring real-time person and object tracking, touch input support and spatial sound output. A software framework was developed in the form of a Unity3D software package allowing developers to create innovative applications without much effort thanks to meaningful abstractions that present communication channels with the equipment. The proposed framework has already been used for the development of numerous use cases, and has been evaluated with regard to its performance, exhibiting remarkable results. Future research should be devoted to the development of additional interaction modalities, such as voice recognition, which would make the system aware and responsive to surrounding sounds. Next steps will also focus on providing developers with support for more development platforms. The advancement of the tracking subsystem by supporting pose and gesture recognition will enhance the level of interaction using body posture information. Last, experimentation with modern sensors and the newer version of OpenPTrack with skeleton tracking capabilities is in our future plans. Acknowledgments. This work has been supported by the FORTH-ICS internal RTD Programme ‘Ambient Intelligence and Smart Environments’. The authors would like to thank Konstantinos Tsirakos for his contribution in the development of the Projectors’ control service.
106
E. Zidianakis et al.
References 1. Bekele, M.K., et al.: A survey of augmented, virtual, and mixed reality for cultural heritage. J. Comput. Cultural Heritage (JOCCH) 11(2), 1–36 (2018) 2. Billinghurst, M., Clark, A., Lee, G.: A survey of augmented reality. Found. Trends® Hum.– Comput. Interaction 8(2–3), 73–272 (2015). https://doi.org/10.1561/1100000049 3. Cavallo, M., Dholakia, M., Havlena, M., Ocheltree, K., Podlaseck, M.: Dataspace: a reconfigurable hybrid reality environment for collaborative information analysis. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 145–153. IEEE (2019) 4. CAVELib. https://www.mechdyne.com/software.aspx?name=CAVELib. Accessed 19 June 2020 5. Chung, H., Andrews, C., North, C.: A survey of software frameworks for cluster-based large high-resolution displays. IEEE Trans. Vis. Comput. Graph. 20(8), 1158–1177 (2013) 6. Costanza, E., Kunz, A., Fjeld, M.: Mixed reality: a survey. In: Human Machine Interaction, pp. 47–68. Springer, Heidelberg (2009) 7. Cruz-Neira, C., Leigh, J., et al.: Scientists in wonderland: a report on visualization applications in the CAVE virtual reality environment. In: Proceedings of 1993 IEEE Research Properties in Virtual Reality Symposium, pp. 59–66. IEEE (1993) 8. Cruz-Neira, C., Sandin, D.J., DeFanti, T.A.: Surround-screen projection-based virtual reality: the design and implementation of the CAVE. In: Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, pp. 135–142 (1993) 9. Dimopoulos, A., Zidianakis, E., Stephanidis, C.: LASIMUP: large scale multi-touch support integration across multiple projections on arbitrary surfaces. In: Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference, pp. 62–66 (2018) 10. Farrell, R.G., Lenchner, J., et al.: Symbiotic cognitive computing. AI Mag. 37(3), 81–93 (2016) 11. Fast-Berglund, Å., Gong, L., Li, D.: Testing and validating Extended Reality (xR) technologies in manufacturing. Procedia Manuf. 25, 31–38 (2018) 12. Febretti, A., Nishimoto, A., et al.: CAVE2: a hybrid reality environment for immersive simulation and information analysis. In: The Engineering Reality of Virtual Reality 2013, vol. 8649, p. 864903. International Society for Optics and Photonics (2013) 13. Isakovic, K., Dudziak, T., Köchy, K.: X-rooms. In: Proceedings of the Seventh International Conference on 3D Web Technology, pp. 173–177 (2002) 14. Mann, S., Furness, T., Yuan, Y., Iorio, J., Wang, Z.: All reality: virtual, augmented, mixed (x), mediated (x, y), and multimediated reality. arXiv preprint arXiv:1804.08386 (2018) 15. Marrinan, T., Aurisano, J., et al.: SAGE2: a new approach for data intensive collaboration using Scalable Resolution Shared Displays. In: 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 177–186. IEEE (2014) 16. Michelakis, A.: ACOUSMA: An intelligent mechanism towards providing personalized auditory feedback in ambient intelligence environments (Master’s thesis) (2020). https://elocus. lib.uoc.gr/dlib/c/1/f/metadata-dlib-1593085755-225190-31302.tkl 17. Munaro, M., Basso, F., Menegatti, E.: OpenPTrack: Open source multi-camera calibration and people tracking for RGB-D camera networks. Robot. Autonomous Syst. 75, 525–538 (2016) 18. Papadopoulos, C., Petkov, K., Kaufman, A.E., Mueller, K.: The Reality Deck–an immersive gigapixel display. IEEE Comput. Graph. Appl. 35(1), 33–45 (2014) 19. Paradiso, J.A., Landay, J.A.: Guest editors’ introduction: Cross-reality environments. IEEE Pervasive Comput. 8(3), 14–15 (2009) 20. Reiners, D., Voß, G., Behr, J.: Opensg: basic concepts. In: 1. OpenSG Symposium OpenSG 2002 (2002)
Design and Evaluation of an Augmented Reality Application for Landing Training Harald Schaffernak1(B) , Birgit Moesl1 , Wolfgang Vorraber1 , Reinhard Braunstingl2 , Thomas Herrele3 , and Ioana Koglbauer2 1 Institute of Engineering and Business Informatics, Graz University of Technology,
Kopernikusgasse 24/III, 8010 Graz, Austria {Harald.Schaffernak,Birgit.Moesl,Wolfgang.Vorraber}@tugraz.at 2 Institute of Mechanics, Graz University of Technology, Kopernikusgasse 24/IV, 8010 Graz, Austria {R.Braunstingl,Koglbauer}@tugraz.at 3 Aviation Academy Austria GmbH, Ludwig-Boltzmann-Straße 4, 7100 Neusiedl am See, Austria [email protected]
Abstract. While the use of Augmented Reality (AR) for pilot support in military and commercial aviation is already established, the easier access to high-tech AR devices offers new possibilities to explore the usefulness of these technologies in General Aviation. Research shows that landing a light aircraft is one of the most difficult parts of the ab initio flight training. This is also an area considered to benefit from AR. For this exploratory study, an AR application was developed to support the landing approach on a flight simulator with a Flight Path Vector (FPV), altitude, air speed and a feedback-tool. Training effects are descriptively analyzed by comparing learners’ self-evaluation and landing performance. This study provides first insights into the design, implementation, and evaluation of an AR application to support ab initio flight training. In addition, empirical data of the usefulness and limitations of the tested AR application in a flight simulator setting are discussed. Keywords: Augmented reality · Human diversity · Aviation · Flight training · Flight simulation · Landing · Flight path vector
1 Introduction AR is considered a potential enabler for improving teaching of the theoretical and practical contents of the flight course [1]. One of the most difficult tasks during the ab initio pilot training is landing training [2]. Early research in the area of aviation [3] examined the use of virtual cues for training the landing procedure based on a computer-generated visual display and identified promising potentials for this form of support. Although already used in military aircraft and in commercial air transport, head-up displays (HUDs) and augmented flight path vectors (FPV) are not used in the General Aviation, mainly due to the associated costs and size [4, 5]. However, the current availability of high-tech © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 107–114, 2021. https://doi.org/10.1007/978-3-030-74009-2_14
108
H. Schaffernak et al.
commercial off-the-shelf (COTS) AR devices gives the possibility to investigate this technology for General Aviation to reveal promising new use cases. First research in the area of AR head mounted displays was conducted in the 1960s by Sutherland [6] and then substantially shaped by works of Mann [7, 8]. As described by Lee [9], Caudell coined the term AR while working at Boeing [10] particularly in the area of aviation, where a head-mounted AR system was developed to guide workers in manufacturing processes. In the upcoming years, research and aviation industry was experimenting in a wide variety of application areas using various forms of AR display devices [1, 11]. In the design of AR aviation use cases special attention needs to be paid to gender aspects to ensure equal opportunities. In 2016 only 12.4% of 128,501 U.S. student pilots and 5.4% of 112,056 commercial pilots were female [12]. Another study of Mitchell et al. [13] also points out that female pilots are significantly underrepresented and that gender issues in aviation are evident. This gender inequality leads to potential issues which are impeding for the future development of the aviation industry [13]. While designing applications, use cases or in general technology, certain users and therefore user characteristics are assumed in the engineering process. If gender aspects are not considered in this process this can lead to gender bias and exclude certain sections of the population [14]. Furthermore, to make learning applications more interesting for students in general, including gaming aspects into the learning experience can have a positive impact. Joiner et al. [15] showed that game-based learning can support women and men equally. Schaffernak et al. [16] analyzed in a study with 60 pilots and flight instructors which gaming concepts are interesting and engaging for both women and men. As a result, “receiving feedback for correct actions”, “achieve a target to finish task”, and “receiving points if you successfully finish a task” were best rated [16]. The aim of this study is the design, implementation, and initial evaluation of an AR application used in connection with a light aircraft simulator for training the final approach of a power-off landing maneuver.
2 Methods 2.1 Method of AR Design The overall objective in landing training is a stabilized approach and landing at the touchdown zone. For the student pilot it is hard to estimate the visual cues in the beginning. The FPV should support the pilot during the approach and landing phase by indicating the current path/direction when maintaining the current attitude, thus help to accelerate the learning curve, build confidence and increase safety. Based on this, three requirements were defined: Display of speed and altitude; display of the FPV; use of a COTS device for implementation. 2.2 Method of Evaluation The effect of the AR application is assessed in a pre-test, training and post-test design. The AR group uses the AR application in combination with the generic light aircraft
Design and Evaluation of an Augmented Reality Application for Landing Training
109
simulator for training. The control group uses only the generic light aircraft simulator for training. The pre- and post-tests are performed without AR. All participants signed an informed consent form. Participants to this pre-study are 6 students aged between 20 and 25 years with no flight experience. The candidates were briefed in terms of process, upcoming tasks, the flight simulator, flight controls and functions needed for the experiments (controls, flaps, Primary Flight Display (PFD)). During a first flight the participants received a familiarization with the flight simulator and its behavior. Between the assessment phases quantitative and qualitative data are collected in an anonymized form. After each flight session students’ both subjective (e.g. selfevaluation) and objective performance data are recorded. The AR group is additionally surveyed about the AR application and their experience with it (comfort, trust, usability and interaction with the application). Each landing approach is evaluated quantitatively by measuring the accumulated deviation from required airspeed and projected touchdown point. Thus, a stabilized approach with constant airspeed aiming the touchdown point scored highest. During training after each landing the participants are asked qualitative performance questions, the AR group directly in the AR environment prompting the learner to self-evaluation. The learner receives immediate feedback from the AR feedback-tool if the self-evaluation was correct and how many points he or she scored for the landing approach. 2.3 Scenario Description For the experiments a power-off “spot landing” was chosen as a basis scenario. Six versions with different initial altitudes and lateral offset (e.g., right, left, or center) were created for different phases: pre-test, post-test and training. Thus, the student pilot starts in a pre-defined vertical and horizontal position, has to intercept the final approach and fly a precise and stabilized approach maintaining the speed within ±5 knots only using elevator, ailerons and flaps for control (see Fig. 1).
Fig. 1. Schematic illustration of spot landing (left) and FPV (right)
2.4 HoloLens and Flight Simulator Features Microsoft HoloLens (1st gen) is a six Degrees of Freedom (DoF) head-mounted stereoscopic AR device, which was released in March 2016. The six DoF allow a tracking of the user’s head movements, yaw, pitch and roll, as well as a three-dimensional tracking in the physical room. On its 2.3-megapixel widescreen it can render virtual objects in
110
H. Schaffernak et al.
the user’s Field of View (FoV) in a range of 30° × 17.5°. The device weighs 579 g and offers wireless connectivity using IEEE 802.11ac Wi-Fi and Bluetooth 4.1 Low Energy. [17, 18]. The light aircraft simulator has genuine side by side cockpit with a three-screen Electronic Flight Instrument System (EFIS) (two PFD and one Multifunction Display (MFD)) and a wide-angle visual screen providing 190° lateral coverage. Although the simulator avionics provide a flight director and three-axis auto pilot no automation whatsoever is used, and the final approach is flown manually. Simulator and HoloLens communicate using network protocols wirelessly with each other, to offer the participants a seamless experience in the ab initio flight training.
3 Proof of Concept The Proof of Concept (PoC) includes a HoloLens application, the light aircraft simulator, and a server application which manages the communication between the two systems [19]. Navigation and orientation data for the FPV is received from the generic light aircraft simulator using the server application. The FPV and feedback-tool is visualized by the HoloLens. Terrain visualization, flight simulation and calculation of the projected touchdown deviation is done by the light aircraft simulator. 3.1 FPV and Ideal Approach To display the virtual FPV fixed to the front of the cockpit, the simulator coordinate space and its aircraft kinematic data must be mapped to the HoloLens coordinate space, which in turn needs to be mapped to real world/cockpit. The calibration between the real world and the HoloLens coordinate space is done by a marker which is placed temporarily in front of the cockpit using Vuforia Augmented Reality SDK1 . For the FPV movements the absolute velocity of the aircraft from the light aircraft simulator is used and transformed into the coordinate space of the HoloLens system. The light aircraft simulator calculates a quality-score for the approach based on the deviation of airspeed and projected touchdown point. Deviations are accumulated on the final approach between 1000 ft and 200 ft above ground. Above 1000 ft the participants are given some time to stabilize the approach, and below 200 ft they would soon enter the flare that changes airspeed and projected touchdown point significantly. The landing itself is done by the participants but its quality does not count in any way, since it requires a different skill set which is not evaluated by this experiment. Additionally, the light aircraft simulator monitors the lateral deviation from runway centerline as well as the primary flight parameters attitude and airspeed. If these data indicate an unsafe approach condition the simulation is stopped and the reason is sent to the HoloLens application.
1 https://www.ptc.com/de/technologies/augmented-reality.
Design and Evaluation of an Augmented Reality Application for Landing Training
111
3.2 Application Flow After a set-up phase where the calibration of the device is performed by the experiment instructor, the application is ready to use by the participant. The instructor starts the simulation using the light aircraft simulator which sends the kinematic data and metadata to the HoloLens application. With this information the HoloLens application is able to display a HUD showing the FPV, airspeed, altitude, and simulation time (see Fig. 2). While the landing approach is executed by the participant, the deviation from the ideal data (airspeed and projected touchdown point) is observed. The simulation is stopped if the landing approach is correctly executed, or the deviation to the normal flight parameters exceeds a defined threshold. To complete the training session the participant must fill out an AR self-evaluation feedback form (see Fig. 3). Lastly, the HoloLens application will show the achieved points for this session, which is calculated from the deviation to the ideal data and the responses of the feedback form.
Fig. 2. AR HUD showing FPV, airspeed, altitude, and simulation, as seen by the participant.
Fig. 3. Completed self-evaluation feedback form, as seen by the participant.
3.3 Usage in Certified Environment For the use of this AR application in a certified flight training simulator it is important that the certification basis is not affected by the AR FPV. A primary negative influence would exist if simulation elements (flight, engine dynamics or system behavior) would be directly changed in their behavior. Since there is only a data exchange in one direction this is not the case for the PoC. A secondary influence could occur, if the data requirement of the AR application changes the performance of the main real time simulation processes. This is prevented by using a flight simulation independent computer system for the calculations.
112
H. Schaffernak et al.
4 Results Results for learners’ self-evaluation are presented descriptively in Fig. 4. The AR group trained with AR, but performed the pre-test and post-test in the simulator without AR. As Fig. 4 shows, two participants from the AR group improved their self-evaluation of the landing after AR training, showing higher values in post-test as compared to pre-test. One participant from the control group that trained without AR also improved their self-evaluation in post-test, as compared to pre-test. In each group there was one participant showing lower self-evaluation scores after training as compared to pre-test. A similar pattern can be described for the average final approach performance scores calculated from kinematic data. As illustrated in Fig. 5, two participants from the AR group improved their landing performance after AR training, showing higher performance scores in post-test as compared to pre-test. The participants from the control group that trained without AR also improved their landing performance in post-test, as compared to pre-test. However, one participant in the control group showed very good performance from the beginning. One learner from the AR group reported discomfort and calibration issues that made the training difficult, and this learner actually showed a performance decrement in post-test as compared to pre-test. The learners from the AR group reported that the holographic speed cues and FPV were the most useful AR cues. In Table 1 additional AR assessments of the learners from the AR group are presented. Table 1. Learners’ assessment of their interaction with AR features ranging from 1 (very poor) to 5 (very good). Learner (N = 3)
Learner 1 Learner 2 Learner 3
Comfort
5
1
3
Trust in using the AR application for practical flight training 3
3
4
Gesture interaction
5
4
3
Speech interaction
Not used Not used Not used
Feedback-Tool
5
4
4
Holograms
4
3
4
Sight and projection field
4
4
5
5 Discussion This pre-study shows early results for the ab initio flight training using the selfimplemented PoC application. The results indicate first potential of AR to improve the landing training for ab initio student pilots. Due to the small sample size the effect of AR on landing approach performance varies among individuals and seems to be dependent on learner’s interaction with the AR device. Some learners experience discomfort, calibration issues and limitations of the field of view caused by the AR device, and
Design and Evaluation of an Augmented Reality Application for Landing Training
113
Fig. 4. Learner’s self-evaluation of the flight path in pre-test and post-test. Scores range from poor (0) to very good (12).
Fig. 5. Objective landing performance scores calculated form the kinematic data. Score range from poor (0) to very good (300)
these factors may affect their performance. Especially the calibration issues are difficult because they are hard to recognize for the experiment instructor. Furthermore, a certain percentage of the population cannot or only partly benefit from the stereoscopic depth of the AR device, due to a disorder of their binocular vision [20]. Nevertheless, the depth perception of humans is received from different attributes such as object size, movement etc. whereby these people can become pilots. However, for determining statistical effects of the AR application, an experiment with 30 female and 30 male students, equally assigned to a control group and AR group will be performed. Thus, the comparison between the AR and control groups is expected to show if the AR training is significantly better than conventional training in improving landing training. In practice, flight training organizations must design their training syllabus in a way that prevents negative training. Negative training occurs when exercises are taught and consolidated incorrectly. This could be the case with the AR FPV, if the student pilot uses the presented help as primary reference and without this reference he or she cannot master the task anymore. It is important to make sure that the vector serves as an aid for recognizing the visual impression. In advanced exercises the AR-help could be faded out in longer intervals to animate the student to use the visual impression as a reference.
114
H. Schaffernak et al.
Acknowledgments. This research was funded by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, and the Austrian Research Promotion Agency, FEMtech Program "Talent", grant number 866702.
References 1. Brown, L.: The next generation classroom: transforming aviation training with augmented reality. In: Proceedings of the National Training Aircraft Symposium (NTAS), Embry-Riddle Aeronautical University, Daytona Beach, FL, USA, 14–16 August 2017 2. Moesl, B., Schaffernak, H., Vorraber, W., Braunstingl, R., Herrele, T., Koglbauer, I.: A Research Agenda for Implementing Augmented Reality in Ab Initio Pilot Training (In review) 3. Lintern, G., Roscoe, S.N.: Transfer of landing skill after training with supplementary visual cues. In: Visual Simulation and Image Realism I, vol. 162, pp. 83–87. International Society for Optics and Photonics (1978) 4. Newman, R., Greeley, K., Schwartz, R., Ellis, D.: A Head-up display for general aviation. SAE Trans. 109, 197–206 (2000) 5. Newman, R.L.: Head-up Displays: Designing the Way Ahead. Routledge (2017) 6. Sutherland, I.: A head-mounted three dimensional display. In: Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I, pp. 757—764. Association for Computing Machinery, San Francisco (1968) 7. Mann, S.: Wearable computing: a first step toward personal imaging. Computer 30, 25–32 (1997) 8. Mann, S.: Wearable computing. In: Soegaard, M., Dam, R.F. (eds.) The Encyclopedia of Human-Computer Interaction, 2nd Ed., The Interaction Design Foundation, Denmark (2013) 9. Lee, K.: Augmented reality in education and training. TechTrends 56(2), 13–21 (2012) 10. Caudell, T., Mizell, D.: Augmented reality: an application of heads-up display technology to manual manufacturing processes. Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences 2, 659–669 (1992) 11. Delgado, F.J., Abernathy, M.F., White, J., Lowrey, W.H.: Real-time 3D flight guidance with terrain for the X-38. In: Enhanced and Synthetic Vision 1999, vol. 3691, pp. 149–156 (1999) 12. Halleran, M.S.: Gender balance in aviation. Coll. Aviat. Rev. Int. 37(1) (2019) 13. Mitchell, J., Kristovics, A., Vermeulen, L., Wilson, J., Martinussen, M.: How pink is the sky? A cross-national study of the gendered occupation of pilot. Employ. Relat. Rec. 5, 43–60 (2005) 14. Koglbauer, I.V.: Forschungsmethoden in der Verbindung Gender und Technik: Research Methods Linking Gender and Technology. Psychol. Österreich 37, 354–359 (2017) 15. Joiner, R., Iacovides, J., Owen, M., Gavin, C., Clibbery, S., Darling, J., Drew, B.: Digital Games, Gender and Learning in Engineering: Do Females Benefit as Much as Males? J. Sci. Educ. Technol. 20, 178–185 (2011) 16. Schaffernak, H., Moesl, B., Vorraber, W., Koglbauer, I.V.: Potential augmented reality application areas for pilot education: an exploratory study. Educ. Sci. 10, 86 (2020) 17. HoloLens (1st gen) hardware. https://docs.microsoft.com/en-us/hololens/hololens1-hardware 18. On the road for VR: Microsoft HoloLens at Build 2015. https://doc-ok.org/?p=1223 19. Frohmann, P.: Utilization of augmented reality in aviation - assisting small aircraft pilots in approaching and landing. MA thesis. Austria: Graz University of Technology (2019) 20. Coutant, B.E., Westheimer, G.: Population distribution of stereoscopic ability. Ophthalmic Physiol. Opt. 13, 3–7 (1993)
Virtual Collection as the Time-Shift Appreciation: The Experimental Practice-Led Research of Automated Marionette Hsiao Ho-Wen Project Chih-Yung Chiu(B) Arts Center, National Tsing Hua University, Hsinchu, Taiwan [email protected]
Abstract. Today’s digital technology makes performing arts something more than their earlier incarnations that could only evoke ephemeral feelings of presence. Virtual archive has empowered spectators to determine their preferred duration, perspective, and scene of the work they’re viewing. Virtual technology has also gradually transformed the admiration of works that had been performed into participatory somatic experiences. Immersing themselves in a world detached from realities, spectators comprehend theater pieces not so much by conscious perception as through a world constructed from image-actor, stage installation, and immersive technology. This practice-based research takes example from Automated Marionette Project: Hsiao Ho-Wen dedicated to applying contemporary digital technology to the methodological construction and R&D of non-material archiving technique concerning performers’ body movement. Breaking away from previous practice of single-perspective recording, this project provided a sweep panorama of the performer’s whole body, insofar as to give the spectators a 3D stereo view of the performer’s body movement. Keywords: Virtual collection · Time-shift appreciation · Automated Marionette · Hsiao Ho-Wen Project · Digital double
1 Introduction Since the 1980s, performing arts has extensively incorporated media and evolved a sui generis genre known diversely as multimedia performance, cross-media performance, cyborg theater, digital performance, virtual theater, new media drama, and so forth. This nascent, changing field still lacks a proper scheme of taxonomy as a research tool and method to disentangle its own portmanteau contents. The somatic stimulation, liberated participatory space, and script flip- flop characteristic of this media technology not only offer spectators unique experiences, but also alter their interactive relations to theater pieces. In the digital performance piece Hsiao Ho-Wen Project, the creative team treated the puppet as “a figure/thing to be looked at,” shaping this “techno-body” as an object, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 115–122, 2021. https://doi.org/10.1007/978-3-030-74009-2_15
116
C.-Y. Chiu
thereby presenting it as the human body which people desperately yearn for. By virtue of the team’s deliberate arrangement, the human performer (i.e. Hsiao Ho-Wen) became the “soul” of the puppet (i.e. Anne Huang) whether at the moment when the former acts as the latter’s lighting engineer or when they dance together. On top of that, the computer program wrote by the human programmer became the puppet’s second soul, so that its body was objectified simultaneously by the subject and others. Such a subjectobject relation is no longer tantamount to the traditional counterpoint relation between subjects and others, and the originally separated body and soul blended seamlessly with each other in this performance. To put it another way, each advance in technological applications or tools implies a deeper and broader understanding of “who I am.” In this sense, the puppet Anne Huang, having a texture of “plastic quality” and “oriental” facial features, seemed to purposefully raise a thought-provoking issue in Hsiao Ho-Wen, that is, who I am/we are in the world constructed by technologies. Furthermore, today’s digital technology further makes performing arts something more than their earlier incarnations that could only evoke ephemeral feelings of presence. Virtual archive has empowered spectators to determine their preferred duration, perspective, and scene of the work they’re viewing. Virtual technology has also gradually transformed the admiration of works that had been performed into participatory somatic experiences. Immersing themselves in a world detached from realities, spectators comprehend theater pieces not so much by conscious perception as through a world constructed from image-actor, stage installation, and immersive technology. Spectators employ body-mind fusion again to set out on a fantastic journey across the virtual universe, indulging themselves in the world of image-body interaction. This practice-based research takes example from Automated Marionette Project: Hsiao Ho-Wen dedicated to applying contemporary digital technology to the methodological construction and R&D of non-material archiving technique concerning performers’ body movement. Breaking away from previous practice of single-perspective recording, this project provided a sweep panorama of the performer’s whole body, insofar as to give the spectators a 3D stereo view of the performer’s body movement.
2 The Impact of Digital Art on Curatorial Practice The art world has long been accustomed to “objects.” As such kind of static artworks is treated as spaces for exhibition and preservation, digital art has not only assumed significance in contemporary art practice, but also kept challenging traditional art in terms of the curatorial model concerning display and documentation as well as the approaches to collection and preservation. Digital art re-orientates itself from object to process, and takes on diverse art forms that are time-based, dynamic, interactive, collaborative, and customized. Digital art refuses to be concretized, and further defies the conventional rules of artistic objects. Due to such process-oriented, participatory art form, digital art exerts tremendous impact on curators, artists, viewers and artistic institutions. It prompts curators to work with artists in developing and presenting artworks. Running counter to conventional museums as temples of rare artistic gems, digital art has redefined the roles of curators and artists as a new model of collaboration and exhibition that engages the viewers/public in creative practice.
Virtual Collection as the Time-Shift Appreciation
117
In addition to the changes in museum’s hardware, the curatorial viewpoint that criticizes conventional museums has arisen spontaneously. The exhibition model and strategy derived from traditional spaces are not suitable for digital art. Most works of digital art are inherently performance- and situation-based. They form a network connected with their external environments. However, the curatorial practice of using computers and screens to construct an independent exhibition space for new media art often provokes criticism due to the complexity of the technique and equipment it entails. The main flaw of digital art is that it cannot be experienced together with the works made of other media in a complete context, which renders itself marginalized in the history of museum exhibition. On top of that, viewers have to spend a far greater amount of time admiring individual, independent digital art installations than that they spend on visiting ordinary museums1 . According to Christiane Paul, former curator of video and media art at the Whitney Museum of American Art, the challenges posed by digital art primarily involve its “immateriality.” That is, new media art is mainly based on software, system and computer network. Numerous conceptual, philosophical and practical issues have arisen from the attempts to display, archive and preserve such process-oriented art form in institutions. She believes that immateriality consists of “material links,” and artistic production oscillates between the openness of systems and the constraints of industrial and systemic guiding principles. It not only exerts direct influence on the creation, exhibition and reception of art, but also alters the role of each participant. The interactive participation not only allows people to personally view and partake in artworks, but also changes the basic rule of “do not touch” in museums. Interactive participation thus serves as the key to the transformation of digital art into an “open system.” The opening process rests upon the amount of time the viewers/participants invest and their professional knowledge required for participation. Digital art also laid a practical foundation for cooperative exchange, through which artists play the role tantamount to that of curators. The practical foundation has profound significance to curatorial processes2 . Moreover, digital art has prompted people to ruminate the conventional definitions of space and structure. We are striving to create virtual spaces and transform information structures into physical spaces. In digital art exhibitions, the connections between the virtual and physical spaces eventually determine the aesthetics of the exhibits. Curators and artists shall decide the connections in a collective manner. Digital art installations help art establish a visible presence in physical spaces. Sometimes the installation of equipment must satisfy specific criteria such as height, width and lighting. Variability and modularity are characteristic of media. Also, variability enables a visual work to be shifted among different ways of presentation. Finally, the physical environment can be redefined according to the needs from an artwork, thereby demonstrating the important connections between the physical and virtual spaces3 .
1 [1], p. 57. 2 [2]. 3 [1], p. 56.
118
C.-Y. Chiu
3 New Vision Li-Yuan: Automated Marionette Project—Hsiao Ho-Wen Project “If we compare a dancer’s body to a random access memory (RAM) device, the dancer will become a puppet when someone writes a dancing program in his/her body. Now there is another puppet with similar memorizing ability to the dancer’s. We may wonder whether the dancer in the real world and the digital puppet capable of interacting and memorizing will cultivate a digitalized relationship of genetic evolution.” —The creative statement of the Hsiao Ho-Wen Project4 We’ve seen the untapped potential for our technologies and inventions in the present era, by virtue of which we can conduct avant-garde experiments, thereby ensuring the most radical aesthetic presentation of our creations. A vital issue has underlain the aesthetic practice of digital performance arts to date, that is, the dramatic tension between the live ontology of performing arts and the simulacra-oriented nature of mediarized, non-live and virtual technologies. “Experiment” is one of the characteristics of this tension that grows around performing arts. The experimental procedure disassembles the constitutive elements from an enduring, stable whole on the one hand, and outlines a vision of a new totality on the other. Such radical experiments yield results of generalization and synthetization. Another characteristic is “digitalization,” an unstoppable momentum revealed in every detail of digital performance art and substantiated in the fact that what can be digitalized have been digitalized. Choreographed in three different versions, New Vision Liyuan – Automated Marionette Project: Hsiao Ho-Wen (hereafter referred to as the Hsiao Ho-Wen Project) treats automated marionette as its proposition, integrates the live musical accompaniment with the heavy control table above the stage, and features the couple dance by Hsiao Ho-Wen and the android named Anne Huang. Its purpose is manifold, such as new media performance, interactive installation, and so forth. Apart from inventing an automated marionette to develop the interactive control technology necessary for digital performing arts, the Hsiao Ho-Wen Project lays greater stress on the investigation into the life experiences mirrored in the body as a vehicle of memories. “Anne Huang” thus becomes a concept, a creative idea of a time-travelling RAM. The dance programs accumulating in her body gradually evolve, repeat, circulate, and metamorphose, which ends up with numerous free bytes that are then replicated to a digital puppet—a virtual RAM, and addresses the question as to whether memories are proved absolutely indispensable for the existence of life. The length of the Hsiao Ho-Wen Project is 50 min. Titled after the dancer Hsiao Ho-Wen and treating traditional Taiwanese marionette as the point of departure, this performance accentuates role-playing in three dimensions: replication, control, and simulation. The script structure can be divided into four acts, viz., “Then, I Become a Human Being,” “Vector and Anti-vector,” “The Temperature of Soul,” and “Am I Hsiao Ho-Wen, or the Puppet Is?” Equipped with 21 control points and driven by 29 motors, the puppet 4 See the creative statement of the “Hsiao Ho-Wen Project,” originally titled the “Anne Huang
Project.”.
Virtual Collection as the Time-Shift Appreciation
119
can make various body movements such as squat, rotation, and moving horizontally in the illusory world of bytes interlaced by memories and history. As an automated marionette, “Anne Huang” bears more than a passing resemblance to a time-travelling bite that transcends all sorts of confines. Her/Its memories turn into countless pixels that illuminate high walls and eaves. The formless, flowing sleeves and the fine yet unruly hair appear in the peripheral vision of memories. Which one is the other’s prototype? The dance theater tailor-made for the dancer and constructed with literary imagery is intended to weave a weird, disorderly relationship of manipulation between body and soul, which is interlaced by the past and the present as well as by the real and the virtual5 . Beginning with traditional Taiwanese opera and utilizing digital technology, the Hsiao Ho-Wen Project managed to identify the point from which it conjoins with contemporary sentiments. The digitally manipulated marionette interacts with the human dancer in an appealing, graceful, lithe and gentle manner, hence a peculiar sense of harmony and aesthetics between them. They meanwhile bring a touch of conflict and tug-of-war. It is noteworthy that the movements and dance of the marionette are designed specifically for its interaction with the human dancer. Its body movements are detached from the inherent plot of an opera and turned into independent creative vocabulary integrated with machines, images, audiovisual, plastic art, and human performance.
4 Virtual Collection and the Experimental Practice-Led Research of Automated Marionette Hsiao Ho-Wen Project Hsiao Ho-Wen Project decomposed experiential phenomena into symbolic bits, making each part of the performance rely heavily on this new form of digital representation. Hsiao Ho-Wen Project not only altered the standard way of presentation on stage and redefined the accepted role of puppeteer, but also de-centered human actors who used to be the protagonists in regular theaters, thereby embodying the radical aesthetics of contemporary digital performance arts and creating a technological spectacle of newfashioned interdisciplinary creation. After the sudden death of Hsiao Ho-Wen, the Artist, Huang Wen-Hao, determines to make a new piece of art concerning the digital double of dancer to memorize Hsiao Ho-Wen. This research will briefly demonstrate the steps of virtualization of Hsiao Ho-Wen Project. 4.1 Body Scanning of Dancer and Marionette and Face Modeling The steps include body scanning, modeling, material adjustment (including surface simulation of special materials such as eyeballs, hair, clothing, etc.) (Fig. 1) rigging with the body or face, light source and reflection calculation, object and 3D environment simulation, Hair and muscle dynamic simulation, etc.
5 The author is grateful to Huang Wen-Hao from the ET@T for providing related textual and
image material.
120
C.-Y. Chiu
Fig. 1. The concept and details of body scanning and face modeling.
4.2 Capturing Motions and Movements After the body modeling of the two dancers are completed, the next step is to use “motion capture” technology allowing the models to perform actions, measuring, tracking, and recording the motion trajectories of “digital objects” in three-dimensional space. The optical system works by tracking position markers or 3D features, and then combines the collected data into approximate movements of the actors. Active systems will use luminous or flashing markers, while passive systems will use non-luminous objects, such as white balls or painted dots. The existing motion capture equipment has been developed to be less heavy and more convenient, such as the use of 2D mobile phones to carry apps, without the use of other hardware, studios or sophisticated sensor devices to generate dancing 3D models (Fig. 2).
Fig. 2. The process of capturing motions and movements
4.3 Scene Construction The five steps of building a model are: image information capture, single scene construction, scene linking, repairing, and mapping. The stage scene of the Automated Marionette VR project is a black stage without scenes and props, as well as lighting changes and projection animations required by the plot (Fig. 3).
Virtual Collection as the Time-Shift Appreciation
121
Fig. 3. Virtual scene construction
4.4 Sound and Image Production for Virtual Reality After the above materials are prepared, the follow-up procedures including “transition,” “Stabilization,” “Pacing,” “Rotations,” and “3D Audio/VR Audio” (Fig. 4).
Fig. 4. Still image from test video
5 Conclusions As a practice-based research project, Hsiao Ho-Wen Project demonstrated multiperspective horizons in virtual archiving. Firstly, different from traditional moving image, Hsiao Ho-Wen Project not only gave prominence to imagery elements, but also opened up new possibilities for multiple perspective and plural images with the assistance of digital technology. Besides, dissimilar to the conventional function of cinematographic machines, this experimental work allowed the spectators to discuss its charm from multiple perspectives and indulge themselves in the immersive sound field. Finally, distinct from the common agenda shared by works of video art, this project granted the spectators preference-based option. It is clear that digital technology has directly impacted the analysis of theater and performance, particularly in terms of presence, documentation, and spectatorship. Digital technologies such as high-resolution imaging, motion capture, and data analysis have not only refreshed the spectatorship and scholarly interpretation of contemporary media and performance, but also influenced art collection methods as well as the plans and roles of art museums. In addition, Hsiao Ho-Wen Project enabled the spectators to admire its “liveness” virtually in different space-time, and invited them to consciously get involved in this
122
C.-Y. Chiu
“event” with their bodies and senses. There was no such thing as a “perfect” angle of admiring this work when the spectators entered the realm of this machine-image performance and roamed the venue. To put it another way, the spectators experienced this work in a reality constructed in an abstract and symbolic fashion. They gave feedback, making movement and space meaningful components of their experiences. The synaesthetic experience thus found expression in the abovementioned process. Somatic reactions dominated sematic interpretations in this immersive performance that involved all human senses. Employing the strategy of virtual immersive spectatorship, Hsiao HoWen Project allowed the spectators to associate their memories of viewing with their present admiration, hence the continuation of the somatic-sematic relation, a distinguishing attribute of virtual archiving. To resolve the internal contradiction between the synaesthetic immersion and the heaviness of technological installation that haunts general VR works, this project invited the spectators to take seat and admire the remade version of Hsiao Ho-Wen Project. This project highlighted the characteristic of database-based performance that transforms “performance” into “archive,” so that people no longer miss any piece they want to see. According to Tara McPherson’ talk “Post-archive: Scholarship in the Digital Era,” we are in the midst of the “post-archive moment,” in which an archive metamorphosed from a collection of objects into a database of them. Hsiao Ho-Wen Project was exactly situated in the dialectical relation that databases dominate, overwhelm, and replace the text of liveness.” This viewpoint coincided with Lev Manovich’s notion of database—a database is “a cultural form of its own.” As a cultural form of the contemporary digital generation, Hsiao Ho-Wen Project represented the world as an inventory and refused to order the items in it. This project simply demonstrated an aggregate of materials, and left the rest to the spectators’ participatory autonomy. In this project, technological media produced the illusion of presence (liveness). It didn’t really present the body on-site, but vividly showed the human body, objects, and scenes in a way as if they were “present” at that very moment.
References 1. Paul, C.: Challenges for a ubiquitous museum: from the white cube to the black box and beyond. In: Paul, C. (ed.) New Media in the White Cube and Beyond: Curatorial Models for Digital Art, pp. 53–75. University of California Press, Berkeley (2008) 2. Paul, C.: The myth of immateriality: presenting and preserving new media. In: Grau, O. (ed.) Media Art Histories, pp. 251–274. MIT Press, Cambridge (2007)
Extended Reality in Business-to-Business Sales: An Exploration of Adoption Factors Heiko Fischer1(B) , Sven Seidenstricker1 , and Jens Poeppelbuss2 1 Baden-Wuerttemberg Cooperative State University Mosbach Campus Bad Mergentheim,
Schloss 2, 97980 Bad Mergentheim, Germany {heiko.fischer,sven.seidenstricker}@mosbach.dhbw.de 2 Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, Germany [email protected]
Abstract. Extended reality (XR) has the potential to change well-established practices in business-to-business (B2B) sales. We investigate in this paper under which circumstances XR can be used in B2B sales and which factors drive or hamper the adoption. For this purpose, we conduct a qualitative survey and present insights into application scenarios of XR. The results illustrate the influences of customer interest, salesperson characteristics, and organizational facilitators on the adoption and use of XR in B2B sales. Keywords: Extended reality · Augmented reality · Virtual reality · B2B · Sales · Technology adoption
1 Introduction A top priority for sales directors in business-to-business (B2B) markets is to maximize revenue and the effectiveness of the sales function. The megatrend of digitalization offers various possibilities to change the way business is done and, therefore, it is seen as an important factor influencing the competitive advantage [1]. Digitalization describes the adoption of various technologies in a broader context and leads to a connection between the physical and the digital world [2]. Thus, digital technologies have the potential to disrupt well-established sales practices [3]. One technology that enables to bridge the gap from the physical world to the increasingly important digital world is extended reality (XR) [4]. However, despite the various possibilities for improving sales processes, the use of this technology is still far away from becoming mainstream [5]. XR is assessed as relatively new and companies are still evaluating implementations to unfold its potential and to create added value [6]. Against this backdrop, we investigate in this paper, how XR can be successfully adopted in B2B sales and which circumstances have to be considered. To do so, we first give an overview of XR (Sect. 2) and the study design (Sect. 3). Afterwards, we present our findings about adoption that appear to influence the system characteristics and adoption. In particular, we investigate the customers attitude towards the use of XR in sales, we show how the characteristics of the salespeople change, which organizational aspects need to be considered when implementing XR, and which © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 123–130, 2021. https://doi.org/10.1007/978-3-030-74009-2_16
124
H. Fischer et al.
application scenarios of XR we were able to identify (Sect. 4). The paper ends with a conclusion where we briefly discuss the results (Sect. 5).
2 Extended Reality and Sales While reality is understood as the actual world that can be experienced with all senses, virtuality consists of imaginary objects that mostly follow the rules of reality [7]. Two technologies that complement or replace reality with virtual elements are augmented reality (AR) and virtual reality (VR). AR adds digital objects to the real environment. For instance, they can be placed between real physical objects and allow a modification of the real environment [8]. VR provides users with a fully virtual environment. The virtual environment is developed in real time with the help of computers and allows the user an immersive experience. To cover both phenomena, the term XR is used, which is said to cover the spectrum from a completely real world to a completely virtual world [9], including both VR and AR [10]. By now, XR has been adopted by the industry for a variety of application areas [8, 9, 11]. For instance, XR can enable new business models, provide digital assistance, or change existing business processes [12]. Moreover, XR can also support B2B sales because it may extend the selling environment and lead to a better shopping experience [13]. XR applications allow for individualizing the customer approach in all steps of the sales cycle, offering new experiences to the customer and considering her/his needs purposefully [14]. Accordingly, XR yields potential benefits for the business [15] since it enhances the transfer of content and the activation of communicative participants [16].
3 Study Design The present study is based on a qualitative survey. We developed a semi-structured interview guideline with questions. Table 1 gives an overview of the participants of the survey. Except for company 4, one expert was interviewed each company. The interviewees can be seen as experts, since they are responsible for XR applications in the concerning company. Accordingly, the companies have already introduced at least one XR application and have already utilized them in B2B context. The companies are headquartered in Germany, but represented globally with subsidiaries. They take the provider perspective in B2B transactions, i.e., they utilize XR technologies in their sales activities with the intention of selling their products and services to their business customers. We gathered the interview data in the second half of the year 2020. All interviews were audio-recorded and transcribed. We analyzed the transcripts of the interviews following Mayring’s and Kuckartz’ guidelines [17, 18]. We started with the coding of the material after we had transcribed two interviews. First, we read them to get an impression of the content, focusing on adoption factors and circumstances that needed to be considered for the use of XR in B2B sales. We noted similarities between both interviews and coded the first two transcripts with these categories. Where we identified an overlap between categories during codification, we combined them or adjusted the definitions to have a clearer distinction. After we had analyzed the fifth interview, we reviewed
Extended Reality in Business-to-Business Sales
125
the formulated categories. We realized that there were categories with shared characteristics. Accordingly, we investigated these characteristics and assigned them to main categories. Then, the remaining transcripts followed. This procedure led us to the four main categories: (1) customer interest, (2) salesperson characteristics, (3) organizational facilitators, and (4) XR application characteristics. Table 1. Characteristics of the participants ID
Industry
Employees (circa)
Expert’s Domain
1
Supplier for components and systems
1,400
Marketing
2
Plant construction and engineering
1,000
Service engineering
3
Supplier for components and systems
1,900
Marketing
4
Plant construction and engineering
2,300
(1) Project management (2) Product management
5
Supplier for c-parts
1,700
Sales
6
Services for automotive after market
800
IT, R&D
7
Plant construction and engineering
1,000
Business development
8
Supplier for components and systems
200
Product management
9
Supplier for components and systems
14,700
Marketing
4 Findings 4.1 Customer Interest The interviewees mentioned that customers are positively impressed by XR applications. Since the diffusion of XR technology in the market is still low, the technology still tends to be seen as new and innovative. Hence, customers show a high interest in XR applications in B2B context and are very curious. That is why the experts that we interviewed assess XR as an important magnet for participants in trade fairs and as a trigger for follow-up conversations. Some customers also show skepticism and hesitate to use the application on first sight. However, as soon as they personally experience the application, they tend to show a positive attitude towards XR. Although the experts perceive customers to be highly interested in XR, they consider it an important task to transfer these positive reactions to a further exchange between salespeople and customers. Customers are not
126
H. Fischer et al.
just interested in experiencing XR from a technology perspective but rather want to see the practical contribution of XR for their business. They want to know how XR can also make their processes more efficient and are interested in tangible, practical applications. 4.2 Salesperson Characteristics The interviewees indicated that XR primarily changes the interaction with the customer. While an offer used to be presented in the reality, it is now experienced via a new technology. This shift away from reality leads to changes of well-established routines. The customer receives and experiences information quicker than before. Accordingly, XR can provide valuable insights to the customer but also requires additional skills by the salespeople. They need to be receptive to new technologies and show more flexibility, since XR offers additional experiences to the customer. On the one hand, the salespeople need to know how to effectively use the technology and assess its potentials. On the other hand, the salespeople need to be able to communicate these potentials to the customer. That is why the salespeople need a good comprehension of the new medium. XR does not change the entire sales process but transfers lots of former conventionally presented information to the digital world. This shift needs to be managed by the salespeople to offer the customer a valuable XR experience. 4.3 Organizational Facilitators Besides considering the roles of customers and salespeople, we investigated how the adoption of XR technology can be supported in the B2B sales organization. From the responses of our interviewees, we identified the following success criteria: Success Measurement. The experts described a lack of success metrics, meaning a lack of established indicators that measure the success of XR applications. Following that, estimating the effort-benefit ratio was considered as very difficult. This lack can lead to problems when trying to justify investments in XR applications. However, the experts suggested that XR has a positive influence on sales if the applications deliver additional benefit. XR Acceptance and Diffusion. By introducing XR in sales, the company must consider the acceptance of the user (salespeople and customers). The experts could not draw a clear picture whether the user’s age or the personal mindset mainly influences the user’s acceptance. However, companies need to consider these circumstances, which can differ from user to user and across industries. Technical Restrictions. There are several technical restrictions identifiable, which hinder companies in using XR in an appropriate way. They can be distinguished in three categories. One restriction addresses the hardware, in particular the computing power of data glasses, which lead to use restrictions. This also concerns the mobility of the applications since some use cases still need an extra computing device to process the virtual environment or require installed sensors to deliver the experience. The second restriction pertains to software issues. Our experts mentioned that they perceived especially AR glasses to be still in the prototype phase, containing many bugs. Additionally,
Extended Reality in Business-to-Business Sales
127
the need to download applications and firewall restrictions can hinder customers’ use of XR applications. The third category concerns the internet connection, since especially remote applications need continuous internet connection to transmit large volumes of data. User Familiarity with XR. This factor describes how the user’s personal familiarity influences the use and usefulness of the XR application. The experts mention that firsttime users can suffer perceptual disorder, vertigo, or nausea. These phenomena usually disappear after a while and concern especially VR applications. Moreover, wearers of “normal” glasses can have problems with the focus of the picture presented through the data glass. Additionally, training is required to move safely in the XR world and have a fully positive experience. Cross-Platform Application and Compatibility. To completely benefit from XR, it is necessary to design an application that is compatible to more than just one device (e.g., desktop application, browser application, glass application). This ensures that XR can be used in different cases and at different places. This leads to the need for standardization of interfaces and application designs, too. Develop and Communicate Added Value. The experts emphasized that the added value is crucial for a successful XR usage. Customers and salespeople need an additional value for their business processes. The XR application needs to solve an existing problem in a better way than existing technologies. Companies must not just transfer the same value via another technology. Moreover, concrete application scenarios need to be shown to point out the added value for customers and sales. Competences and Resources. The experts stated that designing an XR application can result in huge time and energy expenses. Especially concerning AR applications, experts are rare and the technology is considered to still be in the prototype phase. Whether because of missing skills, the huge effort for designing the application or the relatively small number of XR applications to be designed, companies usually decide to outsource the development of XR applications. Data Security. Surprisingly, we did not identify issues concerning data security. Most identified applications are locally contained on a specific device and are not designed for a remote data transfer. If data is transferred, then the data is either not sensitive (voice, images) or the same data had already been transferred before an XR application was used. However, experts mentioned that one needs to take care which information is processed in which step of the sales cycle.
4.4 XR Application Characteristics Considering the interests of the customers, the changing role of salespeople and organizational challenges, we were able to identify various application characteristics. We summarize these into the following four main categories, which cover the entire sales process from attracting attention to after sales.
128
H. Fischer et al.
Offer and Enterprise Presentation. These applications are used to present new products to the customer by visualizing them. However, more than just products are displayed; some companies also use the technology to create an immersive experience to present the company as a whole. In general, these applications aim to give the customers a detailed impression of the supplier and its offers. Project Documentation. The benefit of these applications are that they document the project progress. Particularly, they are preferably used by companies which construct large and complex machines. In this case, the application provides as a basis for technical discussions and a reference point for the project status. The application documents the project from the first concept to full implementation. Training. This use case offers the possibility to train customers in using the product in advance or after the installation. The supplier can give the customer detailed instructions and thus supports the learning process. However, the application cannot be used only to train customers in using the product. It makes a valuable contribution to educate sales department and other departments of the provider company, too. Installation and Maintenance Support. This is a typical application offered in after sales. The supplier gives the customer detailed instructions on how to install or fix the machine. Moreover, the application supports the service team by solving problems on the machine of the customer.
5 Conclusions and Future Research With this study, we provide valuable insights into applying XR in B2B sales. It uncovers a range of use cases and what needs to be considered when developing and introducing them into the market. Generally, a higher efficiency along the sales process can obviously be achieved through XR. The experts saw XR as an important trigger to collaborate better with their customers and to actively involve them in the sales process. They also mentioned the improvements of knowledge transfer, since the application allows visualizing complex matters. A broader group of people can be involved with XR and solutions can be identified and assessed more simply. This can lead to temporary competitive advantages for first and early movers, since the diffusion of this technology in the industry is still low. Moreover, the experts indicated that XR can positively influence the image and thus create trustworthiness. The potential benefits of XR can unfold in almost all steps of the sales process. Our study shows that XR is likely to gain a growing role in the sales process across an extended range of application scenarios. The experts predicted that the technology will either replace familiar sales material such as flyers or catalogues or enhance them with digital objects. In particular, attracting customers on trade fairs and services as well as training, installation, and maintenance are seen to benefit from XR applications. However, the results do not show that XR will change the entire sales process or replace salespeople, but rather that it has the potential to increase the effectiveness and efficiency in sales.
Extended Reality in Business-to-Business Sales
129
Our findings show that the adoption of XR depends on three groups: customers, salespeople and the overall organization. To successfully adopt XR in B2B sales the companies must consider certain influence factors. From the interview data, we were not able to rank the relative importance of each factor. However, we saw that the challenges depend on the type of XR application and how it is used to interact with the customer. To successfully design and apply an XR application, the characteristics of both sides – customer and sales – need to be considered. What the experts especially clarified is that the application needs to have a recognizable benefit to both customers and sales. Otherwise, it may be attracting the customer, e.g., on trade fairs but does not unfold its full potential along the sales process. Salespeople also need to be sensitized in using the technology. Despite the valuable insights, there are certain limitations that need to be considered when interpreting the results. These limitations can provide a starting point for future research. Our findings are based on a qualitative study. We interviewed ten experts from nine companies that present their offers worldwide and apply XR not just in Germany but in an international context. To gain more meaningful results, future research should interview more companies and more than just one expert from each company. Aside from that, the interviewed companies stem mostly from the manufacturing sector which restricts the transfer of results to other industries. Moreover, the study only probes companies that develop and offer XR applications from the seller’s perspective. To extend the meaningfulness of findings, customers should be interviewed about their attitude and expectations towards XR, too. This way, both viewpoints could be combined to receive a comprehensive view on applying XR in B2B sales. Another question that should be answered by future research is how the success of XR can be measured. The experts highlighted the importance of measuring the contribution of XR especially to justify respective technology investments. However, none of the surveyed companies had some kind of a success measurement system. The perceived positive contribution of XR to B2B sales was still largely based on subjective evaluations by the experts.
References 1. Rodríguez, R., Svensson, G., Mehl, E.J.: Digitalization process of complex B2B sales processes – enablers and obstacles. Technol. Soc. 62, 101324 (2020) 2. Legner, C., Eymann, T., Hess, T., Matt, C., Böhmann, T., Drews, P., Mädche, A., Urbach, N., Ahlemann, F.: Digitalization: opportunity and challenge for the business and information systems engineering community. Bus. Inf. Syst. Eng. 59, 301–308 (2017) 3. Singh, J., Flaherty, K., Sohi, R.S., Deeter-Schmelz, D., Habel, J., Le Meunier-FitzHugh, K., Malshe, A., Mullins, R., Onyemah, V.: Sales profession and professionals in the age of digitization and artificial intelligence technologies: concepts, priorities, and questions. J. Pers. Selling Sales Manag. 39, 2–22 (2019) 4. Masood, T., Egger, J.: Augmented reality in support of Industry 4.0—implementation challenges and success factors. Robot. Comput.-Integr. Manuf. 58, 181–195 (2019) 5. Steffen, J.H., Gaskin, J.E., Meservy, T.O., Jenkins, J.L., Wolman, I.: Framework of affordances for virtual reality and augmented reality. J. Manag. Inf. Syst. 36, 683–729 (2019) 6. de Regt, A., Barnes, S.J., Plangger, K.: The virtual reality value chain. Bus. Horiz. 63, 737–748 (2020)
130
H. Fischer et al.
7. Farshid, M., Paschen, J., Eriksson, T., Kietzmann, J.: Go boldly!: explore augmented reality (AR), virtual reality (VR), and mixed reality (MR) for business. Bus. Horiz. 61, 657–663 (2018) 8. Gallardo, C., Rodriguez, S.P., Chango, I.E., Quevedo, W.X., Santana, J., Acosta, A.G., Tapia, J.C., Dndaluz, V.H.: Augmented reality as a new marketing strategy. In: de Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics. 5th International Conference, AVR 2018, Otranto, Italy, 24–27 June 2018, Proceedings. LNCS Sublibrary. SL 6, Image Processing, Computer Vision, Pattern Recognition, and Graphics, vol. 10851, pp. 351–362. Springer, Cham (2018) 9. Hariharan, A., Pfaff, N., Manz, F., Raab, F., Felic, A., Kozsir, T.: Enhancing product configuration and sales processes with extended reality. In: Jung, T., tom Dieck, M.C., Rauschnabel, P.A. (eds.) Augmented Reality and Virtual Reality. Changing Realities in a Dynamic World, pp. 37–50. Springer, Cham (2020) 10. Chuah, S.H.-W.: Why and Who Will Adopt Extended Reality Technology? Literature Review, Synthesis, and Future Research Agenda (working paper). SSRN Electronic Journal (2018) 11. Berg, L.P., Vance, J.M.: Industry use of virtual reality in product design and manufacturing: a survey. Virtual Reality 21, 1–7 (2017) 12. Hagl, R., Duane, A.: Exploring how augmented reality and virtual reality technologies impact business model innovation in technology companies in Germany. In: Jung, T., tom Dieck, M.C., Rauschnabel, P.A. (eds.) Augmented Reality and Virtual Reality. Changing Realities in a Dynamic World, pp. 75–84. Springer, Cham (2020) 13. Bonetti, F., Warnaby, G., Quinn, L.: Augmented reality and virtual reality in physical and online retailing: a review, synthesis and research agenda. In: Jung, T., tom Dieck, M.C. (eds.) Augmented Reality and Virtual Reality. Empowering Human, Place and Business. Progress in IS, pp. 119–132. Springer, Cham (2018) 14. Gieselmann, C., Gremmer, E.: Wie digitale Innovationen den stationären Kaufprozess revolutionieren - Mögliche Antworten auf den Online-Trend. In: Keuper, F., Schomann, M., Sikora, L.I. (eds.) Homo Connectus. Einblicke in die Post-Solo-Ära des Kunden, pp. 431–452. Springer Gabler, Wiesbaden (2018) 15. Tredinnick, L.: Virtual realities in the business world. Bus. Inf. Rev. 35, 39–42 (2018) 16. Mahony, S.O’.: A proposed model for the approach to augmented reality deployment in marketing communications. Procedia – Soc. Behav. Sci. 175, 227–235 (2015) 17. Mayring, P.: Qualitative Inhaltsanalyse. Grundlagen und Techniken, 12th edn. Beltz Verlag, Weinheim, Basel (2015) 18. Kuckartz, U.: Qualitative Inhaltsanalyse. Methoden, Praxis, Computerunterstützung, 4th edn. Grundlagentexte Methoden. Beltz Juventa, Weinheim, Basel (2018)
Towards Augmented Reality-Based Remote Family Visits in Nursing Homes Eva Abels1,2 , Alexander Toet1(B) , Audrey van der Weerden3 , Bram Smeets3 , Tessa Klunder4 , and Hans Stokking4 1 TNO Soesterberg, Kampweg 55, 3769 DE Soesterberg, The Netherlands
{eva.abels,lex.toet}@tno.nl 2 University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands 3 MeanderGroep Zuid-Limburg, Minckelersstraat 2, 6372 PP Landgraaf, The Netherlands
{audreyvanderweerden,bramsmeets}@mgzl.nl 4 TNO New Babylon, Anna van Buerenplein 1, 2595 DA Den Haag, The Netherlands
{tessa.klunder,hans.stokking}@tno.nl
Abstract. Family visiting restrictions in nursing homes due to COVID-19-related measures have a major impact on elderly and their families. As an alternative communication means, TNO is developing an augmented reality (AR)-based solution to realize high-quality virtual social contact. To investigate its suitability for remote family visits in nursing homes, the AR-based solution will be compared to regular video calling in a user study involving elderly and their family members. Based on focus groups with elderly, family and caretakers, user experience (UX) indicators have been established to evaluate these virtual family visits, of which social presence was the most prominent. Remote family visits via AR-based and regular video calling are expected to result in different UX. It is hypothesized that participants will report the highest levels of social presence in the AR condition. If AR-based video calling is indeed preferred, TNO will continue and upscale the development of this technology. Keywords: Augmented reality · Social XR · User experience · Social presence · Spatial presence · Elderly
1 Introduction Due to the recent COVID-19 pandemic, stringent safety measures have been enforced in nursing homes and these visitation restrictions have a major impact on the elderly residents and their families [1]. Virtual visits via XR (extended reality, which includes virtual, mixed and augmented reality (VR, MR and AR, respectively)) communication systems offer a highly suitable solution by enabling users who are physically situated in different locations to interact with digital representations of each other in virtual environments (VEs) [2]. As an alternative communication means during and beyond the COVID-19 pandemic, TNO is developing a social XR application based on AR to realize high-quality virtual contact between the elderly and their family members. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 131–137, 2021. https://doi.org/10.1007/978-3-030-74009-2_17
132
E. Abels et al.
To successfully design a social XR application, it is essential to gain insight into its user experience (UX) [3]. For VEs, UX research has been primarily focused on the technology’s capability to promote a sense of (spatial) presence (“being there” [4, 5]) and social presence (“being together and having an intellectual/affective connection” [4, 6, 7]). These subjective experiences are a critical aspect of virtual interactions [7] and measures of presence have been used as quality parameters for (comparisons between) different communication systems [8]. However, to be able to assess the UX of TNO’s newly developed AR application, two issues need to be resolved: (1) while the concepts of spatial and social presence have been investigated extensively in VR environments [7], less is known in the context of AR [9] and it is unclear to what extent the findings of (social) interaction studies in VR translate to AR; and (2) currently, no UX methodology exists to evaluate social interactions in AR [2]. With respect to the first issue, some technical differences between these XR alternatives must be considered. VR environments typically replace the real world with a virtual one through immersion of the user’s senses [2]. However, AR environments are unique in the sense that real and virtual elements co-exist by supplementing the real world with virtual objects, which the user can interact with in real-time [2, 9]. Consequently, this leads to different interpretations of spatial presence. In VR environments, spatial presence refers to the user’s perception of physically being in the VE [4, 7], whereas in AR, the virtual objects become part of the user’s environment such that it seems as if they really belong to the physical world [10]. As this research involves an AR solution, the appropriate definition of spatial presence for AR environments is maintained. As for the assessment of the quality of social AR experiences, methods developed in VR research can be adopted as a starting point and may be adapted to make them suitable for applications in AR. Moreover, as TNO’s AR solution will be applied in nursing homes, the involved user groups consist of the elderly, their families and caretakers, requiring evaluation methodology tailored to these specific users. Therefore, focus group sessions were conducted with these user groups, with content partially based on VR research. These sessions resulted in the formulation of three UX indicators to evaluate social AR experiences in this study: spatial presence, social presence and enjoyment. In this research project, TNO’s AR-based video calling solution will be compared to regular video calling, to investigate its suitability for remote family visits in nursing homes. This project is part of a collaboration between TNO and MeanderGroep, a healthcare provider in Zuid-Limburg, the Netherlands.
2 Method 2.1 Participants and Procedure This study will include 16 pairs of participants, each pair consisting of one elderly resident and one family member. The evaluation of the technology is structured in a within-subjects design, such that each pair will undergo the two visiting conditions (i.e., AR-based and regular video calling) in counterbalanced order within a single testing day. During the virtual visits, which will last about 20 min each, the resident will be seated in
Towards Augmented Reality-Based Remote Family Visits
133
a room within their familiar living environment (a MeanderGroep nursing home) and the family member will be located in another room at the same location. After each virtual visit, the experiences of the elderly resident and the family member will be evaluated. 2.2 Technical Setup The general architecture of the AR system is shown in Fig. 1. The resident sits in front of the AR-tablet (iPad Pro) that is positioned vertically on a stand, in between the resident and an empty chair. The elderly resident is filmed using a Logitech webcam that is attached on top of the iPad Pro. When looking at the screen, the resident sees the family member projected onto the chair, so that the family member appears to be physically present and sitting opposite the resident (Fig. 2). Two computers are used to connect the technology on both sides via the internet, to deliver the video frames from the elderly to the family member as well as the audio signals in both directions. At the family side, the family member sits in front of a 43 TV screen on which the resident is displayed. To record the family member, a three-dimensional (3D) RGB-D camera (Azure Kinect) is placed in front of the TV, so that the family member can be recorded and then virtually displayed in the resident’s environment as a photorealistic live capture, using highly detailed point clouds. A separate local area network (LAN) is used for video and audio data transmission, ensuring a high-functioning connection. Depth information is required to create a 3D point cloud from a 2D image. The color space (how colors are stored) and bit-depth (how much information is stored per pixel) greatly influence the quality of the depth image. To project the depth information on a video frame, the Kinect depth capture of the family member is processed in several steps, including a foreground-background removal, mapping and stitching of color and depth, conversion into HSV (hue, saturation, brightness; an alternative of the RGB color model), and erosion and dilation to remove noise and smoothen edges. A Web Real-Time Communication (WebRTC) module transmits the frames to the elderly’s side where they are received by the WebRTC reception module. Then, the frames are rendered in AR on the iPad, such that the 3D depth content of the family member is presented at the side of the elderly.
Fig. 1. Overview of the video communication system with AR between the elderly (Person A) and the visiting family member (Person B). The elderly views the family member projected onto a chair in AR on an iPad Pro, while the visitor views the elderly on a 43 TV screen.
In the regular video calling condition, a live video connection will be established using Microsoft Teams, an online video conferencing software program. The same setup will be used as during AR video calling: the elderly sits in front of the iPad where the
134
E. Abels et al.
family member is now presented in 2D and is filmed by the iPad front camera. The family member views the elderly on the TV and is filmed by the Kinect camera. In both the AR and regular video calling conditions, the same screens will be used in the same orientations and the audio quality will be similar by using the same Jabra 750 speakerphones for good audio quality and echo cancellation. In the regular video calling condition, the self-view will be covered, since the AR solution also lacks a selfview. These steps are taken to ensure that the reported evaluation only pertains to the experience and not to technical differences between both viewing modes.
Fig. 2. Example of AR-based video calling: the elderly (left) sees the family member (right) projected onto the chair when looking through the AR-enabled screen of the iPad Pro.
2.3 Evaluation Methodology The evaluation with the elderly residents will consist of observations and a short semistructured interview. A MeanderGroep caretaker will observe (changes in) the elderly’s behavior based on an adapted version of the Music in Dementia Assessment Scales (MiDAS; which assess levels of interest, response, initiation, involvement and enjoyment [11]) before, during and after the virtual visits. The interview will take place directly after each visit, in which the resident is asked about his/her experiences based on the UX indicators of spatial presence, social presence and enjoyment. The family members will fill in a short questionnaire on the three UX indicators and a Dutch adaptation of the Networked Minds questionnaire (NMQ) [6]. This questionnaire was selected because it examines self-perception as well as perception of the other with regards to (social) presence and psycho-behavioral interaction. 2.4 Statistical Analysis Since the elderly’s experiences are prioritized, the primary outcome measures are the MiDAS observations and UX indicator scores. The family member’s experiences, examined by the UX indicators and NMQ, will serve as secondary measures.
Towards Augmented Reality-Based Remote Family Visits
135
A significance level of 0.05 will be used for all hypothesis tests. Data analyses will be performed using the statistical program SPSS. A repeated-measures analysis of variance (ANOVA) will be used due to the within-subjects design of this study.
3 Expected Results The elderly subjects are likely to experience differences in UX between family visits via AR-based and regular video calling. It is hypothesized that AR-based video calling will be evaluated more positively than regular video calling in terms of spatial presence, social presence and enjoyment (i.e., the three UX indicators). As for the observations, higher levels of interest, response, initiation, involvement and enjoyment are expected during AR-based than during regular video calling. The experiences of the family members are hypothesized to be similar between conditions, as the conditions involve an identical setup. Therefore, no differences are expected in UX indicator scores and on the self-perception items of the NMQ. However, the items on the perception of the other might differ, such that higher levels of (social) presence and psycho-behavioral interaction are reported in the AR condition than in the regular condition, as the mode of communication between conditions did differ on the side of the other (that is, the elderly).
4 Discussion In this study, an AR-based video calling solution will be compared to regular video calling, as an alternative communication means with the purpose to afford life-like remote family visits in nursing homes during and beyond the COVID-19 pandemic. The experiences of the elderly residents and the family members will be assessed based on UX indicators of spatial presence, social presence and enjoyment, which were derived from focus group sessions. The hypothesis that the elderly will evaluate AR-based video calling more positively than regular video calling is in accordance with earlier research where AR-based and 2D display-based systems were compared in a within-subjects design [12]. The AR-based condition received higher ratings in terms of spatial and social presence as compared to the 2D system. However, this experiment was conducted with different user groups with a task-based setup and involved AR-based communication for both conversation partners. The main advantage of this study is that the characteristics of the elderly population have been taken into account while developing the application and the evaluation methodology. This leads to a design aimed at providing a comfortable, positive and pleasant experience where social interaction and connectedness is promoted, which is in conformity with design principles for social VEs [2]. Additionally, this study could contribute to the development of an UX evaluation method for social interactions in AR, which can be used to assess and compare the successfulness of such social communication technologies. The limitations of the AR system include screen size and bandwidth. Due to the iPad’s screen size, the family member’s facial expressions are relatively small. This
136
E. Abels et al.
could have been improved by zooming in, but was not tested because of time and budget constraints. Also, the WebRTC bandwidth is limited and could be improved, yet by considering expert opinions on video quality this was still deemed sufficient. Furthermore, the generalizability of the findings to the general population are limited, as the emphasis is on the experiences of the elderly residents. Lastly, this study only involves a one-time experience of the technology and the outcomes might not necessarily correspond to long(er)-term effects, which should be explored by future studies. The elderly’s reported differences in UX between family visits via AR-based and regular video calling will provide insight into the differential experience of these communication methods and into the particular aspects of AR that are crucial to designate it as the favorable system. If AR-based video calling is evaluated more positively, TNO will continue developing this technology. If the elderly report no differences in UX between family visits via AR-based and regular video calling, or if AR-based video calling is rated more negatively, then TNO will discontinue the development of the AR-based solution for nursing homes. Alternatively, this study could reveal that certain aspects of the technology need to be improved for the product to become satisfactory. Acknowledgments. This project was partially funded by the ERP ‘Social eXtended Reality’.
References 1. Noone, C., McSharry, J., Smalle, M., Burns, A., Dwan, K., Devane, D., Morrissey, E.C.: Video calls for reducing social isolation and loneliness in older people: a rapid review. Cochrane Database Syst. Rev. 5 (2020) 2. Lee, L.N., Kim, M.J., Hwang, W.J.: Potential of augmented reality and virtual reality technologies to promote wellbeing in older adults. Appl. Sci. 9, 3356 (2019) 3. Nedopil, C., Schauber, C., Glende, S.: Guideline the Art and Joy of User Integration in AAL Projects (2013) 4. Biocca, F., Harms, C., Gregg, J.: The networked minds measure of social presence: pilot test of the factor structure and concurrent validity. In: 4th Annual International Workshop on Presence, pp. 1–9 (2001) 5. Slater, M., Wilbur, S: A framework for immersive virtual environments (five): speculations on the role of presence in virtual environments. Presence: Teleoper. Virtual Environ. 6, 603–616 (1997) 6. Harms, C., Biocca, F.: Internal consistency and reliability of the networked minds measure of social presence. In: Seventh Annual International Workshop: Presence 2004, pp. 246–251 (2004) 7. Oh, C.S., Bailenson, J.N., Welch, G.F.: A systematic review of social presence: definition, antecedents, and implications. Front. Robot. AI 5, 1–35 8. Grassini, S., Laumann, K.: Questionnaire measures and physiological correlates of presence: a systematic review. Front. Psychol. 11, 1–21 (2020) 9. Miller, M.R., Jun, H., Herrera, F., et al.: Social interaction in augmented reality. PLoS ONE 14, 1–26 (2019) 10. Smink, A.R., van Reijmersdal, E.A., van Noort, G., Neijens, P.C.: Shopping in augmented reality: the effects of spatial presence, personalization and intrusiveness on app and brand responses. J. Bus. Res. 118, 474–485 (2020)
Towards Augmented Reality-Based Remote Family Visits
137
11. McDermott, O., Orrell, M., Ridder, H.M.: The development of music in dementia assessment scales (MiDAS). Nord. J. Music Ther. 24(3), 232–251 (2015) 12. Kim, J.I., Ha, T., Woo, W., Shi, C.-K.: Enhancing social presence in augmented reality-based telecommunication system. In: International Conference on Virtual, Augmented and Mixed Reality, pp. 359–367. Springer, Heidelberg (2013)
Virtual Reality Environment as a Developer of Working Competences Jorge A. González-Mendívil(B) , Miguel X. Rodríguez-Paz, and Israel Zamora-Hernández Tecnológico de Monterrey, EICl, Vía Atlixcayoyl 5718, 72453 Puebla, Mexico [email protected]
Abstract. Having the ability to do something efficiently and successfully is essential to any worker of any discipline, but who’s to say that a worker has or has not developed competence, Only the instructor? The worker? Both? If for example we select a driving school, who’s to say that a student is competent to drive? If he is going to drive it, we better be sure that not only the instructor, but the student feels competent enough to drive, otherwise, an accident could happen. In this paper, we will present the self-analysis of a group of 30 students from 3 different disciplines (IT, Art design, and Industrial Engineering) on the development of collaborative multidisciplinary work, problem analysis, and problem-solving competence along with this, we will present the comparative of the professor’s evaluation of those same competencies. Both of them gave substantial evidence that high development levels have been achieved. Keywords: Virtual reality · Competence development · Educational innovation
1 Introduction Software engineers are excellent in “How” to do something because in their academic years they are taught in how to develop a solution for a problem that needs to be solved using software engineering. If a better software for inventory management needs to be programmed, they have all the software engineering competences needed to make this possible, they do make things happen in terms of information technologies. One competence that software engineers have to develop is the one of collaborative multidisciplinary work. This is because as good as they are in “how” to make something possible, they lack a little in the “what” needs to be done. Most of the times the problems that need to be solved are in Business Management, Strategic Planning, Operations Research, etc. and because they are not taught in those areas, they will need to be able to understand the most important concepts, methodologies and mathematical models of those different specialties to provide better and robust solutions that adapt to real situations [1]. Another important educational issue to be address is the time given to develop a task; in many educational systems, students are given a semester to develop what some call “Final Project” and most students tend to think that they will have four to six months © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 138–145, 2021. https://doi.org/10.1007/978-3-030-74009-2_18
Virtual Reality Environment as a Developer of Working Competences
139
of work to finish every project that they will work on in the future, and that is not what happens in real life. It would important to put them to the test of developing a solution to a problem in much less time, a week maybe.
2 Competences Development To assure and measure how well a competence is developed through academic activities it is a great challenge in today’s educational work, It is important to have an individual competence acknowledgment of its development by the person who is developing such competences, [2] this is as important as the evaluation of them from the perspective of a teacher. Multi-disciplinarity as a factor is vital to develop a problem solution competence because most of the time our specific discipline is not enough to have a complete vision of a problem and its possible solution. A single student can’t have all the disciplines that a problematic situation demands, but he can develop the competence recognizing the disciplines that are necessary for the solution of a problem and it’s responsibility of the discipline in its solution, in this manner they are prepared to interact with other disciplines and develop competencies that allow them to communicate and work properly in the achievement of common objectives [3]. The recognition of competences makes the task distribution and structured division of work is done in a more efficient manner because every student ability and knowledge is used for the intended purpose and all of them interact in order to accomplish the main goal that is solving a problem. Finally, a student must be able to interact with other disciplines because if not, it would not be a real collaborative effort, this ability to communicate with other disciplines that are not a part of his curriculum, prepares them for working in the real-world, where the organizations have lots of disciplines that must interact to fulfill a common goal that are the strategic objectives of organizations where they will always be seeking to be more productive and in the long run, more competitive. The measurement of competence as difficult as may be, has lots of ways to accomplish it, what we need to understand is that a competence is the human capacity in form of different knowledge, skills, thoughts, character, and values that in an integral way through interactions accomplish commons solutions, we can say that a project and its systemic definition where a set of tasks have the purpose of solving a real problematic situation by interacting between themselves can be use to measure competences because, according to the situation under study and the use of collaborative multidisciplinary can not only solve a problem the development of such skills.
3 Virtual Reality as a Competence Developer In the recent past virtual Environments (Ambiences) have been used for operators training [4]. And all these efforts have had to consider and comply with the three illusions that a user of such environment needs to believe for the environment to work properly.
140
J. A. González-Mendívil et al.
The first of them is Place Illusion, this illusion refers to the user’s feeling of being somewhere else. This virtual place can be a real or a fantasy place. In both cases, if the user feels transported to this other location, the illusion of place has been successful. The second is the illusion of plausibility, this illusion refers to how much the user believes that the events that happen on the virtual stage are happening. This illusion can be broken if those events do not seem realistic or natural. It is possible to improve the illusion of plausibility by designing interactive behaviors on the scene [5]. The third and last of these illusions is the illusion of realization: this illusion refers to the user’s feeling of being in a virtual body. This is usually achieved by properly placing an avatar under the camera and showing it after the user turns his head down, for example, in the use of an oculus rift the user can see his hands [6]. On the contrary, mobile VR HMDs do not offer this possibility and, therefore, allow a lower level of realization illusion.
4 Methodology A group of students was presented with the challenge of developing a 3D virtual interactive environment where the user would learn a “simple” task. The requisites of this task were: To have the three movements defined by ergonomics theory, these three movements are present in most of the tasks performed in any production process: General, Controlled, and Tool usage. The other requisite was to accomplish the three illusions that any virtual environment should have: Place, Plausibility, and Embodiment. The students selected the changing of a flat tire as the task and then they had 1 week to develop the environment. Each discipline of the group had one main responsibility: Industrial Engineers were responsible for the ergonomic design of the environment, Art Designers were responsible for designing the elements of the environment, the tools needed and the place and plausibility illusion. Finally, the IT Engineers were responsible for the embodiment illusion as the environment had to be interactive. Once students developed the environment, they had to self-analyze their own competence development from the perspective of the 3D interactive virtual environment designed. They were asked to answer a questionnaire about how well the recognized that they had develop each competence to develop they also had to select the level of each development. The level of development was evaluated qualitatively when presenting their solution to a group of teachers who coordinated the activity by the University. To evaluate the competences, they were asked to evaluate the level of development that they recognized having achieved during the design of a 3D virtual environment. There were four questions that they were asked to answer: 1. What level of development of the collaborative work competence do you think you have achieved as a result of your participation in this challenge? 2. What level of development of the problem-solving competence do you consider to have achieved as a result of your participation in this challenge?
Virtual Reality Environment as a Developer of Working Competences
141
3. How much did you support this challenge to improve your ability to improve processes? 4. How much did you support this challenge to improve your ability to analyze processes? There were four evaluation levels of each competence: Basic, Medium, High, Null. The basic level meant that the students recognize to have a primary level, this means that they have the elements of it but still has a long way to go to recognize that he has fully developed the competence, the student feels a high degree of insecurity of both knowledge and skills. According to González-Mendívil et al. [8]: “The average level means that the student recognizes that he is in good progress in the development of the competition but that he is missing as much as what he has developed to complete the mastery of it. Insecurity in knowledge and skills on the part of students exist but to a much lesser degree than at the basic level.”“The high-level means that the student recognizes that the competition is fully developed and that he feels confident in demonstrating his knowledge and skills when putting this competence into practice for the benefit of a problematic situation [7]. Finally, the null level indicates that the students do not recognize having developed anything with respect to the competition, their insecurity is total and basically it is ex-pressed that no knowledge and/or skill with respect to it was achieved.”
5 Results Once the students and instructors evaluate the competences to develop in the challenge we obtained the next results that show the opinion of both parties. 1. What level of development of the collaborative work competence do you think you have achieved as a result of your participation in this challenge?
Fig. 1: Analysis of collaborative work competence.
142
J. A. González-Mendívil et al.
As we can see in Fig. 1, it shows that most of the students and instructors recognize a high development of the collaborative work competency, this would mean that both parties recognize that the competition is fully developed and that they feel confident to demonstrate their knowledge and skills. Adding to this analysis, it is very to show that more than 80% of students and instructors evaluated the competence development from medium to high levels. 2. What level of development of the problem-solving competence do you consider to had have reached as a result of your participation in this challenge? (Fig. 2).
Fig. 2: Analysis of the problem solution competence.
As before, a large amount of students and instructors declare that their competence is developed at its full level, because they recognize a high level of development of the problem-solving competence. 3. How much did you support this challenge to improve your ability to improve processes? (Fig. 3). We can observe more than 90% of the students and instructors declare that their development level of the competition is high, we can also see that some 40% of them say that their development level is high.
Virtual Reality Environment as a Developer of Working Competences
143
Fig. 3. Analysis of the process improvement competence.
4. How much did you support this challenge to improve your ability to analyze processes? (Fig. 4).
Fig. 4. Analysis of the process analysis competence.
In this case, the process analysis development states that more than 90% of students and instructors recognize from medium to high level of that development. 5. What level of contribution did the multidisciplinary work have in achieving the result of this project? (Fig. 5). Finally, we can see enough evidence to say that multidisciplinary work has from medium to high contribution to achieve the objective of the project with 93% evaluation.
144
J. A. González-Mendívil et al.
Fig. 5. Analysis of the aport of the multidisciplinary work.
In general, the results indicate that both, instructors and students acknowledge a high level of developing of all competences. This proves that challenges and projects are an excellent way to help not students, instructors and people in general, can measure and acknowledge the development of competences and the level of it.
6 Conclusions The results mainly stablish a high development of the competences evaluated by instructors and students. This proves that a competence must be evaluated not only by instructors, but by those who develop the competence, maybe not in terms of a grade but they can be considered as a complement to instructor’s evaluation. Another aspect identified is the recognition by both parties that the multidisciplinary approach was a key factor in obtaining the project goal. Finally, to recognize that you have a competence is equally important as the evaluation of an expert, it is not enough that other people say that you have a certain level of development, it is necessary that you acknowledge such development. Acknowledgments. Authors would like to acknowledge the financial support of Writing Lab, TecLabs, Tecnologico de Monterrey, Mexico, in the production of this work.
References 1. Thomas, B., Maria, T., Thomas, R., Erwin, G., Jörg, S.: Requierements for designing a cyberphysical system for competence development. In: 8th Conference on Learning Factories 2018 – Advanced Engineering Education & Training for Manufacturing Innovation (2018) 2. Karvounidis, T., Chimos, K., Bersimis, S., Douligeris, C.: Evaluating web 2.0 technologies. J. Comput. Assist. Learn. 30(6), 577–596 (2014). https://doi.org/10.1111/jcal.12069
Virtual Reality Environment as a Developer of Working Competences
145
3. Manca, S., Ranieri, M.: Facebook and [9] the others. Potentials and obstacles of Social Media for teaching in higher education. Comput. Educ. 95, 216–230 (2016). https://doi.org/10.1016/j.com pedu.2016.01.012. https://www.sciencedirect.com/science/article/pii/S0360131516300185 4. Manca, S., Ranieri, M.: Yes for sharing, no for teaching!”: social media in academic practices. Internet High. Educ. 29, 63–74 (2016). https://doi.org/10.1016/j.iheduc.2015.12.004. https:// www.sciencedirect.com/science/article/pii/S1096751615300105 5. Moghavvemi, S., Paramanathan, T., Rahin, N.M., Sharabati, M.: Student’s perceptions towards using e-learning via Facebook. Behav. Inf. Technol. 36(10), 1081–1100 (2017). https://doi.org/ 10.1080/0144929X.2017.1347201 6. Rodriguez-Paz, M., Gonzalez-Mendivil, J., Zarate-Garcia, J., Zamora Hernandez, I., NolazcoFlores, J.: A hybrid flipped-learning model and a new learning-space to improve the performance of students in Structural Mechanics courses. In: IEEE Global Engineering Education Conference, EDUCONVolume 2020-April, Article number 9125385, pp. 698–703, April 2020. 11th IEEE Global Engineering Education Conference, EDUCON 2020, Porto, Portugal, 27 April 2020–30 April 2020; Category numberCFP20EDU-ART; Code 161475 (2020) 7. González-Mendívil, J., Rodriguez-Paz, M., Reyes-Zarate, G.: Virtual reality as a factor to improve productivity in learning processes. In: Advances in Intelligent Systems and Computing, AISC 2020, vol. 1217, pp. 762–768. AHFE Virtual Conference on Usability and User Experience, the Virtual Conference on Human Factors and Assistive Technology, the Virtual Conference on Human Factors and Wearable Technologies, and the Virtual Conference on Virtual Environments and Game Design, 2020; San Diego; United States; 16–20 July 2020; Code 241849 (2020) 8. González-Mendívil, J.A., Rodríguez-Paz, M.X., Caballero-Montes, E., Garay-Rondero, C.L., Zamora-Hernández, I.: Measuring the developing of competences with collaborative interdisciplinary work. In: 2019 IEEE Global Engineering Education Conference (EDUCON), Dubai, United Arab Emirates, pp. 419–423 (2019). https://doi.org/10.1109/EDUCON.2019.8725163
Digital Poetry Circuit: Methodology for Cultural Applications with AR in Public Spaces Vladimir Barros(B) , Eduardo Oliveira, and Luiz F. de Araújo CESAR School, Av. Cais do Apolo 177, Recife Antigo, Recife, Pernambuco, Brazil {vbs,ejgo,lfaa}@cesar.org.br
Abstract. Culture brings knowledge, leisure, and pleasure to society through various forms of manifestation. Both poetry and visual arts enchant us with the form and the recovery of our traditions. In this sense, new technologies have great potential to bring us closer to culture, facilitating access and reach so that we can rescue authors, artists, and our cultural memory. This article seeks to present a methodology to connect culture, technology, and design through an artifact that interacts with public environments and their monuments. Were UX, UCD, and Augmented Reality processes, in addition to studies on urban spaces in order to investigate the relationship of users with the public environment in which they identified themselves, an end of proportion an improvement in the cultural experience of the city of Recife. Keywords: User-Centered Design · Augmented Reality · Culture · Public spaces
1 Introduction Each year the digital landscape has been changing according to current technology. Ogusko [1] explains that cycles change over time: computers in the 1980s, the internet in 1990, and smartphones in the 2000s. He believes that the next step is to use Immersive Realities (IR) more and more latent. This technology allows new experimentation of the world, transforming a way in which interaction takes place in the most diverse areas such as health, education, real estate, business, entertainment, and culture. Technology grows every year and presents several innovations introduced in daily life, changing the way of working, communication, leisure, and the way we think. According to Tori and Hounsell [2], these innovations are developed to make the routine easier or more productive, including Augmented Reality (AR), which changes the characteristic of how these relationships are established, interacting with the real environment. However, its use in other segments such as culture and tourism has been the subject of studies, with the great challenge of inserting this technology in a way that the user assimilates naturally and spontaneously consumes it. This being an extension of the user with the use of smartphones or tablets, other activities could be incorporated into objects that would go unnoticed in the daily life of the city. Good opportunities for these © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 146–151, 2021. https://doi.org/10.1007/978-3-030-74009-2_19
Digital Poetry Circuit: Methodology for Cultural Applications with AR
147
applications are the statues and monuments existing in Bairro do Recife, historically rich in the most diverse fields of art and architecture. An example of this is the Circuit of Poetry in Recife, a series of 16 sculptures, in real size, of great artists of Pernambuco literature and music throughout the neighborhood (Fig. 1).
Fig. 1. Statue of Ascenso Ferreira in Circuit of Poetry in Recife
2 UCD To have a more complete understanding of the tool, User-Centered Design was seen as an important design process with a focus on the public to analyze a solution for their needs. The use of research, testing and interviewing brings the possibility of understanding the user, generating satisfactory feelings in the production of the final content. With UCD, you can create systems that adapt to the user’s actions with the world, which makes products more assertive in execution. By identifying the profile and behavior of the consuming public, added to usability tests, we managed to reach a positive result. According to Lowdermilk [3], usability and UCD will change the way people interact with your application, creating a rich experience with solid design principles. This transforms the digital age and viscerally affects all economic and social structures.
3 The Digital Circuit of Poetry This research was characterized as qualitative when the users’ behaviors were analyzed before an immersive process in AR in the monuments of Recife. A device called the Digital Poetry Circuit (Fig. 2) was created, which, when applied to the statue of Ascenso Ferreira, presents information about his life and work to the user. The research shows how AR can transform public spaces, which have little or no movement, into more attractive places. Using design methodologies, solutions were analyzed, developed, implemented, and observed to achieve results.
148
V. Barros et al.
Fig. 2. App Digital Poetry Circuit
4 Methodology Following the model of Dresch, Pacheco, and Júnior [4], technological studies were developed as follows: 1)
2) 3)
4)
5)
6)
7)
Identification of the Problem: The use of immersion technologies in public spaces was observed, hoping to raise hypotheses that would indicate the construction of the research. The data was structured for planning the data collection, exploring the urban environments where the monuments were. Relevant questions were raised to guide possible results, converging with similar studies that brought an opportunity for accurate analysis. Targeted readings: Targeted studies with articles and materials related to the themes proposed by the issue: DCU, RA, and public spaces. Identification and proposition of artifacts and configuration of problem classes: We sought to observe how AR would improve interactivity with people and the public space, and how the development of a device would transform an infrequent place. The use of this artifact would facilitate the collection of information to envision alternatives to solve the problem. Artifact design: After analyzing similar research on the areas involved and with the directed study, the brainstorming technique gave shape to a solution that could be the most suitable for the problem. In this phase, ideation acted heavily for development, focusing on making this solution possible and scalable. Artifact development: Application creation processes were performed: 3D, Ux, and statue mapping. The prototype itself was created in stages and tested in cycles until it was executed efficiently. Evaluation of the artifact: The artifact was evaluated by experts for its effectiveness and the user experience in a real environment. This phase was crucial for the necessary adaptations for an assertive final production for the tests. Semi-structured interviews (1st part): the control group was defined by a survey by Deloitte [5].
Digital Poetry Circuit: Methodology for Cultural Applications with AR
8)
149
Semi-structured interviews (2nd part) and application of the tests: They were divided into four sub-stages: - In the first part, the second part of the semi-structured interview with the Technical Questionnaire for Research 1 was applied, in which we sought to understand the relationship of users with the statues in an objective way, analyzing the habits in the appreciation of the works without technological interference. - In this second part, the AR artifact was presented to the user. In it, the statue of Ascenso Ferreira declaims authorial poetry. In the test, infographic information was provided with step by step instructions for using the application. - After testing the material, the third part appears with the application of a Technical Questionnaire for Research 2, according to a desirability study by Rohrer [6]. A short user experience questionnaire was created in which the use of AR was evaluated, observing with the DCU how this three-dimensional information presented itself. - The last part, the Technical Survey Questionnaire 3 was completed, observing the levels of interactivity, usability, and desirability of the user. With objective questions, the use of this type of technology linked to culture was seen. This method (Fig. 3) was designed to observe the data obtained after using AR to solve the problem proposed by this research.
Fig. 3. Research method
9)
Explanation of learning: with the data, the answers were classified and evaluated for future improvements. 10) Conclusions: the results were formalized to generate a final proposition, generating possible solutions and possibilities for future applications. 11) Generalization for a class of problems: Proposals for implementation in similar processes were presented. 12) Communication of results: It was denoted what was learned in the process and how it will unfold in the future.
150
V. Barros et al.
This research proposed in 2019, has undergone adaptations in some of the stages due to the COVID-19 pandemic. The creation of the application, the interview and testing process had to change in order to meet the standards of social distance suggested by the World Health Organization (WHO). The studies and methods had their ideation to develop in a real environment, applied in public environments. It was necessary to look for an alternative that users could have an experience closer to the initial project in their homes. The DSR had some adjustments for this conduct from its planning to the explanation of the final solution.
5 Analysis The numbers were positive with a lot of assimilation of the group that consumes technology and are open to new experiences. This is already an indication to defend cultural proposals with the use of technology in public environments. The information in the structural questionnaire indicated the user’s cultural position: – Little interacts with the city’s monuments; - Sometimes you find it difficult to view information on monuments; - See partial gains in the information provided educationally; – They are, in part, dissatisfied with the amount of information available; - Has the cell phone as a tool to make it easier to obtain information; – If the works were more interactive, they would have a greater interest. Technical questionnaire 1, immersive in the RA, addresses the following points: – Does not easily find applications with innovation in the works; – When monuments have a differentiated presentation, most of them are of direct interest; – I would visit an exhibition again; Technical questionnaire 2 presented good technical considerations for the research interests, as the control group considered AR-positive in all characteristics and with the table by Tullis and Albert [7], presented 91% confidence in the sample, in which all completed the activity defined in the test. He considers AR to be fluid during the test, easy to use, efficient for its purpose, evident in the visual presentation, with interactive dynamics and a high level of innovation. Technical questionnaire 3 ended with a direct explanation from the control group, evaluating the application in the public space as: – Interactive; – Interesting in the appreciation of the statue; – Facilitator for immersion; – Desirable about space and work; – Referral to other people and investment feasible for own consumption
6 Final Considerations Throughout the study, the use of AR was a positive and real option for occupying public space. Transforming these places within the city for an immersive technological experience, would re-signify the environment and bring a new horizon for the culture.
Digital Poetry Circuit: Methodology for Cultural Applications with AR
151
This process makes it possible to expand it to other existing works not only in the neighborhood of Recife but throughout the capital, as they have several other sculptures. There is a strong desire of the users studied to consume culture in the State, but, the form of presentation prevents a deeper immersion in the theme. The application brought questions to users regarding the honorees of the Poetry Circuit. This fact already makes the immersive solution an alternative for generating knowledge to connect information that is sometimes imperceptible. Users recommended to include some features to make the tool more interactive: – Menu for accessing an audiobook to permeate other poetry; – Life stories told by the statue itself; – Question and answer quiz about the statue honoree The possibilities denote the user’s real interaction and how they feel the need for stimuli for the consumption of cultural products, in which they are almost always not addressed technologically. The fact that this object can be staggered is a point that deserves to be highlighted, as Recife has dozens of works outside the Circuito da Poesia that suffer the same problem of neglect in the public environment. The historical digital and interactive transposition make the space more assiduous and ready to receive local visitors or tourists. This rebirth remakes the meaning of history, not least because the people have in their culture the essence of identity. The future objectives are to make the Digital Poetry Circuit part of the local digital tourism, applying to Pernambuco culture desirability and didactics to educate the population about their ancestors. Then it will become public what belongs to the people, the knowledge of traditions, and where it all started.
References 1. Ogusko, T.: Precisamos falar sobre Tecnologias Imersivas: VR, AR, MR e XR. https://med ium.com/hist%C3%B3rias-weme/precisamos-falar-sobre-tecnologias-imersivas-vr-ar-mr-exr-6c7e8077267b/. Accessed 17 Sept 2020 2. Tori, R., Hounsell, M.S.: Introdução a Realidade Virtual e Aumentada. Editora SBC, Porto Alegre (2018) 3. Lowdermilk, T.: Design Centrado no Usuário. Novatec Editora, São Paulo (2013) 4. Dresch, A., Pacheco, D., Antunes Júnior, J.A.V.: Design Science Research: método de pesquisa para avanço da ciência e tecnologia. 1st edn. Bookman, Porto Alegre (2015) 5. Deloitte Global: A mobilidade no dia a dia do brasileiro. https://www2.deloitte.com/br/pt/ pages/technology-media-and-telecommunications/articles/mobile-survey.html. Accessed 22 July 2020 6. Rohrer, C.P.: Desirability studies: Measuring aesthetic response to visual designs. https://www. xdstrategy.com/desirability-studies/. Accessed 15 July 2020 7. Tullis, T., Albert, B.: Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics. Morgan Kaufmann Publisher, New York (2013)
Human–Computer Interaction
Co-creating Value with the Cooperative Turn: Exploring Human-Machinic Agencies Through a Collective Intelligence Design Canvas Soenke Zehle1(B) , Revathi Kollegala2 , and David Crombie3 1 xm:lab - Experimental Media Lab, Hochschule der Bildenden Künste Saar, Keplerstr. 3-5,
66117 Saarbrücken, Germany [email protected] 2 K8 Institut für strategische Ästhetik, Keplerstr. 3-5, 66117 Saarbrücken, Germany [email protected] 3 Center for Research and Innovation, HKU, Utrecht, The Netherlands [email protected]
Abstract. This essay introduces a “collective intelligence design canvas” (v0.1) developed by members of anticipate, a non-disciplinary research network initiated in 2019 in response to shared interests in exploring the cooperative turn in comprehensions of agency, value, and intelligence. The network has focused its analysis on the vocabularies policy makers and researchers use to explore the transformation of agency in distributed intelligent systems. A structured overview of these vocabularies is included. To facilitate further engagement with the network’s collective intelligence design conversation, authors have synthesized these vocabularies into a design canvas highlighting the main conceptual layers of this process. Keywords: Anticipation · Collective intelligence design canvas · Cognitive assemblages · Human-machine agency · Intelligent distributed systems
1 Engaging with the Cooperative Turn A far-reaching transformation is underway, with implications for a wide range of innovation-related concerns - from organizational development to the design of sociotechnological systems: the renaissance of cooperation. Bolstered by the demise of neoclassical economics and its narrow focus on rational economic agency, cooperation has re-entered the stage in at least three roles: as principle of institutional and market design, as model for the co-creation of value, and as affective dynamic at the core of emerging forms of collaborative work. Since the question of value cuts across all three, our analysis makes the rise of new forms of value the point of departure and reference for an analysis of the implications of what we explore as “the cooperative turn” for the design of collective intelligences for viable distributed systems. Our particular research interest lies in reframing a design conversation that seems to have already taken off in the wrong direction in human-centered “ethics-and-ai” © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 155–161, 2021. https://doi.org/10.1007/978-3-030-74009-2_20
156
S. Zehle et al.
proposals that struggle to take the implications of ethics-by-design approaches fully into account. At the same time, the growing interest in “sovereignty”-oriented systems design calls for a reengagement with the systems design approaches that are already part of the HCI archive since they offer us a way to contextualize the cooperative turn and the focus on collective agency from within the field of HCI. Given the state of play in HCI (gigantic public and private investments in a new generation of “intelligent” technologies, growing awareness of the disruptive potential of these same investments), we wish to enter into cross-sectoral design conversations that take seriously these disruptions - not only of markets for goods and services, but of our concepts of agency, intelligence, and value. We are in the midst of a turn toward designing “cooperation machines” that incorporate human cognition, emotion, and movement in ways that break radically with machine concepts from the industrial era [1]. To fully engage with “the cooperative turn”, not least to develop corresponding educational and research agendas, we must venture beyond the (conceptual) boundaries of “human-centered design” and take seriously the potential for collective human-machine intelligences. Focusing on the implications of “the cooperative turn”, this essay aims to contribute to such a design narrative.
2 Cultures of Collective Agency The designation of smart systems as “artificial” intelligence tends to preserve the concept of technological objects as distinct (i.e. separate from us) rather than distributed (i.e. involving us). A key emphasis is therefore placed on “collective” (rather than simply artificial) intelligences, i.e. the “cognitive assemblages” of distributed systems that bring forth new forms of co-agency whose societal implications we are just beginning to explore [2]. Since our goal is to explore collective intelligences as effects of distributed systems that change the ways in which human and machinic agencies relate to and affect each other, we focus on this interdependence. This approach to agency draws both on conceptual HCI research and applied methods of co-creation from an international network of living lab communities [3]. Next-generation mobile networks are widely expected to facilitate a “cooperative turn” in robotics as faster networks allow real-time interaction and involve robots in more complex relational settings [4]. We focus on collective intelligence to bring such “cooperative” technological innovation together with new organizational forms and technologybased facilitations of collective agency. Stressing the central role played by cooperation in a critical review of the ai mainstream, Weyl and Lanier invoke recent Taiwanese social movements and their translation into new governance models: “Taiwan’s citizens have built a culture of agency over their technologies through civic participation and collective organization, something we are starting to see emerge in Europe and the US through movements like data cooperatives” [5]. We approach these examples as part of a “cooperative turn”, the renaissance of collaborative and commons-oriented initiatives in agriculture, education, finance, health, technology, etc.1 In all of these fields, efforts 1 Directories include https://digitalsocial.eu, https://www.ica.coop, https://ioo.coop/directory,
https://ncui.coop; also see https://decodeproject.eu/ and https://scs.community/ as examples of “sovereign technology” projects.
Co-creating Value with the Cooperative Turn
157
exist to build a “culture of agency” to bring human and machinic agency together in ways that amplify the collective agency of (and in) a new generation of socio-technological systems.
3 Dynamics of Translation When the anticipate network was initiated, the researchers involved decided to not position the project in the context of a specific academic discipline or research field but maintain a “non-disciplinary” stance [6]. Such a stance is, in turn, necessarily dynamic (tracking transformations rather than observing final states) and collective (as there can be no single perspective from which the whole of a process or systems come into analytical view). The best analogy we found was that of play - a type of play where players change the rules of the game as they move through the process. Such an approach owes much to experiences with the cybernetic cultures of arts-and-technology research, but also a proximity to the performing arts where the staging of interactive multi-plot narratives is an established practice sustained by a rich methodological tradition. Given the shared interest in exploring distributed dynamic systems, it made perfect sense to adopt such complex choreographies to the work with conceptual constellations - and engage in such translation with a sensitivity to the limits of translation-as-commensuration borrowed from the humanities [7]. In sum - those involved felt that any engagement with the design of collective intelligence must find ways to explore these dynamics without allowing the research design to render key elements of such a design process invisible. This awareness of the limits of translation and the difficulties of mapping conceptual vocabularies onto each other have both accompanied the network’s activities and driven the effort to structure the process through a canvas. The following visual is an attempt to capture the wide range of vocabularies at play; the representation of terms in columns is not based on attributions of value (wrong/right, not useful/useful etc.) or on their status (state of the art/beyond the state of the art), nor do they mark the ends of a spectrum. As a map of the current conversation space co-created through the anticipate network exchange, it serves as a snapshot of network conversations and is the context out of which the canvas was developed (Fig. 1): While HCI concepts and methods are clearly identifiable in these vocabularies, we decided not to assign them into disciplinary subfields but leave them in the context of terms drawn from a wide variety of concepts. For the purpose of taking the collective intelligence design question further, the origin or main disciplinary location of these terms is less interesting to us than what happens to them in different dynamic constellations. What is more, because it has already evolved into a multi-disciplinary field, it becomes increasingly difficult to map a “state of the art” in HCI. However, since HCI revolves around questions of use, we see one of its tasks in keeping track of the transformation of use, and we in turn follow a variety of developments across this field related to the question of agency.
158
S. Zehle et al.
Fig. 1. Key terms in the anticipate network research conversation
4 Collective Intelligence Design Canvas Design is a practice, and the development of a collective intelligence design canvas is driven by an interest in agency: how does our involvement in and with intelligent systems change the way we approach agency? How do we conceptually frame the ways in which individual and machinic agency affect and involve each other? Can we imagine a concept of collective agency that comprehends and mobilizes the complexity of the humanmachine relation? And can we reframe the discussion of “artificial” as “collective” intelligences to better understand the operation of such distributed agencies in future human-machine assemblages? The canvas created to facilitate such a focus on agency prioritizes five analytical strands (Fig. 2): What follows are brief annotations for each of them.
Co-creating Value with the Cooperative Turn
159
Fig. 2. Layers of the collective intelligence design canves
4.1 Object The world is full of seemingly discrete objects (like the mobile phone). Yet as we expand our analysis of technical objects to include its processes of constitution, i.e. of everything that makes them possible, the discrete object gives way to a web of overlapping processes that constitute the object as object: sourcing (like rare earths and other so-called “conflict” minerals), energy generation (Apple’s creation of an energy subsidiary to coordinate the powering of data centers sustaining its cloud-based ecosystem), corporate design frameworks (Google’s material design governing UX design practice across the platform economy), geopolitical conflicts over next-generation communication infrastructures (Huawai phones without access to the Android operating system) etc. For us, the questions of collective agency and intelligence already arise on the level of the (processual) object. 4.2 Agency Attention to the complex interactions and interdepencies of a wide range of (processual) objects in distributed systems calls for corresponding explorations of agency. We do not follow a subjective concept of agency here, derived from (conscious) human agency to serve as reference for “strong ai” and other fantasies of machinic autonomy. Instead, we try to comprehend agency based on the effects it creates - whatever the source of such agency. In this view, pollution particles seeping into the groundwater of a semiconductor plant or an electronic waste site have agency, as they materially affect their ecosystemic contexts. So do the algorithms of automated decision systems already at work in banks, financial markets, insurance providers, or schools allocating a wide array of resources and thereby shaping economic and social relations. 4.3 Value Around the anticipate network conversation about the role of values in the cooperative turn, fundamental critiques of the cultures of venture capital as “toxic” and a growing interest in “purpose”-driven economies mesh with the generalizations of peer-to-peer
160
S. Zehle et al.
approaches to value creation, cooperative alternatives to existing organizational forms, and smaller socio-technological systems design experiments such as “data unions” exploring the pros (and cons) of data monetization [8, 9]. What is cutting across all of these conversations are conflicts of commensuration - the tension between values that can be mapped onto each other and those that cannot. 4.4 Situation The rise of prediction as a paradigm for analysis and governance is one of the key challenges across data-driven societies. Interest in prediction grew when it became apparent that predictive analytics could inform resource allocation and improve enterprise resource planning (predictive maintenance, predictive policing, predictive logistics). As such systems mature far beyond the power ascribed to them in an earlier generation of foresight research, their influence (as ubiquitous systems) expands to encompass the prediction of human behavior more generally. A focus on the futural is part of the shift toward data-driven assessments; through the anticipate network, we have been exploring a wide range of speculative idioms and frameworks. 4.5 Intelligence For the philosopher Catherine Malabou, the plasticity of human intelligence has always been the decisive difference between humans and machines. Exploring the potential of plastic machinic intelligence, she contends that “The problem of intelligence can no longer be limited to psychology, biology, and cybernetics. It must become a central philosophical concept once again…. The challenge is to invent a community with machines together, even when we share nothing in common with them” [10]. As computing companies (from Apple to Tesla) bet on machine learning to amplify software capabilities, the modelizations of learning systems directly affect our “cooperation” with such systems. The symbiotic “cooperative interaction” envisioned by Licklidder has arrived, but perhaps looks different than expected [11]. Now, we can train neural networks on our notebooks whose hardware is customized to accommodate specific approaches to machine learning: “This approach to integration into a single chip, maximum throughput, rapid access to memory, optimal computing performance based on the task, and adaptation to machine learning algorithms is the future” [12]. Maybe. Yet as the current trend toward hardware optimized for specific modelizations of such intelligence may exclude alternative modelizations and create a new type of conceptual “lock-in”, how we imagine machine intelligence matters for the systems we build [13].
5 Outlook: Collective Intelligences and Machine Democracy Automated decision-making is already creating a wide range of material effects, and it is no surprise that such effects have inspired theories of an “algorithmic governmentality” treating data without attention to the complex worlds that give such data significance, thereby creating a model of politics in which these worlds play no role [14]. In our view, collective intelligence design aims for a “reworlding” of data and technical objects. Our
Co-creating Value with the Cooperative Turn
161
proposal is to explore how the condition of distribution affects how we approach objects, comprehend emerging forms of human-machine agency, co-create value, integrate anticipation into what we do, and work toward an understanding of intelligence that breaks out of the mold of measuring machine intelligence against narrow comprehensions of consciousness. Iterating our tookit, we will next engage with the MyData Design Toolkit as well as NESTA’s Collective Intelligence Playbook [15, 16]. As sketched above, we see the “cooperative turn” as a way to reaffirm the primacy of context whenever we deal with data and technology, with collective intelligence design as a structured way to engage in the conflicts that attend such recontextualization. These conflicts help us reimagine the current conjuncture from a variety of outsides to better speculate about the role of collective intelligence in co-creating alternative futures. Acknowledgments. Visuals co-created by authors and Ned Rossiter.
References 1. Hansen, M.B.N.: Feed-Forward: On the Future of Twenty-First Century Media. University of Chicago Press, Chicago (2014) 2. Hayles, N.K.: Unthought: The Power of the Cognitive Nonconscious. University of Chicago Press, Chicago (2017) 3. European Network of Living Labs. https://enoll.org 4. Giordani, M., Polese, M., Mezzavilla, M., Rangan, S., Zorzi, M.: Toward 6G networks: use cases and technologies. IEEE Commun. Mag. 58(3) (2020). https://ieeexplore.ieee.org/doc ument/9040264/ 5. Weyl, G., Lanier, J.: AI is an ideology, not a technology. In: Wired (2020). https://www.wired. com/story/opinion-ai-is-an-ideology-not-a-technology. Accessed 15 Mar 2020 6. Anticipate - a collective intelligence researh network. https://www.anticipate.network 7. Buden, B., Nowotny, S.: Übersetzung: Das Versprechen eines Begriffs. Turia + Kant, Wien (2009) 8. Ryland, N., Jaspers, L.: Starting a Revolution: Was wir von Unternehmerinnen über die Zukunft der Arbeitswelt lernen können. Econ, Berlin (2020) 9. Introduction to Data Unions. https://streamr.network/docs/data-unions/intro-to-data-unions 10. Malabou, C.: Morphing Intelligence: From IQ Measurement to Artificial Brains. Columbia University Press, New York (2019) 11. Waldrop, M. M.: The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal 12. Malik, O.: Steve Jobs’s last gambit: Apple’s M1 Chip (2020). https://om.co/2020/11/17/whym1-chip-by-apple-matters/ 13. Hooker, S.: Hardware Lottery, In: Computer Science (2020). https://arxiv.org/abs/2009.06489 14. Rouvroy, A.: Algorithmic governmentality and the death of politics. Green Eur. J. (2020). https://www.greeneuropeanjournal.eu/algorithmic-governmentality-and-the-deathof-politics 15. MyData Design Canvas. https://mydata.org/2020/09/25/putting-mydata-principles-into-act ion-an-introduction-to-the-mydata-design-toolkit/ 16. NESTA Collective Intelligence Design Playbook. https://www.nesta.org.uk/toolkit/collec tive-intelligence-design-playbook/
Virtual Assistant: A Multi-paradigm Dialog Workflow System for Visitor Registration During a Pandemic Situation Martin Forsberg Lie(B) and Petter Kvalvik Østfold University College, Halden, Norway [email protected], [email protected]
Abstract. This paper summarizes the findings related to the use of a conversational user interface to support a contactless business visitor registration in a pandemic situation, where requirements to infection control measures might reshape the visitor journey. We explore services, experience models and technology necessary to prepare and support touchless services and prototype a relevant vertical of such a system. Our findings are supportive to the claim that conversational interfaces are perceived safe, when considering infection control. However, the usability of such interfaces needs to be further refined for a holistic touchless operation, as well as a close consideration to the design of the dialogue flow. Keywords: Conversational interfaces · Natural language processing · Ubiquitous computing
1 Introduction With the widespread focus on infection control during the Covid-19 pandemic, there are procedures that require visitors to physically enter information, signing consent forms or touching common surfaces, like buttons and touchscreens. Reducing the points of contact where viruses can live and spread should reduce the probability of virus exposure. These contact points should be identified, limited and re-designed for touchless operation, supported by relevant technology and interfaces. Our review of related work in a multi-paradigm setting, identifies challenges related to traditional metrical evaluation and propose a thorough user-centric design. We suggest a metaphor for ubiquitous computing, using an actor-based architectural pattern, the virtual visitor assistant (VVA). A prototype is built, and we evaluate a relevant narrow vertical of such a digital eco-system [1] on a group of business visitors with respect to conversational interaction in a business visitor journey. The evaluation is based on a hypothesis on perceived safety.
2 Background A physical business visit can be defined as a journey that has defined start- and endpoints, with sequential tasks preparing and supporting the main event that make up a meeting, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 162–169, 2021. https://doi.org/10.1007/978-3-030-74009-2_21
Virtual Assistant: A Multi-paradigm Dialog Workflow System
163
a brief or a training event. The idea is to limit physical contact with both people and equipment when visiting and conducting meetings, whilst making sure the visit yields the desired outcome and ensuring that the visitor experiences a safe visit. Ferri et al. [1] forms basic ideas for this design. We want the visitor to feel safe and comfortable throughout a visit to meet requirements in society during the pandemic. A visitor should have several choices when it comes to providing necessary information, obtaining desired information, contributing during meetings and being notified of various matters throughout a visit. Technologies used to support a contactless journey can consist of smart devices (e.g., smart phones, smart watches), smart environments (e.g., motion detection, cameras) and smart interaction (e.g., voice recognition, gestures) through ubiquitous computing. Restricted to the scope of the study, we will look closer into the usability and perceived safety of voice recognition as described by Harrison et al. [2]. The use of technology should not be perceived as invasive, and inspiration from Elouali et al. [3] will be used when we look at how the visitor can be made aware of different matters during the visit. The design process focuses on the cooperation between different interaction modalities, including input and output modalities and emphasizes on holistic ergonomics. One example of this is social distraction as a result of using voice assistants, where the need for human-to-human interaction is replaced. Related Works. We approach the design of the VVA towards the third paradigm as proposed by Harrison et al. [2], as we consider that the contactless interface between the system and a user (the business and the visitor) has a phenomenologically situated interaction metaphor, although parts of the tested prototype can be considered taskbased, and as such of the second paradigm. We emphasize values of non-intrusiveness in the user space, as the interaction should be based on a common trust between the user and the system. We might need to consider new ways of testing and evaluating such an interface, as compared to traditional methods. Ferri et al. [1] describes the advancements in user interactions systems over the last forty years towards an ecosystem, where many different devices and people are interacting in a multimodal and adaptable interaction paradigm. A participatory design approach is proposed where more complex user features like social membership and culture are included. To be able to meet the universal accessibility challenge, several interaction models with different configuration parameters should be supported in order to make it approachable for a diverse set of users. To comply with the adaptive and personalized interaction challenge, a user centric design phase is inevitable. Elouali et al. [3] focuses on the technology addiction and the amount of time spent on smartphones. The multimodality features of smartphones may have a negative effect on the addiction, as the user becomes even more attached to the device. One example of this is social distraction as a result of using voice assistants, where the need for humanto-human interaction is replaced. The Time Well Spent (TWS) movement is mentioned and their design goal to re-align technology with humanity and to protect the users time through mobile multimodality. The design process focuses on the cooperation between different interaction modalities, including input and output modalities and emphasizes on
164
M. F. Lie and P. Kvalvik
holistic ergonomics. Holistic ergonomics includes considerations regarding user needs vs business needs, cost of choices, interruption and essentials. The questions of how we evaluate conversational interactions arises, as the paradigm for interaction in such systems is to attempt replication of human interaction. There are no direct observable components as we have on screen-based interfaces in the form of buttons, text fields and window flows. Such artefacts can be used for interface efficiency and focus tests, but conversational interfaces may need different methods and metrics. Holmes et al. [4] conducted experiments to contribute to the understanding whether we can use conventional methods to assess conversational interfaces. The usability tests were based on observational research, with think-aloud testing protocol and questionnaires. The authors use established metrics for the evaluation of the interface but has also added their own metric method. All metrics were statistically treated to understand their adequacy, and to indicate a minimum level of study participants for statistical relevance of the usability testing. Based on metric statistical variance, the authors conclude that traditional methods do not evaluate all aspects of a conversational interface and that at least 26 subjects are necessary for usability testing in order to capture prominent effects, challenging existing research in the field. The related works emphasize the importance of involving users, more than for traditional interface evaluation. User-centric design is even more important for success within the third paradigm as the environment of several layers of competence, equipment and human abilities is a complex matter. Within the third paradigm, all types of devices may become interfaces, and it is not necessary within the control of the designer how the interaction is executed, and in which environments. We also learned that evaluation of conversational interfaces is necessary to conduct using modern methodology and techniques, but at the same time, the number of test subjects recruited might triple as compared to traditional interface testing.
3 Method This study builds a prototype of a system which is evaluated using usability methods proposed for conversational interfaces [4]. We design the VVA as a non-intrusive actor of communication by the means of a conversational interface as a chatbot-kind of operated voice-recognition interface (VOI) with a supportive display, to facilitate contactless registration of business visitors (Fig. 1). The questions of how we evaluate conversational interactions arises, as the paradigm for interaction in such systems is to attempt replication of human interaction, either through conversational interfaces or ubiquitous computing systems. Based on inspiration from Holmes et al. [4], we will evaluate the interfaces using quantitative statistics based on measurable metrics and a questionnaire, as well as a qualitative interview of the study group. Prototype Technology. The prototype was implemented using Google Actions. This is a web-based development framework for constructing conversational interfaces which can be tested on a computer, on a phone or even on a Google Home device (desktop intelligent loudspeaker). The Actions framework utilizes Natural Language Understanding
Virtual Assistant: A Multi-paradigm Dialog Workflow System
165
Fig. 1. Left - experimental setup; Right - screen supporting the dialog flow
(NLU) and lets us describe a flow of dialogue using a framework consisting of Prompts, Scenes, Intents and Types. The flow of dialogue consists of Prompts for information, and Intents, catching user utterance by natural language. The Speech Synthesis Markup Language (SSML) has been used to markup text for the generation of synthetic speech. The Intent training part is a vital component of the Actions framework and lets us support the variances of spoken language. At first, a basic trained model is used for the prototype. Hypothesis formulation. Our hypothesis must be seen in light of the pandemic, and how conversational interfaces can be utilized for safety: H a : Contactless visitor registration using a voice-user interface (VOI) implemented as a conversational interface, will lead to an increased sense of safety for the visitor in the C-19 era. We break down this to answering two research questions: RQ1 Will a virtual assistant enable simplified visitor registration in the Covid-19 era? RQ2 Will a contactless visitor registration increase the sense of safety for the visitor? The hypothesis will be tested by observing and interviewing users of two separate solutions: one control group using an existing registration system based on physical registration inputs (modality A), and our prototype using voice-controlled visitor assistant technology (modality B). Both a qualitative and quantitative questionnaire will be used to identify important aspects of usability and user experience. The evaluation will be performed in a controlled setting with clear guidance on how to navigate and overcome the limitations of the prototype. A comparative evaluation of the conversational interface and the traditional touchbased registration modality has been done. To minimize the learning effect between the modalities, the participant population is split into two sub-groups, controlling the order of what modality the participant was supposed to be exposed to first. The first group start with the physical registration interface while the second group starts with the contactless interface to counterbalance. Because prerequisite knowledge of the registration may lead to a learning effect, in which can influence the dependent variables, the knowledge level is also captured. Eight testers were recruited and tested in total 16 modalities. This is lower than the recommended 26 for conversational interface evaluation [4] and could be a threat to liability.
166
M. F. Lie and P. Kvalvik
The prototype has several choices and paths, but only one interaction path is implemented: from a welcome screen to selection of meeting host, meeting and registration of visitor name and printing of a badge. The testing is performed in the discovery phase of the design process. The testing facilities had proper infection control measures in place. After each modality test, the participants were served two claims linked to the hypothesis and research questions. Claim 1: The system is very safe to use and Claim 2: The system is easy to use. For each claim, the participants answered with a number 1 to 5 on the Likert scale (1 - Strongly disagree to 5 - Strongly agree) and the results were recorded in a spreadsheet [5]. Time to completion (seconds), number of errors and number of session restarts was also recorded. After each session, the participants were interviewed in an open session with a guiding questionnaire.
4 Results We present the results from the design process with a focus on the findings from the evaluation phase. Participant 4 had to withdraw from the test due to symptoms of Covid19. The other participants were healthy. Informing. Human presence in the reception area is stilled preferred and especially in the C-19 situation. The use of voice recognition technology is acceptable, but environmental factors may reduce the usability. The idea of a virtual receptionist is still a bit too abstract and low-tech-methods of registration are still preferable for some users. Designing. During the discussion about the sketches, we concluded that the dialogue with the receptionist must be effective for a voice-based solution to be preferred over a touch-based solution. The result of this was a menu system, where the visitor mainly chose menu options by saying the letter (e.g., A, B, C) that represented the option. An alternative dialogue-only-based approach was also investigated but discarded. Prototyping. The biggest challenge when setting up the prototype was to implement a dialogue flow that balanced the presentation of the menu choices with speech, efficient selection of menu items and collection of the personal information from the visitor. Evaluating. Descriptive Statistics: We visualize our data on a diverging stacked bar chart to get an impression on the participant’s evaluation results responded on the Likert scale (Fig. 2). We can visually see a clear difference between the modality groups. Whereas Modality A scores high on usability, it scores low on safety. The opposite goes for the Modality B group, which scores low on usability and high on safety.
Virtual Assistant: A Multi-paradigm Dialog Workflow System
167
Fig. 2. Participant’s evaluation results responded on the Likert scale
When we compare the time necessary to complete each modality, we see that modality A has a mean of 75 s, whereas modality B has more than double finishing time with a mean of 183 s, as shown in Fig. 3.
Fig. 3. Time for task completion pr. modality
There were zero errors and restarts in modality A, whereas modality B had 20 recognition errors and 6 session restarts, indicating a more immature recognition system. Inferential Statistics. The significance of the results was tested (H a : Modality Asafety < Modality Bsafety ) with a two-sided t-test for related samples with significant at the 0.05 level. The average safety score for modality A = 2.00 (1.25, 2.75), was statistically significantly lower than for modality B = 4.71 (4.26, 5.17) with t 14 = 7.6, p < 0.05. The inferential statistics accept the alternate hypothesis. Qualitative. The participants were quite pleased with the modality B prototype, however, we got quite strong feedback on the immature interface paradigm. It was not clear when the system was ready to accept participant speech, and the process was slow. All participants have used registration systems before, and some have used voice recognition in cars, with Apple Siri. The feedback is that it works best as a command-system/taskbased system, as opposed to a conversational system as we have implemented in the prototype. After the evaluation of the first three testers, it could appear that the voice recognition worked better for the male testers.
168
M. F. Lie and P. Kvalvik
The use of speech recognition is perceived as safer by all respondents in a pandemic situation (Modality B), but the ease of use of touch-based registration (Modality A) is experienced better, faster and leads to fewer errors. The use of speech recognition has a big potential, and one of the respondents highlights the use of this in the car before being on site for the visit, as pre-registration or visit preparation. One of the testers pointed out that speaking in a reception can be uncomfortable if you are standing with several people, and noisy environments could slow down the conversation further. It was pointed out that the microphone itself should be of higher quality. It was also requested that the visitor should be given a clue (ear-con or similar) regarding when to start speaking. This raises the issue of conversation invocation and when to cancel the conversation on the VVA part. Although there are usability issues, all participants were very positive in regard to implement a Virtual Visitor Assistant based on voice technology. For infection control, the VVA is preferred, and could even replace the touch-based method. Hypothesis. The hypothesis is accepted at this stage due to the statistical significance of the safety score. This means that the perceived safety of using a conversational interface for business tasks in a pandemic situation is stronger than for touch-based systems.
5 Discussion Will a Virtual Assistant Enable Simplified Visitor Registration in the Covid-19 Era? The usability of the prototype is unacceptable for production use, but many improvement ideas have been collected during the testing sessions. The participants reported that unnecessary speech from the virtual assistant should be removed and be synchronized with text already visible on a companion display or poster. Further, the visitor should be given the possibility to choose conversation language and support local abbreviations, spelling of names and numbers. The spelling of names and phone numbers in mixed languages is difficult and should be refined. We also found that the actual flow of the dialogue could be improved and should also be designed to allow for several business tasks in different contexts, like access to safety procedures or consent forms. The quantitative results also show that completing tasks in the conversational interface takes twice as long time as for the touch-based prototype. Both the qualitative narrative and quantitative results indicate that using a conversational interface does not simplify the visitor registration process when the same interaction model is assumed. Will a Contactless Visitor Registration Increase the Sense of Safety for the Visitor? Both the quantitative and qualitative results indicate quite strongly an increased state of safety for the visitors in a Covid-19 era. The conversation was started manually by the testing facilitator; hence the subject of conversation invocation must be solved for a true touchless experience. The participants agreed on the perceived infection safety when using voice-controlled conversational interfaces.
Virtual Assistant: A Multi-paradigm Dialog Workflow System
169
Limitations to Our Research. Due to Covid-19, planned environments for prototype testing were closed and had to be carried out outside of business premises. Due to the situation, a physical context of a visitor reception was not possible to reproduce, neither was the availability of actual physical registration equipment at selected businesses. As such, a modality A prototype that resembles a physical interface, on a laptop using Google Schemas was built. This interface followed the actual flow of information retrieval from the conversational interface. 14 test modality tests with seven testers were executed. This is lower than the recommended 26 test subjects for conversational interface evaluation by Holmes et al. [4] and could be a threat to liability. However, we assume that our results would be significant due to the directional nature of our hypothesis. Further Work. Several observations emerged that should be considered in the next iteration of the design process as described in the qualitative results section. Multiple frameworks for building voice assisted interfaces should also be evaluated in regards of an optimal dialogue flow, including mechanisms for language, gender selection and tone of voice. To improve the evaluation phase, some of the elements from the questionnaire in the work of Holmes et al. [4] could provide additional information about the usability.
6 Conclusion Hybrid interfaces, which provide the same services to different kinds of interaction paradigms are highly attractive approaches for presenting business information, communicating with users and providing actionable content. Focusing on an actor-based implementation for the business logic, and conversational interfaces for the user interaction is a way to adopt traditional physical interfaces to touchless interfaces if the need is there, like in a pandemic scenario. Our findings are supportive to the claim that conversational interfaces are perceived safe, when considering infection control. However, the usability of such interfaces needs to be designed for total touchless operation, as well as a close consideration to the design of the dialogue flow.
References 1. Elouali, N.: Time well spent with multimodal mobile interactions. J. Multimodal User Interfaces 13(4), 395–404 (2019) 2. Ferri, F., et al.: The HMI digital ecosystem: challenges and possible solutions. In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (2018) 3. Harrison, S., Tatar, D., Sengers, P.: The three paradigms of HCI. In: 7th ACM Conference on Designing Interactive Systems. New York (2007) 4. Holmes, S., et al.: Usability testing of a healthcare chatbot: can we use conventional methods to assess conversational user interfaces? In: Proceedings of the 31st European Conference on Cognitive Ergonomics. New York (2019) 5. Lie, M.F., Kvalvik, P.: Virtual assistant. osf.io/tcs37. 14 Nov 2020
Challenges of Human-Computer Interaction in Foreign Language Teaching: The Case of a Russian Technological University Alexander Gerashchenko(B) , Tatiana Shaposhnikova, Alena Egorova, and Dmitry Romanov Kuban State Technological University, Moskovskaya Street 2, 350072 Krasnodar, Russia
Abstract. This paper focuses on the analysis of the challenges of humancomputer interaction as observed while teaching foreign languages at Kuban State Technological University (Krasnodar, Russia). The principal identified challenges can be divided into three groups: 1) the products of computer technology legally available for conducting classes of foreign languages are often insufficient due to various reasons, such as financial and bureaucratic restrictions; 2) language teachers may not be ready to use computer hardware and software properly, even if they want and even have to use them for their classes; 3) students, especially those whose foreign language level is as low as their motivation to master the language, often use computers improperly, imitating language skills rather than developing them. Such challenges need to be properly dealt with so that the benefits of human-computer interaction could be maximized. Keywords: Foreign language teaching · Human-computer interaction · University education
1 Introduction Having become an integral part of everyday life, computers are widely implemented for teaching and learning purposes all over the world, Russia being no exception. Relevant emerging technologies – such as information and computer technology and/or information and communication technology (ICT) – need to be necessarily integrated into modern training practices. One of the spheres where these technologies are now actively applied, drawing the attention of many researchers, is language teaching. As Chong and Reinders state it, “computer-assisted language learning (CALL) has grown exponentially as a field, with an increasing number of studies, mostly focusing on second/foreign language (S/FL)” [1]. Researchers tend to stress the benefits of this technology. For example, Ainoutdinova and Blagoveshchenskaya claim that the “CALL-based model” designed by them for university students in Russia “enhances the quality of foreign language education and increases chances of achieving the desired level of language proficiency among the university students” [2]. It should be noted that in Russia a foreign language is one of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 170–176, 2021. https://doi.org/10.1007/978-3-030-74009-2_22
Challenges of Human-Computer Interaction in Foreign Language Teaching
171
the mandatory subjects taught at educational institutions, including universities. While international students at Russian universities are requested to study the country’s main language as a foreign one, the native (and near-native) speakers of Russian normally have to study English (though a much less used opportunity for studying other languages, particularly German or French, can also be provided). Analyzing various aspects of CALL, researchers have described numerous positive sides of applying computer-based technology, such as ICT, for the purposes of language teaching – particularly teaching English as a foreign language (EFL). Kamilah and Anugerahwati, for example, mention the fact that “ICT in ELT has been regarded such a powerful media that it offers opportunities in enhancing teaching and learning process” [3]. Ovilia and Asfina conclude that “ICT, when integrated into the EFL classroom, adds immense value to the quality of teaching, making it a holistic learning experience for the students”; “makes the learning more student-centered, visual, and time-saving”; “motivates the students to produce creative assignments” [4]. Azmi lists such benefits of using computers (namely ICT) in the EFL classroom as “boosting interaction and communication”; “enhancing multisensory delivery and authenticity”; “boosting students’ performance on written class assessment” [5]. Nevertheless, there are certain challenges to be faced while teaching and learning languages (including EFL) with the help of computers. Gilakjani et al. identify the following barriers for using the computer technology in EFL instruction: “availability of hardware and software”; “lack of computer knowledge”; “lack of computer experience”; “inadequate computer technology support”; “time factor”; “teachers’ attitudes”; “lack of professional development in computer technology integration” [6]. Besides, as Azmi puts it, “using ICT without careful planning and well-defined objectives will more likely be a waste of time and effort” [5]. It should be noted that these opportunities and challenges result not so much from the language itself, but rather from the human-computer interaction while teaching and learning languages. Due to its numerous benefits and opportunities, the use of computers for teaching and learning foreign languages, for example, at a Russian university is now considered conventional and generally promoted. This practice has become dramatically more widespread with the shift to distance learning caused by the COVID-19 pandemics. The diverse and often controversial experience obtained in these conditions has reasonably attracted attention as an object for analysis. The authors of this paper have analyzed the challenges of human-computer interaction as observed in connection with teaching foreign languages at a Russian technological university, namely Kuban State Technological University (KubSTU) in the city of Krasnodar. The humans whose interaction with computers has been considered in this respect include the technological university teachers, their undergraduate and postgraduate university students, as well as participants of training programmes provided by the Language Center and the Regional School Technology Park based at KubSTU. The research methodology involves observation and experimental testing (following sociological, competency and system approaches). As a result, certain generalizations concerning the barriers and challenges of human-computer interaction in the context of foreign language teaching have been made.
172
A. Gerashchenko et al.
2 Key Challenges of Human-Computer Interaction in Foreign Language Teaching as Observed at a Russian University The key challenges as observed by the authors can be divided into three groups, depending on what kind of lack leads to them: 1) the lack of available and/or properly functioning computer products; 2) the lack of teachers’ motivation and computer skills; 3) the lack of students’ motivation and computer skills. These groups of challenges are to be described in details. 2.1 Challenges Caused by the Lack of Available and/or Properly Functioning Computer Products The computer products legally available and/or properly functioning for conducting foreign language classes at Russian universities are often insufficient. This lack of proper hardware and software may take place due to various reasons. First of all, the state budget (which is the main source of funding for state educational institutions, such as KubSTU) does not allocate enough money to purchase all the computer products desirable for teachers and students in class (though the sums provided for this purpose may be large indeed). So not all the university classrooms are equipped with computers, and those equipped may be inaccessible for the foreign language classes (especially at a technological university, where priority belongs to various classes of technology). Even if a classroom is equipped with computers, their number may be insufficient for a particular group. For example, it is not uncommon to deliver an English class (or arrange an exam) for as many as 19 students (there should be not more than 20 of them in a group formed for foreign language classes at KubSTU) in a room equipped with only 10 computers. In such case, students have to share computers, doing the tasks (such as quizzes) either together or in turn. Even being accessible, hardware and software may be rather outdated or not working properly. For example, the university has some interactive whiteboards used only as large computer screens (not even as touch screens), because no licensed software has been provided to make whiteboards function as they are meant to. Due to the lack of money and, in certain cases, due to security matters, there is often no chance to get the desirable computer products – at least, without giving good reasons to be approved by authorities (so the former head of the KubSTU IT department, for example, personally told that foreign language teachers would not be able to prove that they needed Microsoft Office to be purchased and installed at their department’s computers, no matter how inconvenient the freeware to be used instead might be). The positive aspect of this situation is that if teachers have to prove that they need particular computer products, they have to develop a better understanding of computer usage in this connection thus increasing their IT competence. The difficulties connected with obtaining computer products from educational institutions may cause controversial feelings: one can either dream of having them in class or, on the contrary, prefer to do without them instead of dealing with hardware and software purchase challenges. As a matter of fact, though, personal electronic devices (such as laptops or smartphones) with the necessary software are frequently used in language
Challenges of Human-Computer Interaction in Foreign Language Teaching
173
classrooms, somewhat helping to overcome the lack of ICT (though possibly ignoring the university regulations banning the usage of mobile devices in class – and maybe even the state laws banning the usage of “pirate” software etc.). When Kuban State Technological University shifted to distance learning due to COVID-19 pandemic, the university authorities prescribed conducting the classes via KubSTU Moodle environment and the university “electronic department” system with BigBlueButton used for videoconferencing. However, it took a considerable amount of time to make these two learning environments work properly, and there are still complaints connected with them. In the situation when the “officially authorized” systems failed to function as needed, foreign language teachers and their students started using other tools – such as WhatsApp, Zoom, and Discord – installed on their personal devices (not necessarily functioning properly) to continue their classes online. Nevertheless, despite numerous suggestions, these tools have been used only unofficially ever since, while “official” online classes – foreign language classes included – must be conducted only in the controllable university learning environment. So the challenges of human-computer interaction in this respect are largely connected with financial and legal issues – the computer products, even physically available, can be used for foreign language classes at a Russian university only if necessary permissions are obtained. And it also happens that the authorized computer products have to be used regardless of how properly they function. 2.2 Challenges Caused by the Lack of Teachers’ Motivation and Computer Skills As it is the teacher who normally plays the leading role in conducting the classes (the view of the teacher’s role as “helping the students to study” is well known, but commonly considered an “ideal situation”, hardly possible in real life), at least at the educational institutions typical for Russia – higher education being no real exception, the introduction of computer technology into the teaching-learning process depends largely on the teacher. Therefore, if teachers – foreign language teachers in the case under analysis – have sufficient motivation to use computers in class, they will do their best to implement such usage. If not, they will avoid using computer technology under any possible pretext. In case this technology cannot be avoided (especially due to the distance learning mode during COVID-19 pandemic), it would be criticized by them. Motivation can directly correlate with the level of skills development – in case a teacher is skilled enough to use computer technology properly, he or she will probably be more motivated to use it. The common observation in this connection is that younger teachers are typically more comfortable with computers and express more enthusiasm about it, while their senior counterparts may have problems using modern information and computer technology and tend to be more skeptical about its opportunities. In fact, the students at presentday universities nowadays often (though surely not always) have better computer skills, than their teachers do. In the situation with distance learning under COVID-19, students would often be quicker than their teachers to suggest alternative platforms for online classes when the “officially authorized” ones would not work. For some teachers it may lead to psychological discomfort, because they are expected (though, maybe, only by themselves) to be superior to their students, at least in class. So if computer usage can show that “the teacher is less competent”, it should better be avoided, and it is even better
174
A. Gerashchenko et al.
to state that computer-based technology (CALL included) is “bad” in itself (so there is no “moral duty” for teachers to develop their skills in it). Yet, we should not ignore the fact that strong motivation for using computer technology can (at least, in theory) also be combined with the lack of motivation for conducting high-quality classes and the lack of skills for proper ICT and CALL usage. In that case, a teacher would have the students placed at the computers and make them, e.g., watch the videos or do the automatically checked exercises all the class long, pretending to be “technically advanced”. In the conditions of the public approval of ICT and CALL, a teacher demonstrating the usage of such technology may even seem “better” than his/her counterpart ignoring it, though the outcomes of their teaching may be exactly the opposite. Such improper use of computer technology has a negative influence on teaching and learning process and, if detected and criticized, it can make a negative impact on the technology reputation. The authors have hardly observed such extreme situations in their university environment, though have sometimes heard of them as described by their colleagues and students. In fact, the most common thing the authors have experienced is senior colleagues asking their junior fellows or students to help them “with computers”. So the possible problems caused by the teachers’ lack of motivation and computer skills can be solved with the help of developing such skills, which can be done by means of special courses (regularly conducted for the teachers of KubSTU, other universities being no exception) and, especially, relevant practice. 2.3 Challenges Caused by the Lack of Students’ Motivation and Computer Skills The Russian university students are generally considered to have sufficient motivation and skills for using computers so that various kinds of ICT are commonly used in class regardless of being introduced by the teacher. The students would use their mobile devices for various purposes: storing the digital (maybe illegally downloaded) versions of the textbooks to be used in class (instead of their paper versions available at university libraries), taking photos of the book pages or the teacher’s notes on the black- or whiteboard, recording the teacher’s speech, exchanging text messages (sometimes for the purposes of cheating) etc. It should be noted in particular that Russian university regulations (those of KubSTU included) typically ban the usage of mobile devices in class, making possible exceptions for the cases where the usage of these devices is necessary for the purposes of studying and approved by the teacher. This “exception” had to be transformed into the most common practice for the distance education of the time of COVID-19 pandemic, and in this condition the students’ (mis)use of computers has become much less controllable. The lack of students’ motivation, which makes ICT and CALL usage a challenge at a Russian university, is rather not that for using ICT itself, but that for studying this or that subject. Taking EFL teaching as the most common case of foreign language teaching, we need to stress the fact that English in Russia is not a language of common everyday use for most of the population. Therefore, the students obliged to study the foreign language but not really planning to use it in real life would use modern technology to help them imitate the knowledge of English. An average student of a technological university is not nearly as motivated to study languages as, e.g., his/her counterpart studying at a linguistic
Challenges of Human-Computer Interaction in Foreign Language Teaching
175
university. Besides, the level of English (or other language but Russian) is usually not a criterion that is necessary for the Russian university applicants to be enrolled. So students with different levels of English are often present in the same EFL class at a technological university. Facing the necessity to do the tasks that are complicated (maybe too complicated) for them, the Russian students with poor English would, e.g., plagiarize texts from online sources rather than write them themselves or write the texts in Russian and then use machine translation to produce the English version (no matter how full of inaccuracies). Normally, plagiarism can be detected by searching a fragment from the suspicious text (it really is suspicious, when an essay in perfect English is submitted by a student unable to produce a correct English phrase in class) online, with the help of a search engine like Google. The possible markers of machine translation – and the students’ lack of skills in using it – are such “mysterious” things as pridoda (a misprint of Russian priroda – “nature”) or marriage of products (actually, damage of products – both marriage and damage in Russian can be called brak – homonymy the machine translator failed to detect). Students must also be careful when searching the answers online – the first answer in the list of search results is not necessarily the needed one. Teachers have to identify such cases and struggle with them (up to giving “fail” grades), and students need to be more careful relying on the existing ICT opportunities, developing their CALL skills.
3 Conclusion To sum it up, the principal identified challenges of human-computer interaction are of the three main kinds. First, the products of computer technology legally available for conducting classes of foreign languages are often insufficient (e.g., hardware and software, even being accessible, may be rather outdated or not working properly) due to various reasons, such as financial and bureaucratic restrictions. Second, language teachers, particularly those who graduated decades ago, may not be ready to use computer hardware and software (at least their recently developed kinds) properly, even if they want and even have to use them for their classes. Third, the students, especially those whose foreign language level is as low as their motivation to master the language, are frequently noticed to be using computers improperly, imitating language skills rather than developing them (e.g., plagiarizing or using machine translation to present the written tasks, such as essays, or getting the answers from fellow students via messengers, etc.). While the challenges of human-computer interaction in foreign language teaching and learning in Russia are largely similar to those present in other countries, there are some specific aspects of them that are connected with the local culture and language situation. Such challenges need to be aware of and to be properly dealt with in order to increase the quality of teaching and learning by means of proper integration of computer technology, thus increasing the benefits of human-computer interaction. It is especially important in the situation when computers become an increasingly essential (and even vital) medium of interaction between humans themselves. Acknowledgments. The research was carried out with the financial support of the Kuban science Foundation in the framework of the scientific project № IFR 20.1/36.
176
A. Gerashchenko et al.
References 1. Chong, S.W., Reinders, H.: Technology-mediated task-based language teaching: a qualitative research synthesis. Lang. Learn. Technol. 24(3), 70–86 (2020) 2. Ainoutdinova, I., Blagoveshchenskaya, A.: The impact of computer assisted language learning (CALL) on foreign language proficiency of university students in Russia. In: INTED2018 Proceedings: the 12th Annual International Technology, Education and Development Conference, pp. 6765–6775. IATED, Valencia (2018) 3. Kamilah, N., Anugerahwati, M.: Factors contributing to teachers’ resistence in integrating ICT in EFL classroom in senior high school. J. English Lang. Lit. Teach. 1(2), 133–149 (2016) 4. Ovilia, R., Asfina, R.: 21st century learning: Is ICT really integrated in EFL classrooms or merely segregated outside the classroom? EnJourMe 2(1), 1–17 (2017) 5. Azmi, N.: The benefits of using ICT in the EFL classroom: from perceived utility to potential challenges. J. Educ. Soc. Res. 7(1), 111–118 (2017) 6. Gilakjani, A.P., Sabouri, N.B., Zabihniaemran, A.: What are the barriers in the use of computer technology in EFL instruction? Rev. Eur. Stud. 7(11), 213–221 (2015)
Influence of Gender, Age, and Frequency of Use on Users’ Attitudes on Gamified Online Learning Adam Palmquist1(B) and Izabella Jedel2 1 Applied IT, University of Gothenburg, Gothenburg, Sweden
[email protected] 2 Insert Coin, Gothenburg, Sweden [email protected]
Abstract. Given the disruption of the global workforce’s desired skill categories, there is a need to facilitate individual lifelong learning. One suggested solution is gamification. However, gamification has been accused of a novelty effect on the user, meaning that the user’s engagement is likely to decrease significantly over time. The present study investigates adult learners’ attitudes towards a gamification implementation, considering factors such as gender, age, and visit frequency in the Learning management system (LMS). The main findings indicate that the perception of the game element Badges differs depending on age and that the user’s attitude toward gamification is positively affected by user’s frequency of use of the gamified LMS. Keywords: Adult learning · Age · Badges · Frequency of use · Game elements · Gamification · Gender · Learning management system · Lifelong learning · Online learning
1 Introduction and Background Desired labor market skills are changing rapidly. Employees are likely to participate in life-long learning to pursue various careers, and to develop competencies necessary in their current workplace [1]. Simultaneously the median age of the working populous is increasing predicting later retiring increasing the demand of retraining [2]. One important solutions for the issue is corporate training [3]. Another approach to the issue is adult education (AE) [4, 5]. AE has been suggested for the increasing demand of new skills [6]. In AE adults engage in systematic and sustained self-educating activities to gain new skills, attitudes, or values. Traditionally AE believe that adults want to learn and are able and willing to take responsibility for their learning. However, it has been identified that learning barriers are prevalent in AE such as time, strains, finance, transportation, selfconfidence, education significance, prospect after education, entrance requirements, and child care [7]. Online learning has been viewed as an AE facilitator circumventing some of the identified barriers allowing an individualized approach to learning [4]. However, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 177–185, 2021. https://doi.org/10.1007/978-3-030-74009-2_23
178
A. Palmquist and I. Jedel
online education struggle with high student dropout rates and low completion rates. This problem has been persistent and identified in the past [8], and in modern times [9]. Gamification, the use of game elements in non-game contexts, has proven useful fostering student retention and engagement in online learning [10, 11]. However, less has uncovered how users remark on individual game elements in their online learning environment. This study examines a gamification implementation in the Omnius LMS and how the students received it. Omnius is a Swedish LMS that launched in 2012. Developing Omnius a participatory design approach involving its stakeholders (developers, teachers, admins, students) to ensure the LMS addressed their requirements and facilitate easeof-use for the platform’s different end-users. Inspired by social media, Omnius utilizes different communication symbols in its interface to accommodate its student population’s low socioeconomic status (Fig. 1). Currently LMS serves large quantities of adult learners simultaneously. The platform provides approximately 300 upper secondary education online courses to 35 000 adult learners from Scandinavia each year. The gamification implementation concerned 19 separated 5-week online courses, History, Mathematics, and English, reaching 949 students. The game design ecology in the implementation consisted of two primary game elements: Level and Badges. These game elements were both connected to an overarching Experience points (XP) system. When students performed tasks, submitted assignments or registered for an exam, they were awarded XP in the LMS, weighted according to the task effort. After accumulating a certain sum of XP, the students progressed in Level. Specific tasks, identified vital by the educational provider to finish the course, were added as Badges, grating the user a digital token and XP when completed. E.g., The Badge, Diligent was received when the student had visited the LMS seven days in a row. 1.1 Gamified Online Learning: Previous Studies and the Current Study A systematic literature review in 2019 of 61 articles on gamified online learning shows that studies frequently address motivation, performance, and engagement, neglecting students’ attitudes and perceptions [10]. Studies on gamified online learning are not coherent regarding design, implementation, or outcome, concluding that gamification in online learning is an undeveloped field lacking in rigorous empirical investigations [10]. Another systematic review in 2017 of 15 studies evaluated gamification’s effect on student engagement in online programs indicated that gamification had positive effect [11]. Although most studies reviewed indicated gamification increased engagement in online programs, the reviewers stressed a need to examine different game elements’ effects concerning demographics to determine how to achieve sustained engagement. The systematic reviews presented two predicaments on gamified online learning: 1) Gamified online learning unit of analysis varies; several studies refer to various objects and processes but overlook students’ attitudes and perceptions [10]. 2) Gamified online learning seems context- and design-dependent; to define the best practice of gamified online learning investigations of how the gamified context, game elements, and user demographics correspond [11]. Concerning the concerns expressed in previous reviews, this study focuses on student attitude of gamification and perceptions of the specific
Influence of Gender, Age, and Frequency of Use on Users’ Attitudes
179
Fig. 1. The LMS Interface top-left: no gamification implemented. Top-right gamification implemented (high-lighted in red), down-right: gamification curtain opened down-left gamification user profile showing Badges.
game elements Level and Badges in online AE. Furthermore, the impact of three different factors that have previously been related to gamification perception are studied. In previous studies the perception of gamification in education depending on age, is lacking. However, research on gamification in other contexts, i.e. marketing and health, have studied perception related to user age. In a survey study with 101 participants of gamification’s supposed usefulness in product advertising, the results display that gamification appeals more to individuals under the age of 40 [12]. Purchase intention was predicted by attitudes of gamified products, which declined with seasoned consumers. A mixed-method study with 150 participants investigated game elements relation to users’ perceived motivation to exercise and concluded age groups perceived game elements differently [13]. Game elements preferences differed between young individuals, 18–29 years, and older generations. The young individuals preferred extrinsic motivational game elements, points and leaderboards, while older generations preferred intrinsic motivational game elements indicating progress or improvement [13]. The previous studies indicate an influence of age on attitude and perceptions of gamification: younger participants seem more positive toward gamification. Gender seems to affect attitudes toward gamification. A survey study of 195 participants inspecting perceived motivational benefits using a gamified exercise app indicated gender differences regarding gamification [14]: 1) Women perceived social benefits more significant than men; 2) Women reported being more favorable to the gamified feedback; 3) Women valued the social aspects of gamification more than men [14]. A quantitative study of 235 undergraduates examined gender differences in the impression of playfulness in different game elements and showed diversity in women and men perceptions of a playful gamification design [15]. The game element point was perceived as more
180
A. Palmquist and I. Jedel
playful by men than women. In contrast, women perceived badges as more playful than men [15]. Thus, previous research concerning gender and gamification indicate that men and woman perceive game elements and game design elements differently. Finally, in Information and Communication Technologies (ICT), the variable frequency of use (FU) has been suggested to affect user-attitude and perceptions. A study of 714 participants, utilizing the Unified Theory of Acceptance and Use of Technology (UTAUT) framework [16] intended to determine if first-year university students’ selfperception of their ICT competence varied with their ICT frequency of use. The study concluded that frequency using ICT created familiarity positively affected user anxiety towards digital technology in the learning environment [17]. Another study using UTAUT of 610 participants testing users’ perceptions towards mobile services depending on the time they had used it showed that user familiarity of the device had an impact on the perception of performance expectancy (usefulness) and effort expectancy (ease of use) of the services [18]. Likewise, gamification may require time to get acclimatized before it is perceived as useful, that frequency using it can lower user anxiety and therefore improve attitude. To investigate the factors that can impact the perception of different game elements and attitude towards gamification, it is of interest to further investigate if age, gender, and the students’ FU in the LMS affect users’ attitudes toward gamification. From the previous research presented, the study aims to answer: Does age, gender, and students estimated visit frequency affect student attitude toward gamification and perception of the game elements Level and Badges in an online learning environment? Based on previous research, we hypothesize that younger users will have a positive attitude toward gamification. Furthermore, we hypothesize that men will have a more positive perception of Level since it relates to similar competitive gamification elements and that women will have a more positive perception of Badges since it relates to similar social display mechanisms. Furthermore, with support from the previous research mentioned above, we hypothesize that higher frequency of use of gamification will lead to a more positive attitude toward gamification.
2 Method: Data Gathering and Analysis To answer the research question, a survey was sent to students involved in a gamified LMS for AE. In the following section, the details about data collection and analysis. A voluntary user-survey was prepared and distributed throughout the courses that had implemented the gamified LMS to investigate student attitude and perception towards gamification. Due to ethical considerations, the students could turn off gamification in the gamification elements in the course. Of the students (n = 949) who had utilized the gamified interface, 321 responded to the survey. The survey was sent out to the students at the end of the courses. It included demographic questions, questions about how the LMS was perceived, how often they used the LMS, perception of Level and Badges, and attitude towards gamification. Of the students who answered the survey, 63.6% were women, 34.9% were men, and 1.6% defined themselves as other, non-binary, or did not want to disclose gender. The students had different geographical backgrounds, reporting 34 different native languages in the
Influence of Gender, Age, and Frequency of Use on Users’ Attitudes
181
survey. However, most of the students (n = 176) reported having Swedish as a native language. The students age range spanned from 17–54 years (M = 27.73, s = 7.83). Attitude toward gamification derived from the question: “How probable is it that you would prefer other courses similar to this course (i.e., including elements such as Progress, Feedback, and Badges)?” with a scale ranging from one to ten, where one symbolized not at all likely and ten symbolized very likely. The perception of Level was distinguished by the question “Level enhances usage of the LMS?” and Badges was distinguished by the question “Badges enhance usage of the LMS?”. Both perception questions were measured on a Likert scale from one to five, one symbolized strongly disagree, and five represented strongly agree. To distinguish difference depending on age; age was grouped from quarterly percentiles, resulting in the age-groups 1) 17–21, 2) 22–26, 3) 27–32, and 4) 33–54. Gender was coded as women or men, with the rest of the categories coded as missing values due to only representing 1.6% of the students answering the survey. FU derived from the question “How often do you visit the LMS?” with the answers: 1) Several times a day, 2) Once a day, 3) Several times a week, 4) Once a week, 5) Several times a month, 6) Once a month, 7) Seldom, 8) Depends on my progress in the course, and 9) Don’t know. Due to the low number of students answering the last options, the options for once a week, several times a month, once a month, and almost never were recorded as the variable for once a week or less. The remaining options, i.e., depend on my progress and don’t know, were recorded as missing values. To answer the research question and distinguish the relationship between age, gender, FU, attitude toward gamification, and perception of Level/Badges, several non-parametric tests were run. First, a Mann-Whitney test was run with gender as the independent variable, and with attitude toward gamification, the perception of Level/Badges as the dependent variable. Thereafter, two Kruskal-Wallis tests were run with the same dependent variables but with age and FU as independent variables. Non-parametric tests were chosen due to the variables being ordinal and not continuous.
3 Result The Mann-Whitney test showed that attitude toward gamification for men (Mdn = 8) did not differ significantly than for women (Mdn = 9), U = 12812.5, z = 1.854, p = 0.064, r = 0.10. The same was true for perception of Badges for men (Mdn = 3) compared to for women (Mdn = 3), U = 12201, z = 1.056, p = 0.291, r = 0.06, and for perception of Level for men (Mdn = 3) compared to women (Mdn = 3), U = 12608, z = 1.606, p = 0.108, r = 0.09. Indicating that there was no identifiable gender differences in attitude toward gamification or perception of Badges and Level. Age group (Fig. 1) did have a significant effect on perception of Badges, H(3) = 7.94, p = 0.047. Step-down follow up analysis showed that age group 3, that is age 27–32, had a significantly higher perception of Badges than the other age groups, H(3) = 7.94, p = 0.047 but that there was no significant difference in perception for the rest of the age groups, p = 0.998. Age group did not have a significant effect on attitude toward gamification, H(3) = 4.73, p = 0.194, or perception of Level, H(3) = 7.21, p = 0.066. In FU (Fig. 1) there was a significant effect on attitude toward gamification, H(3)
182
A. Palmquist and I. Jedel
= 14.58, p = 0.002, with step-down follow up analysis showing that attitude toward gamification was significantly lower for the students visiting the LMS once a week or less compared to the students visiting more. Among the students visiting more than once a week there was no significant difference in attitude toward gamification, p = 0.935. In FU there was no significant effect for neither perception of Level, H(3) = 4.62, p = 0.201, nor of Badges, H(3) = 1.920, p = 0.589 (Fig. 2).
Fig. 2. Left: perception of Badge and Level from age group. Right: attitude toward gamification depending on FU
4 Discussion The main results indicate that gender does not affect attitude and perception; that age affects how Badges are perceived; and that FU once a week or less is related to a more negative attitude of gamification. In the following section, the results are discussed in more detail and put into the context of previous research, followed by direction for further research. Gender did not influence attitude toward gamification nor perception of Level and Badges. Previous studies displayed that the perception of game elements differs significantly between gender [14, 15]; however, this study did not show such effects, concluding that attitudes on gamification and perceptions of Level and Badges were not affected by gender. One explanation for the insignificant results can be cultural and contextual. Since gender can be argued to be a socio-contextual construct [19] cohorts from different socio-economic backgrounds can provide a different perception of different game elements. To draw this conclusion, further research declaring the socio-economic factors concerning the study is required. Another explanation can be the design: Badges did not trigger when students gave feedback to other students, and Level was not designed to foster competition. In previous studies, Points [15] and competitive aspects [14] have been more appreciated by men, whereas women have more appreciated Badges [15] and Feedback [14] The gamification design might have to elicit these underlying mechanisms to produce gender differences in perception toward different game elements. To conclude gender perception of game elements, further research should include contextual and cultural considerations, including measures about the underlying assumptions behind the difference, i.e., Badges are comprehended as feedback and Level are fostering competition.
Influence of Gender, Age, and Frequency of Use on Users’ Attitudes
183
Age did have a significant effect on the perception of Badges. Age-group 3 had a more positive perception than the younger and older groups. Noteworthy aspects are: 1) The effect of age was not present for the perception of Level or attitude toward gamification. Comparing gamification in general and the game element Level with Badges, Badges is might be more clearly related to visual feedback. Thus, a hypothesis is that the age-group 3 might be more driven to recognition and reward than the other age groups, which could explain the results. To draw this conclusion, more research is needed on the perception of different age groups in relation to different game elements. 2) Unlike previous research [12, 14] the positive perception of Badge was not prevalent for the younger age groups. Previous studies indicate that these groups perceived the usefulness of gamification higher. The present study did not show any difference in younger compared to older age groups. An explanation could be that the majority of the students were under 40 (90.7%), a group identified considering gamification appealing [12]. FU had a significant effect on attitude toward gamification. The students who reported that they visited the LMS once a week or less implied a more negative attitude toward gamification than the rest of the students. However, the perception of Level and Badge was not affected by FU. The study result appears more with the impression that positive attitudes towards gamification correlates with FU. The non-controlled study setting cannot account for causality, however, since the students had the option to opt-out, it is unlikely that negative attitudes toward gamification affected the students to spend less time in the LMS. The results indicate that gamification might need a start-up phase, and attitude toward gamification might increase once it is understood. The result also implies that gamification responds to similar theoretical principles proposed in the UTAUT framework according to how familiarity affects user adoption of a technology [16]. This is probably not exclusive for gamification implementations, but new features in general that are added to an LMS. Limitations of the study include single item, one-time measurements of perception. For a more reliable outcome that can be validated, the study could have included scales with more items to measure attitude toward gamification and Level/Badges perception. Another critique is that the scales of game element perception were a five-point, whereas the gamification attitude scale was a ten-point. Since the environment in which the survey was filled out was not controlled, other factors could also have influenced the participant’s answers. Finally, there was a lack of studying the potential mediating variables between the independent and the dependent variables explaining the results in more detail.
5 Conclusion The main original finding and that further work should explore is 1) the relationship between age groups and different game elements’ perceptions – the age group between 27–32 perceived Badges as more favorable than other groups. 2) Frequency interacting with a gamified system can influence attitudes toward gamification; less time corresponds to a negative attitude and vice versa. A future inquiry should correlate game elements’ perception with underlying psychological mechanisms using scales describing individual motives and drives. A measure of different perceptions based on more than one item should be done, giving a deeper understanding of the findings this and previous work displayed regarding demographic variations and different game elements.
184
A. Palmquist and I. Jedel
Acknowledgements. The study was partly supported by Vinnova (grants number 2018-02953).
References 1. World Economic Forum: Towards a Reskilling Revolution | World Economic Forum, Davos (2018) 2. Ritchie, H., Roser, M.: Age Structure (2019) 3. Manyika, J., Lund, S., Chui, M., Bughin, J., Woetzel, J., Batra, P., Ko, R., Sanghvi, S.: Jobs lost, jobs gained: workforce transitions in a time of automation (2017) 4. Bork, A.: Adult education, lifelong learning, and the future. Campus-Wide Inf. Syst. 18, 195–203 (2001). https://doi.org/10.1108/EUM0000000006266 5. Conceição, S.C.O.: Competing in the world’s global education and technology arenas. New Dir. Adult Contin. Educ. 2016, 53–61 (2016). https://doi.org/10.1002/ace.20176 6. Kim, J.Y. (ed.): World development report 2019: the changing nature of work. The World Bank Group (2019). https://doi.org/10.1596/978-1-4648-1328-3 7. Phipps, S.T.A., Prieto, L.C., Ndinguri, E.N.: Teaching an old dog new tricks: investigating how age, ability, and self efficacy influence intentions to learn and learning among participants in adult education. Acad. Educ. Leadersh. J. 17, 13 (2013) 8. Bartels, J., Willen, B.: Drop-out: Problems of comparing drop-out in different distance education systems (1985) 9. Pierrakeas, C., Koutsonikos, G., Lipitakis, A.D., Kotsiantis, S., Xenos, M., Gravvanis, G.A.: The variability of the reasons for student dropout in distance learning and the prediction of dropout-prone students. In: Intelligent Systems Reference Library, pp. 91–111. Springer (2020). https://doi.org/10.1007/978-3-030-13743-4_6. 10. Antonaci, A., Klemke, R., Specht, M.: The effects of gamification in online learning environments: a systematic literature review. Informatics 6, 1–22 (2019). https://doi.org/10.3390/inf ormatics6030032 11. Looyestyn, J., Kernot, J., Boshoff, K., Ryan, J., Edney, S., Maher, C.: Does gamification increase engagement with online programs? A systematic review. PLoS One 12 (2017). https:// doi.org/10.1371/journal.pone.0173403 12. Bittner, J.V., Shipper, J.: Motivational effects and age differences of gamification in product advertising. J. Consum. Mark. 31, 391–400 (2014). https://doi.org/10.1108/JCM-04-20140945 13. Kappen, D.L., Mirza-Babaei, P., Nacke, L.E.: Gamification through the application of motivational affordances for physical activity technology (2017). https://doi.org/10.1145/3116595. 3116604 14. Koivisto, J., Hamari, J.: Demographic differences in perceived benefits from gamification. Comput. Human Behav. 35, 179–188 (2014). https://doi.org/10.1016/j.chb.2014.03.007 15. Codish, D., Ravid, G.: Gender moderation in gamification: does one size fit all? In: Proceedings of the 50th Hawaii International Conference on System Sciences (2017). https://doi.org/ 10.24251/hicss.2017.244 16. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information technology: toward a unified view. MIS Q. Manag. Inf. Syst. 27, 425–478 (2003). https://doi.org/ 10.2307/30036540 17. Verhoeven, J.C., Heerwegh, D., De Wit, K.: Information and communication technologies in the life of university freshmen: an analysis of change. Comput. Educ. 55, 53–66 (2010). https://doi.org/10.1016/j.compedu.2009.12.002
Influence of Gender, Age, and Frequency of Use on Users’ Attitudes
185
18. Koivumäki, T., Ristola, A., Kesti, M.: The perceptions towards mobile services: an empirical analysis of the role of use facilitators. Pers. Ubiquit. Comput. 12, 67–75 (2008). https://doi. org/10.1007/s00779-006-0128-x 19. Lorber, J., Farrell, S.A.: The Social Construction of Gender. Sage Newbury Park, Thousand Oaks (1991)
Personalizing Fuzzy Search Criteria for Improving User-Based Flexible Search Mohammad Halim Deedar(B)
and Susana Muñoz-Hernández
Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo s/n, 28660 Boadilla del Monte, Madrid, Spain [email protected], [email protected]
Abstract. This proposal provides a user-friendly way of personalizing fuzzy search criteria in an expressive searching platform. The interest is in, for example, if we have a fuzzy criterion “expensive” for searching expensive restaurants defined in the system, by personalization, any user can access the criterion and personalize it with his/her preferences and values that satisfies his/her needs. In this way, every user retrieves different results while querying over a single fuzzy search criterion. The system executes this personalized fuzzy searching criterion if the logged-in user has previously personalized that criterion definition. Moreover, our framework is user-friendly enough to perform expressive searches over modern and conventional database formats without knowing the low-level syntax of the criteria of the framework. Furthermore, we present the architecture of this novel framework, with its design and implementation details. We provide a clarifying case study on our system by providing an experiment. We have analyzed the results obtained from the experiment to show our system’s behavior and performance after incorporating the functionality of the personalization of fuzzy search criteria. Keywords: Expressive queries · Fuzzy criteria · Personalization · Data management · Searching platform
1 Introduction While defining fuzzy concepts, the subjective characteristics of fuzziness is an important issue, in which a fuzzy concept like “expensive” depends on the “wealth” of the person performing the query. For example, we have a database of cars having a field that stores information about cars’ prices. There is a car with a price of 200.000 dollars. The price is a crisp value, and there is no problem in storing this value in our database, but it is not so easy if we try to store if the car is “expensive”, “very expensive”, “not expensive at all” or any other fuzzy value, because it might not be true for all the people retrieving the value from the database. With the capital value of 100,000 dollars, Jose might consider it very expensive, while Sara, whose capital is 400,000 dollars, might consider it just expensive. In our fuzzy query system (implemented using constraint logic programming), users can own a database by uploading it to the system and define the fuzzy concept (predicates) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 186–199, 2021. https://doi.org/10.1007/978-3-030-74009-2_24
Personalizing Fuzzy Search Criteria
187
for performing flexible searches. The database table and the defined fuzzy predicate, which is owned by a user, can be accessed by any other users who enter the system, and they can perform flexible queries based on the defined predicate. The problem arises when the subjective characteristic of the fuzzy predicate defined by the database owner does not match or satisfies the other user’s preferences. For example, a user owns the restaurant database, and he/she defines a fuzzy predicate cheap for searching low price menu restaurants. The fuzzy predicate owner (the user who defined the fuzzy concept) has defined the cheap predicate considering 25 dollars as a maximum price for food being completely cheap. Any other users who access the database and wants to perform flexible searches based on the cheap fuzzy predicate they will retrieve results as the data owner, but for some of the users, this fuzzy predicate might not be good enough; there might be users for whom 25 dollars is not cheap. Modification of the fuzzy predicate will not help because and if the user tries to change the value, the whole fuzzy predicate will change where the data owner or maybe most of the users will not be happy with it. Therefore, a system that allows personalization of the fuzzy concept devoted to each user’s preferences will be beneficial for the accuracy of the answers, where the main fuzzy predicate defined by the database owner will not be affected for being used in the searches of users that do not provide a personalized definition of the search criterion. Obtaining fuzzy answers by posing fuzzy queries over databases having non-fuzzy information has been studied in some works, where good revisions can be found in the works such as the ones provided by Bosc and Pivert [1], the work of Dubois and Prade [2], Tahani [3], and the Ph.D. The thesis of Leonid Tineo [4], although maybe a little bit outdated. Moreover, many proposals for performing fuzzy queries on relational databases can be found in [5–10]. Most of the work discussed in these papers focus on advancing the effectiveness of the existing methods, in introducing new syntactic constructions in the query or in allowing to add the conversion between non-fuzzy values and fuzzy values required to perform the query, for which they use a syntax somewhat similar to SQL or in some work they have extended the SQL language. The integration of fuzzy logic and logic programming resulted in the development of many fuzzy systems over Prolog. To represent fuzzy knowledge, we need to create a link between fuzzy and non-fuzzy concepts, and to achieve that, we could use any of the existing frameworks. Apart from the theoretical frameworks, as [11], we know about the Prolog-Elf system [12], the F-Prolog language [13], the FRIL Prolog system [14], the FLOPER [15], the FuzzyDL reasoner [16], the Fuzzy Prolog system [17, 18], or RFuzzy [19]. All of them somehow implement the fuzzy set theory introduced by Lotfi Zadeh in 1965 (as cited before in [20], and they allow us to execute the connectors required to retrieve the nonfuzzy information stored in databases. We used RFuzzy with priorities [21, 22], in our approach for increasing its capabilities in deciding which result is more preferred to be chosen among the results provided by the different rules. It is not necessary if the last rule offers a result with a higher truth value. In this paper, we aim to provide an approach for the flexible query systems to allow the users to personalize fuzzy search criteria (predicates), which is owned by other users without affecting their preferences. Moreover, we provide user-friendly interfaces, where users without having knowledge of the syntax and semantics of the framework can use our system and perform expressive queries. The framework will take care of
188
M. H. Deedar and S. Muñoz-Hernández
all the mapping processes to link between the crisp information stored in the database and the fuzzy concept. The syntax and semantics involved in constructing our flexible queries and personalization of the fuzzy criterion and the details about the design and development of our system for processing flexible queries are also described here. As a proof of concept, we provide a case study to experiment with the fuzzy predicates’ personalization, where we have personalized a fuzzy criterion defines by the data-owner. We executed a flexible (fuzzy) query devoted to the personalized criterion and the default (general) criterion over a database; we have compared the obtained result by the data owner and the other user who personalized the criterion to report the efficiency and performance of our system. This paper is structured as follows: We provide details about the background works, on which our approach depends on in Sect. 2. We present a comprehensive description of the implementation details of the system in Sect. 3. We provide a case study about our system performance by giving an example of working and its experimental results in Sect. 4. Finally, several conclusions and future works are discussed in Sect. 5.
2 Background In this section, we describe the earlier works on which our framework is based, such as a brief detail about the RFuzzy library used in our framework and the details about FleSe, which the former version of our framework is. 2.1 RFuzzy Library RFuzzy [19] is a library that was implemented for Ciao Prolog [23] to increase the expressiveness of Prolog with the possibility of handling fuzzy information. The idea of developing RFuzzy was mainly focused on reducing the complex syntax of Fuzzy Prolog [11], by representing, in a problem, fuzzy truth values using real numbers. The multi-adjoint algebra presented in [7, 24–28] is used to give semantics to the RFuzzy library. The purpose of using this structure is that it provides credibility for the rules that we give in our program. 2.2 FleSe Framework FleSe [29] is the former version of our system, a handy framework for performing fuzzy and non-fuzzy queries over Prolog databases containing crisp information. It uses the RFuzzy package, which is a Prolog library developed for Ciao Prolog and uses fuzzy logic in a Prolog implementation. FleSe was developed by adding several utility features, which makes FleSe an advanced framework with an intelligent search engine that allows users to define fuzzy search criteria and perform expressive, flexible queries over various databases. For more details about these new features refer to [30–32].
Personalizing Fuzzy Search Criteria
189
2.3 Database Definition Syntax To define a link between the fuzzy predicate and the database, we have to know what is stored inside each database field. The syntax, which is responsible for outlining the contents of a database into concepts, that we use in our searches, is shown in Eq. 1. In the syntax P is the name of the database table, A is its arity, N i is the name assigned to a column (field) of the database table whose values are of type T i (boolean type, categorical type, integer type, float type, string type), i ε [1, A] identifies each field of the table. We give an example in Eq. 2, where we have defined an employee database having six columns (attributes). define_database(P/A, [(Ni , Ti )])
(1)
define_database(employee/6, [(name, string_type), (age, integer_type), (years_of _experience, integer_type), (years_of _studying, integer_type), (major, enum_type), (distance_to_the_workcenter, integer_type)]).
(2)
2.4 Fuzzy Search Criteria Syntax The syntax presented in Eq. 3 is responsible for defining fuzzifications, and it computes the fuzzy values for fuzzy predicates from the non-fuzzy value existed in some columns in a database. Where P is the name of the database table, N i is the name assigned to a column (field) of the database table. fPredName is the name for the fuzzy predicate, [valIn, valOut] is a list of pairs of values to define the fuzzification domain. The first element of the pair, ValIn, represents a value of the domain of the attribute (column N i ) of the table P that is a breakpoint in the function that represents the truth value of the fuzzy search criterion that is being defined, fPredName. The second element of the pair, valOut, is the truth value for fPredName of the corresponding value ValIn in the domain. To clarify the syntax, we give an example in Eq. 4, in which we compute if a restaurant is cheap or not based on their price average. fPredName(P):∼ function(P, (Ni ), [(valIn, valOut)]).
(3)
cheap(restaurant):∼ function(price_average(restaurant), [(10, 1), (60, 0)]). (4) By defining this fuzzy criterion (cheap), we mean the maximum price on which a restaurant is cheap is “10” with truth value “1”, and as much the value of the price is increasing, the restaurants get more expensive, and the truth value decreases. And the minimum price on which the restaurant is not at all cheap is “60” with the truth value “0”. We provide a graphical representation of cheap fuzzy search criteria in Fig. 1.
190
M. H. Deedar and S. Muñoz-Hernández
Fig. 1. Graphical representation of cheap fuzzy search criterion.
In order to allow regular users to use our system and perform flexible queries, we provide them a user-friendly interface to define the fuzzy criteria (without being concerned about the low-level syntax of the criteria) devoted to the syntactical structures, which are not so easy for regular users to understand them, for more details about criteria definition refer to [31]. 2.5 Syntax for Personalization of Fuzzy Search Criteria To personalize a fuzzy predicate, a tail syntax in Eq. 5 is added at the end of the syntax of fuzzy predicates on rules. During searches, the personalized definition of searching criterion will be used for users who personalized that fuzzy concept; the general definition will be used otherwise. In the syntax, UserName is the name of any user (a string). We provide an example in Eq. 6 in which we say that “Sara considers that a car is a good car depending on the worst value it has between being low mileage and being cheap, which means that a car must be low mileage and cheap at the same time to consider it a good car”. So, if it is, she who poses a query to the system asking for a good car, she will obtain the results based on her personalized searching criterion. Only for user ’UserName’
(5)
good _car(cars):∼ rule(min, low_mileage(cars), cheap(cars))) only for user ’Sara’. (6)
3 Implementation Details Our framework has a web interface developed using Java, and it runs on a Tomcat server on a Linux operating system machine. It is developed for the search engine implemented using RFuzzy that allows regular users, without the need to learn the underlying syntax, to pose fuzzy queries for performing expressive, flexible searches over a database that does not contain fuzzy information. It gets the database files (tables) from the users in any of the allowed extensions (.sql, .json, .csv, .xlsx, .xls, .pl), the framework then converts the database tables in a Prolog syntax which is understood by the search engine.
Personalizing Fuzzy Search Criteria
191
For more details about the conversion process, refer to [32]. We provided different userfriendly interfaces for defining fuzzy search criteria, and we provide it also now for the personalization of criteria. For more details about the framework refer to [33], where we have introduced UFleSe, which is an advanced framework for performing expressive flexible searches over databases. In the following sections, we explain the interface and the steps for personalizing fuzzy search criteria, and how it is used for performing expressive searches over a database. 3.1 Syntax for Personalization of Fuzzy Search Criteria We developed user-friendly interfaces for the personalization of fuzzy predicates. To personalize a fuzzy predicate, the first thing to do is select the database on which we want to perform flexible searches. As shown in Fig. 2 in the combo box, we can see a list of all the databases and their owner. In our example we select the “restaurant” database which is owned by the user “adib.akbari75”.
Fig. 2. Selecting the database table.
Once the database is selected, another combo box appears. It is needed then to select a particular table. In this case, there is only one and, then we can see in Fig. 3, a combo that contains the list of fuzzy predicates and crisp attributes that are owned by the database owner. From the list of predicates, we select, for example, the fuzzy predicate (cheap),
Fig. 3. Selecting the fuzzy predicate.
192
M. H. Deedar and S. Muñoz-Hernández
which is defined by the database owner (adib.akbari75). Once the predicate is selected, the interface for performing a flexible query (search engine) appears, as shown in Fig. 4. In Fig. 4, we can see two buttons (Search and Personalize); If we select the search button, we will get retrieve results based on the general definition of the cheap fuzzy predicate which is defined by the database owner, and his fuzzy predicate indicates “the maximum average price in which the restaurant is considered to be cheap is 25 dollars” but let us assume, that based on our preferences this value (25 dollars) does not satisfy our requirements; thus we want to personalize this fuzzy predicate with different fuzzification value, so that in future if we are searching for a “cheap restaurant” over the database owner’s data and predicates, we get the results based on our personalized criteria, and this personalization should not affect the general definition of the fuzzy search criterion cheap for the other users. To do this, we select the “Personalize” button, as shown in Fig. 4. Once the “Personalize” button is clicked, the interface for personalizing the fuzzy predicates appears, as shown in Fig. 5. In Fig. 5, we can see the domain of fuzzification for the fuzzy predicate (cheap), which is defined by the data owner (adib.akbari75); thus, we, as users, can adjust the fuzzy predicate definition devoted to our preferences by merely changing the breakpoint values of the domain of the fuzzification of the cheap fuzzy criterion. We have personalized the cheap predicate (as shown in Fig. 6) by giving “10 dollar as the maximum value of the price average in which the restaurant is completely cheap” and “50 dollars as the minimum value of the price average in which the restaurant is not considered cheap at all”.
Fig. 4. Cheap fuzzy predicate for performing flexible searches.
Fig. 5. Personalization interface.
Personalizing Fuzzy Search Criteria
193
4 Case Study Although in the FleSe tool, it is possible to run complex queries composed of different search criteria. We will focus on this case study in a simple query using one only fuzzy search criterion to easily see the effect of personalization. We have analyzed the system behavior when a flexible query is executed based on the definition of the general fuzzy search criteria and the personalized criteria, through this experimentation: A small database about restaurants has been developed for experimentation purpose only, which was in “JSON” format. After uploading the database and providing general definitions of the system’s searching criteria, the system provides a configuration file (in Prolog format). We provide an example of the configuration file consisting of the information about restaurants in Fig. 11. Flexible Query: “I am looking for a cheap restaurant” uses the values offered by the definition of the following attribute domain in the database: Price Average: this attribute represents the price average of a restaurant. The cheap fuzzy predicate has been defined as a fuzzification criterion over price average attribute. To perform this experiment, we have considered two users; one is the data owner who owns the database and the fuzzy predicate, and the other user “halimdeedar” who wants to personalize the fuzzy predicate owned by the data owner.
Fig. 6. Personalization of cheap fuzzy predicate.
Data Owner: The data owner has uploaded the restaurant database and defined the cheap fuzzy criterion with the interval of ([1,25], [70, 0]) saying that the restaurants with the price average under or equal to 25 dollars are considered cheap, and the ones with the price average of 70 dollars or more are not cheap at all as shown in Fig. 5. Any user who enters the system can have access to the restaurant database (if the data owner allows him/her) and can perform searches using the cheap fuzzy search criteria
194
M. H. Deedar and S. Muñoz-Hernández
which are defined by the data owner. When a user searches for “cheap restaurants”, if the user had not personalized the definition of the fuzzy search criterion cheap, then the general definition that the data owner had provided will be used. The result We present the result of this search that the data owner of any user that had not provided a personal definition for the cheap criterion is provided in Fig. 7. Halimdeedar: is the user who owns neither the database nor the cheap fuzzy predicate. He personalized the cheap fuzzy predicate with different values for the domain of the fuzzification, considering that “the maximum price average on which restaurants are completely cheap is 10 dollars (not 25 dollars), and the minimum price average on which restaurants are not cheap at all is 50 dollars (not 70 dollars)”. We present the result of the user’s query (halimdeedar) in Fig. 8.
Fig. 7. Data-owner result of the query: I am looking for a cheap restaurant.
We present the syntax behind the cheap general fuzzy predicate defined by the data-owner in Eq. 7 and its personalization syntax defined by the user (halimdeedar) in Eq. 8. cheap(restaurant):∼ function(price_average(restaurant), [(10, 1), (60, 0)]). (7) cheap(restaurant):∼ function(price_average(restaurant), [(15, 1), (50, 0)]) only for user ’halimdeedar’.
(8)
Personalizing Fuzzy Search Criteria
195
Fig. 8. Halimdeedar’s results after personalization of the cheap fuzzy criterion: “I am looking for a cheap restaurant”.
We present the graphical representation of the cheap fuzzy predicate in Fig. 9 where we have shown the general definition of the fuzzy predicate by the owner (red-colored graph) and the personalized definition of the fuzzy predicate by the user (halimdeedar) in a green-colored graph.
Fig. 9. Graphical representation of the cheap fuzzy predicate.
196
M. H. Deedar and S. Muñoz-Hernández
The results obtained by the user(halimdeedar) vary from the results obtained by the data owner, even though there always will be one cheap fuzzy predicate in the system even after personalizing them, but it reacts differently considering different users who are logged in the system according to their personalized searching criteria. Thus, if the user logged in the system is halimdeedar, he will always retrieve this result sets while searching for “cheap restaurants”. To better understand the results obtained by both the users (data owner and halimdeedar), we have visualized the result, as shown in Fig. 10. in which we can see Tapasbar and Burger king restaurant both with the price of 10 dollars are cheap restaurants for both the users with the truth value 1. We can observe that the truth value of being cheap of the Kenzo restaurant with the price of 50 dollars for dataOwner is 0.44, which is a little cheap (not very cheap) for the dataOwner, but for the user halimdeedar, the system has assigned it truth value 0, which means this is not cheap at all for him considering his preferences.
Fig. 10. Result analysis: I am looking for a cheap restaurant.
In this way, we were able to deal with the fuzzy predicates’ subjective characteristics and provide the facility for the users to personalize any fuzzy search criteria that exist in the system according to their preferences to get better and satisfying results for their flexible queries.
Personalizing Fuzzy Search Criteria
197
Fig. 11. Restaurant database, configuration file.
5 Conclusions In this paper, we propose to enrich an approach for performing expressive querying of databases (flexible by using fuzzy search criteria) with the personalization of the searching criteria. Therefore, the main purpose is to provide a system with a user-friendly interface to allow regular users (without knowing the framework’s low-level syntax) to deal with the subject characteristics of fuzziness while performing expressive searches and retrieving an adequate result that satisfies their requirements. Moreover, we have provided the details about our approach’s background, where we have described the earlier works on which our proposed framework is based on and the former version of our framework. We explained the detail about the low-level syntax, where the search engine takes advantage of the improvements in it for performing flexible searches. We have provided a case study by experimenting with our proposal by executing a flexible query based on the personalization of the criteria over a database, where we have analyzed the behavior and performance of our system. Our present work is oriented to validate our framework with different users interested in searching over their own data by uploading their databases, adding the fuzzy criteria and similarity definitions and personalize them for expressive searching in their data, searching and using them in our framework, and finally providing their opinion related their satisfaction with it. Future steps in this research line will be introducing a mechanism for clustering the system users devoted to their profile and searching records.
References 1. Bosc, P., Pivert, O.: SQLf: a relational database language for fuzzy querying. IEEE Trans. Fuzzy Syst. 3(1), 1–17 (1995). https://doi.org/10.1109/91.366566
198
M. H. Deedar and S. Muñoz-Hernández
2. Dubois, D., Prade, H.: Using fuzzy sets in flexible querying: why and how? In: Troels, A., Henning, C., Legind, L.H. (eds.) Flexible Query Answering Systems, pp. 45–60 (1997). https://dl.acm.org/citation.cfm 3. Tahani, V.: A conceptual framework for fuzzy query processing: a step toward very intelligent database systems. Inf. Process Manag. 13, 289–303 (1977) 4. Rodriguex, L.J.T.: (Ph.D. Tesis) a contribution to database flexible querying: Fuzzy quantified queries evaluation, Novemver 2005 5. Prade, H., Testemale, C.: Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries. Inf. Sci. 34, 113–143 (1984) 6. Umano, M., Hatono, I., Tamura, H.: Fuzzy databaase systems. In: Proceedings of the IEEE International Joint Conference on Fuzzy Systems, vol. 5, pp. 35–36 (1995) 7. Moreno, J.M., Aciego, M.O.: On first-order multiadjoint logic programming (2002) 8. Konstantinou, N., Spanos, M.C., Solidakis, E., Mitrou, N.: VisAVis: an approach to an intermediate layer between ontologies and relational database contents. In: Proceedings of the 2006 CAISE Third International Workshop on Web Information system Modeling (WISM) (2006) 9. Martínez-Cruz, C., Noguera, J.M., Vila, M.A.: Flexible queries on relational databases using fuzzy logic and ontologies. Inf. Sci. 366, 150–164 (2016) 10. Takahashi, Y.: A fuzzy query language for relational databases. IEEE Trans. Syst. Man. Cyb. 21, 1576–1579 (1991) 11. Vojtas, P.: Fuzzy logic programming. Fuzzy Set. Syst. 124(3), 361–370 (2001) 12. Ishizuka, M., Kanai, N.: Prolog-ELF incorporating fuzzy logic. In: Proceedings of the 9th International Joint Conference on Artificial Intelligence, pp. 701–703. Morgan Kaufmann Publishers Inc., San Francisco (1985) 13. Li, D., Liu, D.: A Fuzzy Prolog Database System. Wiley, New York (1990) 14. Baldwin, J.F., Martin, T.P., Pilsworth, B.W.: Fril- Fuzzy and Evidential Reasoning in Artificial Intelligence. Wiley, New York (1995) 15. Morcillo, P., Moreno, G.: Floper, a fuzzy logic programming environment for research. In: Gij (ed.) Proceedings of VIII Jornadas sobre Programacion y Lenguajes (PROLE 2008), vol. 10, pp. 259–263 (2008) 16. Bobillo, F., Straccia, U.: fuzzyDL: an expressive fuzzy description logic reasoner. In: International Conference on Fuzzy Systems (FUZZ08), , pp. 923–930. IEEE Computer Society (2008) 17. Guadarrama, S., Muñoz, S., Vaucheret, C.: Fuzzy prolog: a new approach using soft constraints propagation. Fuzzy Sets Syst. 144(1), 127–150 (2004). https://doi.org/10.1016/j.fss.2003. 10.017 18. Vaucheret, C., Guadarrama, S., Muñoz-Hernández, S.: Fuzzy prolog: a simple general implementation using CLP(R). In: Baaz, M., Voronkov, A. (eds.) (LPAR). Lecture Notes in Artificial Intelligence, vol. 2514, pp. 450–464. Springer (2002) 19. Muñoz Hernández, S., Pablos-Ceruelo, V., Strass, H.: RFuzzy: syntax, Semantics and Implementation Details of a Simple and Expressive Fuzzy Tool over Prolog. Inf. Sci. 181(10), 1951–1970 (2011). https://doi.org/10.1016/j.ins.2010.07.033 20. Zadeh, L.A.: Fuzzy sets. Inf. and Control 8(3), 338–353 (1965) 21. Pablos-Ceruelo, V., Muñoz-Hernández, S.: Introducing priorities in rfuzzy: Syntax and semantics. In: CMMSE 2011: Proceedings of the 11th International Conference on Mathematical Methods in Science and Engineering, vol. 3, Benidorm (Alicante), Spain, June 2011, pp. 918–929 (2011) 22. Pablos-Ceruelo, V., Muñoz Hernández, S.: Getting answers to fuzzy and flexible searches by easy modelling of real-world knowledge. In: Proceedings of 5th International Joint Conference on Computational Intelligence, pp. 265–275 (2013). https://doi.org/10.5220/000455530265 0272.
Personalizing Fuzzy Search Criteria
199
23. The CLIP lab: The Ciao Prolog Development System. https://www.clip.dia.fi.upm.es/Sof tware/Ciao 24. Medina, J., Ojeda-Aciego, M., Vojtas, P.: A multi-adjoint approach to similarity-based unification. Electr. Notes Theor. Comput. Sci. 66, 70–85 (2002) 25. Medina, J., Ojeda-Aciego, M., Vojtas, P.: A completeness theorem for multi-adjoint logic programming. In: FUZZ-IEEE, pp. 1031–1034 (2001) 26. Medina, J., Ojeda-Aciego, M., Vojtas, P.: Multi-adjoint logic programming with continuous semantics. In: Proceedings of the 6th International Conference on Logic Programming and Nonmonotonic Reasoning, series LPNMR 2001, pp. 351–364. Springer, London (2001) 27. Medina, J., Ojeda-Aciego, M., Vojtas, P.: A procedural semantics for multi-adjoint logic programming. In: Proceedings of Progress in Artificial Intelligence, pp. 290–297 (2001) 28. Medina, J., Ojeda-Aciego, M., Vojtas, P.: Similarity-based unification: a multi-adjoint approach. Fuzzy Set. Syst. 146(1), 43–62 (2004) 29. Pablos-Ceruelo, V., Muñoz-Hernández, S.: FleSe: a tool for posing flexible and expressive (fuzzy) queries to a regular database. In: Proceedings of 11th International Conference on Distributed Computing and Artificial Intelligence, pp. 157–164 (2014) 30. Deedar, M.H., Muñoz-Hernández, S.: Allowing users to create similarity relations for their flexible searches over databases. In: Artificial Intelligence and Soft Computing, pp. 526–541. Springer, Cham (2019) 31. Deedar, M.H., Muñoz-Hernández, S.: User-friendly interface for introducing fuzzy criteria into expressive searches. In: Intelligent Systems and Applications, pp. 982–997. Springer, Cham (2020) 32. Deedar, M.H., Muñoz-Hernández, S.: Extending a flexible searching tool for multiple database formats. In: Emerging Trends in Electrical, Communications, and Information Technologies, pp. 25–35. Springer (2020) 33. Deedar, M.H., Muñoz-Hernández, S.: UFleSe: user-friendly parametric framework for expressive flexible searches. Can. J. Electr. Comput. Eng. 43(4), 235–250 (2020)
Virtual Reality to Improve Human Computer Interaction for Art Fayez Chris Lteif1,2(B) , Karl Daher1,2 , Leonardo Angelini1,2 , Elena Mugellini1,2 , Omar Abou Khaled1,2 , and Hayssam El Hajj1,2 1 Lebanese International University, Saloumi Street, Beirut, Lebanon
{karl.daher,leonardo.angelini,elena.mugellini, omar.aboukhaled}@hes-co.ch, [email protected] 2 HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
Abstract. The integration of technology, namely virtual reality, with arts is implemented in this paper. An observational study is performed by replicating an actual museum by a Swiss-Lebanese artist and having user experience it in a virtual immersive environment. Data is then collected to monitor the user experience and its effects on his emotional and psychological state. This information is used to reflect on future developments and enhancements to elevate the user experience and potentially upgrade it with further advanced technologies. Keywords: Technology · Virtual Reality · Art · Immersive environment
1 Introduction Art, “something that is created with imagination and skill and that is beautiful”, according to the Merriam Webster dictionary, always referred to a classic practice that evolved with mankind. However; when technology erupted to become an essential tool for humanity with the benefits it offers, art needed to embrace the change and become technology oriented. Consequently, Virtual Reality (VR), another emerging technology found a great influence onto several sectors such as tourism, where Beck J. displays its growth and impact on tourism with a comprehensive state of the art study on VR systems and their potential capabilities [1]. Additionally, healthcare has also been a primary point of focus according to Brenda Kay Wiederhold [2], as she investigates the notion of Virtual Medical Centers to improve human performance through careful physiological monitoring during VR therapy and training to eventually reflect the results onto the real world. Far from limitations, Virtual Reality has also been embraced in industrial fields, such as Autonomous vehicles [3] to help avoid risks of physical set-ups where people could be injured. Virtual Reality (VR) is the use of computer technology to create a simulated environment [4]. It is mainly simulated using a specific platform consisting of a head mounted display (HMD) and hand controllers connected to a computer. Upon use the user finds himself fully immersed in a new virtual world and in this paper’s context, an immersive museum. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 200–207, 2021. https://doi.org/10.1007/978-3-030-74009-2_25
Virtual Reality to Improve Human Computer Interaction for Art
201
In this paper, the idea of human to machine interaction is tested in an observational study in the field of arts. A VR headset provided the user with an art gallery experience with several interactions and decisions to make. This was displayed and tested in SwissLebanese artist’s exhibition Hafiz Bertschinger in Belfaux, Switzerland. The idea behind the experiment was to extract a preliminary sense of how humans react to artistic virtual expositions, and its impact on their psychological state, such as emotions. In Sect. 2, a state of the art is discussed regarding previous uses of virtual reality to enhance the human to machine interaction and development, in addition to previous attempts to integrate Art into VR. Furthermore; In Sect. 3, the project is thoroughly discussed and broken down into several subsections, its fundamental principles, technologies used and hardware, software implementation, testing phase and finally the results. In Sect. 4, the results are displayed, and future work is discussed based on what benefits the initial testing phase mounted to.
2 State of the Art As Virtual Reality became viral in recent years, its use was not just limited for entertainment purposes but rather into enhancing the life of individuals. A state-of-the-art study was conducted and broken down to two essential fields, virtual reality to develop the human computer interaction and virtual reality to promote art. 2.1 Virtual Reality to Develop the Human Computer Interaction Telekinetic object movement, a concept rather demonstrated in movies only, was never considered a reality, However, with the help of Virtual Reality simulations this became possible. This means integrating neuroscience with technology to enrich human life. This is done by studying physiological signals to further develop a better understanding of human conditions using neurotherapy feedbacks. On another hand, neurable [5], a software company founded in 2015, was able to facilitate the use of Virtual Reality to monitor the human brain. The technology is powered by breakthrough Brain Computer Interface (BCI) research and signal processing techniques. It enables the delivery on the promise of truly useful BCI tools for enterprise and consumer applications. Their software was able to reflect a real time insight of the human brain signals and use them in various domains such as human insights and simulation training. The human insights category was used in optimizing marketing and advertising by analyzing emotions rather than questionnaires. Additionally, simulation training which Measures stress in VR simulation training to improve outcomes and Conduct user-user, user-expert, and user-cohort analysis with objective biometrics. Another use of Virtual Reality was in pain reduction. In an IEEE Pulse research, patients who were distracted by the VR immersiveness showed reduction in pain scores [6]. Some pain directed VR products were already being implemented such as SnowWorld, an immersive VR experience designed to reduce the pain of burnt patients during their treatment. Moreover, this technology was not only limited to burn pain, it was also implemented for dealing with chronic pain. A VR application called GLOW was developed. GLOW is geared primarily toward alleviating chronic pain and promoting relaxation and mindfulness [6]. The application was accompanied by a heart
202
F. C. Lteif et al.
rate monitor that can help users control their respiration and lower their anxiety levels. This is achieved by introducing the patient in a calm environment that takes away the pain to an extent. Additionally, virtual reality was implemented as “Immersive mirror for pain control and treatment”, [7] that uses virtual and augmented reality for the treatment of phantom limb pain and stroke rehabilitation. Regardless of pain treatment, VR has also been used for training young adults with intellectual disabilities [8]. A study adopted by the University of Fribourg, that simulated real life scenarios specifically a public transport bus to help individuals train for different situations be it riding the correct bus, getting out of the bus and interacting with strangers. As a result, this will help eliminate the dangers of the real world by engaging in a confined and controlled virtual environment. 2.2 Virtual Reality to Promote Art Art wise, Virtual Reality found a path through many applications that motivated this project and helped improve its development towards the right goal. The VR Museum [9] of Fine Art is a VR application on steam. The application offers an art gallery with an initial release of 15 high-fidelity sculptures, and famous paintings scanned and rendered in crisp detail. It allows the free movement around the museum and the ability to get close to every sculpture. One of its disadvantages that should be enhanced is the ability to properly interact with every sculpture as in rotating and touching. Lifeliqe VR Museum [10] is also a VR application for Vive. The app presents an unmatched volume of 1000 + interactive 3D models and 20 VR experiences to help students and teachers to learn and understand complex science concepts and make the learning process of STEM subjects more effective. Lifeliqe museum was created as an extension for the Lifeliqe curriculum program. Finally, Amaryllis VR [11] is a room scale VR installation created by the Danish-Armenian artist Miriam Zakarian. It was created in 2016 and had its first completed scene named “Ocean” exhibited to the public in August 2017. It is composed of several surreal worlds inspired by concepts related to what makes humans different from sophisticated machines: spirituality, mortality, the subconscious and complex emotions. Hence art never truly stops to develop as several projects have already attempted its integration with technology, each in their own methods relative to their objectives, be it education, games and even philosophical designs. In this project we tried to study the emotional effect of the digitalized art while being presented in virtual reality. We compared the real-life exposition to the one presented in virtual reality.
Virtual Reality to Improve Human Computer Interaction for Art
203
3 Project Implementation and Testing Results This whole concept of human to machine interaction, reflects itself in the human psych. How humans react to such behavior and how emotions develop towards such interaction. To fulfill this purpose, a virtual museum was created and presented during a real exposition. The virtual museum included artwork that was displayed in the real exposition. The project is in collaboration with the Swiss Lebanese artist Hafiz Berschtinger, the experiment took place during one of his exhibitions in 2019. The idea was to recreate the artwork displayed in the museum into the virtual environment and analyze the interaction, experience, emotional state of various users. The development of this project required several procedures on several levels in order to connect the pieces successfully, this required the incorporation of several software and hardware equipment to function all together as one unit. The main components with their functions are the following: 1. The first stage was to capture high quality 3D images and convert them into digital replicas of the statues, sculptures and paintings. 2. Then we introduced it into the Unity Game Engine on PC platform. The unity software will allow us to create the VR environment with all its assets and objects in addition to all its properties. 3. Then the coding was done using C# programming language. 4. After the creation of the complete application, it was tested on the VR headset, the HTC VIVE PRO for an immersive experience with the help of its motion tracking technology, directional audio system and HD touch feedback that offers realistic movement in the virtual world (Figs. 1 and 2).
Fig. 1. Unity game engine interface
204
F. C. Lteif et al.
Fig. 2. Statue 3D digital replica
3.1 Hardware/Software The computer desktop is used to load the software responsible for the Museum application and integrate it through its peripherals to the VR headset. The computer requires high end specifications, the VR headset is latest generation, and the ones adopted are the following: 1) Intel Core I7-7700k Computer processing unit (CPU) 2) 32 GB of Ram (min. Required 8 GB) 3) NVidia GTX 1080ti Graphics Processing Unit (GPU), (min. Required GTX 1060 or equivalent). 4) HTC VIVE PRO virtual reality headset The HTC VIVE PRO virtual reality headset allows the user to engage in the virtual world using its head mounted display, motion detecting sensors, G-sensor, gyroscope, proximity sensors. Additional hardware required was a computer screen running at all times to help view what the user is experiencing. As for the software, several programs were needed to complete the project and also run it afterwards: 1) Windows 10 running on the computer. 2) Steam which allows the use of the SteamVR that helps run the application on the VR headset. 3) Unity software which is used to create the application. 4) Zephyr 3DF to capture the 3D models. 5) Inkscape vector graphics editor. 6) GIMP raster graphics editor. 7) Autodesk Fusion 360 to help create 3D models manually.
Virtual Reality to Improve Human Computer Interaction for Art
205
3.2 Tests and Results The final step beyond the initial implementation, was to actually test the final product on actual subjects and record their experience. To fulfill the project objective, the product was displayed at a Museum in Belfaux, Switzerland, 2019. During the exposition of the Artist Mr. Hafiz Berschtinger. Since the product reproduces the museum artifacts, the users had to navigate the actual museum and then finally experience the virtual environment. After experiencing the virtual world, a questionnaire containing several observational questions was filled. The questions were oriented towards user experience. 20 participants have visited the virtual museum and took part in the study. 6 of those filled our questionnaire that helped further understand the participant’s perception of the immersive experience. The main points covered in the observation targeted the users’ first impressions, thoughts on the environment and improvements they saw fit to enhance the whole experience, all while recording the emotional impulse. Some of the questions are listed below as follows: • “How realistic did the sculptures/paintings feel?” • “Was it easy to walk around the museum and interact with the environment?” • “Rate your virtual experience?” Following the data collection, results were noted and revealed a relatively successful first attempt for the project design and implementation. The results were distributed as shown in Fig. 3.
Fig. 3. Questionnaire statistical results
206
F. C. Lteif et al.
Further results were also extracted based on user preferences. Some recorded the need for better overall lighting, using the device while standing rather than seated, and even reported some difficulties navigating certain aspects. Aside from the museum review, the main goal was the observational study behind the physical and emotional reactions. Most users have recorded emotions of joy and excitement upon engaging the museum while others more acquainted with technology navigated naturally and displayed interest and enthusiasm to follow up on later updates.
4 Future Work and Discussion Art as known to humanity, would be impossible without mankind’s historical cache of technological development that helped enable its various new forms. Virtual Reality, in modern day, proved itself to be a valuable asset towards this understanding. Little could be added on the importance of this new emerging concept of virtual reality, but much could be said on what this technology has to offer in its future releases to advance the work of many artists and help them get the exposure needed to deliver their thoughts and concepts in a very unique and exceptional manner. What this project offered was a world with no limits for artists to become creative, interactive and reflect their art onto people’s emotions in real and virtual expositions. The future potential is immense, the study for this project will continue to help improve the interaction methods to ameliorate the overall immersive experience. Examples reveal the integration of other technologies altogether, such as hologram assistants within the museum environment, voice command controls, and a possibility for improved emotional and psychological studies resulting from these experiences.
References 1. Beck, J., Rainoldi, M., Egger, R.: Virtual reality in tourism: a state-of-the-art. Tour. Rev. (2019) 2. Brenda, K.W.: The potential for virtual reality to improve health care. A White Paper (2006) 3. Nascimento, A.M., Queiroz, A.C.M., Vismari, L.F., Bailenson, J.N., Cugnasca, P.S., Junior, J.B.C.: The role of virtual reality in autonomous vehicles’ safety. In: IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR) (2019) 4. Sherman, W.R., Craig, A.B.: Understanding Virtual Reality: Interface, Application, and Design. Morgan Kaufmann (2018) 5. Jantz, J.J., Molnár, A., Alcaide, R.: A brain-computer interface for extended reality interfaces. In: ACM SIGGRAPH 2017 VR Village (2017) 6. Mertz, L.: Virtual reality is taking the hurt out of pain. IEEE Pulse. 10(3), 3–8 (2019). https:// doi.org/10.1109/MPULS.2019.2911819. PMID: 31135343 7. Carrino, F., Abou Khaled, O., Mugellini, E.: IMPACT – immersive mirror for pain control and treatment. In: Augmented Reality, Virtual Reality, and Computer Graphics – 5th International Conference, AVR 2018, Otranto, Italy, 24–27 June 2018, Proceedings, Part II, vol. 10851, pp. 192–200 (2018). https://doi.org/10.1007/978-3-319-95282-6_14 8. Carrino, F., Cherix, R., Omar, A.K., Mugellini, E., Wunderle, D.: Bus simulator in virtual reality for young adults with intellectual disabilities. In: Proceedings of the 4th Gamification & Serious Games Symposium (GSGS 19) (2019)
Virtual Reality to Improve Human Computer Interaction for Art
207
9. Sinclair, F.: VR Museum of Fine Art, 20 August 2016 (2016). https://store.steampowered. com/app/515020/The_VR_Museum_of_Fine_Art/ 10. Lifeliqe VR Museum. https://www.lifeliqe.com/products/vr-museum 11. Amaryllis VR (2017). https://www.amaryllisvr.com/
A Gaze-Supported Mouse Interaction Design Concept for State-of-the-Art Control Rooms Nadine Flegel1(B) , Christian Pick2 , and Tilo Mentler1 1 Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany
{N.Flegel,mentler}@hochschule-trier.de 2 University of Lübeck, Ratzeburger Alle 160, 23562 Lübeck, Germany
Abstract. Control rooms represent critical infrastructures important for security and well-being of humans. While they have changed considerably with respect to information technologies within the last 30 years, user interfaces are, as our studies revealed, still characterized by windows, icons, menus, and pointers (WIMP). Further research on novel interaction design concepts for multi-monitor WIMP applications is necessary in addition to smart control room and pervasive computing environments approaches. In this study, we examined how control room operators can be supported in their daily work with multi- monitor applications. Based on human-centred design activities, the gaze-supported mouse interaction design concept “Look & Drop”, approaching the most common interaction problems, e.g. moving the mouse pointer to distant screens, was developed. Results of a laboratory study, expert reviews and a proof of concept installation at a previously unknown control room indicate that Look & Drop could support operators at state-of-the-art control room workstations. Keywords: Control room · Mouse interaction · Gaze support · Look & Drop · Multi-monitor workstations
1 Introduction Control rooms, meaning “location[s] designed for an entity to be in control of a process” [1], are important for modern-day societies and humans in various areas of life. Whether an ambulance is required as fast as possible, traffic needs to be managed or uninterrupted supply of power, gas and water are taken for granted, they represent critical infrastructures and their operators bear major responsibility. Control rooms have changed considerably with respect to available information and communication technologies, human-machine task allocation and levels of automation in recent decades. However, user interface and interaction design are still characterized by windows, icons, menus, pointers (WIMP) paradigm applications available at stationary devices with displays of various sizes. Current research on control rooms with respect to human-computer interaction and computer-supported cooperative work is mainly focused on paradigm shifts from WIMPoriented interaction design to post-WIMP smart control rooms and pervasive computing environments. However, insights of contextual inquiries in 3 control rooms and a survey © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 208–216, 2021. https://doi.org/10.1007/978-3-030-74009-2_26
A Gaze-Supported Mouse Interaction Design Concept
209
among 113 domain experts presented subsequently indicate that further research on novel concepts for multi-monitor WIMP applications is necessary. Therefore, a gazesupported mouse interaction design concept approaching the most common interaction problems is introduced and evaluated in the following sections.
2 Analysis In the following two sections, details of the human-centred design analysis activities applied in this study are described. They involved contextual inquiries and a survey. 2.1 Contextual Inquiries Within the research project NetzDatenStrom [2], 3 control rooms were visited for at least 2 shifts (6–10 h) by 2 HCI researchers, including shift changeovers. According to best practices of Contextual Design, operators were interviewed, guided by a predefined set of questions, during their shifts at their workstations, when appropriate. They were asked to assess usability of their workstations with the aid of the ISO 9241110-S questionnaire. In addition, handwritten notes on topics like workstation design, available resources, operators’ responsibilities and alarm management were taken [2]. The on-site appointments revealed that control rooms show “major differences […] with respect to number and layout of workstations, human and technical resources as well as operators’ responsibilities” [2]. There are also differences with respect to cooperation, e.g. supervising operator located in the same or different room, and privacy issues, e.g. operators sitting in the back able to view the screens of others. 2.2 Online-Survey An online questionnaire was developed and published via a LimeSurvey platform for 3 weeks. In summary, it should help to gain insights into the following questions: 1. How are state-of-the-art multi-display workplaces designed? 2. Which interaction-related challenges arise? 3. Are there domain- and application-independent challenges? The questionnaire was divided into 2 parts. In the first part, both the professions of the participants and the design of their workstations were covered by 8 questions, including the number and arrangement of screens and input devices. If respondents indicated the existence of a public space (e.g. large wall-mounted displays), further questions were asked about the position and interaction. In the second part, the participants were asked about challenges they face in their daily work with multiple screens. These were grouped into the topics mouse interaction, switching between screens, information perception, application and window management and system configuration.
210
N. Flegel et al.
In order to recruit participants, organizations, professional associations and online forums devoted to working domains with multi-display working environments were contacted, e.g. stock exchange trading, network control stations and traffic control centres (bus and train services). Among others, the Professional Association of Control Centers, the Association of German Transport Companies, the Network Control Center of Deutsche Bahn and the Professional Association of German Radiologists published information about the survey and invited their members to participate. A total of 113 participants were included in the evaluation of the results. The classification into work domains was based on the procedure for qualitative content analysis [3]. Categories were established by the domains participating in the survey: Rescue and fire control centers (n = 69), traffic control centers (n = 22), radiology (n = 13) and “not assignable” (n = 9). Locating the current position of the mouse pointer after performing another task was named the most common challenge (see Table 1). Table 1. Top 3 distribution of mouse interaction problems experienced by professionals working at multi-display workstations (N = 113) Description of interaction problem
% of respondents
Locating the current position of the mouse pointer after another activity (e.g. talking to someone)
65%
Clicking elements in the border area between 2 screens
32%
Covering long distances with the mouse
31%
Results confirmed findings of previous studies [2]. Mouse input is still the prevalent input type. 95% of the participants use it. However, touch input is used by 33%.
3 Prototyping Given the fact that control rooms vary with respect to number and layout of workstations, the solution to be found should be scalable and flexible. This means that it should not be limited to a specific multi-display setup. Based on the results described in Sects. 2 the mouse interaction design concept “Look & Drop” was developed. Its basic idea is illustrated in Fig. 1. When a mouse movement is detected, the focused monitor is determined. It is checked whether the pointer is already located on it. For recognizing gaze directions, various hardware approaches (eye tracking cameras/glasses, Microsoft Kinect, and webcams) were systematically compared regarding the fulfilment of criteria like functional range, flexibility, and costs. Finally, off-the-shelf webcams were chosen.
A Gaze-Supported Mouse Interaction Design Concept
211
Fig. 1. Basic idea of Look & Drop: (1.) When moving the mouse and focusing on another screen, (2.) the pointer is moved.
To ensure supported real-time mouse interaction, it is necessary to continuously monitor the user’s line of sight, mouse movements and the pointer position. For the placement of the pointer after a screen change, the fixed-location strategy [4] is used, in which the pointer is always placed at the same position on the screen surface. As in the study [4], the center of the screen is chosen as the “fixed location”, meaning that users must cross half the diagonal screen area at most to reach elements. Computer vision algorithms were used for biometric identification and to determine the gaze-direction based on the head pose. Look & Drop was realized in Python, using the programming library OpenCV, with functions for image processing, especially face recognition. The libraries Dlib, scikit-learn, imutils and NumPy were used to implement the computer vision component. The library pynput was used for monitoring mouse interaction and moving of the pointer to any position on the display. Based on Euler angles, the system calculates the direction of view in real time to ensure smooth interaction sequences. Due to the variance in body height and seat height of users, a calibration procedure is necessary at the beginning of the work and when changing the seat height for the precise calculation of the viewing direction.
4 Evaluation In the following sections, details of the evaluation applied in this study are described. They involved a laboratory study, an expert review and a Proof of Concept. 4.1 Laboratory Study Goal of the laboratory study was to answer the following research questions: – RQ1: Is mouse interaction with Look & Drop reducing the time to cover distances between different monitors compared to mouse interaction without it? – RQ2: Is mouse interaction with Look & Drop reducing the time to locate the mouse pointer compared to the mouse interaction without it? – RQ3: Is mouse interaction with Look & Drop assessed as beneficial compared to mouse interaction without it?
212
N. Flegel et al.
Participants were divided into groups A and B and had to complete 2 tasks either first without the prototype (group A) or with the prototype (group B). The first one (T1) served to clarify RQ1 by asking the participant to cover various distances between screens. The second task (T2) simulated a use case for locating the pointer and addressed RQ2. After completion of each task with one form of interaction, participants were presented with a partial questionnaire (questions 1–8) of the PSSUQ to evaluate the system usefulness [5], because the other two sub-questionnaires (Information Quality; Interface Quality) are not applicable to the evaluation of mouse interaction. The resolutions of the screens were scaled to ensure good legibility (desk screens - private space: 1920 × 1200 pixels at 100%; 2 wall screens - public spaces: 300% at 3840 × 2160 pixels). Sizes of the start and finish areas and the confirm-button of a message were set to 360 × 120 pixels to ensure uniform conditions. The end of timing was recorded by a click event on the finish area or the confirm-button. The study was conducted with the mouse settings as standard provided by Windows 10, including a medium pointer speed, improved acceleration, and display without magnification. Only right-handed users were admitted to the study, with a total of 10 participants. One person was excluded from the result evaluation due to technical issues. The remaining 9 participants (5 male; 4 female) were 7 students and 2 research assistants with an average age of 25 years (Min = 21; Max = 32; SD = 3.4). The experience of working with two or more monitors was diverse (less than 20 to more than 100 h). Within the tasks (consisting of subtasks in a randomized order), participants were asked to follow different mouse paths (see Fig. 2) in order to gain insights into whether and if so, under which conditions, Look & Drop is supportive. Before each, they completed a trial round for practice purposes only. Paths were divided into 3 categories: Screen changes within the private space (paths 1 to 7), switching screens within the public space (Path 8) and between private and public space (paths 9 and 10). In contrast to the other paths, path 3 started and ended in screen center corresponding to positions to which the pointer of Look & Drop is moved by fixed location strategy.
Fig. 2. Mouse pathways in the laboratory study (Task 1: all, Task 2: in orange)
In T1, to remove influences of the movement direction and to reduce the variance of measurements, participants had to complete all 10 paths in both directions in the form of 20 individual subtasks. T2 required participants to confirm a message after the pointer
A Gaze-Supported Mouse Interaction Design Concept
213
was placed at an unknown position after the countdown had elapsed. Participants had to locate the pointer and move it to the confirm-button. Before the start, they were asked to look at the message already displayed during the countdown in order to avoid deviations in the viewing direction at the beginning of the subtasks and the resulting influence on the results. Positions of the pointer and message were set analogous to the positions of start and finish areas from T1 to ensure that the same paths were taken in both tasks. In T2, the paths 3, 6, 7, 9, and 10 were used in the form of 10 subtasks. All subtasks started automatically after the previous. A countdown was displayed on all screens to signal the start. For T1, the reaction time was not measured until the start area was left to obtain more precise values. For T2 it was measured immediately after the countdown had ended. Start and finish of T1 and the button from T2 had uniform dimensions. Reaction times were recorded automatically. First, the total of 540 measured reaction times were cleaned from extreme outliers by means of Z-transformation in order to avoid that the results were influenced by technical problems in mouse interaction with and without Look & Drop. As recommended by [6] for a sample with more than 100 values, Z > 3.3 was chosen as the condition for eliminating outliers. By applying the Z-transform, 2% of the acquired values were removed, consisting of 7 outliers in the interaction with Look & Drop and 4 without support. The fact that three times more outliers were identified in the interaction with Look & Drop than in conventional mouse interaction can be explained by the prototypical nature of the support system and a resulting higher error rate. A significance level of α = 0.05 was used for all statistical tests on the data of the study. For T2, it should be noted that the pure time to localize the pointer was not determined, but a mixture of the localization process and covering a distance. This task form was chosen because it corresponds to an everyday use case for pointer localization: After it has been localized, it is not necessarily located directly at the target element and must therefore still be moved to the corresponding position. However, the reaction times for pure localization in T2 could be approximately determined by using the same paths as in T1: For this purpose, the average response times of the respective paths in T1 were subtracted from the average response times of the individual paths in T2. In addition to the reaction times of T1 and T2, the estimated time for locating the pointer in T2 is therefore also reported and examined. To answer questions RQ1 and RQ2, the average total times for completing the tasks were calculated (see Fig. 3).
Fig. 3. Reaction times (N = 9) (a) for task completion, (b) for different pathways
214
N. Flegel et al.
The normal distribution of average reaction times per participant, interaction form and task were confirmed by the Shapiro-Wilk test [7]. In addition, the average reaction times of the 2 forms of interaction using the paired T-test for dependent samples [8] differ significantly for T1, T2 and the time for localization in T2. More detailed analysis of single pathways revealed imprecisions of the current prototype when moving between neighboring screens but broad support of mouse pointer localizations in any direction. In summary: The longer the distance, the more useful Look & Drop. In T1, participants were significantly faster on average (M = 1393 ms, SD = 314 ms) in task processing with Look & Drop than without it (M = 1720 ms, SD = 365 ms). For solving T2, test persons needed significantly less time with support by Look & Drop (M = 1679 ms, SD = 251 ms) compared to interaction without it (M = 3695 ms, SD = 835 ms). The average reaction time for completing the subtasks of T2 with Look & Drop was reduced by more than half. When investigating the pure localization of the mouse pointer in T2, the significant difference with a reaction time more than three times reduced with Look & Drop (M = 489 ms, SD = 250 ms) compared to the interaction without it (M = 1687 ms, SD = 847 ms) is again increased. The PSSUQ sub-questionnaire on system usefulness yielded a value of 1.8 for interaction with Look & Drop and a value of 2.7 without. Therefore, participants rated system usefulness with Look & Drop support better than without (RQ3). 4.2 Expert Review and Proof of Concept Two expert reviews with operators of a fire and rescue control room were arranged to gain feedback by persons familiar with daily routine and extraordinary circumstances. At the first meeting, the head of the control center with 8 years and a dispatcher with 19 years of professional experience examined the prototype at the control room laboratory. To create contextual reference, a fire and rescue control room simulation (LstSim) was provided. At the second meeting, the prototype was installed at a backup workstation at the actual control room. Hardware setup was unknown to all persons involved in the development. Necessary adjustments to the source code as well as installation and alignment of the webcams at the previously unknown setting took about half an hour in total. In addition to the experts involved in the first review, 2 dispatchers with 11 and 1.5 years of professional experience took part. The experts confirmed that they encounter difficulties in mouse interaction on a daily basis. Especially “in urgent emergency calls, locating the mouse pointer causes additional stress and occurs even when working with three monitors”. Furthermore, covering long distances with the mouse pointer is problematic and can lead to strain on the hand. The usual counterstrategy was described as “rowing”, whereby the mouse is repeatedly lifted and set down. To reduce the problem, modules that are usually needed one after the other during scheduling are placed on adjacent monitors. All participants would use a support system such as Look & Drop in daily work. The experts stated that the idea was “really good” and “when monitoring operations, such a system would make a lot of sense and make things easier”, as it would require frequent screen changes. The head of the control center and a dispatcher cited the advantages of Look & Drop as “stress reduction, work relief, the ability to facilitate smooth workflows and improved ergonomics”. While working with the system, dispatchers observed that it
A Gaze-Supported Mouse Interaction Design Concept
215
offers “a particularly considerable time saving” when switching between the two outer screens. Interaction was described as “natural” by two dispatchers. Head movements towards the focused monitor was perceived as part of the participants usual movements. The possibility of window movement when changing screens was praised as a “useful function”. Participants noted that precise and reliable functioning of the system under all circumstances would be crucial for the use. Different lighting conditions and headsets at work were mentioned as possible influencing factors. In addition, it was important for the acceptance of the technology on the part of the employees to provide them with clear and comprehensible proof that the video data from the webcams are not recorded and used for purposes other than calculating the direction of gaze.
5 Discussion While results of the laboratory study and expert reviews indicate that a gaze-supported mouse interaction design concept like Look & Drop could support operators at state-ofthe-art control rooms, both limitations of the prototype and room for improvement have to be discussed. The short-term installation of Look & Drop within a previously unknown control room setting demonstrated the transferability of the solution. However, there are still inaccuracies in the determination of the viewing direction by the current implementation, especially when changing between adjacent monitors. For the use of Look & Drop in a safety-critical context, high precision of the gaze-based interaction support under all circumstances is an indispensable requirement. A crucial step in the further development of Look & Drop lies in finding a solution for the accurate calculation of the direction of gaze independent of the sitting position and height of the user. The use of eye tracking across multiple screens might be one approach [9]. In addition, there are also advances in eye tracking via webcams, which were not known at the time of technical conception and realization of this study [10]. A hybrid solution by combining different modalities would also be conceivable. Visual cues while and after movements should be integrated as they might improve localization and tracing of the mouse pointer by the user. Another crucial point is the transformation of Look & Drop into a generic solution for arbitrary multi-screen workstations, which is configured by the users or in the best case automatically adapts to the screen layout. For example, the number, resolution, and arrangement of monitors could be read out via the operating system.
References 1. Hollnagel, E., Woods, D.D.: Joint Cognitive Systems: Foundations of Cognitive Systems Engineering. Taylor & Francis, Boca Raton (2005) 2. Mentler, T., Rasim, T., Müßiggang, M., Herczeg, M.: Ensuring usability of future smart energy control room systems. Energy Inf. 1(26), 167–182 (2018) 3. Mayring, P.: Qualitative inhaltsanalyse. In: Mey, G., Mruck, K. (eds.) Handbuch Qualitative Forschung in der Psychologie. VS Verlag für Sozialwissenschaften (2010)
216
N. Flegel et al.
4. Benko, H., Feiner, S.: Multi-monitor mouse. In: CHI 2005 Extended Abstracts on Human Factors in Computing, pp. 1208–1211. ACM, New York (2005) 5. Lewis, J.R.: Psychometric evaluation of the post-study system usability questionnaire: the PSSUQ. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 36, p. 1259–1260. Sage Publications, Los Angeles (1992) 6. Fidell, L.S., Tabachnick, B.G.: Preparatory Data Analysis. Handbook of Psychology. Wiley, Hoboken (2003) 7. Shapiro, S., Wilk, M.: An analysis of variance test for normality (complete samples). Biometrika 52(3/4), 591–6113 (1965) 8. Student: The probable error of a mean. Biometrika, 6(1), 1–25 (1908) 9. Balthasar, S., Martin, M., van de Camp, F., Hild, J., Beyerer, J.: Combining low-cost eye trackers for dual monitor eye tracking. In: Kurosu, M. (ed.) Human-Computer Interaction. Interaction Platforms and Techniques. HCI 2016. LNCS, vol. 9732. Springer, Cham (2016) 10. Lame, A.: Gaze tracking (2019). https://github.com/antoinelame/
Video Conferencing in the Age of Covid-19: Engaging Online Interaction Using Facial Expression Recognition and Supplementary Haptic Cues Ahmed Farooq1(B) , Zoran Radivojevic2 , Peter Mlakar1 , and Roope Raisamo1 1 Tampere Unit of Computer Human Interaction (TAUCHI), Tampere
University, Tampere, Finland [email protected] 2 Nokia Bell Labs, Cambridge, UK
Abstract. More than 50 years since its mass market introduction the core user interfaces of Video Conferencing (VC) systems have essentially been unchanged. Relaying real time audio and video over distance is inherently productive. However, it lacks the sense of in-person interaction. With the current global pandemic, additional privacy concerns over the extended use of video and audio-conferencing systems, there is a need to redefine how VC Systems function and what information they communicate. To resolve these issues, we propose a VC system that utilizes facial recognition to identify and catalog participant’s expressions and communicates their emotional states to other participants on the VC system using encoded haptic cues. In our testing we found that the approach was able to provide summarized haptic feedback of facial expressions and reduce the time it took for the participants to react to ongoing discussions without increasing mental or physical strain on the user. Keywords: Human-computer interaction · Facial recognition · Haptics · Human-systems integration · Video Conferencing
1 Introduction Online video conferencing is a key business tool for remote interaction. Since 1968, when AT&T introduced the concept, various systems have gained traction as viable, secure, and efficient ways to communicate in real time over distance. Subsequently, as communication infrastructure has improved, current video conferencing systems (VCS) have become very efficient at relaying visual and auditory information in high definition. However, interpersonal communication goes beyond a two-dimensional video feed of a communication partner. Febrianita and Hardjati [1] argue that without sufficient nonverbal cues such as facial expressions and body language, the effectiveness of in-person virtual communication can be substantially reduced. Van den Bergh and colleagues [2] suggest that soft skills acquired for in-person interaction may not be completely carried over to virtual interaction, especially in complex or emotional situations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 217–223, 2021. https://doi.org/10.1007/978-3-030-74009-2_27
218
A. Farooq et al.
As participants of such systems reside in different countries with different languages and cultural traits, the efficiency of communication may further be reduced. With the current global pandemic and the extended use of video conferencing systems to replace in-person interaction, there is a need to improve the technology and how we interact with it for enhancing user experience as well as providing a more personalized exchange between individuals and groups. For that reason, we propose a low cost video conferencing system similar to the one proposed by Myers and Secco [3], that utilizes facial recognition to identify and catalog its participant’s expressions and communicate their emotional state using encoded haptic cues.
2 System Design The setup was developed by training a neural network to identify facial expressions of video conferencing participants and by providing 3.5secs haptic feedback cues to communicate their facial expressions with respect to five common emotions (neutral, angry, happy, sad and surprised). After testing different APIs and open source libraries we developed the application setup on Tensorflow 2 in Python using Dlib and OpenCV [4]. The system employed a web camera to first extract the live image of the participant. By using a Haar cascade face detector (OpenCV) we used the input image to extract 51 salient points of the sampled face (nose position, eye shape, eyebrow shape, mouth shape etc.). These points were then normalized with the IMAGE_WIDTH, IMAGE_HEIGHT parameters and then matched to the data set and fed to the recognition model [5] (Tensorflow) within the neural network (Fig. 1).
Fig. 1. (left to right) Angry, Sad. Happy, Neutral and Excited emotions recorded by the system
The generated output was a vector of 5 elements corresponding to a probability distribution of the 5 different emotional responses being identified: neutral, angry, happy, sad and surprised. These vector responses were accumulated over a period of time measured as Refresh_Time_Seconds parameter, which was set to 3.5 s for this study. An averaged response vector was computed, and the maximum average emotion was selected as the emotional response to be transmitted through the haptic feedback wristband where the emotional responses are stored as sound files. To create distinct yet recognizable haptic feedback, we developed custom vibrotactile signals for each emotion response. The goal was to create natural tactile signals that can easily be identified by users with limited training. Therefore, we modulated natural
Video Conferencing in the Age of Covid-19
219
auditory signals to simulate 3 core haptic primitives: human heartbeat, human scream, and a drum-bass combination. Each signal was divided into three segments, while adjusting the tempo and pitch of the second and third segment of the signal helped characterize the entire signal as positive or negative feedback. Using heartbeat as the base primitive, we created feedback for “neutral” and “surprised” emotions by increasing/decreasing the rate of the heartbeat (Fig. 2a & e). Similarly, we utilized the drum-bass beat as the primitive to modulate “happy” and “sad” emotions (Fig. 2c & d), and a modulated scream was used as a representation of the “anger” emotion.
Fig. 2. Modulated feedback signals (from top to bottom) Neutral (a), Anger (b), Sadness (c). Happiness (d), & Excitement (e).
We applied these signals to the participant through a wearable palm device. The device was developed using a Tectonic TEAX14C02-8 voice coil actuator attached to the inside of the palm using a Velcro strap similar to Farooq et al. [6]. We piloted various adaptations of wristband and palm devices that could relay the custom designed signals, and concluded that the feedback parameters (duration, frequency, amplitude) were ideal for stimulating the inside of the palm. As all the 5 signals were of a similar duration (3.5 s) and required similar amplitude, we used a standard D-class amplifier with a peak amplitude of 5.8 V.
220
A. Farooq et al.
Fig. 3. Design and placement of the wearable palm device.
3 User Test We conducted a 24-participant pair-wise study where two unfamiliar participants were acting as both as a presenter and as a listener (counter balanced) in two different sound isolated rooms. In the presenter mode the participants were asked to replicate 15 randomly generated facial expressions from five selected emotions (5X3) into the VC system which were recorded and played back to the listener. The listener was asked to identify each expression using three conditions: 1) by directly viewing the presenter’s recorded video, 2) by listening to the presenter’s recorded audio and 3) by only sensing the system’s haptic feedback signals on their hand using the palm strap device. The duration of each feedback was 3.5 s and the participants were instructed to reply as soon as the feedback ended. At the end, the listener rated the task load (NASA TLX) and accuracy. We also measured the time it took for the listener to respond to each facial expression as well as their accuracy compared to the VC system. Visual feedback was provided using Skype over a Samsung B2440L 24-in. monitor (1080p) with the VC emotion recognition software running in the background. Audio feedback was also provided using Skype where the listener’s monitor was switched off and the participants were wearing noise canceling wired headsets. The presenter recorded the text message “This is the presenter mode for feedback X”, where “X” was the number of feedback (1–15). The presenter was asked to convey their emotional state by amending their delivery of the text message, altering tone, annunciation, speed, and intensity. Haptic feedback was provided as five custom designed feedback signals shown in Fig. 2 via the custom palm device (Fig. 3). Once all the data for one participant was collected the presenter and listener switched roles.
4 Results and Discussion Results of the NASA TLX questionnaire (Fig. 4) showed that identifying emotions with audio only feedback was the most difficult task. Users rated audio-only modality as more mentally and temporally challenging compared to haptics only and visual only conditions. Results also showed that the participants found audio-only condition to require the most effort and it was more frustrating to manage compared to visual and haptic only conditions. Haptic only condition was rated as similar in frustration and effort to visual only condition, but more mentally and temporally challenging.
Video Conferencing in the Age of Covid-19
221
Fig. 4. Mental and physical demand measured for each condition using NASA TLX.
Looking at the response-time measurements (Fig. 5), we see that there were minor differences between the three modalities. However, there was a trend showing that the first task of each modality took longer to complete than the others. Audio only modality consistently remained the slowest across all tasks and conditions. This trend continues in recorded errors (Fig. 6) where we see that participants made more mistakes for audio only modalities compared to haptic and visual only conditions.
Fig. 5. Response time per task for each of the five emotions.
If we consider the number of errors (Fig. 6) we see that visual only task was the most accurate followed by haptics only, while audio only was not only slower (Fig. 5) and difficult to perform (Fig. 4), but produced the most errors (per task and in total). Interestingly, we observed more errors recorded for the first task in each modality condition for both audio and haptic. The results continued to improve during the study for both conditions. This indicates that there is room for learning for better performance.
222
A. Farooq et al.
Fig. 6. Number of errors per task for each of the five emotions
5 Conclusion We proposed a novel method of communicating over a video conferencing system where the presenter’s facial expressions were used to encode emotional information and relay to the listeners using real-time vibrotactile feedback. A user study with 24 participants was conducted. Results showed that encoded haptic feedback helped participants to identify and respond to the presenter faster, with fewer errors and with no significant additional reported stress. Participants performed better at identifying facial expressions using visual-only condition and the VC system with encoded haptic feedback, over audio only condition. Moreover, the listeners identified the encoded haptic feedback signals with a high degree of accuracy and rated the overall system positively, classifying the periodical non-visual haptic information as a novel and informative aspect of the system. However, participants performance and rating of audio only condition was the lowest, illustrating that VC system without video input can be more frustrating and can lead to misunderstanding and misrepresentation.
References 1. Febrianita, R., Hardjati, S.: The power of interpersonal communication skill in enhancing service provision. J. Soc. Sci. Res. 14, 3192–3199 (2019). https://doi.org/10.24297/jssr.v14i0. 8150 2. Canto, S., Jauregi, K., Van den Bergh, H.: Integrating cross-cultural interaction through videocommunication and virtual worlds in foreign language teaching programs: is there an added value? ReCALL J. EUROCALL 25(1), 105–121 (2013). https://doi.org/10.1017/S09583440 12000274. Proceedings of Huub 3. Myers, K., Secco, E.L.: A low-cost embedded computer vision system for the classification of recyclable objects. In: Soft Computing Research Society and Congress on Intelligent Systems (CIS), 05–06 September 2020, Virtual Format (2020) 4. Gupta, N., Sharma, P., Deep, V., Shukla, V.K.: Automated attendance system using OpenCV. In: proceedings of 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, pp. 1226–1230 (2020). https://doi.org/10.1109/ICRITO48877.2020.9197936
Video Conferencing in the Age of Covid-19
223
5. Reny, J.: A convolutional neural network (CNN) approach to detect face using tensorflow and Keras. Int. J. Emerg. Technol. Innov. Res. 16(5), 97–103 (2019). ISSN 2349-5162. SSRN: https://ssrn.com/abstract=3599641 6. Farooq, A., Evreinov, G., Raisamo, R.: Enhancing multimodal interaction for virtual reality using haptic mediation technology. In: Ahram, T. (ed.) Advances in Human Factors in Wearable Technologies and Game Design. AHFE 2019. Advances in Intelligent Systems and Computing, vol. 973. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-20476-1_38
Real-Time Covid-19 Risk Level Indicator for the Library Users Sadiq Arsalan(B) and Khan Maaz Ahmed Department of Information Technology, Østfold University College, Halden, Norway {arsalan.a.sadiq,maaz.a.khan}@hiof.no
Abstract. Covid-19 has highly affected the university library visit routine of students and faculty members. To investigate this further, university students were interviewed, and it was found out that they were quite concerned about their safety and health when visiting the library in Covid-19. To cater this, a smart interactive system is designed, which helps the students and faculty members avoid highrisk situations inside the library and can optimally manage their library visits. The system has two main components i.e. a public display and a mobile interface. Both the public display and the mobile interface were evaluated by university students in order to understand whether this system met the user needs and expectations as expressed during the informing phase. The results and their implications for the designs of similar systems are discussed. Keywords: Covid-19 · Public display · Interaction design · QR code
1 Introduction In recent months, the whole world is focused on epidemics due to the evolvement of the Coronavirus 2019 (Covid-19). It is a new pandemic that concerns every country and human of the world. Covid-19 appeared firstly in China and then spread rapidly throughout the world, sending billions of people into lockdown [1]. The Covid-19 outbreak disrupted life around the globe, and as in any other sector, it has created significant challenges for the global higher education community. Numerous countries suspended face-to-face education and physical exams, as well as putting restrictions on immigration affecting international students [2]. Additionally, the reopening of educational institutes and allowing interaction between students and staff during the Covid-19 poses a special challenge worldwide [3]. Governments around the globe have followed a shared objective of decreasing the spread of Covid-19 by presenting measures restricting social contact. Higher education institutes encourage their students and staff members to follow general infection measures to keep them safe. Avoiding unnecessary social contact and keeping a safe distance are two of the most preventative strategies adopted by the universities since the emergence of Covid-19 [4]. There is no doubt that as a student, one needs the right place to study and conduct research in order to be more focused and productive, but due to Covid-19 libraries around the globe are confronting hard decisions around which services to offer, going from minimal restrictions to complete closure [5]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 224–231, 2021. https://doi.org/10.1007/978-3-030-74009-2_28
Real-Time Covid-19 Risk Level Indicator for the Library Users
225
The main objective of this paper is to propose a smart library solution using a usercentered approach in Covid-19. It is observed and informed that students were much worried about their vulnerability when visiting the library, and they needed a solution by which they could check the total number of people inside the library and current risk level before their visit. Responses were evaluated and it led to the research question “Are the students concerned about their library visits in Covid-19, would they like to use our proposed system, what information would they like to see, and does our proposed system have any positive effect on their routine of library visit in this pandemic situation?”. To answer this research question, a system is designed and developed based on user requirements, and evaluation is performed. Results proved that the proposed solution has a very positive impact on the library usage routine of students in Covid-19.
2 Background The Digital public displays basically refer to the digital screens which are used to deliver the relevant information timely to the public. Digital public displays can have a variety of useful applications across different industries. For instance, according to [6] digital signage has become an important channel of communication with the consumers and has been prominently used in the retail sector as a tool for marketing. Being digital technology, digital public displays can have useful applications in the covid-19 situation. For instance, the dissemination of information about preventive measures, which one must take, in order to effectively counter the situation. [7] while exploring the applications of digital technology in Covid-19 pandemic response, states that the countries that have maintained low Covid-19 per capita mortality rates used digital technologies in their response strategies. Hence, there is no doubt that digital public displays can be a useful addition to covid-19 response strategies. [8] states that public displays are leaving labs and are being deployed in public places and user experience, user acceptance, user performance, display effectiveness, privacy, and social impact are some of the factors which must be taken into consideration while designing effective public displays. [9] is of the view that the use of public displays is increasing in public places for supporting the community and social activities. The author states that while messages shown on public displays are assumed to be eye-catching and appealing to people, for effectively conveying the messages, it is important that the target audience is taken into account. [10] defines QR codes as two dimensional (matrix) codes which can be used for representing data such as web address and map locations. It can be scanned quickly using mobile devices such as smartphones. In digital displays, QR codes can be used for providing the public a gateway to access information that is not directly displayed on the screen. As discussed in the Introduction section, to answer our research question we came up with the proposed display system based on the Ubiquitous Computing paradigm as described in the paper [11]. Our proposed display is based on two parts, one of them must be placed in front of the library so that students who are approaching the library can get an overview of occupancy level, risk level, recommendations, etc. The second one is a web interface that is accessible to students remotely so that they can plan their visit based on the information available through the system. To realize an optimum system, we followed the classical user-centered design approach [12]. Students are the
226
S. Arsalan and K. M. Ahmed
primary user target group in the proposed system and understanding their concerns, needs, and expectations for the system in this Covid-19 situation were the prime goals. For understanding the domain, we reviewed many articles in this context relating the public displays conveying information in the simplest possible way. User engagement and interactivity can be enhanced using multiple methods as discussed in [13, 14]. In the article [15], authors discussed the concept of “sense making theory” as a framework and argued that merely presenting the data does not guarantee that a user can comprehend the meaning out of it, that why proper strategies should be adopted to convey the information in a meaningful way as per user expectations.
3 Informing Phase By analyzing the current Covid-19 situation, it is observed that there is a need for a smart system for the library, which will help the students and faculty members in these tough times. The solution must provide the library users with the important information about the library, before their visit. This leads us to investigate the initial research questions “Are students concerned about their library visits during Covid 19?, or Is Covid 19 situation affecting their library visits? If yes, what kind of information would they like to be presented that will help them stay safe and have a positive impact on their library visit routine?”. For this purpose, informing is done by sharing a consent form followed by conducting interviews of different students at the university. Each student is interviewed for about 15–20 min and asked different questions, “Do you feel the need for using this public display each time you visit the library? and Rate how much informed and safe did the use of this system make you feel, before entering the library”. All the participants said that owing to their safety and health concerns when visiting the library, they would strongly prefer a smart library solution informing them about the total number of people, seated people, and current risk level in the library. Moreover, the participants informed that this solution will not only help ensure their safety but would also provide them the confidence to resume their library activities as those were before the pandemic. After analyzing the results of the interviews and running some successful ideation sessions, different user scenarios and storyboards have been created to get a complete idea about the proposed solution and to communicate the interaction and user experience in a human-centered way. The interview findings and data collection from different library users led to the formulation of the following hypothesis. We hypothesize that providing COVID related information to students and faculty members would make them feel informed and safe, which will reduce the influence of COVID in their library usage pattern.
4 Design Alternatives and Prototype Different design alternatives are presented based on the existing design guidelines in literature, which offered simplicity and clarity to attract the users towards the public display. Two major components i.e. web-based mobile application for remote library users and a public display for on-site users are designed separately. Different design alternatives were exchanged and discussed, and a final design is selected.
Real-Time Covid-19 Risk Level Indicator for the Library Users
227
For both the public display and the remote library users, a web-based prototype is developed using HTML, CSS, JavaScript, PHP, and MySQL database [16]. Development is done in three phases 1) Developing the public display version for on-site users 2) Developing the responsive mobile version for remote library users 3) Deployment on the live host so the remote users could access the prototype for evaluation. As per the informing phase, the prototype contains different information about the library in COVID-19 situation. Fig. 1. Shows the on-site version of the prototype for the public display. As seen in the top section boxes, the current status of the library like the total number of people inside, total seats available, and the number of seated people is being displayed. In the middle section, the current risk level is displayed based on which recommendations and suggestions are also provided as to wear the mask or not. The bottom section shows if the library is currently open or not and different QR code interactions are provided for mobile access. For better interaction and understanding of the risk, the background color of the screen is also changed as per the current risk factor. The mobile version of the prototype for remote library users can be seen in Fig. 2.
Fig. 1. (left) public display version of prototype at low risk level (right) public display version of prototype at high risk level.
Fig. 2. Shows the mobile version of the prototype for remote library users. Low risk level to the high-risk level are shown.
228
S. Arsalan and K. M. Ahmed
5 Evaluation To test the hypothesis and other aspects such as usability, learnability and valuablity of the proposed system, we carried out an evaluation in a controlled setting involving 20 students from the university. As evident in the research articles [17–19] the low fidelity prototypes can be equally effective to elicit the user’s feedback and in usability testing, therefore, we used a low fidelity method called Wizard of Oz for the prototype and evaluation phase to test our hypothesis. Evaluation was carried out keeping in view the needs of both the user groups i.e. 1) Potential library users at remote locations, 2) Library users present at the university campus. In total, 20 participants took part in the evaluation phase, out of which 11 evaluated the system remotely while 9 participants took part in on site evaluation in a controlled environment. For remote access or testing the web version of the developed application, we shared a web URL with the students and demonstrated different risk levels to them i.e. very low risk, medium level risk, and very high-level risk. Participants were asked to use their mobiles or laptops and check what information could they comprehend in time. After this, an online Questionnaire [20] was shared with them to which they were requested to submit their answers anonymously. For the privacy of the participants, no personal information was asked, and no login details were required. For on-site evaluation, we reserved a room at the university with an LED to mimic the public display. At the start of each session, the complete system was briefly explained to each participant and a clear idea of the proposed solution was given. After that, the participants were asked to look at the screen from a considerable distance and comprehend the information for about 15–20 s. The Wizard of Oz approach is used for the evaluation. Several participants scanned the QR codes and got the information on their cell phones. At last, a Questionnaire [21] was shared with them, and were requested to fill later based on their assessments (Fig. 3).
Fig. 3. Shows a participant taking part in on-site evaluation and using his cell phone to scan the QR code
Real-Time Covid-19 Risk Level Indicator for the Library Users
229
5.1 Results To test the validity of the hypothesis, few research questions were formulated and based on the participants’ response in the evaluation phase, individual analysis were done for each research question. Detail of each Research Question (RQ) and the analysis is provided as followed. RQ1. Will the students and faculty members be willing to use our system and get the information about the library in COVID 19? In the questionnaire it is asked that “Do you feel the need for using this public display each time you visit the library?” All the 20 participants answered to this question with the option “Yes”. This response is a strong evidence that there was a dire need of such system especially in Covid-19 situation and that our system filled the void. RQ2. Does our proposed system have any positive effect on the routine of library usage for students and faculty members and will it make them feel informed and safe? To evaluate this research question, users were asked to rate our system based on two questions 1) “Rate how much informed and safe did the use of this system make you feel before entering the library” and 2) “How much do you think the current level of risk shown in the application influences your decision of entering the library?” In response, 18 out of 20 participants responded that they felt very informed and safe and referring to question (2) 19 participants responded that this system would have very significant influence on their decision of visiting the library. RQ3. Is the information clearly conveyed and was easy to comprehend? Pertaining to this research question, a very specific question was asked in the form that “Rate how easy was it for you to get the information from this public display prototype and view the number of people who were inside the library?” A significant number 18 out of 20 participants responded that the information was effectively and easily conveyed by the system. RQ4. What kind of information the students and faculty members prioritize the most and what else would like to know about the library in COVID 19 situation? Responses to two questions in the questionnaire are analyzed to analyze the RQ4. 1) “How much do you think the current level of risk shown in the application influences your decision of entering the library?” 19 out of 20 participants rated this information as very significant for making decision. Secondly an open-ended question was asked 2) “Which functionality of the system proved to be most useful for you? Also Briefly state why?” Upon analyzing, risk level indicator and number of people inside were the two functionalities that people prioritized the most.
6 Discussion and Future Directions This study started with the identification of a problem that the challenges faced by students in their library visit during the Covid-19 situation. With the informing positive evidence is received that students were very concerned about Covid-19 situation while visiting or planning to visit the library. Student’s responses indicated that knowing the occupancy level and risk level related information about the library would help them in this uncertain time (for details see Informing section). This information led to the
230
S. Arsalan and K. M. Ahmed
design of our proposed system. Our prototype system shows relevant details about the library i.e. number of people inside, number of seats available, current risk level, and recommendations for the users who are inside the university and those who are at a remote location and outside the university. Based on the results, 100% participants agreed that they found this system very useful for their library usage in Covid-19 situation. A significant majority of around 90% of participants rated the system very significant in terms of comprehensibility and responded that the system will have a positive impact on their library visiting routine and the proposed solution will make them feel informed and safe during these uncertain times. These responses and analysis clearly indicate the validity of the hypothesis (For detailed analysis check Sect. 5 Evaluation and results). Throughout this research and design, we faced certain challenges that led to some limitations, especially due to Covid-19 situation and pertaining strict rules, we were not able to test the system outside the Library, therefore, the system is tested in a controlled setting environment and performed usability to get responses from 20 participants. Owing to the fact that the tangible interaction could result in possible virus spread, we had to make sure that there should be minimum physical activity around the display, therefore, we opted for a minimum possible interaction design and could not include the various schemes to increase interactivity. This study has shown that providing information related to risk level and occupancy level about the library to students will make them feel safe, informed, and help them avoid possible risk situations. This study can be scaled up and used in many diverse environments. It was also planned to have ambient displays inside the library to keep the user informed about the risk levels and related precautionary measures but due to time limitation, this hypothesis of having ambient displays for high risk situations can be done in the future. Knowing the fact that tangible interaction could result in possible virus spread, this could also lead to another important question, “how can we increase engagement, impact, and comprehensibility but with minimum crowd/physical activity around the system?”. This can be improved in future studies and evaluation phases with a high-fidelity prototype system.
7 Conclusion In this paper, we propose a smart solution to help library users in Covid-19. The system targets two types of library users i.e. those who are already inside the university and, those who are at a remote location. For on-site users, a public display is designed to be placed outside the library that will provide information about the number of people in the library and the current risk level. Remote library users can access the same information on their mobile phone by visiting a web URL connected with the same database. The Prototype is developed and tested both for on-site and remote users and we found out that 100% of the participants want to use the system in Covid-19 for their library visits. It can be concluded that the proposed solution is very helpful for students and faculty members to resume their library activities as before the Covid-19 situation. Acknowledgments. We would like to convey our gratitude to Prof. Georgios Marentakis and Østfold University College, Norway to guide us from the beginning till the end, and to fund this research project.
Real-Time Covid-19 Risk Level Indicator for the Library Users
231
References 1. Sahu, P.: Closure of universities due to coronavirus disease 2019 (COVID-19): impact on education and mental health of students and academic staff. Cureus 12, e7541 (2020) 2. Crosier, A.M.-S.A.D.: How is Covid-19 affecting schools in Europe? (2020) 3. Wood, G.: There’s no simple way to reopen universities (2020) 4. Rose, S.: Medical student education in the time of COVID-19. JAMA 323, 2131–2132 (2020) 5. IFLA: COVID-19 and the Global Library Field (2020) 6. Burke, R.R.: Behavioral effects of digital signage. J. Advert. Res. 49, 180–185 (2009) 7. Whitelaw, S., Mamas, M.A., Topol, E., Van Spall, H.G.: Applications of digital technology in COVID-19 pandemic planning and response. Lancet Digit. Health (2020) 8. Alt, F., Schneegaß, S., Schmidt, A., Müller, J., Memarovic, N.: How to evaluate public displays. In: Proceedings of the 2012 International Symposium on Pervasive Displays, pp. 1–6 (2012) 9. Thapa, P.: A review on use of public displays in location-based service 10. Petrova, K., Romaniello, A., Medlin, B.D., Vannoy, S.A.: QR codes advantages and dangers. In: 13th International Joint Conference on e-Business and Telecommunications, pp. 112–116. SCITEPRESS–Science and Technology Publications, Lda (2016) 11. Storz, O., Friday, A., Davies, N., Finney, J., Sas, C., Sheridan, J.: Public ubiquitous computing systems: lessons from the e-campus display deployments. IEEE Perv. Comput. 5, 40–47 (2006) 12. Preece, J., Sharp, H., Rogers, Y.: Interaction Design: Beyond Human-Computer Interaction. 4 edn. (2015) 13. Kukka, H., Oja, H., Kostakos, V., Gonçalves, J., Ojala, T.: What Makes you Click. ACM Press 14. Brignull, H., Rogers, Y.: Enticing people to interact with large public displays in public spaces (2003) 15. Kim, S.: Investigating everyday information behavior of using ambient displays: a case of indoor air quality monitors. In: Proceedings of the 2018 Conference on Human Information Interaction & Retrieval, pp. 249–252 (2018) 16. Yi, X.Y.C.: Design and implementation of the website based on PHP & MYSQL (2010) 17. Dhillon, B., Banach, P., Kocielnik, R., Emparanza, J.P., Politis, I., Raczewska, A., Markopoulos, P.: Visual fidelity of video prototypes and user feedback: a case study. In: Proceedings of HCI 2011 The 25th BCS Conference on Human Computer Interaction, vol. 25, pp. 139–144 (2011) 18. Sauer, J., Sonderegger, A.: The influence of prototype fidelity and aesthetics of design in usability tests: effects on user behaviour, subjective evaluation and emotion. Appl. Ergon. 40, 670–677 (2009) 19. Dow, S., Lee, J., Oezbek, C., MacIntyre, B., Bolter, J.D., Gandy, M.: Wizard of Oz interfaces for mixed reality applications (2005) 20. https://nettskjema.no/a/172395 21. https://nettskjema.no/a/172592
Pilot’s Visual Eye-Track and Biological Signals: Can Computational Vision Toolbox Help to Predict the Human Behavior on a Flight Test Campaign? Marcela Di Marzo1(B) , Jorge Bidinotto1 , and José Scarpari2 1 São Carlos School of Engineering (EESC), University of São Paulo (USP), Av. Trabalhador
são-carlense, 400, São Carlos, SP, Brazil [email protected], [email protected] 2 Instituto Tecnológico da Aeronáutica, São José dos Campos 12228-900, Brazil
Abstract. Humans’ biosignal responses are physiological behaviors that unravel the understanding of patterns which may help to predict reactions from risky situations. In the aeronautical field, the pilot’s reaction behavior is highly determinant to overcome a flight failure and excel in the execution of a maneuver during a flight. In this manner, an experiment was conducted using the AS-350 aircraft (Airbus Helicopters) during a Flight Test Campaign, which pilots were exposed to unexpected engine failures, and tested their ability to make a safe landing under the conditions prescribed by the aircraft manufacturer. This work aimed to analyze the spatial distribution of the visual eye-track and its effects on pilot performance, describing the recurrence of the location rate of the visual field across the cockpit versus the handling qualities of the aircraft, along with the physiological parameters of the pilots. Preliminary empirical results showed that the combination of three elements (Runway, altitude, and engine speed) are positively impacted on the success of the task. Keywords: Flight Test Campaign · Workload vs physiological response · Unexpected response · Human-computer Interaction
1 Introduction Flight safety is a requirement in the aviation sector, and it has been increasingly the target of many studies, research, and developments. Even though performance, efficiency, and comfort are highly considered in the development of new aeronautical projects, flight safety is one of the most important criteria considered and plays a crucial role in this field. Studies in the flight safety area can assist on the pilot tasks’ automation, flight controls simplification, new alarm systems adoption, among others [1, 2]. In addition to automation and the development of new technologies that may facilitate the execution of the tasks by pilots, highly trained and skilled pilots, both physically and psychologically, contribute to the safety of the flight once they are more prepared to respond quickly and assertively to critical situations in flight. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 232–238, 2021. https://doi.org/10.1007/978-3-030-74009-2_29
Pilot’s Visual Eye-Track and Biological Signals
233
Correlating these two factors that directly contribute to flight safety (technology and human behavior), it is of interest to understand the biological responses of pilots in real situations of danger, which can cause emotional and biological fluctuations leading to physiologic and psychological changes [3, 4]. Also, this critical situation can lead to anxiety effects on cognitive tasks placing significant demands on cognitive resources. Nowadays the facility in capturing biological signals is due to the data storage cost reduction and the equipment miniaturization aimed open a new rationality called the Internet of Medical Things (IoMT) environment [5, 6]. The Computer Vision area tries to build an artificial intelligence system to obtain information from multidimensional images or data [7], by recognizing the positioning of the gaze and facial expressions. Attempts to recognize the suspect target area in remote sensing images after a series of pre-processing. With this objective, this research works with eye- and face-track data provided by a Flight Test Campaign conducted by the Brazilian Flight Test Research Institute (IPEV) where eleven pilots fully instrumented with physiological sensors were submitted to a sudden and unexpected engine failures, and their physiological reactions were collected.
2 Background Theory For the construction of this work, a bibliographic survey was made regarding topics that make it possible to rationalize, collect, and analyze data that assist in understanding the human factors associated with success in a flight. The four topics explored for this understanding were: Human-computer Interaction, Human-Behavior in unexpected situations, Flight-test campaign, and Eye-tracking system. 2.1 Multimodal Human-Computer Interaction Human-Computer Interaction (HCI) is defined as a single-user facing a computer and interacting with explicit commands, which incorporates multi-disciplinary fields as computer graphics, human factors, cognitive psychology, and artificial intelligence [8, 9]. Additionally, alongside with the evolution of the technologies, the HCI became more interactive, that is, allowing multiple users, and inputs-or-outputs non-structural data, often called as Multimodal Human-Computer Interaction (MMHCI). Likewise, HCI and MMHCI need the comprehension of at least three factors: the user (the human factor), the system (the computational technology) and the interaction between the user and the system [9]. MMHCI importance relies on the development of new technologies, concepts, and designs to improve the usability for the user with new ways of interaction. The usability can be understood as the intuitive use of the system, and incorporates characteristics such as learnability, productivity, efficiency, effectiveness, retainability, and user satisfaction [10]. Complementary, human factor is one of the most important factors when addressed MMHCI in aviation, and stablishes its importance in improving flight safety: playing a crucial role, for instance, in the development of new designs of aviation’s systems, cabin and cockpits, cockpit organization, crew interaction, automation and others [11].
234
M. Di Marzo et al.
2.2 Human-Behavior in Unexpected Situations The subject “human behavior in unexpected situation” on this work is related to the human performance when executing a task in a stressful condition, and how these stressful situations can lead to a possible risk circumstances due to psychological disturbances and influence the reaction on some tasks that demand a short time reaction response from the decision-makers [12]. One emotion that is intrinsic related with this high-pressure and stressful state is Anxiety. It can cause changes on psychophysiological signals, leading to different levels of attention and psychomotor skills. Anxiety is defined by Allsop and Gray [13] “as a negative emotional and motivational state that can occur when a current goal is under threat, or physical harm is perceived to be imminent”. This such negative state is frequently linked to various accidents in the aviation sector resulting in a sequency of human errors [14]. Thus, due to the possible potential critical consequences of these emotional changes on aviation sector, the study and understanding of these physical and psychological changes resulted from these emotional fluctuations, must be deeply evaluated and pilots must be extremely well trained in order to reduce the reliability of the procedures. Wiegmann and Shappell [14] studied situations providing safety practitioners with an overview of the prominent human error perspectives in aviation. Safety considerations and enhanced aircrafts simulations emergency reaction are mandatory in excel successful intervention; some studies are presented in the literature [15–17]. Scarpari et al. [18] shown that pilots training-based formation, along with the development of flight simulators with physiological measurements parameters quantification, impacts on pilot performance which improves across time (by training). 2.3 Flight Test Campaign An aircraft flight testing campaign can be designed for its certification or pilot training. It is designed to evaluate aircraft characteristics such as: operational performance, noise emission, maneuverability, and normal mode system operation. In addition to normal mode, alternative campaigns can be designed to simulate failure scenarios and extreme conditions, which is adopted in both cases (aircraft certification and pilot training). Follador and Trabasso [19] investigated about the systematization of the tradeoff between actions focused on performing flight test campaigns versus the quality of the training, under the knowledge management and competence transferring optic. Scarpari et al. [18] discussed the gain of adopting quantitative measurements towards the pilot’s biosignal regarding their behavior upon a simulation of unexpected engine failure. Corroborating with the usage of mathematical models, Roscoe and Ellis [20] summarized in a practical technical report a subjective rating scale model of spare capacity to assess the practicing test pilots workload as a promising approach.
Pilot’s Visual Eye-Track and Biological Signals
235
Thus, given the development of the technology, new equipment (fNIRS, eye-tracking device, real-time data acquisition among others) has been added to the big data era and increased the possibilities of quantifying the quality of flight test campaigns. Artificial intelligence models enable the improvement of the experimental design thinking and performance evaluation by adopting non-structural data interpretation (such as audio, image, video etc.) to enhance pilots’ skills. 2.4 Computer Vision – Eye-Tracking Computer Vision is the subfield of artificial intelligence which mimics human vision capabilities through mathematical reasoning. Popoola and Wang [21] presented an extended update on Intelligent vision systems on observed dynamic of moving targeted, and human-computer interfaces, focused on abnormal human behavior detection. Among the human senses, the eye plays an important role in the interaction with the environment and the world around it, being this organ responsible for up to 80% of the stimuli received and the generated mental interpretations [22]. Since this organ has such importance, it is natural that it is through it that information can also be extracted for a better understanding of human psychology and perception. According to Majaranta & Bulling [23] “Eye-tracking refers to the process of tracking eye movements or the absolute point of gaze (POG)—referring to the point the user’s gaze is focused at in the visual scene”. The eye-tracking and eye detection positioning can be crucial for the development of more intelligent and intuitive tools and interfaces for the users in the human-computer interaction [24]. Analyses of eye movement, gaze positioning, pupil dilation and blinks amount are examples of characteristics that can generate relevant information not only about the emotional state of the user but also about their spatial sensing and visual behavior, which can lead to insights about gaze interactions and to the development of new interfaces designs: more suitable for each need of the human interactive technologies. In aviation, the gain of using eye- and face-tracking technology is much more than just improve interfaces on MMHCI, it improves pilots’ training, cockpits, cabins, and, ultimately, and the most important: flight safety. Colvin et al. [25] shown that eye-tracking technics for studying pilots’ monitoring performance is a promising tool, nonetheless, a challenge technic, given the personal experience of the subject to be considered in the generalization of the results. Peysakhovich et al. [26] studied feasibility of the neuroergonomics towards the aircraft cockpits design enhance using the eye-track system.
236
M. Di Marzo et al.
3 Preliminary Results The motivation of this work was to analyze the performance of pilots in a Flight Test Campaign using computational vision toolbox for helping to predict human behavior. A single-engine helicopter model AS-350, from the Brazilian Air Force, was adopted and eleven elite military pilots fully equipped with special sensors were monitored. Each pilot performed between 12 and 27 complete autorotations, in different height and speed combinations, at least one with an unexpected engine failure. A sample of this experiment was taken whose main objective was to discuss the feasibility of adopting eye- and face-tracking analyzes for the development of a model that associates the probability to the success of executing a given task (here being the reaction on the engine failure simulation), and also to explicit which factors are determinant to its success. Moreover, the preliminary results open the possibility of revealing the pilots’ profiles and the chances to improve skills, over time, according to their natural abilities. To illustrate, a single experiment sample of one pilot’s response was taken from this flight campaign. Figure 1 shows the adopted equipment placed on the pilot’s head to measure his/her physiological signals, besides the eye-tracking device. Moreover, it displays the head tracking detection adopted for further studies. For instance, this work adopted for the purpose of illustration a single sample pilot’s response on this campaign. Figure 1 shows the adopted equipment placed on the pilot’s head to measure his physiological signals, eye-tracking device, and moreover, displays the head tracking detection adopted for further studies.
Fig. 1. Adopted equipment placed on the pilot’s head (physiological signals) and eye-tracking device.
Figure 2 displays the results of the taken sample considering only eye-track data, using the Tobii company’s glasses. The elements that are observed, as well as their frequencies of access and recurrence during the engine failure simulations, may show which ones are more essential for the task successfully. The three elements that were seen as most important in this test were: Runway, altitude, and engine speed. Moreover, more adequate cockpit designs can be thought of.
Pilot’s Visual Eye-Track and Biological Signals
237
Fig. 2. Eye-track sample results shown the three most pilots’ looked elements: runway, altitude, and engine speed.
4 Discussion This work brought elements that corroborate the usefulness of predictive models aided by eye- and face-tracking in flight tests. This technology enables the development of new resources in the aviation area, such as: cockpit configuration, pilot profile predictability, predictability of the amount of training for each profile, etc. Moreover, all the obtained information has been extricated from data-driven models, that is, through artificial intelligence algorithms. Further work aims to implement videos extracting information techniques considering the entire set of data from the experiment, containing more than 100 GB of recording data. Parallel processing techniques, as well as automatic model selection (auto-machine learning), may be adopted in order to make this information retrieving from the acquired data viable.
References 1. Kim, S., Choi, B., Cho, T., Lee, Y., Koo, H., Kim, D.: Wearable bio signal monitoring system applied to aviation safety. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2349–2352. IEEE, July 2017 2. Scarpari, J.R.S., Ribeiro, M.W., Deolindo, C.S., Aratanha, M.A.A., Andrade, D., Forster, C. H., Figueira, J.M.P., Lacerda, S.S., Machado, B.S., Amaro-Jr, E., Sato, J.R., Kozasa, E.H., Silva, R.G.A.: Elite helicopter pilots physiological assessment during landing maneuver in critical situation. Unpublished (in press) 3. Eysenck, M.W., Derakshan, N., Santos, R., Calvo, M.G.: Anxiety and cognitive performance: attentional control theory. Emotion 7(2), 336 (2007) 4. Cha, U.: Developing an embedded method to recognize human pilot intentions in an intelligent cockpit ads for the pilot decision support system. J. Ergon. Soc. Korea 17(3), 23–39 (1998)
238
M. Di Marzo et al.
5. Vishnu, S., Ramson, S.J., Jegan, R.: Internet of medical things (IoMT)-an overview. In: 2020 5th International Conference on Devices, Circuits and Systems (ICDCS), pp. 101–104. IEEE, March 2020 6. Qadri, Y.A., Nauman, A., Zikria, Y.B., Vasilakos, A.V., Kim, S.W.: The future of healthcare internet of things: a survey of emerging technologies. IEEE Commun. Surv. Tutor. 22(2), 1121–1167 (2020) 7. Fang, Z., Yao, G., Zhang, Y.: Target recognition of aircraft based on moment invariants and BP neural network. In: World Automation Congress 2012, pp. 1–5. IEEE, June 2021 8. Newell, A., Card, S.K.: The prospects for psychological science in human-computer interaction. Hum.-Comput. Interact. 1(3), 209–242 (1985) 9. Jaimes, A., Sebe, N.: Multimodal human–computer interaction: a survey. Comput. Vis. Image Underst. 108(1–2), 116–134 (2007) 10. Hartson, R., Pyla, P.S.: The UX Book: Process and Guidelines for Ensuring a Quality User Experience. Elsevier, Amsterdam (2012) 11. Salas, E., Maurino, D., Curtis, M.: Human factors in aviation: an overview. In: Human Factors in Aviation, pp. 3–19. Academic Press (2010) 12. Driskell, J.E., Salas, E. (eds.): Stress and Human Performance. Psychology Press (2013) 13. Allsop, J., Gray, R.: Flying under pressure: Effects of anxiety on attention and gaze behavior in aviation. J. Appl. Res. Mem. Cogn. 3(2), 63–71 (2014) 14. Wiegmann, D.A., Shappell, S.A.: Human error perspectives in aviation. Int. J. Aviat. Psychol. 11(4), 341–357 (2001) 15. Nagel, D.C.: Human error in aviation operations. In: Human Factors in Aviation, pp. 263–303. Academic Press (1988) 16. Sharma, S., Singh, H., Prakash, A.: Multi-agent modeling and simulation of human behavior in aircraft evacuations. IEEE Trans. Aerosp. Electron. Syst. 44(4), 1477–1488 (2008) 17. Miyoshi, T., Nakayasu, H., Ueno, Y., Patterson, P.: An emergency aircraft evacuation simulation considering passenger emotions. Comput. Ind. Eng. 62(3), 746–754 (2012) 18. Scarpari, J.R.S., Forster, C.H.Q., Andrade, D., Silva, R.G.A.: Autorotation: physiological measures of workload. In: 45th European Rotorcraft Forum, Warsaw, Poland, pp. 17–20 (2019) 19. Follador, R.D.C., Trabasso, L.G.: Knowledge management patterns model for a flight test environment. J. Aerosp. Technol. Manag. 8(3), 263–271 (2016) 20. Roscoe, A.H., Ellis, G.A.: A subjective rating scale for assessing pilot workload in flight: a decade of practical use (No. RAE-TR-90019). Royal Aerospace Establishment Farnborough, United Kingdom (1990) 21. Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition—a review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 865–878 (2012) 22. Haupt, C., Huber, A.B.: How axons see their way–axonal guidance in the visual system. Front. Biosci. 13, 3136–3149 (2008) 23. Majaranta, P., Bulling, A.: Eye tracking and eye-based human–computer interaction. In: Advances in Physiological Computing, pp. 39–65. Springer, London (2014) 24. Ji, Q., Wechsler, H., Duchowski, A., Flickner, M.: Special issue: eye detection and tracking. Comput. Vis. Image Underst. 98(1), 1–3 (2005) 25. Colvin, K.W., Dodhia, R.M., Belcher, S.A., Dismukes, R.K.: Scanning for visual traffic: an eye tracking study. In: Proceedings of the 12th International Symposium on Aviation Psychology, vol. 14 (2003) 26. Peysakhovich, V., Lefrançois, O., Dehais, F., Causse, M.: The neuroergonomics of aircraft cockpits: the four stages of eye-tracking integration to enhance flight safety. Safety 4(1), 8 (2018)
Meeting the Growing Needs in Scientific and Technological Terms with China’s Terminology Management Agency – CNCTST Jiali Du1 , Christina Alexandris2 , Yajun Pei3 , Yuming Lian4 , and Pingfang Yu1(B) 1 Lab of Language Engineering and Computing, Faculty of Chinese Language and Culture,
Guangdong University of Foreign Studies, Guangzhou, China {201310039,201310051}@oamail.gdufs.edu.cn 2 National University of Athens, Athens, Greece [email protected] 3 China National Committee for Terms in Sciences and Technologies, Beijing, China [email protected] 4 Beijing International Institute of Urban Development, Beijing, China [email protected]
Abstract. China National Committee for Terms in Sciences and Technologies (CNCTST) is a top management agency in China. It was established in 1985, aiming to introduce new terminology, endorse changes of borrowed terminology, alter the standards of terminology, explain unfamiliar terminology, and adopt practice to make terminology more accurate. CNCTST tries to standardize terminology practices in the country. Under the management of CNCTST, a considerable number of terminology databases have been constructed to serve the academic research and practical translation. Furthermore, CNCTST officially launched the first phase of DISE (data intelligent search engine) platform named TERMONLINE on June 15, 2016, and launched the revised second phase of TERMONLINE on September 17, 2020. This article will discuss both different periods of CNCTST and its successful management in Chinese terminology research and practices. Keywords: Terminology · Database · CNCTST · TERMONLINE
1 Introduction A characteristic-even unique-feature of terminology is that in its expression of scientific concepts, there may be issues to be addressed that might be regarded as contradictory at first sight. One of the typical challenges of terminology, concerning knowledge representation and knowledge management, is the accurate representation of conceived abstract entities (concepts) [1]. The complexity of this challenge increases with the increase of research and development (R&D) globally and the necessity of appropriate terms-monolingual and multilingual-for the international community in the fields of Science and Technology. Thus, more and more scholars are keen on expressing their views about the terminological work from theoretical or practical perspectives. The discussion © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 239–245, 2021. https://doi.org/10.1007/978-3-030-74009-2_30
240
J. Du et al.
includes research on corpus or database [2–5], semantic purpose [6, 7], multilingual translation [8–10], computational analysis [11–13], cognitive processes [14], among others. To meet the requirements of handling the growing number of translated terms from abroad when China opened up in 1978, China established its top agency, the China National Committee for Terms in Sciences and Technologies (CNCTST) on April 25, 1985. The CNCTST was established to manage the whole range of terminological areas, including introducing new terminology, endorsing changes of borrowed terminology, altering the standards of terminology, explaining unfamiliar terminology, and adopting practice to make terminology more accurate. Furthermore, CNCTST also tries to provide the comparative analysis of the terminology used in China Mainland, China Hongkong, China Macao and China Taiwan, with the aim to narrow the knowledge gap between themselves wherever possible. This article will present and analyze the historical background of CNCTST (https:// www.cnctst.cn/) and its impact to terminology research and introduce its terminology database-TERMONLINE (https://www.termonline.cn/) and its connection to terminology research in the international community.
2 The Establishment of CNCTST (1985–1991) The State of Council of China approves a plan to establish CNCTST in December, 1978. To meet the rapidly growing demand for terminology borrowed from abroad, the National Science and Technology Commission and the Chinese Academy of Sciences jointly submitted the “Report on the Establishment of the National Natural Science Term Evaluation Committee (former name of CNCTST)” to the State Council of China on October 6, 1978. This action was just before the date of government announcing opening up on December 12, 1978. The State Council approved the report two months later and asked the Chinese Academy of Sciences to be in charge of the preparatory work. CNCTST was formally established on April 25, 1985 in Beijing. The well-known nuclear physicist and academician, Sanqiang Qian, served as the first director of CNCTST. The State Council made important instructions to CNCTST on August 12, 1987: “CNCTST was established with the approval of the State Council. It is the committee’s authority to review and publish the terms of various disciplines. Within this scope, the scientific terms are approved and established by the authorities and are used by scientific research units across the country, including teaching units, production units, business management units and publishing units. All scientific research units should comply with terminological directions issued by CNCTST”. The first 1949 astronomical standardized terms were promulgated on September 24, 1987. Sanqiang Qian, the first director of CNCTST, held the press conference and announced “the State Council’s Reply on the Issue of Promulgation of Astronomical Terms”. These terms which had been studied for nearly three years were the first ones issued by CNCTST. CNCTST held a meeting to discuss the plan working for terminology exchanges on September 22, 1988. It is the first time for CNCTST to make a plan to focus on the extension of terminological development from China mainland to China Hongkong, China Macao, China Twaiwan, and the other Chinese-speaking regions. Furthermore, CNCTST paid attention to both basic subject terms and applied subject terms, and
Meeting the Growing Needs in Scientific and Technological Terms
241
discussed the issue of publishing good quality terminological works of various subjects in traditional Chinese characters on January 26, 1989. In 1991, under the management of CNCTST, Ministry of Machinery Industry began to construct a concept oriented terminology database, China Mechanical & Electrical Engineering Terminology Database (CMEETDB), with parallel translation among Chinese, English, Russian, Japanese, German and French. The database had 44000 term records and 200000 multiple language supported term entries. It was included in catalogue of international terminology database by INFOTERM in 1996.
3 The Growth of CNCTST (1992–2001) The renowned Chinese chemist and academician, Jiaxi Lu, took over as the second director of CNCTST on September 23, 1992. He hosted a symposium on May 13, 1994 in Beijing to promote the exchange and unification of scientific and technological terms across the Taiwan Strait, which marked the official launch of the scientific and technological work on cross-strait terminology comparison and unification. CNCTST held a seminar on comparison of cross-strait astronomy terms in Huangshan on June 23, 1996. This is the first scientific and technological terminology seminar jointly held by experts on both sides of the strait. China Taiwan also welcomed the delegation of CNCTST to participate in the cross-strait symposium held in Taipei on July 16, 1996 to discuss the navigational scientific and technological terminology. Thus, terminology narrows the gap between cross-strait research. CNCTST produced its academic journal, “K¯ejì shùyˇu yánji¯u(Chinese Science and Technology Terms Journal)” on December 25, 1998. The journal is published in Chinese, and devotes to the terminological research. The director of CNCTST, chemist and academician, Jiaxi Lu, is the honorary editor-in-chief. CNCTST becomes a joint member of the International Information Centre for Terminology (Infoterm: https://www.infoterm.info/). With the rapid development of science and technology, CNCTST strengthened international cooperation and signed a research cooperation agreement on July 20, 2002 with Infoterm, “which was founded in 1971 by contract with the United Nations Educational, Scientific and Cultural Organization (UNESCO), with the objective to support and co-ordinate international co-operation in the field of terminology”. According to the website of Intoterm, we have access to the information that “members of Infoterm are international, regional or national terminology institutions, organizations and networks, as well as specialized public or semi-public or other non-profit institutions engaged in terminological activities”.
4 The Maturity of CNCTST (2002–2014) Nominated by the Ministry of Science and Technology and the Chinese Academy of Sciences, Academician Yongxiang Lu, President of the Chinese Academy of Sciences, took over as the third director of CNCTST in March 2002. He emphasized the importance of the release and trial of new scientific and technological terms and the construction of terminology database, which was to usher in a new era in CNCTST management on March 19, 2002. Furthermore, the international cooperation with the other institutions
242
J. Du et al.
and organizations in European countries was greatly deepened when the delegation from CNCTST was invited to visit European terminology institutions and organizations on October 21, 2002. In 2002, under the management of CNCTST, Data Bank of Cyclopedia Terms (DBCT) which included about 1 million terms was successfully created by Encyclopedia of China Publishing House after a decade of construction. DBCT provides terminology content of the Natural Sciences, Engineering Technology, Social Science, Culture and Education, Art, Religion and other disciplines. A special feature of DBCT is that it provides content of the unique terminology related to China. The DBCT was technically supported by the Multimedia Information Retrieval System (MIRS) of Peking University’s Founder Company. The cooperation provides a “win-win” situation for both DBCT and MIRS. The website of CNCTST was available on March 30, 2004. CNCTST provides more than 200,000 standardized scientific and technical terms online. It is the modern platform that supports the promotion and popularization of standardized scientific and technical terms all over the world. The journal sponsored by CNCTST was renamed on February 25, 2007. The Chinese journal name is considered to be Zh¯ongguó k¯ejì shùyˇu, and English name to be “China Terminology”. Furthermore, CNCTST publishes the journal bimonthly instead of quarterly to provide more chances for translators and terminologists. The Chinese government and the Russian government announced the cooperation project of terminological research on September 24, 2008. The Sino-Russian Prime Minister’s Regular Meeting Committee approved the project of “Maintenance of terminology in Sino-Russian scientific and technological cooperation” hosted by both CNCTST and Russian Terminology Agency. This makes the terminological cooperation between two countries active and effective. CNCTST signed an agreement with the world’s largest Chinese encyclopedia (Baidu Baike) to jointly promote standardized scientific and technical terms on June 21, 2010. CNCTST provides more than annotated 80,000 terms to the encyclopedia and the Baidu online platform offers modern convenience to users all over the world. Experts from CNCTST and China Taiwan discussed the further terminological cooperation in Beijing on June 29, 2011. During the fifth round of talks, terminological experts from both sides agreed to compile a special terminology dictionary entitled “Chinese Dictionary of Science and Technology”. This plan makes the terminological exchanges across the strait more practical and technical. CNCTST announced the establishment of 90 subcommittees for about 48 disciplines on February 26, 2014. Furthermore, CNCTST made considerable progress in supporting the standardization of scientific and technical terms in ethnic minority areas on April 30, 2014. It supports the terminological research on Tibetan, Uyghur, Kazakh, Mongolian, Korean, and other dialects or languages by providing further funding, terminological service, and practical help. CNCTST published the first minority language based terminological dictionary, “Chinese-English-Uyghur Science and Technology Dictionary”, which was compiled in Uyghur according to the standardized terms issued by the agency.
Meeting the Growing Needs in Scientific and Technological Terms
243
Convenient online query services is provided by CNCTST on July 4, 2014. With the widespread use of Weibo and WeChat in China, CNCTST began to use new social query tools to provide convenient services.
5 The Continuous Development of CNCTST (2015–2020) After the joint nomination of the Ministry of Science and Technology and the Chinese Academy of Sciences, Academician Chunli Bai, President of the Chinese Academy of Sciences, took over as the fourth director of CNCTST on March 20, 2015. The newly revised website of CNCTST is available online. Internet users have free access to more than 300,000 standardized scientific and technical terms, which brings the further interconnection and intercommunication of several basic databases for the standardization, and promotes the informatization and popularization of the standardized scientific and technological terms. The sub-committee of CNCTST for Terms in Urban and Rural Planning was established in May 2015. Since the middle of the 20th century, urban issues-oriented research has become the focus of common attention in the international political and scientific circles. Interdisciplinary theories and ideas concerning society, economy, politics, and ecological environment have flooded into the field of Urban and Rural Planning. The birth of these emerging disciplines has promoted the continuous extension and expansion of the fields and categories of terminological research. The sub-committee of CNCTST for Terms in Nuclear Science was established in November 2015. The committee standardizes a large number of new terms, including terms from the fields of Uranium Geology, Uranium Mining and Metallurgy, Nuclear Energy and Power, Nuclear Materials, Isotope Separation, Nuclear Chemistry and Radiochemistry, Radiation Protection, Nuclear Chemical Engineering and Nuclear Physics among others. With the outbreak of COVID-19 in January, 2020, CNCTST tries to provide terminological support for the epidemic control. The sub-committee of CNCTST for Terms in National Public Health and Preventive Medicine was established on Oct. 16, 2020. The sub-committee is working for the standardized terms on epidemic control. The continuous development of CNCTST brings the rapid development of network technology, which helps to construct terminology platform. In 2016, Database TERMONLINE (https://www.termonline.cn/index.htm) was constructed by CNCTST. TERMONLINE is an official internet knowledge service platform, which provides terminology retrieval, terminology management (error correction, collection, sharing), terminology extraction, annotation, and terminology proofreading services. TERMONLINE, considered a Chinese terminology database of crucial importance, includes more than 500000 standard terms, and covers more than 120 disciplines in the Basic Sciences, Engineering and Technology, Agricultural Science, Medicine, Human Sciences and Social Science, Military Science and other fields. The database was officially launched on June 15, 2016. Based on four years’ construction, CNCTST launched the revised second phase of TERMONLINE on September 17, 2020. TERMONLINE becomes more convenient and comprehensive, also targeting to become a multilingual platform. Under the management of CNCTST, which proved to be a successful one, the first multilingual
244
J. Du et al.
shùdiˇan (Big Data Dictionary) was published worldwide on May 22, 2020. This dictionary provides parallel translation of 5692 big data terms from Chinese into Arabic, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish. The publication of Big Data Dictionary starts a trend towards multicultural and multilingual practices in terminological research.
6 Conclusion CNCTST is considered to be China’s top terminology management agency. The State Council approved the report to establish CNCTST to meet the growing needs in scientific and technological terms borrowed from abroad in 1978 when China opened up. It was formerly established on April 25, 1985. CNCTST runs over four crucial periods, namely the periods of establishment, growth, maturity, and continuous development, which are separately directed by Academician Sanqiang Qian, Academician Jiaxi Lu, Academician Yongxiang Lu, and Academician Chunli Bai. CNCTST management is proved to be very successful. Meeting the standards of the recent developments of Science and Technology, CNCTST constructs its own online platform-TERMONLINE, and paves the way for more convenient and comprehensive services from monolingual to multilingual practices. Acknowledgments. This article is sponsored by China Natural Science Fund (61877013), “Thirteen-Five Major Project” for China National Committee for Terms in Sciences and Technologies (CNCTST-ZD2019001), Guangdong Social Science Project (GD20CWY01).
References 1. L’Homme, M.C.: Being a privileged witness of twenty years of research in terminology. Terminology 26(1), 1–6 (2020) 2. Zorrilla-Agut, P., Fontenelle, T.: IATE 2. Terminology 25(2), 146–174 (2019) 3. Vàzquez, M., Oliver, A., Casademont, E.: Using open data to create the Catalan IATE edictionary. Terminology 25(2), 175–197 (2019) 4. Du, J.L., Alexandris, C.K., Yu, P.F.: Comparative research on terminology databases in Europe and China. In: IHIET, pp. 252–257. Springer, Switzerland (2020) 5. Du, J.L., Alexandris, C.K.: Research on European terminology operation mechanism. In: New Exploration of Terminology Translation Research from the Perspective of Interdisciplinary, pp. 474–492. Nanjing University Press, Nanjing (2020) 6. Bertels, A., Speelman, D.: Clustering for semantic purposes: exploration of semantic similarity in a technical corpus. Terminology 20, 279–303 (2014) 7. Dubois, D.: How words for sensory experiences become terms. Terminology 23(1), 9–37 (2017) 8. Chen, Y., Chen, W.: English translation of long traditional Chinese medicine terms. Terminology 24(2), 181–209 (2018) 9. Wei, X.Q.: Conceptualization and theorization of terminology translation in humanities and social sciences. Terminology 24(2), 262–288 (2018) 10. Yu, P.F., Du, J.L., Li, X.G.: English education for young learners in China and Europe. In: Goossens, R., Murata, A. (eds.) AHFE 2019, pp. 578–586. Springer, Washington (2019)
Meeting the Growing Needs in Scientific and Technological Terms
245
11. Cárdenas, B.S., Ramisch, C.: Eliciting specialized frames from corpora using argumentstructure extraction techniques. Terminology 25(1), 1–31 (2019) 12. Alexandris, C.K.: Managing Implied Information and Connotative Features in Multilingual Human-Computer Interaction. Nova Science Publishers, Inc. (2013) 13. Alexandris, C.K.: A speech-act oriented approach for user-interactive editing and regulation processes applied in written and spoken technical texts. In: Human-Computer Interaction. Novel Interaction Methods and Techniques, pp. 645–653. Springer, Heidelberg (2009) 14. Pecman, M.: Variation as a cognitive device: how scientists construct knowledge through term formation. Terminology 20(1), 1–24 (2014)
The Effect of Outdoor Monitor on People’s Attention Chau Tran(B) , Ahmad Bilal Aslam, Muhammad Waqas, and Islam Tariq Faculty of Computer Sciences, Østfold University College, B R A Veien 4, 1757 Halden, Norway {Chaut,Ahmad.B.Aslam,Muhammad.W.Hameed,Islam.Tariq}@hiof.no
Abstract. Since the end of 2019, COVID-19 has left several impacts on people’s life. To reduce the chance of infection, avoiding crowded places has been recommended by the authority. It raises a need to find an effective way to avoid high occupancy levels inside public places. One approach is using a screen located outside to display the occupancy information. This study aims to examine this method’s effect and screen size on people’s attention. To evaluate, various screen sizes were investigated in a natural arrangement meaning that people’s activities and behaviors are observed and captured. The results show that placing a monitor outside is a sufficient way to inform customers about the occupancy’s level; and the larger screen size increases the chance of getting people’s attention. The research questions are implied in the sentence above. Our findings benefit those seeking a suitable solution to provide the occupancy level of the public places. Keywords: Human-Computer Interaction (HCI) · COVID-19 · Social Distance · Outdoor Monitor · People’s attention · Crowd Monitoring System · Affective aspects
1 Introduction After the first coronavirus outbreak in Wuhan at the end of 2019, the World Health Organization had stated a public health emergency of international concern on January 20, 2020, later declared it as a pandemic on March 11, 2020 [1]. COVID-19 has left several impacts on human life as well as global economic, political, and social culture. According to [2], this situation has changed the behavior of consumers and businesses in different industries such as retail, higher education, and tourism. To reduce the spread of COVID-19, the government has introduced various restrictions and regulations that include keeping social distancing and avoiding crowded places. Those rules lead to a need in finding an efficient way to provide the occupancy level of public places. There are two primary tactics for displaying the number of occupied people, which are manual and automated. The first approach is manual by using a person as a counter and informer, while the other implements a system to monitor and transfer information to a peripheral device such as a mobile application or an outdoor monitor. In the context of the later solution, this study aims to examine its efficiency on people’s attention and define the sufficient size of the screen. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 T. Ahram et al. (Eds.): IHIET-AI 2021, AISC 1378, pp. 246–253, 2021. https://doi.org/10.1007/978-3-030-74009-2_31
The Effect of Outdoor Monitor on People’s Attention
247
The research describes how to design an outdoor display to provide occupancy’s levels and capture people’s attention in the next four sessions, which follows the “UserCentered Design” concept. It comprises a collection of processes that focus on putting users at the center of product design and development. They are Informing in Sect. 3, Designing & Prototyping in Sect. 4, and Evaluation in Sect. 5. Lastly, the conclusion and future work are provided in Sect. 6.
2 Background 2.1 Literature Review The first display was invented in 1897 by Karl Ferdinand Braun, a physicist, and inventor using Cathode-Ray Tube. However, before becoming a popular and comprehensive device as of today, it had taken decades of research and innovation in technology [3]. Since then, the display screen has changed the way information is absorbed. For instance, this invention has shifted the reading habit from paper to electronic books, especially with e-ink. Thus, it helped to generate a similar result between e-readers and printed paper [4]. Furthermore, [5–7] studied leveraging the screen feedback to the teaching suprasegmental regarding stress, rhythm, linking, and intonation. It convinces that applying electronic visual display can alter people’s behavior. In the last decade, there have been various studies focused on the effect of the display’s size. Their motivation is driven by the growth of screen diversity and the interest in user experience. Despite the contradiction in research findings, there is thorough support that the large screen promotes attention and memory. [8] showed that there were a higher heart rate deceleration and skin conductance created by the largest screen (56-inch) compared to the medium and small ones. Thus, their hypothesizes were proven that users had paid more attention and been more arousing when using larger screens. The work from [9] assessed the impact of variation monitors on text memorizing. Although there were no differences in the required cognitive effort of the acquired information between screen size or text layout, there was a significant effect on learning time. Students reading from 15-inch computer learned faster than from smaller one of 12-inch. Moreover, viewers were preferring to watch a large monitor when it came to the point-of-view movement such as video, compared to images [10]. Also, their review found that using a bigger screen can reduce the mindful consideration of the persuasive contents or education messages. Therefore, those studies strengthen the potential impact of outdoor display’s size on people’s attention and actions before entering public places during COVID19 situation. 2.2 Research Questions and Hypotheses Based on the review of past literature, this research tries to answer two concerns: (1) how outdoor monitor helps people obligate to the guidance; (2) how monitor’s size affects people’s attention. They can be represented as two hypotheses, which are stated as follows:
248
C. Tran et al.
H1: A public display system increases the likelihood that people follow room occupancy guidelines. H2: The bigger size of the screen produces a higher chance of grabbing people’s attention.
3 Informing 3.1 Method To have a better understanding of users’ opinion of a public display system containing occupancy level, a survey was conducted which consisted of two sessions: an online questionnaire and an interview. 14 volunteers participated in the experiment. Several questions had been asked such as: • In this pandemic, how did they feel when they enter a crowded place like a grocery store? • Were they in favor of placing a public display informing them about the occupancy level inside a grocery store? • If such a system is designed, would they like to have any sort of tangible interaction with it? • What information would they think should be displayed by a crowd monitoring system? The answer to these questions shows the user’s need, perspective, and expectation for the proposed system. 3.2 Result The survey conducted in the informing stage revealed that the participants had a high frequency of visiting the grocery store. Moreover, their feelings were uncomfortable when entering a crowded place. Almost 80% of the people considered it necessary to put a display screen outside the shop because it helps to keep an update about the crowd situation inside the grocery store. They also suggested keeping the information simple so it is easier to comprehend. Lastly, three fourth of the people did not want any physical interactions with the display screen because of the risk of being infected by COVID-19.
4 Designing and Prototyping 4.1 Method The general perception of colors is that red can be pivotal in depicting alarming conditions and green for depicting safety situations [11]. Considering this relationship, the red color was used in design to alert people when the threshold (max number of people allowed) was met. On the contrary, the green color was used to let people know that it was safe for them to enter the public space. The feedback from the participants is considered that the design should be simple and have relevant information. A study suggested the people
The Effect of Outdoor Monitor on People’s Attention
249
apprehend graphical information more than numerical information therefore the designs also include a bar displaying the capacity making it a better experience for users [12]. Since the focus of the study was mainly on interaction design, more emphasis was given to display and user experience. Thus, different sketches were created followed this direct. 4.2 Result After performing the literature review and conducting the survey, two final designs are presented in this session. Figure 1 illustrates the “Allowed” design when the occupancy level is below the threshold whereas Fig. 2 shows the “Not Allowed” design which instructs people to wait.
Fig. 1. “Allowed” design informing that people can enter.
Fig. 2. “NOT Allowed” design informing that people cannot enter.
5 Evaluation 5.1 Method To evaluate the hypotheses, an experiment was carried out in a natural setting, which means the participants’ actions were observed without controlled scripts during the study. Three different screen sizes being used for implementing the observation were 45-inch, 32-inch, and 24-inch. The test was organized at an entrance of the university cafeteria with the place capacity’s limitation being at 15. The participants were chosen randomly regarding their age, their gender, or their nationality. To examine the two hypothesizes,
250
C. Tran et al.
two scenarios were orchestrated based on the Wizard of Oz method. It means the screen’s responses are being generated by a human without the subject’s awareness. For the first scenario, observations were made on the behavior of each participant whether the subject looks at the display and follows the instruction or not. A wireless remote was used to change the occupancy’s level on the screen by different designs. Whenever a person enters the cafeteria, the design is updated on the screen to increase the number of inside people and vice versa. If the limitation is reached, the “NOT allow” sketch displays the expected behavior is shown in Fig. 3. For the second scenario, observations were to be made on participant’s behavior when passing within the 25-foot radius of the entrance, whether they took a glance at the display or not. Three different screen sizes were used for this test. The screen is changed after the number of subjects reaches 25.
Fig. 3. Screen setup outside the cafeteria.
5.2 Result 5.2.1 The First Experiment Regarding the data analysis, a binomial test is implemented with the input values being listed in Table 1. It includes the number of successes (k = 26), the total number of trials (N = 30), and hypothesized probability (p = 0,5). The outcome result is the exact probability of success being recorded as 0.87. Also, the p-value from binomial test is less than 0.05 (p-value = 5.948e-05 < 0.05). Therefore, the null hypothesis is rejected leading to accept the first hypothesis, which means there is a higher chance of people following the outdoor screen information. Table 1. The number of people following the outdoor screen guidance and its binomial test values. Screen size
Subject no
Follow
Not follow
p-value
Exact proportion
45
30
26
4
5.948e−05
0.87
The Effect of Outdoor Monitor on People’s Attention
251
5.2.2 The Second Experiment To examine the collected data, a statistical method called proportion test was used, which is based on the binomial distribution. After applying the binomial test, the exact probability of success for 45 , 32 , and 24 respectively equal to 0.96, 0.88, and 0.60 listings in Table 2. Also, the p-value from the binomial test is 5.591e-08, which is smaller than 0.05. It proves that there is a difference in the success proportions of three screen size groups. Moreover, the z test was conducted to define whether there is a significant difference in the proportion between the two screens. The z-values in Table 3 record all positive numbers indicating a reduction in the success probability when changing to smaller screen size. It’s also noticeable that the p-value of the comparison 32 –24 and 45 –24 is less than 0.05 while the pair 45 –32 is larger than 0.05. There is a significant increase in probability from large and medium screens (45 or 32 ) to the small screen (24 ). On the contrary, there is no differences between 45 and 32 screen. Thus, it verifies that there is a rise in the number of people getting attention to the display when increasing the screen size. Or the second hypothesis is accepted. Table 2. The result of three different screens (45 , 32 , and 24 ) on people’s attention and its proportion test values. Screen size Subject no Notice Hypothesized The exact probability p-value probability of success of success 45
25 (A)
24
0.5
0.96
32
25 (B)
22
0.5
0.88
24
25 (C)
15
0.5
0.60
5.591e-08
Table 3. The multiple comparison z tests between 2 screens. Comparison
z-value
One-tailed p-value
Note
45 –32
1.0426
0.14917
>0.05
32 –24
2.2569
0.01191