Human Centred Intelligent Systems: Proceedings of KES-HCIS 2020 Conference [1st ed.] 9789811557835, 9789811557842

This book highlights new trends and challenges in intelligent systems, which play an important part in the digital trans

440 81 14MB

English Pages XVI, 450 [438] Year 2021

Table of contents :
Front Matter ....Pages i-xvi
Front Matter ....Pages 1-1
Visual Analytics Methods for Eye Tracking Data (Nordine Quadar, Abdellah Chehri, Gwanggil Geon)....Pages 3-12
Challenges of Adopting Human-Centered Intelligent Systems: An Organizational Learning Approach (Fons Wijnhoven)....Pages 13-25
Multi-level Evaluation of Smart City Initiatives Using the SUMO Ontology and Choquet Integral (Nil Kilicay-Ergin, Adrian Barb)....Pages 27-39
Assistance App for a Humanoid Robot and Digitalization of Training Tasks for Post-stroke Patients (Peter Forbrig, Alexandru Bundea, Thomas Platz)....Pages 41-51
A Novel Cooperative Game for Reinforcing Obesity Awareness Amongst Children in UAE (Fatema Alnaqbi, Sarah Alzahmi, Ayesha Alharmoozi, Fatema Alshehhi, Muhammad Talha Zia, Sofia Ouhbi et al.)....Pages 53-63
A Survey of Visual Perception Approaches (Amal Mbarki, Mohamed Naouai)....Pages 65-75
Analysis of Long-Term Personal Service Processes Using Dictionary-Based Text Classification (Birger Lantow, Kevin Klaus)....Pages 77-87
Toward a Smart Town: Digital Innovation and Transformation Process in a Public Sector Environment (Johannes Wichmann, Matthias Wißotzki, Kurt Sandkuhl)....Pages 89-99
Automatic Multi-class Classification of Tiny and Faint Printing Defects Based on Semantic Segmentation (Takumi Tsuji, Sumika Arima)....Pages 101-113
A Novel Hand Gesture Recognition Method Based on Illumination Compensation and Grayscale Adjustment (Dan Liang, Xiaocheng Wu, Junshen Chen, Rossitza Setchi)....Pages 115-125
Architecting Intelligent Digital Systems and Services (Alfred Zimmermann, Rainer Schmidt, Kurt Sandkuhl, Yoshimasa Masuda)....Pages 127-137
A Human-Centric Perspective on Digital Consenting: The Case of GAFAM (Soheil Human, Florian Cech)....Pages 139-159
An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions (Andreas Speck, Melanie Windrich, Elke Pulvermüller, Dennis Ziegenhagen, Timo Wilgen)....Pages 161-172
Spectrum Management of Power Line Communications Networks for Industrial Applications (Abdellah Chehri, Alfred Zimmermann)....Pages 173-182
Innovations in Medical Apps and the Integration of Their Data into the Big Data Repositories of Hospital Information Systems for Improved Diagnosis and Treatment in Healthcare (Mustafa Asim Kazancigil)....Pages 183-192
Automatic Classification of Rotating Machinery Defects Using Machine Learning (ML) Algorithms (Wend-Benedo Zoungrana, Abdellah Chehri, Alfred Zimmermann)....Pages 193-203
Front Matter ....Pages 205-205
Potentials of Emotionally Sensitive Applications Using Machine Learning (Ralf-Christian Härting, Sebastian Schmidt, Daniel Krum)....Pages 207-219
A Brief Review of Robotics Technologies to Support Social Interventions for Older Users (Daniela Conti, Santo Di Nuovo, Alessandro Di Nuovo)....Pages 221-232
The Human–Robot Interaction in Robot-Aided Medical Care (Umberto Maniscalco, Antonio Messina, Pietro Storniolo)....Pages 233-242
Experiment Protocol for Human–Robot Interaction Studies with Seniors with Mild Cognitive Impairments (Gabriel Aguiar Noury, Margarita Tsekeni, Vanessa Morales, Ricky Burke, Marco Palomino, Giovanni L. Masala)....Pages 243-253
Designing Robot Verbal and Nonverbal Interactions in Socially Assistive Domain for Quality Ageing in Place (Ioanna Giorgi, Catherine Watson, Cassiana Pratt, Giovanni L. Masala)....Pages 255-265
Front Matter ....Pages 267-267
IoT in Smart Farming Analytics, Big Data Based Architecture (El Mehdi. Ouafiq, Abdessamad Elrharras, A. Mehdary, Abdellah Chehri, Rachid Saadane, M. Wahbi)....Pages 269-279
Review of Internet of Things and Design of New UHF RFID Folded Dipole with Double U Slot Tag (Ibtissame Bouhassoune, Hasna Chaibi, Abdellah Chehri, Rachid Saadane, Khalid Menoui)....Pages 281-291
Smart Water Distribution System Based on IoT Networks, a Critical Review (Nordine Quadar, Abdellah Chehri, Gwanggil Jeon, Awais Ahmad)....Pages 293-303
Performance Analysis of Mobile Network Software Testbed (Ali Issa, Nadir Hakem, Nahi Kandil, Abdellah Chehri)....Pages 305-319
Front Matter ....Pages 321-321
Method for Assessing the Applicability of AI Service Systems (Hironori Takeuchi, Shuichiro Yamamoto)....Pages 323-334
How Will 5G Transform Industrial IoT: Latency and Reliability Analysis (Ahmed Slalmi, Rachid Saadane, Abdellah Chehri, Hatim Kharraz)....Pages 335-345
Real-Time 3D Visualization of Queues with Embedded ML-Based Prediction of Item Processing for a Product Information Management System (Alina Chircu, Eldar Sultanow, Tobias Hain, Tim Merscheid, Oğuz Özcan)....Pages 347-358
Business Process-Based IS Development as a Natural Way to Human-Centered Digital Enterprise Architecture (Václav Řepa)....Pages 359-368
Digital Architecture in Startups (Veronika Kohoutová, Václav Řepa)....Pages 369-379
Internet of Robotic Things with Digital Platforms: Digitization of Robotics Enterprise (Yoshimasa Masuda, Alfred Zimmermann, Seiko Shirasaka, Osamu Nakamura)....Pages 381-391
Front Matter ....Pages 393-393
Wireless Positioning and Tracking for Internet of Things in Heavy Snow Regions (Abdellah Chehri, Paul Fortier)....Pages 395-404
Text-Dependent Closed-Set Two-Speaker Recognition of a Key Phrase Uttered Synchronously by Two Persons (Toshiyuki Ugawa, Satoru Tsuge, Yasuo Horiuchi, Shingo Kuroiwa)....Pages 405-413
Enabling Digital Co-creation in Urban Planning and Development (Claudius Lieven, Bianca Lüders, Daniel Kulus, Rosa Thoneick)....Pages 415-430
Interaction Effects of Environment and Defect Features on Human Cognitions and Skills in Visual Inspections (Zhuo Zhao, Yusuke Nishi, Sumika Arima)....Pages 431-448
Back Matter ....Pages 449-450

Recommend Papers

Human Centred Intelligent Systems: Proceedings of KES-HCIS 2022 Conference 9789811934551, 9789811934544, 981193455X

The volume includes papers presented at the International KES Conference on Human Centred Intelligent Systems 2022 (KES

113 99 20MB Read more

Human Centred Intelligent Systems: Proceedings of KES-HCIS 2023 Conference 9819934230, 9789819934232

The volume includes papers presented at the International KES Conference on Human Centred Intelligent Systems 2023 (KES

172 64 24MB Read more

Human Centred Intelligent Systems: Proceedings of KES-HCIS 2023 Conference (Smart Innovation, Systems and Technologies, 359) 9819934230, 9789819934232

The volume includes papers presented at the International KES Conference on Human Centred Intelligent Systems 2023 (KES

106 39 11MB Read more

Intelligent Human Systems Integration 2020: Proceedings of the 3rd International Conference on Intelligent Human Systems Integration (IHSI 2020): ... in Intelligent Systems and Computing, 1131) 3030395111, 9783030395117

This book presents cutting-edge research on innovative human systems integration and human–machine interaction, with an

118 102 132MB Read more

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 3 [1st ed.] 9783030551896, 9783030551902

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

817 12 89MB Read more

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 1 [1st ed.] 9783030551797, 9783030551803

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

947 29 110MB Read more

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 [1st ed.] 9783030551865, 9783030551872

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

653 93 89MB Read more

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 [1st ed.] 9783030586683, 9783030586690

This book presents the proceedings of the 6th International Conference on Advanced Intelligent Systems and Informatics 2

1,586 54 84MB Read more

Proceedings of 2020 Chinese Intelligent Systems Conference: Volume I [1st ed.] 9789811584497, 9789811584503

The book focuses on new theoretical results and techniques in the field of intelligent systems and control. It provides

517 4 80MB Read more

Proceedings of 2020 Chinese Intelligent Systems Conference: Volume II [1st ed.] 9789811584572, 9789811584589

The book focuses on new theoretical results and techniques in the field of intelligent systems and control. It provides

428 33 16MB Read more

Human Centred Intelligent Systems: Proceedings of KES-HCIS 2020 Conference [1st ed.]
9789811557835, 9789811557842

Author / Uploaded
Alfred Zimmermann
Robert J. Howlett
Lakhmi C. Jain

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Smart Innovation, Systems and Technologies 189

Alfred Zimmermann Robert J. Howlett Lakhmi C. Jain Editors

Human Centred Intelligent Systems Proceedings of KES-HCIS 2020 Conference

123

Smart Innovation, Systems and Technologies Volume 189

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artiﬁcial Intelligence, University of Technology Sydney, Sydney, NSW, Australia

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/8767

Alfred Zimmermann Robert J. Howlett Lakhmi C. Jain •

•

Editors

Human Centred Intelligent Systems Proceedings of KES-HCIS 2020 Conference

123

Editors Alfred Zimmermann Faculty of Informatics Reutlingen University Reutlingen, Baden-Württemberg, Germany

Robert J. Howlett KES International and Bournemouth University Shoreham-by-sea, UK

Lakhmi C. Jain University of Technology Sydney Sydney, NSW, Australia

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-15-5783-5 ISBN 978-981-15-5784-2 (eBook) https://doi.org/10.1007/978-981-15-5784-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Organisation

Honorary Chairs T. Watanabe, Nagoya University, Japan L. C. Jain, University of Technology Sydney, Australia, and Liverpool Hope University, UK

General Chair Alfred Zimmermann, Reutlingen University, Germany

Executive Chair Robert J. Howlett, University of Bournemouth, UK

Programme Chair Rainer Schmidt, Munich University of Applied Sciences, Germany

v

vi

Organisation

International Program Committee Prof. Witold Abramowicz, Poznan University of Economics and Business, Poland Prof. Marco Aiello, University of Stuttgart, Germany Prof. Jalel Akaichi, University of Tunis, Tunisia Prof. Rainer Alt, University of Leipzig, Germany Prof. Marco Anisetti, University of Milan, Italy Prof. Koichi Asakura, Daido University, Japan Prof. Ahmad Taher Azar, Prince Sultan University, Saudi Arabia Prof. Monica Bianchini, University of Siena, Italy Prof. Karlheinz Blank, T-Systems International Stuttgart, Germany Dr. Oliver Bossert, McKinsey & Company, Germany Prof. Lars Brehm, Munich University of Applied Sciences, Germany Prof. Giacomo Cabri, University of Modena and Reggio, Italy Dr. Giuseppe Caggianese, National Research Council, Italy Prof. Abdellah Chehri, University of Quebec in Chicoutimi, Canada Dr. Dinu Dragan, University of Novi Sad, Serbia Prof. Margarita Favorskaya, Reshetnev Siberian State University of Science and Technology, Russia Prof. Peter Forbig, University of Rostock, Germany Assoc. Prof. Gwanggil Jeon, Incheon National University, Korea Prof. Christos Grecos, National College of Ireland, Ireland Prof. Giancarlo Guizzardi, Free University of Bozen-Bolozano, Italy Dr. Michael Herrmann, Daimler Financial Services, Germany Prof. Robert Hirschfeld, Hasso Plattner Institute Potsdam, Germany Prof. Katsuhiro Honda, Osaka Prefecture University, Japan Prof. Hsiang-Cheh Huang, National University of Kaohsiung, Taiwan Prof. Emilio Insfran, Universitat Politècnica de València, Spain Prof. Reza N Jazar, RMIT University, Australia Prof. Bjorn Johansson, Lund University, Sweden Dr. Dierk Jugel, Reutlingen University, Germany Prof. Da-Yu Kao, Central Police University, Taiwan Dr. Dimitris Kanellopoulos, University of Patras, Greece Assist. Prof. Mustafa Asim Kazancigil, Yeditepe University, Turkey Prof. Marite Kirkova, Riga Technical University, Latvia Prof. Boris Kovalerchuk, Central Washington University, USA Dr. Birger Lantow, University of Rostock, Germany Prof. Michael Leyer, University of Rostock, Germany Prof. Kelly Lyons, University of Toronto, Canada Prof. Chengjun Liu, New Jersey Institute of Technology, USA Dr. Giovanni Luca Masala, Manchester Metropolitan University, UK Dr. Yoshimasa Masuda, Carnegie Mellon University, USA Prof. Lyudmila Mihaylova, University of Shefﬁeld, UK Dr. Michael Möhring, Munich University of Applied Sciences, Germany

Organisation

vii

Prof. Vincenzo Moscato, Università degli Studi di Napoli Federico II, Italy Dr. Selmin Nurcan, University Paris Pantheon-Sorbonne, France Prof. Andreas Oberweis, Karlsruhe Institute of Technology (KIT), Germany Prof. Soﬁa Ouhbi, UAE University, UAE Prof. Oscar Pastor Lopez, Universitat Politecnica de Valencia, Spain Prof. Radu-Emil Precup, University of Timisoara, Romania Prof. Carlos Ramos, ISEP/IPP, Portugal Prof. Manfred Reichert, University of Ulm, Germany Prof. Vaclav Repa, University of Economics, Czech Republic Dr. Patrizia Ribino, National Research Council, Italy Prof. Alexander Rossmann, Reutlingen University, Germany Prof. Mohammed Sadgal, Cadi Ayyad University, Morocco Prof. Kurt Sandkuhl, University of Rostock, Germany Prof. Rainer Schmidt, Munich University of Applied Sciences, Germany Dr. Christian Schweda, Reutlingen University, Germany Prof. Sabrina Senatore, University of Salerno, Italy Prof. Alberto Silva, University of Lisbon, Portugal Dr. Stefano Silvestri, ICAR CNR, Italy Dr. Milan Simic, RMIT University, Australia Prof. Andreas Speck, University of Kiel, Germany Dr. Maria Spichkova, RMIT University, Australia Dr. Jim Spohrer, IBM Almaden Research, USA Prof. Ulrike Steffens, Hamburg University of Applied Sciences, Germany Prof. Janis Stirna, Stockholm University, Sweden Prof. Eulalia Szmidt, Systems Research Institute Polish Academy of Sciences, Poland Prof. Hironori Takeuchi, Musahi University, Japan Prof. Edmondo Trentin, University of Siena, Italy Prof. Taketoshi Ushiama, Kyushu University, Japan Prof. Rosa Vicari, Federal University of Rio Grande do Sul, Brazil Prof. Toyohide Watanabe, Nagoya University, Japan Dr. Jaroslaw Watrobski, University of Szczecin, Poland Dr. Alicja Wieczorkowska, Polish-Japanese Academy of Information Technology, Poland Prof. Fons Wijnhoven, University of Twente, Netherlands Prof. Matthias Wißotzki, Wismar University of Applied Sciences, Germany Prof. Shuichiro Yamamoto, Nagoya University, Japan Prof. Cecilia Zanni-Merk, INSA Normande University, France Prof. Alfred Zimmermann, Reutlingen University, Germany

Preface

This volume contains the proceedings of the KES International Conference on Human-Centred Intelligent Systems HCIS 2020, as part of the multi-theme conference KES Smart Digital Futures 2020, organized as a virtual conference. We have gathered a multi-disciplinary group of contributors from both research and practice to discuss the ways how human-centred intelligen systems are today architected, modelled, constructed, veriﬁed, tested, and applied in various domains. Human-Centred Intelligent Systems (HCIS) are information systems applying artiﬁcial intelligence in order to support humans and to interact with people. Today intelligent systems play an important role in digital transformation in many areas of science and practice. Artiﬁcial intelligence deﬁnes core techniques of modern computer science that lead to a rapidly growing number of intelligent services and applications in practice. The objective of HCIS includes a deep understanding of the human-centred perspective of artiﬁcial intelligence of intelligent value co-creation ethics value-oriented digital models transparency together with intelligent digital architectures and engineering to support digital services and intelligent systems the transformation of structures of digital businesses and intelligent systems based on human practices as well as the study of interaction and the co-adaptation of humans and systems. HCIS especially consider human work when supporting digital services and building intelligent systems which consists of optimizing knowledge representations algorithms collecting and interpreting the data and even deciding what to model. All submissions were carefully reviewed by at least two members of the International Program Committee. Finally, we have accepted 35 scientiﬁc publications to be included in this proceedings volume. The major areas are organized as follows: • Human-Centred Intelligent Systems, • Technologies to Improve Senior Care, • Real-time Data Processing in Industrial and IoT Applications,

ix

x

Preface

• Digital Enterprise Architecture for Manufacturing Industry, Financial Industry, and others, and • Innovative Information Services for Advanced Knowledge Activity. We are satisﬁed with the quality of the program and would like to thank the authors for choosing KES-HCIS 2020 as a forum for presentation of their work. Also, we gratefully acknowledge the hard work of the members of the International Program Committee and the Organization team.

Reutlingen, Germany Shoreham-by-sea, UK Sydney, Australia

Editors Alfred Zimmermann Robert J. Howlett Lakhmi C. Jain

Contents

General Track of Human-Centred Intelligent Systems or Human-Centred Intelligent Systems Visual Analytics Methods for Eye Tracking Data . . . . . . . . . . . . . . . . . Nordine Quadar, Abdellah Chehri, and Gwanggil Geon

3

Challenges of Adopting Human-Centered Intelligent Systems: An Organizational Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . Fons Wijnhoven

13

Multi-level Evaluation of Smart City Initiatives Using the SUMO Ontology and Choquet Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nil Kilicay-Ergin and Adrian Barb

27

Assistance App for a Humanoid Robot and Digitalization of Training Tasks for Post-stroke Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Forbrig, Alexandru Bundea, and Thomas Platz

41

A Novel Cooperative Game for Reinforcing Obesity Awareness Amongst Children in UAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fatema Alnaqbi, Sarah Alzahmi, Ayesha Alharmoozi, Fatema Alshehhi, Muhammad Talha Zia, Soﬁa Ouhbi, and Abdelkader Nasreddine Belkacem A Survey of Visual Perception Approaches . . . . . . . . . . . . . . . . . . . . . . Amal Mbarki and Mohamed Naouai

53

65

Analysis of Long-Term Personal Service Processes Using DictionaryBased Text Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Birger Lantow and Kevin Klaus

77

Toward a Smart Town: Digital Innovation and Transformation Process in a Public Sector Environment . . . . . . . . . . . . . . . . . . . . . . . . . Johannes Wichmann, Matthias Wißotzki, and Kurt Sandkuhl

89

xi

xii

Contents

Automatic Multi-class Classiﬁcation of Tiny and Faint Printing Defects Based on Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 101 Takumi Tsuji and Sumika Arima A Novel Hand Gesture Recognition Method Based on Illumination Compensation and Grayscale Adjustment . . . . . . . . . . . . . . . . . . . . . . . 115 Dan Liang, Xiaocheng Wu, Junshen Chen, and Rossitza Setchi Architecting Intelligent Digital Systems and Services . . . . . . . . . . . . . . . 127 Alfred Zimmermann, Rainer Schmidt, Kurt Sandkuhl, and Yoshimasa Masuda A Human-Centric Perspective on Digital Consenting: The Case of GAFAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Soheil Human and Florian Cech An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Andreas Speck, Melanie Windrich, Elke Pulvermüller, Dennis Ziegenhagen, and Timo Wilgen Spectrum Management of Power Line Communications Networks for Industrial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Abdellah Chehri and Alfred Zimmermann Innovations in Medical Apps and the Integration of Their Data into the Big Data Repositories of Hospital Information Systems for Improved Diagnosis and Treatment in Healthcare . . . . . . . . . . . . . . 183 Mustafa Asim Kazancigil Automatic Classiﬁcation of Rotating Machinery Defects Using Machine Learning (ML) Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Wend-Benedo Zoungrana, Abdellah Chehri, and Alfred Zimmermann Technologies to Improve Senior Care Potentials of Emotionally Sensitive Applications Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Ralf-Christian Härting, Sebastian Schmidt, and Daniel Krum A Brief Review of Robotics Technologies to Support Social Interventions for Older Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Daniela Conti, Santo Di Nuovo, and Alessandro Di Nuovo The Human–Robot Interaction in Robot-Aided Medical Care . . . . . . . . 233 Umberto Maniscalco, Antonio Messina, and Pietro Storniolo

Contents

xiii

Experiment Protocol for Human–Robot Interaction Studies with Seniors with Mild Cognitive Impairments . . . . . . . . . . . . . . . . . . . . . . . 243 Gabriel Aguiar Noury, Margarita Tsekeni, Vanessa Morales, Ricky Burke, Marco Palomino, and Giovanni L. Masala Designing Robot Verbal and Nonverbal Interactions in Socially Assistive Domain for Quality Ageing in Place . . . . . . . . . . . . . . . . . . . . 255 Ioanna Giorgi, Catherine Watson, Cassiana Pratt, and Giovanni L. Masala Real-Time Data Processing in Industrial and IoT Applications IoT in Smart Farming Analytics, Big Data Based Architecture . . . . . . . 269 El Mehdi. Ouaﬁq, Abdessamad Elrharras, A. Mehdary, Abdellah Chehri, Rachid Saadane, and M. Wahbi Review of Internet of Things and Design of New UHF RFID Folded Dipole with Double U Slot Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Ibtissame Bouhassoune, Hasna Chaibi, Abdellah Chehri, Rachid Saadane, and Khalid Menoui Smart Water Distribution System Based on IoT Networks, a Critical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Nordine Quadar, Abdellah Chehri, Gwanggil Jeon, and Awais Ahmad Performance Analysis of Mobile Network Software Testbed . . . . . . . . . 305 Ali Issa, Nadir Hakem, Nahi Kandil, and Abdellah Chehri Digital Enterprise Architecture for Manufacturing Industry Financial Industry and others Method for Assessing the Applicability of AI Service Systems . . . . . . . . 323 Hironori Takeuchi and Shuichiro Yamamoto How Will 5G Transform Industrial IoT: Latency and Reliability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Ahmed Slalmi, Rachid Saadane, Abdellah Chehri, and Hatim Kharraz Real-Time 3D Visualization of Queues with Embedded ML-Based Prediction of Item Processing for a Product Information Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Alina Chircu, Eldar Sultanow, Tobias Hain, Tim Merscheid, and Oğuz Özcan Business Process-Based IS Development as a Natural Way to Human-Centered Digital Enterprise Architecture . . . . . . . . . . . . . . . 359 Václav Řepa Digital Architecture in Startups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Veronika Kohoutová and Václav Řepa

xiv

Contents

Internet of Robotic Things with Digital Platforms: Digitization of Robotics Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Yoshimasa Masuda, Alfred Zimmermann, Seiko Shirasaka, and Osamu Nakamura Innovative Information Services for Advanced Knowledge Activity Wireless Positioning and Tracking for Internet of Things in Heavy Snow Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Abdellah Chehri and Paul Fortier Text-Dependent Closed-Set Two-Speaker Recognition of a Key Phrase Uttered Synchronously by Two Persons . . . . . . . . . . . . . . . . . . . . . . . . . 405 Toshiyuki Ugawa, Satoru Tsuge, Yasuo Horiuchi, and Shingo Kuroiwa Enabling Digital Co-creation in Urban Planning and Development . . . . 415 Claudius Lieven, Bianca Lüders, Daniel Kulus, and Rosa Thoneick Interaction Effects of Environment and Defect Features on Human Cognitions and Skills in Visual Inspections . . . . . . . . . . . . . . . . . . . . . . 431 Zhuo Zhao, Yusuke Nishi, and Sumika Arima Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

About the Editors

Alfred Zimmermann is a Professor at Reutlingen University, Germany, Director of Research and Speaker of the Doctoral Program for Services Computing at the Herman Hollerith Center, Boeblingen, Germany. His research chiefly focuses on digital transformation and digital enterprise architectures with decision analytics in close connection with digital strategies and governance, software architectures and engineering, artiﬁcial intelligence, data analytics, the Internet of Things, services computing, and cloud computing. He graduated with a degree in Medical Informatics from Heidelberg University, Germany, and obtained his Ph.D. in Informatics from the University of Stuttgart, Germany. Besides his academic experience, he has a strong practical background as a Technology Manager and Leading Consultant at Daimler AG, Germany. Professor Zimmermann also maintains academic ties between his home university and the German Computer Science Society (GI), the Association for Computing Machinery (ACM), and the IEEE, where he is involved in various research groups, programs, and initiatives. He serves on numerous editorial boards and program committees and has published the results of his research at conferences, workshops, and in books and journals. Additionally, he supports industrial cooperation research projects and public research programs. Robert J. Howlett is the Executive Chair of KES International, a non-proﬁt organization that facilitates the dissemination of research results in areas including intelligent systems, sustainability, and knowledge transfer. A Visiting Professor at Bournemouth University, UK, his technical expertise is in the use of intelligent systems to solve industrial problems. He has been successful in applying artiﬁcial intelligence, machine learning, and related technologies to sustainability and renewable energy systems; condition monitoring, diagnostic tools, and systems; and automotive electronics and engine management systems. His current research focuses on the use of smart microgrids to achieve reduced energy costs and lower carbon emissions in areas such as housing and protected horticulture.

xv

xvi

About the Editors

Dr. Lakhmi C. Jain, Ph.D., M.E., Fellow (Engineers Australia) is afﬁliated with the University of Technology Sydney, Australia, and Liverpool Hope University, UK. Professor Jain serves with KES International, which provides the professional community with opportunities for publication, knowledge exchange, cooperation, and teambuilding. Involving over 5,000 researchers drawn from universities and companies worldwide, KES facilitates international cooperation and generates synergy in teaching and research. KES regularly provides networking opportunities for the professional community through one of the largest conferences of its kind.

General Track of Human-Centred Intelligent Systems or Human-Centred Intelligent Systems

Visual Analytics Methods for Eye Tracking Data Nordine Quadar, Abdellah Chehri , and Gwanggil Geon

Abstract Nowadays, eye tracking data have become important and valuable information that help to understand the behavior of users. The gathering of these data is not an issue anymore. However, the problem is the analysis process and especially how can these raw data be converted to understandable and useful information. Visual analytics can solve this issue by combining human analytics skills and the advanced computer analytics. This leads to the novel discoveries and helps humans take control of the analytical process. These visualizations can be used to solve difficult problems by discovering new unknown patterns of available data. In this work, we discussed different methods that are used in the case of eye tracking data, and we addressed the challenges of visual analytics in this context.

1 Introduction Eye tracking systems are improving fast as the hardware advancement related to these applications becomes more accessible for everybody at affordable prices. With this, the gathered data from eye tracking devices trend to be considered big data. This data is characterized by three parameters, such as volume, velocity, and variety. We will describe in detail each parameter later in this paper.

N. Quadar School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada e-mail: [email protected] A. Chehri (B) Department of Applied Sciences, University of Québec in Chicoutimi, Chicoutimi, QC G7H 2B1, Canada e-mail: [email protected] G. Geon Department Embedded Systems Engineering, Incheon National University, Incheon, Korea e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_1

3

4

N. Quadar et al.

The challenge behind this evolution is how we can analyze this tremendous amount of data. The statistical methods are limited to find an existing correlation between different collected data. Also, traditional visualization methods such as scan path and attention maps are restricted as the number of people being eye tracked increases. Various studies have been conducted to investigate and test new approaches and methods. Visual analytics is one of these alternative solutions to analyze and find patterns between the enormous volumes of collected data. This method involves different key components, such as analytic reasoning, visualization, and interaction between computers and humans. In this paper, we study the possibility of applying the visual analytics methods to eye tracking data. Moreover, an example of possible future applications is provided, and the last section discusses the future challenges of this movement.

2 Eye Tracking Data Toward Big Data Big data has a lot of different definitions, and one of them, according to Ward and Barker is: “a term describing the storage and analysis of large and complex data sets using series of techniques including, but not limited to NoSQL, MapReduce, and machine learning” [1]. Another definition, according to De Mauro et al., is: “Big Data represents the Information assets characterized by such a high volume, velocity, and variety to require specific technology and analytic methods for its transformation into value” [2]. According to these definitions and as mentioned previously, the volume, velocity, and variety of the eye tracking data are increasing rapidly; therefore, they can be considered as big data. The Fig. 1 shows how big data and eye tracking data are connected. The eye tracking data are becoming big data, and as visual analytics can be applied to big data, why not use these methods to analyze this new kind of emerging data [3]. The common characteristics between big data and eye tracking data are their three V’s, these parameters are increasing, and the collected data becomes more and more complex and hard to analyze. A detailed explanation of these parameters is given below. Velocity: Different hardware and devices are now available to gather data from eye movements. Smartphones, eye tracking glasses are some examples of this technology that allow gathering data faster. Currently, the data movement in real time and in a fraction of second, we can have a new update that should be implemented and analyzed [4]. Volume: As the number of people having access to eye tracking systems increases and with the recording speed of these new devices that know a considerable improvement (e.g., SMI RED 500 having 500 Hz of recording rate) [3] the volume of data becomes enormous. With crowdsourcing solutions and platforms, people can send their eye tracking data from their homes and online. Also, the high-resolution cameras now

Visual Analytics Methods for Eye Tracking Data

5

Fig. 1 Eye tracking data and big data connection [3]

that have been developed and continue to be improved will help to collect more data in places where a lot of people are present such as soccer games. Variety: Gathered data from eye tracking cannot be only the ones from eye movement, but also other data could be collected, such as motion tracking, verbal data, and so on. These data help to understand the behavior of the tracker person and will complete the information received from the eye movement. This may cause many problems such as data synchronization, so new methods should be developed to take into consideration all these various data in order to find the best patterns.

3 Eye Tracking Data Representation Nowadays, eye tracking is used in various applications that require to analyze the user’s behavior, such as in marketing or human–computer interactions. Depends on the application, a different type of collected data is generated by the recording devices. The available tools can record with a rate up to 500 Hz [3] that depends on the characteristic of each device. The recording rate measures how many gaze points detected per second. Figure 2 shows different data types that can be gathered by an eye tracking system, such as saccade, gaze, area of interest (AOIs), fixations, transition, and stimulus. Each type is described below in detail. Fixation: It is the combination of gaze points; two parameters define this combination: their combination area and timestamps. The timestamp or fixation duration is

6

N. Quadar et al.

Fig. 2 Eye tracking data type [5]

the difference between the time when the eye to enter a specific point and the time when the eye leaves this point (between 200 ms and 300 ms) [5]. The aggregation area is usually around 20 and 50 pixels. Saccade: The rapid movement of the eye when it jumps from the fixation on one point to another is called the saccade; this movement can last up to 80 ms. Saccade amplitude, duration, and velocity are typical metrics for this type of data. Gaze: The sum of the fixation duration within a specific area. Stimulus: It is the full region or area of study; in other words, it’s the visual content where the eye movement will be tracked. It can be static or dynamic content in 2D or 3D. Areas of interest: Usually, the eye tracking applications are used to study the behavior of people toward some specific regions. The areas of interest (AOIs) are those regions in the stimulus that have high importance. The AOIs are defined based on the stimulus semantics. The movement between different areas of interest is called the transition. The AOIs can be seen, in the case of 3D stimulus as objects and are called objects of interest (OOIs). Data metrics and additional data sources: Using the different data types described previously, various and more complex data metrics can be generated, such as directions of the saccades, fixations numbers, saccade amplitude, and so on. For instance, analyzing the time variation of some data metrics such as fixation duration over time can lead to understanding the learning effect while checking a stimulus. The additional data source can be used to understand the tracked person’s behavior; these data can be the mouse movement or, moreover, the interaction with the keyboard in the case of using a computer to show the stimulus. The social media interaction can

Visual Analytics Methods for Eye Tracking Data

7

also be used in the case of smartphone use. These data are synchronized with other data collected from the eye tracking devices and can be useful to find a connection between tracked people.

4 Visual Analytics Methods This section discusses different methods based on visual analytics science and explains different steps before and during the data analysis. As mentioned before, there are various data collected from the eye tracking devices. The sequence order over the time of different data is called the trajectory or the scan path; it can also be defined on how the movement changes over time when the person is under-eye tracking. The following method will be described [6]: • Heat map • Space–time cube • Gaze plot.

4.1 Data Pre-processing Before analyzing eye tracking data, various transformation should be done in order to make the analysis easier and efficient [7]. One of these transformations is the adjustment of the time reference: it aims, in the context of eye tracking data analysis, to align the start or the end times of multiples trajectories by shifting the trajectories timeline to the same end or start without changing any variants. The second transformation to perform is the spatial generalization, which aims to replace the trajectories’ spatial positions by new ones having the same space units (points by area). The last transformation is the spatiotemporal aggregation; in this step, the paths will be transformed into a series of visited places and moves between these places. For this purpose, various statistic parameters are computed such as the count of visits and users, average time spent per place, and so on.

4.2 Methods Visual analysis methods are used to fill the gap in statistical methods and add additional insights. Visualization is always a good tool to use in the case of spatiotemporal data. There are two, as mentioned above, techniques used in visualization analysis: the heat map, space–time cube, and gaze plot.

8

N. Quadar et al.

Fig. 3 Example of attention map outputs [27]

• Heat map: The attention map or the heat map provides an overview of the aggregation of gaze points within the stimulus in order to reveal the distribution of visual attention. Therefore, the heat map helps to identify the main areas of interest. As we can see from Fig. 3, the attention map outputs are a color-coded scheme. Red areas show a high number of gaze points, which means essential areas of interest, green area have less gaze points. The results are obtained by taking the aggregation of gaze position over the observation time. However, this method has some limitations, such as it doesn’t take into consideration the temporal data component; also, it has a weak efficiency for dynamic images [8]. • Space–time cube: This method is an alternative approach for spatiotemporal data; it extends the 2D spatial domain to a third dimension by adding the time component as the third axis. It can be used in the case of static and dynamic stimulus; also, it gives a direct overview of data of multiple users. Figure 4 shows the result of the space–time cube methods applied to video eye tracking data, and the green line represents the timeline or dimension. The gaze points are projected into the sidewall of the STC.

Visual Analytics Methods for Eye Tracking Data

9

Fig. 4 Example of space-time cube outputs [8]

• Gaze plot: The gaze plot, also known as scan path, uses the position of the fixation points and fixation time to produce an overview of the sequence of user’ fixations. This method represents the fixation by circles with size proportional to the fixation duration, as can be seen from Fig. 5. Moreover, it connects fixation circles bylines to give the exact sequence of the trajectories.

Fig. 5 Example of gaze plot outputs [28]

10

N. Quadar et al.

5 Example of Future Applications (1) Shopping case study The eye tracking data can help the supermarket to understand their buying behavior and strategy by analyzing their eye movement and finding patterns between different data using the methods described above. The tracking can be done by using highdefinition cameras or by giving eye tracking glasses to customers at the entrance of the supermarket. Using devices such as eye tracking glasses can be used to analyze realtime data and take live decisions such as guide users to production on sale or giving suggestions on what to buy. Using a high-definition camera would be challenging as the stimulus will be modeled on 3D, and different users will have a different stimulus. (2) Driving case study As the number of cars and the accident related to driving increases, a new solution such as eye tracking is necessary to face this challenge. In the future, cars can be smarter by integrating the eye tracking system to their control system. This will lead to analyze the traffic situation and allow the car to act depending on the situation. With such a solution, driving can be safer and more comfortable.

6 Future Challenges As we discussed in the previous sections, the amount of eye tracking data is increasing day after day, and the number of people having access to this technology increases as well. Combining and synchronizing these data present a big challenge, especially when the stimulus content is dynamic and changing with time [9]. Below a summary of the main problems that will face this technology in the future: Stimuli: The content is becoming more and more 3D, and the stimuli can be different from user to user. So more advanced technologies are needed to satisfy all requirements. Users: As mentioned, the number of users is increasing, and more data will be generated. So, new procedures should be set up as the users will be non-experts and will need more reliable methods to analyze their data. Hardware: New hardware devices are getting into the market and keep improving, so the analysis techniques should take into consideration different aspect of the newly available equipment [10–26]. Privacy: One of the coming and significant issues is the people’s privacy as the number of users increases. The new mechanism should be applied to protect the millions of personal data that should not be available to anyone. Table 1 shows an evolution of various challenges over time.

Visual Analytics Methods for Eye Tracking Data

11

Table 1 Eye Tracking evolution per categories Past

Present

Future

Hardware and costs

Stationary eye tracking devices

Professional glasses (>$30,000)

Smart phones and personal eye tracking glasses

Stimuli

2D static stimuli (images)

2D/3D dynamic stimuli (virtual reality, video)

Unconstrained real-world scenarios

Users

1000000

Recorded data and metrics

Fixations and saccades

Video, high-resolution gaze data (smooth pursuits)

Numerous additional data sources

Evaluation methods

Visual inspection

statistics and visualization

Big data visual analytics

Privacy

Not an issue

Signed forms

Consent needed

7 Conclusion In this paper, we have discussed the trend of the eye tracking data and how they are considered big data. We described a different type of data that can be generated using this technology and its principal characteristic parameters, such as velocity, variety, and volume. Moreover, we discussed the different visual analytics methods that can be applied in different scenarios. Based on our references, this field still has many challenges to face, such as the need for new analysis techniques that can be used in the case of the variety of the data collected and their huge volume.

References 1. Ward, J.S., Barker, A.: Undefined by data: a survey of big data definitions. CoRR, abs/1309.5821, pp. 1–2 (2013) 2. Hashem I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of ‘big data’ on cloud computing: review and open research issues. Inform. Syst. 47, 98–115 (2015) 3. Blascheck, T., Burch, M., Raschke, M., Weiskopf, D.: Challenges and perspectives in big eye-movement data visual analytics. In: Big Data Visual Analytics (BDVA), pp. 1–8, 22–25 (2015) 4. De Mauro, A., Greco, M., Grimaldi, M.: What is big data? A consensual definition and a review of key research topics. In: AIP Conference Proceedings, pp. 97–104. AIP Publishing (2015) 5. Blascheck, T., Kurzhals, K., Raschke, M., Burch, M., Weiskopf, D., Ertl, T.: State-of-the-art of visualization for eye tracking data. In: Borgo, R., Maciejewski, R., Viola, I. (eds.) EuroVis– STARs, pp. 63–82 (2014) 6. Andrienko, G.L., Andrienko, N.V., Burch, M., Weiskopf, D.: Visual analytics methodology for eye movement studies. IEEE Trans. Vis. Comput. Graph. 18(12), 2889–2898 (2012) 7. Andrienko, G., Andrienko, N., Bak, P., Keim, D., Kisilevich, S., Wrobel, S.: A conceptual framework and taxonomy of techniques for analyzing movement. J. Vis. Lang. Comput. 22(3), 213–223 (2011)

12

N. Quadar et al.

8. Kurzhals, K., Weiskopf, D.: Space-time visual analytics of eye-tracking data for dynamic stimuli. IEEE Trans. Vis. Comput. Graph. 19(12), 2129–2138 (2013) 9. Stellmach, S., Nacke, L., Dachselt, R.: Advanced gaze visualizations for three-dimensional virtual environments. In: Proceedings of the Symposium on Eye Tracking Research & Applications, pp. 109–112 (2010) 10. Kim, Y., Varshney, A.: Persuading visual attention through geometry. IEEE Trans. Vis. Comput. Graph. 14(4), 772–782 (2008) 11. Klingner, J., Kumar, R., Hanrahan, P.: Measuring the task-evoked pupillary response with a remote eye tracker. In: Proceedings of the Symposium on ETRA, pp. 69–72 (2008) 12. Kurzhals, K., Heimerl, F., Weiskopf, D.: ISeeCube: visual analysis of gaze data for video. In: Proceedings of the Symposium on ETRA, pp. 43–50 (2014) 13. Kurzhals, K., Hoferlin, M., Weiskopf, D.: Evaluation of attention-guiding video visualization. Comput. Graph. Forum 32(3), 51–60 (2013) 14. Lam, H., Bertini, E., Isenberg, P., Plaisant, C., Carpendale, S.: Empirical studies in information visualization: seven scenarios. IEEE Trans. Vis. Comput. Graph. 18(9), 1520–1536 (2012) 15. Liu, G., Austen, E.L., Booth, K.S., Fisher, B.D., Argue, R., Rempel, M., Enns, J.T.: Multipleobject tracking is based on scene, not retinal, coordinates. J. Exp. Psychol. Hum. Percept. Perform. 31(2), 235–247 (2005) 16. Loftus, G.R., Mackworth, N.H.: Cognitive determinants of fixation location during picture viewing. J. Exp. Psychol. Hum. Percept. Perform. 4(4), 565–572 (1978) 17. Milner, A.D., Goodale, M.A.: Two visual systems reviewed. Neuropsychologia 46(3), 774–785 (2008) 18. Plaisant, C.: The challenge of information visualization evaluation. In: Proceedings of the Working Conference on Advanced Visual Interfaces, pp. 109–116 (2004) 19. Po, B.A., Fisher, B.D., Booth, K.S.: Pointing and visual feedback for spatial interaction in largescreen display environments. In: Proceedings of the Third International Symposium Smart Grid. Lecture Notes in Computer Science, pp. 22–38. Springer (2003) 20. Poole, A., Ball, L.: Eye tracking in human-computer interaction and usability research: current status and future prospects. In: Ghaoui, C. (ed.) Encyclopedia of Human-Computer Interaction, pp. 211–219. Idea Group Inc. (2006) 21. Ribarsky, W., Fisher, B.D., Pottenger, W.M.: Science of analytical reasoning. Infom. Vis. 8(4), 254–262 (2009) 22. Ristovski, G., Hunter, M., Olk, B., Linsen, L.: EyeC: coordinated views for interactive visual exploration of eye-tracking data. In: Proceedings of the Conference on Information Visualization (IV), pp. 239–248 (2013) 23. Song, H., Yun, J., Kim, B., Seo, J.: GazeVis: interactive 3D gaze visualization for contiguous cross-sectional medical images. IEEE Trans. Vis. Comput. Graph. 20(5), 726–739 (2014) 24. Spence, B.: The broker. In: Ebert, A., Dix, A., Gershon, N.D., Pohl, M. (eds.) Human Aspects of Visualization. Lecture Notes in Computer Science, vol. 6431, pp. 10–22. Springer (2011) 25. Star, S.L.: The structure of ill-structured solutions: boundary objects and heterogeneous distributed problem solving. In: Huhns, M., Gasser, L. (eds.) Readings in Distributed AI. Morgan Kaufmann (1988) 26. Swindells, C., Tory, M., Dreezer, R.: Comparing parameter manipulation with mouse, pen, and slider user interfaces. Comput. Graph. Forum 28(3), 919–926 (2009) 27. Hurter, C., Ersoy, O., Fabrikant, S., Klein, T., Telea, A.: Bundled visualization of dynamic graph and trail data. IEEE Trans. Vis. Comput. Graph. (2013) 28. Jarodzka, H., Holmqvist, K., Nyström, M.: A vector-based, multidimensional scanpath similarity measure. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, pp. 211–218 (2010)

Challenges of Adopting Human-Centered Intelligent Systems: An Organizational Learning Approach Fons Wijnhoven

Abstract Clinical decision support systems (CDSSs) are human-centered intelligent systems (HCISs) that use codified medical expertise or large data sets for medical decision recommendations. Most analytical CDSS that exploit the opportunities of large data sets and analytic technique remain within a research and development environment and lack adoptions in clinical contexts. To understand this, we analyse CDSS adoption as an organizational learning process. We apply a model of organizational learning on the case of an analytical CDSS implementation which analyses medical data to predict the probability on sepsis for prematurely born babies to support the physicians’ decision-making on ministering antibiotics. In our discussion, we next compare our case findings with possible organizational learning challenges for the adoption of other (medical) HCISs and we draw consequences for projects of HCIS adoption in organizations.

1 Introduction Clinical decision support systems (CDSSs) are systems for clinical decision making [1]. A CDSS can contain multiple techniques to support medical decision-making, like visualization of clinical data [2, 3], zooming, sorting and filtering to deep dive in specific sections of relevant patient data, images and video recordings [4–6] and analysis of data from multiple sources like the patient medical record and genomic data for diagnosis and treatment selection [7]. Natural language processing (NLP) can also be used to extract the meaning from natural language text notes in medical records [8–10]. More complex forms of technology involve machine learning, i.e. methods that can automatically detect patterns in data [11]. CDSSs like Watson for Oncology incorporate a form of prescriptive analytics by ranking treatment alternatives along predicted effectiveness for a given diagnosis with the support of mining knowledge from 600 medical journals, hundreds of different medical data sources F. Wijnhoven (B) University of Twente, Enschede, The Netherlands e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_2

13

14 Table 1 Analytics techniques in clinical DSS [13]

F. Wijnhoven Technique

Healthcare application examples

Cluster analysis

Detecting high-risk obesity groups

Machine learning

Predicting disease risk; Detecting epidemics

Neural networks

Diagnosing chronic diseases; Prediction patients’ future diseases

Pattern recognition

Improving public health surveillance

and statistical evidence [12]. Mehta and Pandit summarize analytic techniques for CDSS in Table 1 [13]. Mehta and Pandit [13] state that all the studies they found describe, develop and test a model or algorithm to show its added value, however, do not mention anything about implementation. Mehta and Pandit [13] suggest that a reason for the lack of these CDSS implementations is that the current body of literature does not provide adequate quantitative evidence that these techniques can be trusted by medical practitioners in their clinical use. Also other review articles state that different from research contexts, medical clinical contexts have very high levels of ethical, legal and reasoning transparency demands that are difficult to meet in practice [14]. A lack of trust in the systems recommendations is also an important reason for not adopting CDSS by physicians [14]. Highly problematic is also medical journalists claim that Watson for Oncology’s training resulted in a bias towards Memorial Sloan Kettering physicians’ preferences, which is not unlikely because these physicians were involved in the development of Watson for Oncology [15, 16]. Other technology companies such as Google and Microsoft have developed similar CDSS [17]. These systems remain within the R&D environment and have not been implemented in clinical practice either [18]. Despite these adoption problems, McNut et al. [19] and Rumsfeld et al. [20] state that analytics has great promises for the fields of oncology and cardiovascular diagnoses and treatments, respectively, but that these promises only can become realized when sufficiently large and reliable datasets are available. Unfortunately, this is difficult to achieve. For realizing such large data sets hospitals will have to share their data [21–23]. Lack of systems interoperability and different data taxonomies prevent such inter-hospital data sharing and thus further structuring and standardization of systems and data is needed [7, 24]. NLP of informal language that describes patient’s status and medical doctor’s thoughts are a new opportunity for CDSS. These data are available via patient medical records, but accessibility of these data for analytics is legally and ethically complicated and classifying natural language in medical terminology is still not very reliable [25]. Besides of all these analytical workflow challenges, CDSSs are also difficult to use from the perspective of the medical practitioner. Medical practitioners are especially concerned regarding the accuracy of classifications and predictions when the data sets are too small [26–30] and the algorithms used are non-transparent or incomprehensible [31]. Intransparency of algorithms may result in feelings of loss of reasoning control which is unacceptable for medical

Challenges of Adopting Human-Centered Intelligent Systems …

15

professionals during their diagnosis and treatment decision-making [14, 32, 33]. Experiences with unreliable registrations in patient files also do not contribute much to trust in CDSS [23]. Many of the reasons for not trusting and resisting CDSS are thus not technical or psychological (i.e. the medical professionals risk perception) but are rooted in ethical, legal and managerial requirements being insufficiently met [34, 35]. Finally, also the actual realization of a clinical CDSS requires new knowledge, expertise and training in use and interpretation of results and new IT personnel that can give the proper support and use conditions. Given the increasing costs of healthcare these extra resources are difficult to fund. In this article, we present an organizational learning approach to these HCIS adoption challenges, motivated by the belief that the most fundamental change a HCIS can bring to professionals is that it may change their competences of learning and finding knowledge and solutions for problems that the average person will not be able to handle. Following a classic article on organizational learning [36], we describe organizational learning as a cyclic process of knowledge development. This process starts with externalizing and codifying tacit knowledge of individuals and placing this on a human independent medium like and information system. Second, this explicit and codified knowledge can be combined with other explicit knowledge, e.g. by the creation of data warehouses and extensive rule-based expert systems. Third, the more advanced combined knowledge can produce advanced recommendations to decision makers, but the decision maker must be able to internalize the recommendation by combining it with his or her personal experiences, values, and skills, i.e. tacit knowledge, to be able to take the responsibility for the decision. Finally, professionals often discuss problems and recommendations with colleagues for verification or group decision-making, or they share the decision, recommendation and cases with colleagues after the decisions to learn from each other or develop new standards and guidelines. See Fig. 1 for the Nonaka model, which presents this learning process as a continuous cycle.

Fig. 1 Nonaka’s SECI model. Remake after [36]

16

F. Wijnhoven

This brings us to our research questions: 1. 2. 3. 4.

What challenges can happen in the externalization stage of HCIS adoption? What challenges can happen in the combination stage of HCIS adoption? What challenges can happen in the internalization stage of HCIS adoption? What challenges can happen in the socialization stage of HCIS adoption?

The next section explains the methodology for detecting relevant adoption challenges. Section 3 presents our results and Sect. 4 discusses these results in the broader contexts of other variants of HCIS systems. Lastly, Sect. 5 entails the conclusion and reflects about further research.

2 Methodology As stated in the research questions, this research wants to register the organizational learning challenges for a concrete implementation of a CDSS. To realize this, we use a rich case description in which key stakeholders were interviewed regarding their view on the adoption of an HCIS. We first describe this case in the following subsection, after which we go in details about the stakeholders and our analysis.

2.1 The BD4SB Case The Utrecht University Medical Centre (UMCU) initiated the Applied Data Analytics in Medicine (ADAM) project in spring of 2017 to make healthcare more personalized with analytics in collaboration with external partners such as Siemens, Philips, SAS and Accenture. This is a hospital-wide project with a special team of clinicians and data scientists. ADAM enables pilots from four departments within the UMCU, among which the BD4SB pilot within the Neonatology Department that cares for premature babies. Babies that are born too early are sensitive to infection. Regretfully, treatment with invasive procedures, such as intravenous lines, blood samples and ventilation are all potential entry points for bacteria and add to the risk of illness. The Neonatology Department wants to know as early as possible when a patient will become ill and what treatment is most appropriate. The current healthcare process is as follows: (1) the physician suspects an infection (e.g., skin colour change, blood pressure or temperature instability), (2) the physician takes a blood culture, (3) this blood culture is examined for bacteria in the laboratory and (4), when the culture is positive, the blood is examined by Gram staining (bacteria are coloured to make them visible under the microscope to identify the species). This process can take up to 48 h which can be crucial in the development of the infection and the administration of antibiotics. Not administering antibiotics must be considered carefully since sepsis has negative consequences for the patient; however, administering antibiotics can

Challenges of Adopting Human-Centered Intelligent Systems …

17

also have negative consequences such as an increased chance on other diseases as asthma, cancer, intestinal diseases or obesity. The BD4SB CDSS aims to support the physicians when they consider administering antibiotics. The CDSS focuses on predicting with a minimum of false negatives. False negatives are the most dangerous situations because the advice is not to give antibiotics when it is needed. The BD4SB CDSS uses different data sources from the database of neonatology which consists of 6,000 children born between 24 and 32 weeks. These data originate from several systems whose data must be integrated and prepared within data management before analysis. The model development method applied for analysis is the ‘gradient boosting’ technique of predictive machine learning.

2.2 Data Collection and Analysis Twenty-three-key expert stakeholders were interviewed in a semi-structured way about challenges for the CDSS implementation. The unscripted narrative enabled the researcher to explore the respondents’ expertise. We categorize the given statement in this article as representative for adaption challenges per organizational learning process in Sect. 3. The stakeholders and their functions are given in the case descriptions. The respondents were visited or approached at an event.

3 Results: BD4SB Adoption Challenges This section presents data for the four organizational learning categories of Fig. 1.

3.1 Externalization Challenges Externalization is the transformation of tacit knowledge to explicit knowledge that can be stored on a person-independent medium. The term ‘knowledge’ is used here in a very broad sense in the many ways that people say to know something, and thus includes facts and figures, theories and explanations, methods and skills and experiences [37]. We found many statements from which we can say that the CDSS is adoption is challenging. The data quality in healthcare in general suffers from inaccurate registrations. The inaccurate registration is partially caused by the high registration load within healthcare and users’ inability to see the benefits of registration. R5 (professor health informatics) says: “I believe that a lot of professionals are not satisfied with the registration load….” R1 (ex-chairman board of AMC) says: “There is a tension

18

F. Wijnhoven

between how we can standardize the systems input and how can we keep the input requirements user friendly.” Even when the data would be available and correct, a lot of data processing and analytics is needed which unfortunately is not always compliant with CE1 quality norms. R11 (ex-physician and data scientist UMCU) says: “Firstly, some data producing machines within the UMCU are validated and CE approved for research and not for healthcare which is needed for BD4SB, a new CE approval of the data warehouse is required to realize this transition. Secondly, data source HIX (EHR) is currently updated once a day, however, only converted to usable input for analytics once a week.” Physicians also hold professional expectations that make them hesitant to work with a CDSS. R20 (ex-physician and analytical entrepreneur) says: “CDSS technology enforces strict working according to guidelines and thus may deprive physicians from their sense of added value. This perhaps is the single biggest reason why working with CDSS technology is so slowly being adopted - it makes physicians feel less valuated.” This hesitation is not necessarily a defensive or conservative attitude, but could be based on sincere professional doubts regarding a positive cost–benefit relation between knowledge externalization and contributions to medical care. R17 (Business engineer CDSS developer) says: “It is not clear if the improvement of care is worth the costs of data scientists, medical trial, infrastructure and maintenance of the BD4SB project.” One of these doubts may also be based on the infeasibility of some essential parts of a CDSS. The most optimal design for a clinical evaluation study is a randomized control trial (RCT), but R4 (clinical owner BD4SB) says: “The best way would be a randomized experiment where half will be exposed to the algorithm and the other half not. This is seriously hard, randomizing thousands of patients and the algorithm might be only suitable for our own population, we have to look at how we can show the clinical relevance without a randomized study.”

3.2 Combination Challenges Combination is the process of bringing different externalized parts (like databases and applications) together in one larger system. R12 (clinical owner BD4SB) says: “All the data the CDSS requires can be extracted from the hospital wide critical data layer for analytical proceedings by means of an API call. This data layer collects patient specific data from each individual by means of patient ID. We need this step or else we will still be looking at retrospective data. This data layer is currently under construction, technologically feasible, however, realization depends on commitment and budget.”

1 CE

stands for Conformité Européenne and is the European Union quality certification label.

Challenges of Adopting Human-Centered Intelligent Systems …

19

This layer would automatically collect, integrate and prepare the data from different data sources, but there is a lack of data sharing protocols. R8 (Senior technical consultant SAS) says: “Data protocols describe among other things what data was used, where the algorithm was developed, which version it was and how it should be utilized. This increases the controllability and auditability. They are not used currently. Every system generates its own data in a way that is most easy for this system. Which is not wrong of course. Only when you want to join the data from these systems, you might realize that you should have done it in a different manner.” ….and data are poorly shared among different applications as R12 (clinical owner BD4SB CDSS) says: “There is no automated process that collects data for analysis and makes it available to other solutions, all the data is still in its original source and has to be extracted and integrated manually by somebody to enable the following analysis.”

3.3 Internalization Challenge Internalization is the process of humanly interpreting knowledge, insights and recommendations given by a system. For making recommendations acceptable for decision makers, they have to rely on transparent data analysis procedures. R10 (ex-physician and sales manager SAS) says: “The analytical process often requires transparency. For example: which algorithm was used six months ago, which data was used and who entered or changed the data, were they entitled to do so, does the data sources supply the required quality or how it was analysed. If a physician stands in front of a judge, he/she must be able to exactly explain how the process was executed.” However, the system must be compliant with the standards for CE approval. R12 (clinical owner BD4SB): “However, the system still misclassifies some patients because they had other symptoms or something special, only 6 of the 500 children. This is not much, however, if all 6 babies die, this is hard to explain, of course.” And R13 (Program manager ADAM) says: “It does not matter if the algorithm says to intervene or not to intervene, the physician is always responsible and therefore is hesitant in using these systems.” And R10 (ex-physician and senior sales executive healthcare SAS) says: “When a complaint is made, a hospital/clinician has to justify every step within the treatment process.”

3.4 Socialization Challenges Socialization is the process of sharing tacit knowledge among people. This tacit knowledge may be any insight or experience that is not fully formally defined and not made person-independent and may have a role before externalization but also have a role in the process of human decision-making.

20

F. Wijnhoven

R7 (healthcare director SAS): “Hospitals have another problem. A cultural problem, the physicians are too distant from the IT department.” Involving physicians within the development stage of the analytical CDSS will be beneficial for trust and acceptancy of an analytical CDSS. Therefore, R11 (ex-physician and data scientist UMCU) says: “It is important to bring along a group of physicians within this process or else you will get the ‘not invented by me syndrome’.” Whereas the full responsibility for medical decisions lies with the physician, there is a certain kind of moral responsibility that lies with the developer. R16 (Ethicist and member medical ethical commission, UMCU) says: “If an algorithm makes a mistake or if a device makes a mistake, the developer is partially responsible since they built this into the algorithm, when there is a causal link between the mistake and the algorithm.” The Medical Ethical Assessment Committee (METC) has no clear legal framework for a predictive algorithm within the WMO. R11 (physician and data scientist UMCU) says: “CE approval requires the description of risks when it goes wrong for which you need a test period. To execute this test period, you need to deliver a risk assessment to the METC, this is a vicious circle.” Legislation only states to execute a good risk assessment but there is no specification on requirements for predictive algorithms within such an assessment (R17, business engineer developer CDSS). According to R19 (inspector e-health), the EU is currently developing norms for artificial intelligence which are also applicable to analytical CDSS. These norms could also give more foundation to the METC within the medical trial approval since it provides a greater understanding within a legal framework. The interpretation of the MDR is still under construction according to R22 (global clinical director notified body): “Dekra has updated its procedures to the new legislation on European level, the MDR. The inspection healthcare (IGJ) and representatives of other EU member states validated the new MDR procedures during joined assessments. Very often during these joined assessments the EU member state representatives had different interpretations of the regulations. Hence, it took 7 years to write the new legislation, then you have a meeting with representatives of a notified body from several countries and the healthcare inspection (IGJ) and they are still discussing what the MDR exactly says. The new legislation leaves room for interpretation probably because a 100% alignment of all EU member states is difficult to reach.” Including a regulatory expert in the R&D team might be beneficial to the CE approval process preparation of the BD4SB project team. R22 (Clinical CEO notified body, Dekra) says: “A regulatory professional within a R&D team involved from step one can think of what Notified Body or food and drug administration (FDA) market approval conditions are. Large companies see this and include a regulatory professional from the concept stage.” Thus, the socialization process is not just among only physicians and their teams, but also with other technical, ethical and legal stakeholders, who all mutually have to be able to understand each other.

Challenges of Adopting Human-Centered Intelligent Systems …

21

Fig. 2 Summary of CDSS development challenges

3.5 Case Summary and Discussion Figure 2 summarizes the BD4SB development challenges as boxes and ovals, respectively, within the analytical CDSS development project. Nearly all challenges influence the difficulty of physicians to justify their diagnostic or treatment selection decisions and could result in less motivation from their side to engage in CDSS.

4 Discussion In this section, we aim at finding general lessons for the development of HCIS on basis of this case. In grounded theory development this is named the formalization of theory [38] by generalizing over key factors. As key factors we identify the different knowledge types that CDSS/HCIS system can handle and different environments outside the context of hospitals. In Sect. 1, we introduced Minger’s classification of “knowledge”: facts and figures, theories and explanations, methods and skills and experiences [37]. The CDSS case discusses knowledge as facts and figures so that physicians know exactly and more quickly the status of a patient. This type of classification knowledge is not the knowledge that physicians find in their books, their methods or skills and experiences. A CDSS that would focus on delivering knowledge as theories and explanations is different and has a different organizational learning process organized in academic research, text books and medical journals. Such knowledge can be stored in expert systems and medical guidelines, although experiences has shown that developing and

22

F. Wijnhoven

maintaining such a system is complex and time-consuming [39, 40]. An interface with publishers’ content for very fast search, not just on key terms but especially on basis of diagnostics data as input could be very useful HCIS application here. “Methods & skills” knowledge is hard to place in a CDSS, although medical guideline systems do give some indications of what to do when, but these systems are of course not the professional skills themselves that are part of tacit knowledge developed through much training. Medical guideline systems, however, do need a very extensive and active maintenance process. This is possible only if the governance structure for such a system is well developed and well managed. “Experiences” knowledge can be shared nowadays easily via social media and problem-solving fora. There may not be many active platforms for physicians who exchange experiences mainly via conferences and inter-colleague consultations, which also enable high-level socialization. Very interestingly, patients do share experiences, especially about rare diseases, which enable professionals to mine insights [41]. There seems to be a fundamental difference between systems that create or manage rules and systems that classify. Rule-based systems bundle and combine experiences, methods, theories and explanations in what if expressions. Classification systems only process facts and figures to state in what state someone or something is. Depending on these classifications, the rules prescribe what to do, and so they are related, but analytical CDSS (like the BS4SB case) are only classifiers. Whereas expert systems or rule-based systems are transparent when rules are created and maintained by people, analytic systems apply algorithms that are often black boxes to their users. The lack of insights in how the algorithms make their conclusions has a much higher level of risk perception than the manual rule defining systems. Question remains if this would be valid for non-hospital settings. Clearly, the professional context of a physician is different from fields like manufacturing, services, education, construction or logistics. How different? They all may use large data sets, explicit expert knowledge, experience-sharing platforms and methods and skills. So, is it all really that different? Maybe less regulated by law and quality norms, as people’s life do not depend that much on these services but highly a matter of business or financial risk that are just part of the regular business game. Therefore, we identify two critically different variables that determine the adoption challenges of HCIS: (1) level of impact on life versus (2) risks of lack of transparency of reasoning. Regarding impact of life, the dichotomy between low and high is of course a gross simplification and many nuances on this scale are needed. Whereas in the extreme low impact on life people do not care that computers or people are making decisions for them, e.g. the cases of the application of administrative rules, in the other extreme the decisions may be a matter of life, death or happiness, as in many medical decisions, personnel recruitments or work evaluations. On this scale, in the middle are decisions that apply human fairness principles, i.e. the decisions should handle the interests of women, ethnic minorities or other demographic characteristics in a fair (Table 2).

Challenges of Adopting Human-Centered Intelligent Systems …

23

Table 2 HCIS adoption challenge dimensions and example systems Impact on life Low (e.g. business services) Transparency of reasoning

High (e.g. medical)

Low (e.g. data mining)

Customer segmentation Analytic CDSS algorithms

High (e.g. rules applied)

Banking rules loan application system

Medical expert (guidelines) system

Regarding the reasoning transparency dimension, low and high is again lacking nuances. The application of rules may need a clear classification of cases, where sometimes the class boundaries are less clearly to be identified. The BD4SB case is an example here, where it is not sure if a baby had sepsis, but when it has, the rule is clear and antibiotics should be administered. Social programmes against poverty meet a similar problem. Whereas in these programmes money is available for people lacking financial means, one may debate if these funds should be available for university students with unemployed middle-class parents. Algorithms that perform the classifications but also the data themselves can be biasing the cut-offs for the application or non-application of rules. In both cases, HCIS researchers and users should be aware of fairness impacts and algorithmic biases, which means that HCIS with social impact need the influence of affected stakeholders in a democratic way.

5 Conclusions This study aimed at understanding the challenges of HCIS adoptions as an organizational learning process. We found many challenges summarized in Fig. 2 for a clinical DSS, and we brought this case in the contexts of fairness and biases, where the CDSS case is on the high risks of life and low transparency of reasoning, making adoption of CDSS not a complex case. Given the high responsibility of physicians in medical decision making, resistance in the use of CDSS may be possibly wise abstention [42]. The lack of trust is not just rooted in a technical or organizational difficulty, but especially is rooted in legal and interorganizational frameworks that are not well in place yet. A new kind of decision-making process must also be developed with new roles for physicians, patients and CDSS developers. Although business applications of HCISS, like product recommenders and customer services management, may have less a context of life or death, companies and governments also have to treat their customers fair and non-biased. This also implies that HCIS systems development should not be the sole work of knowledge engineers but require the explicit and intensive involvement of business people who can translate the norms of handling customers fair and non-biased. HCIS systems therefore should be validated also on

24

F. Wijnhoven

fairness and absence of bias. This may require the involvement of representatives of the people affected by these systems. Acknowledgements The case for this article is based on data collected by Rick Klein Koerkamp. This article, which gives an analysis to the more general domain of HCIS, however, is the full responsibility of the author.

References 1. Marakas, G.: Decision support systems in the 21st century. ACM SIGSOFT Softw. Eng. Notes 27, 104 (1999) 2. Turkay, C., Jeanquartier, F., Holzinger, A., Hauser, H.: On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics, pp. 117–140 (2014) 3. Wongsuphasawat, K., Guerra Gómez, J.A., Plaisant, C., Wang, T., Taieb-Maimon, M., Shneiderman, B.: LifeFlow. In: CHI 2011 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1747–1756. ACM, New York (2011) 4. Korotkov, K., Garcia, R.: Computerized analysis of pigmented skin lesions: a review. Artif. Intell. Med. 56, 69–90 (2012) 5. Faggella, D.: Where healthcare’s big data actually comes from. Tech Emerg, 11 (2018) 6. Exastax: Top Five Use Cases of Tensorflow. https://www.exastax.com 7. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2, 3 (2014) 8. Holzinger, A., Geierhofer, R., Mödritscher, F., Tatzl, R.: Semantic information in medical information systems: utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14, 3781–3795 (2008) 9. Ivanovi´c, M., Budimac, Z.: An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41, 5158–5166 (2014) 10. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: The university of bradford institutional repository. J. Bus. Res. 70, 263–286 (2017) 11. Murphy, K.P.: Machine Learning-A Probabilistic Perspective. Table-of-Contents (2012) 12. Somashekhar, S.P., Sepúlveda, M.J., Puglielli, S., Norden, A.D., Shortliffe, E.H., Rohit Kumar, C., Rauthan, A., Arun Kumar, N., Patil, P., Rhee, K., Ramya, Y.: Watson for Oncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board. Ann. Oncol. 29, 418–423 (2018) 13. Mehta, N., Pandit, A.: Concurrence of big data analytics and healthcare: a systematic review. Int. J. Med. Inform. 114, 57–65 (2018) 14. Eberhardt, J., Bilchik, A., Stojadinovic, A.: Clinical decision support systems: potential with pitfalls. J. Surg. Oncol. 105, 502–510 (2012) 15. Petitjean, F.: IBM Watson kiest foute kankerbehandeling 16. Ross, C.: IBM’s Watson supercomputer recommended ‘unsafe and incorrect’ cancer treatments, internal documents show. https://www.statnews.com/wp-content/uploads/2018/09/IBMs-Wat son-recommended-unsafe-and-incorrect-cancer-treatments-STAT.pdf 17. Nijhof, K.: Watson for oncology maakt forse stappen 18. Electronics, A., Batra, B.G., Queirolo, A., Santhanam, N.: Artificial intelligence: the time to act is now, 1–16 (2018) 19. McNutt, T.R., Moore, K.L., Quon, H.: Needs and challenges for big data in radiation oncology. Int. J. Radiat. Oncol. Biol. Phys. 95, 909–915 (2016) 20. Rumsfeld, J.S., Joynt, K.E., Maddox, T.M.: Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13, 350 (2016)

Challenges of Adopting Human-Centered Intelligent Systems …

25

21. Dinov, I.D.: Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data. Gigascience 5, 12 (2016) 22. Peek, N., Holmes, J.H., Sun, J.: Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb. Med. Inform. 23, 42–47 (2014) 23. Salas-Vega, S., Haimann, A., Mossialos, E.: Big data and health care: challenges and opportunities for coordinated policy development in the EU. Heal. Syst. Reform. 1, 285–300 (2015) 24. Asokan, G.V., Asokan, V.: Leveraging “big data” to enhance the effectiveness of “one health” in an era of health informatics. J. Epidemiol. Glob. Health 5, 311–314 (2015) 25. Auffray, C., Balling, R., Barroso, I., Bencze, L., Benson, M., Bergeron, J., Bernal-Delgado, E., Blomberg, N., Bock, C., Conesa, A.: Making sense of big data in health research: towards an EU action plan. Genome Med. 8, 71 (2016) 26. Budhiraja, R., Thomas, R., Kim, M., Redline, S.: The role of big data in the management of sleep-disordered breathing. Sleep Med. Clin. 11, 241–255 (2016) 27. Cox, M., Ellsworth, D.: Application-controlled demand paging for out-of-core visualization. In: Proceedings. Visualization 1997 (Cat. No. 97CB36155), pp. 235–244. IEEE (1997) 28. Geerts, H., Dacks, P.A., Devanarayan, V., Haas, M., Khachaturian, Z.S., Gordon, M.F., Maudsley, S., Romero, K., Stephenson, D., Initiative, B.H.M.: Big data to smart data in Alzheimer’s disease: the brain health modeling initiative to foster actionable knowledge. Alzheimer’s Dement. 12, 1014–1021 (2016) 29. Kruse, C.S., Goswamy, R., Raval, Y.J., Marawi, S.: Challenges and opportunities of big data in health care: a systematic review. JMIR Med. Inform. 4, e38 (2016) 30. Szlezak, N., Evers, M., Wang, J., Pérez, L.: The role of big data and advanced analytics in drug discovery, development, and commercialization. Clin. Pharmacol. Ther. 95, 492–495 (2014) 31. Maia, A.-T., Sammut, S.-J., Jacinta-Fernandes, A., Chin, S.-F.: Big data in cancer genomics. Curr. Opin. Syst. Biol. 4, 78–84 (2017) 32. Holden, R.J., Karsh, B.T.: The technology acceptance model: its past and its future in health care. J. Biomed. Inform. 43, 159–172 (2010) 33. Maillet, É., Mathieu, L., Sicotte, C.: Modeling factors explaining the acceptance, actual use and satisfaction of nurses using an Electronic Patient Record in acute care settings: an extension of the UTAUT. Int. J. Med. Inform. 84, 36–47 (2015) 34. Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.Z.: Big data for health. IEEE J. Biomed. Health Inform. (2015) 35. Abouelmehdi, K., Beni-Hssane, A., Khaloufi, H., Saadi, M.: Big data security and privacy in healthcare: a review. Procedia Comput. Sci. (2017) 36. Nonaka, I.: A dynamic theory knowledge of organizational creation. Organ. Sci. 5, 14–37 (1994) 37. Mingers, J.: Management knowledge and knowledge management: realism and forms of truth. Knowl. Manag. Res. Pract. 6, 62–76 (2008) 38. Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Adeline, Chicago, Illinois (2009) 39. Bensoussan, A., Mookerjee, R., Mookerjee, V., Yue, W.T.: Maintaining diagnostic knowledgebased systems: a control-theoretic approach. Manag. Sci. (2008) 40. Grosan, C., Abraham, A.: Rule-based expert systems. Intell. Syst. Ref. Libr. (2011) 41. Dirkson, A., Verberne, S., Van Oortmerssen, G., Gelderblom, H., Kraaij, W.: Open knowledge discovery and data mining from patient forums. In: 2018 IEEE 14th International Conference on e-Science (e-Science), pp. 397–398. IEEE (2018) 42. Althuizen, N., Reichel, A., Wierenga, B.: Help that is not recognized: harmful neglect of decision support systems. Decis. Support Syst. 54, 713–728 (2012)

Multi-level Evaluation of Smart City Initiatives Using the SUMO Ontology and Choquet Integral Nil Kilicay-Ergin and Adrian Barb

Abstract Smart city initiatives from different administrative levels have varying perspectives. Evaluating initiatives in isolated administrative levels limits understanding of the systemic implications of initiatives on policy analysis and development. This paper focuses on the use of natural language processing and knowledge elicitation techniques as the means to evaluate smart city initiatives at various administrative levels including multinational, national, and local city levels. By building ontological knowledge maps, study evaluates alignment of multi-level smart city initiatives which ultimately supports smart city governance and policy analysis.

1 Introduction Smart city initiatives are proliferating in urban planning and development, but governments are challenged by transforming these initiatives into practice. Local and central governments lack a consistent and complete plan for governance of smart city development. This is partly because smart city development involves multi-level of agencies, perspectives, and sectors. This multiplicity of perspectives can be observed in the way cities around the world define smart cities. For example, Barcelona’s perspective of smart city is to transform the city using advanced digital technologies in order to create a sustainable city with competitive and innovative commerce whereas Amsterdam’s perspective of smart city is to address climate changes using smart infrastructure technologies. Other cities have differing perspectives that range from using advanced technologies for urban design to improving political participation of its citizens [30]. Understanding and identifying the synergies as well as differences between multidomain, multi-level contextual perspectives in an evolving smart city concept is N. Kilicay-Ergin (B) · A. Barb Penn State Great Valley School of Graduate Professional Studies, Malvern, PA 19355, USA e-mail: [email protected] A. Barb e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_3

27

28

N. Kilicay-Ergin and A. Barb

confusing. Decision makers and policy analysts need various tools to support the governance process. Several studies emphasize the need for holistic approaches to evaluating interconnected initiatives and policies [15]. Initiatives and policies are communicated across different levels of administrations through text. Structural representation of text using formalized semantic networks allows decision makers and analysts at different levels of administration to communicate and understand each other. However, identifying relevant parts of a text for structure representation is not a straightforward task due to the variability in the language of free text. To reduce complexity and ambiguity, words are categorized into more abstract categories using ontologies. Ontologies provide conceptual models of a domain and can be used in automatic semantic evaluation of text [11]. In this paper, we evaluate three smart city initiatives published by administrative levels at international, national, and city levels. We use formal ontology to abstract knowledge found in text. Experiments evaluate the relevance, coherence, and alignment of multi-level smart city initiatives to ultimately provide a holistic view of the problem for decision makers and policy analysts. The rest of the paper is organized as follows: Sect. 2 reviews related work on smart city governance and natural language processing techniques. Section 3 describes the methodology used to evaluate and analyze multi-level smart city initiatives. Section 4 provides details on the experimental evaluation, and finally Sect. 5 concludes the paper with a discussion of future research.

2 Related Work Effectiveness of smart city governance is influenced by the availability of knowledge and local conditions among interested stakeholders [18]. Studies emphasize the need to focus on contextual factors and to integrate disparate and ambiguous information to support governance process [2, 18]. Cities need to align their smart city initiatives with agendas of higher-level administrations including states, regions, governments, and international treaties. Smart city governance is a complex, multi-level and multidomain problem which necessitates a proper understanding and mapping of the knowledge structure. One of the key aspects of knowledge understanding is the evaluation of contextual characteristics of knowledge representation [27]. For example, a smart city initiative generated at the United Nations level will focus mostly on a broad spectrum of issues that relates to international development of communities. At the same time, smart city initiative at a country or city level, will focus more on specific areas of applicability such as transportation, commerce, housing, entertainment, etc. However, although expressed at different levels of granularity, these initiatives have some key areas in common which relate to safe, sustainable, and inclusive smart cities. Consumption of knowledge is difficult due to its representation at different granularity levels. It is important to achieve consensus among decision makers at different

Multi-level Evaluation of Smart City Initiatives …

29

levels. The research by Miller [20] shows that consensus in a community is knowledge based and it is a very good indicator of the amount of shared knowledge of that community. Some of the most important methods of evaluating consensus in decision-making communities are by formalization [33] and abstraction [26]. By using an ontology, it is possible to consistently capture, represent, exchange, and interpret knowledge in textual documents. Data from various sources and formats can be mapped into an ontology to compare knowledge structures [11]. By using an ontology, it is possible to achieve both formalization and abstraction of knowledge in a domain [32]. This allows the use of formal vocabularies to describe knowledge from a specific domain and facilitate communication among decision-making entities. Classification schemes for linguistic content have been developed in several studies. Levin [14] classified English verbs into top-level categories. Another widely used classification system, WordNet semantic network [19] includes nouns, verbs, adjectives, and adverbs. The Suggested Upper Merged Ontology (SUMO) [23] is an open source formal ontology consisting of an upper ontology upon which other domain ontologies are or can be built. The SUMO ontology consists of more than 20,000 terms [24]. Due to this ontological richness and logical expressiveness, SUMO provides a greater variety and completeness of definitions than other formal ontologies such as DOLCE [10] or BFO [29]. SUMO has been mapped to WordNet and allows automatic conversion of English words to formal terms. This makes SUMO also suitable for reducing ambiguity and interpretation of text meaning using formal terms [24]. The process of mapping free text into an ontology is complex and difficult to generalize. For example, in the smart city literature, the relationship between “transportation” and “safety” or “education” is stronger than in general English conversation which is most closely related to “bus,” “rail,” or “commuter.” Also, developers of specific policies may choose to use synonym words to express similar concepts which may not be found in the smart city literature that was used for our evaluation. This complexity of this process can be addressed using multi-criteria decision-making strategies that provide a solution for ordering multiple alternatives. The Choquet integral is one of these methods of aggregation method that follows the minimum variance principle [12] to determine the utility of different alternatives. In this article, we will use the Choquet integral to determine the relevance of synonymity among different concepts. Studies acknowledge the challenges of knowledge extraction in smart city domain due to interoperability problems among semantics of data [5]. A knowledge-based model for smart city services is proposed in [5] where reconciliation processes are used to map static and dynamic data to a smart city ontology. A multi-phase correlation framework is used in [25] to automatically extract non-taxonomic relations from smart city domain documents. In this study, a semantic graph-based method is combined with context information of terms to identify non-taxonomic relations. Formal concept analysis is a clustering technique used to extract knowledge by constructing formal concept lattices. Formal concept analysis is applied to smart city problems in [7] to extract emergent knowledge in order to design smart city applications and services.

30

N. Kilicay-Ergin and A. Barb

In this paper, we provide a methodology to evaluate smart city-related text written at different levels of abstraction. We accomplish this by first mapping the text into WordNet [19] and then by mapping WordNet terms into SUMO concepts. We use the knowledge mapping between WordNet and SUMO provided in [23]. We apply our approach on three texts: United Smart Cities (USC) from United Nation [8], United States Smart City fact sheet [31], and Philadelphia Smart City Road Map [22].

3 Methodology In this paper, we propose a quantitative method for formal evaluation of text using the principles of ontological formalization and abstraction. By applying these principles, we can evaluate the similarity and differences among initiatives by associating different concepts to specific areas of the smart city topics. For example, a sentence like “US will invest in new grants to build a research infrastructure for smart cities” should be more relevant to the areas of infrastructure than to the areas of environment. The methodology is summarized in Algorithm 1 which is shown in Table 1. We explain the algorithm in detail in the proceeding sub-sections. The first step in our procedure is to generate a dictionary related to smart city. We focus on the frequency of words commonly used by domain experts. For this article, we crawled 12 textbooks related to the area of smart city and built a custom dictionary, as shown in lines 1–3 of the algorithm. To apply our methodology, we also need other general information such as general frequency of English words, root form of words, and similarity of English words. For word inflection, we used the Automatically Generated inflection Database [3]; for English word frequency, we used the n-grams data [21]; and for word similarity, we used the DIRECT system [13].

3.1 Triplet Extraction For each document, we extract the relevant triplets (lines 4–6 in Algorithm 1). Triplet extraction is performed using the Stanford NLP package [16]. Triplets were extracted with the open information extraction algorithm [9] that generates triplet from maximally shortened sentence fragments. These triplets were further reduced by separating typed dependencies such as adverbial or adjectival modifiers. Table 2 shows an example of triplet extraction for a sentence in the US Smart City fact sheet [31]. At each stage of the process, the sentence is simplified until all the dependencies are accounted for and removed. The name entity recognition [6] will identify countries, cities, organizations, currency, etc. In our example, we have identified “US” as an

Multi-level Evaluation of Smart City Initiatives …

31

Table 1 Algorithm for document relevance analysis

instance of a country and generated the subject–action–object triplet “Country has instance US.” We will subsequently replace all the instances of “US” with “country.” Further, we identify compound words, adjectives, adverbs, and other modifiers existent in each sentence. Finally, we identify other triplets, such as “country invest more,” using the open information extraction method [1].

32

N. Kilicay-Ergin and A. Barb

Table 2 Example of concept extraction from a sentence using natural language processing techniques NLP task

Text

Triplets

Name entity recognition

US will immediately invest more than $35 million in new grants and $10 million in proposed investments to build a research infrastructure for Smart Cities by the National Science Foundation and National Institute of Standards and Technology

Country has instance US Money has instance $35 million Money has instance $10 million City has instance Smart City

Determinant removal

Country will immediately invest more than money in new grants and money in proposed investments to build a research infrastructure for city by the organization

Compound word identification Country will immediately Research relates to invest more than money in new infrastructure grants and money in proposed investments to build research infrastructure for city by organization Adjective identification

Country will immediately New characterize grant invest more than money in new grants and money in proposed investments to build infrastructure for city by organization

Adverb identification

Country will immediately invest more than money in grants and money in investments to build infrastructure for Cities by organization

Country invest immediately

Modifiers identification

Country will invest more than money in grants and money in investments to build infrastructure for city by organization

More relates to money Money relates to grant Money relates to investment Money relates to infrastructure Infrastructure relates to city Infrastructure relates to organization

Other

Country will invest more

Country invest more

Multi-level Evaluation of Smart City Initiatives …

33

Fig. 1 Semantic network generated from the example sentence

3.2 Triplet Preprocessing The next stage is to apply more preprocessing to all the components of a triplet as shown in lines 9–15 of Algorithm 1. This consists in action abstraction and dictionary base conversion of subjects and objects. The abstraction of a triplet action is performed using the WordNet lexical classes [19]. For example, the action in the triplet “Country invest immediately” describes a change in possession in money and hence action type has class “possession.” On the other hand, the action in the “Research relates to infrastructure” triplet describes a state of facts. As seen in the paragraph above, the object in the triplet “Country invest immediately” is not a subject but rather an adjective. It may not have an equivalent in the SUMO ontology, so we need to convert it to its noun form by identifying its dictionary root. By using this procedure, the triplet will be converted to “Country-possessionimminence.” The result of this stage is shown in Fig. 1 which is a semantic network generated by combining the triplets in a visual graph.

3.3 Concept Mapping into the SUMO Ontology Mapping concepts into the SUMO ontology is performed in three steps as shown in lines 12–20 in Algorithm 1. First, we perform query expansion to find similar words using the DIRECT system. Next, we remove the irrelevant synonyms using the Choquet integral to find the domain-specific relevant terms. Finally, we map the entailed words into SUMO using the approach in [17] and retain the most relevant mapping. Our goal is to map our terms into the highest level of granularity, so we provide the highest level of information detail. This is realized by a higher weight on mapping into SUMO concepts at a higher depth in the ontology. This would allow us more flexibility in presenting the data to the analyst at levels ranging from specific

34

N. Kilicay-Ergin and A. Barb

to very abstract representations. For more information on mapping, the reader is directed to [4]. Table 3 shows an example of mapping the term “money” in the SUMO. Displayed equations are centered and set on a separate line. As seen from this table, the term “money” has seven related terms, including self with different degrees of similarity. Each of these similar terms have some term frequency both in general English language and in the smart city-related literature. These are used as input for determining their relevance to the context using the Choquet integral. The result shows that only two terms are relevant to our case: “investment”—that can be mapped in SUMO concept “process” using a subsumtion relation and the term “money” that can be mapped into the “currency” SUMO concept using an equivalence relation. Fig. 2 shows semantic network after it was mapped into the related SUMO concepts. In this figure, the network was both formalized into a formal ontology as well as abstracted to contain only general usage terms. As a result, the network size is decreased, and its density is increased. Table 3 Example of determining the SUMO equivalence for the term “money” Entailed word

Similarity Term Contextual Choquet Relevant Sumo SUMO frequency frequency equivalent relation

Money

1

0.000736

0.0338

Yes

Currency Equivalence

Investment 1

0.000069 0.001409

0.0002

0.0221

Yes

Process

Budget

0.798

0.000066 0.00038

0.0045

No

Resource

0.320

0.000031 0.00100

0.0023

No

Expense

0.611

0.000016 0.000097

0.00022 No

Currency

0.432

0.000023 0.000011

0.00002 No

Funding

1

0.000087 0.000026

0.00052 No

Fig. 2 Semantic network mapped into the SUMO ontology

Subsumed

Multi-level Evaluation of Smart City Initiatives …

35

We apply this procedure to text to be investigated and for each document we generate a semantic map. Further each map is simplified using PathFinder networks [28].

4 Evaluation Using the procedure above, we have generated a conceptual map for each of the three documents. The generated maps varied in size and detail level. For example, the map generated from the UN document contained 61 concepts and 130 action edges. Comparatively, the US policy document contained 104 concepts and 208 action edges. The most complex map was the one generated for Philadelphia with 114 concepts and 285 action edges. The granularity of the concepts varied across the three maps as well. Fig. 3 shows a histogram of the depth of concepts in the SUMO ontology. The PHL map contains concepts with higher depth in the ontology which hints to the fact that decision makers are more detail oriented. Similarly, the concepts in the UN document have lower depth in the ontology which hints to their role of guiding organizations. Fig. 4 shows the histogram of actions that were used in each document. Documents from the US and PHL align very well with their distribution of actions which hints to the idea of coordination of efforts between the two entities. The UN document differs in many aspects. For example, the UN document focuses more on communication and less on consumption which hints to higher degree of operational power of regional initiatives. Also, the UN document focuses more on creation and possession and less on existing state, which is typical for broad-spectrum policies. Since the three networks were complex, we have performed two postprocessing techniques on each. First, we have abstracted each network by walking the subclass relations in the SUMO ontology toward the root. For example, the concept “passenger Vehicle” has the following parent nodes, in order of generalization: “Vehicle,”

Fig. 3 Histogram of concept depth in the SUMO ontology for the three smart city policies

36

N. Kilicay-Ergin and A. Barb

Fig. 4 Histogram of action types for each of the three smart city policies

“Transportation Device,” “Device,” “Artifact,” etc. In this analysis, we have generalized each concept to at least level five in the SUMO ontology. This resulted in a reduction of 67%, 74%, and 75% in the number of concepts for the UN, the US, and PHL maps, respectively. Further, we applied the PathFinder network analysis to the resulting map to preserve only the most relevant action edges. Fig. 5 shows some partial view of the overlapped maps that show interesting similarities and approaches of the three organizations. In Fig. 5a, we can see that the relationship between artifact and human is central but abstract. The US documents adds more detail to the relationship by including specific systems to be used and how humans interact with them. Further, the PHL document adds information about social role and language to this relationship. Similar pattern can be observed for transportation in the context of smart city, as shown in Fig. 5b. While the UN document considers it a relevant but abstract topic, the US document adds more details

Fig. 5 Example segments of conceptual map that overlap over the three smart city policies: P a shows the relationship between artifact and human and b shows the existing relationships for transportation

Multi-level Evaluation of Smart City Initiatives …

37

about this by including collections. Most detail on the topic of transportation can be found in the PHL document where financial and organizational information is added to the policy.

5 Conclusions and Future Work In this article, we developed a model to represent domain-specific knowledge using semantic maps. These maps consist of subject–action–object triplets extracted from text which are further formalized and abstracted to SUMO ontology. We applied this model to smart city initiatives at three levels of organizational authority: international, country, and city. Our evaluation shows how each level addresses the subject differently by using different level of abstractions and focusing on different aspects of the smart city topic. Our future work includes expert in the loop assessment that will include questionnaires to compare our results of with other methods as well as evaluate the impact of document size on the relevance of our approach. We also plan to work on a more indepth evaluation of smart city policy development procedures using mereotopology principles. We will evaluate the smart city information topology at different mereological levels to identify gaps or overlaps in policies. The knowledge gained from this process can be used to address policy areas that need more focus for a consistent allocation of financial and human effort. We plan to apply it to other domain-specific areas such as analysis of security or privacy characteristics of complex systems. This can be done by aligning text generated by organizations to domain-specific ontologies using the principles explained in this article.

References 1. Angeli, G., Premkumar, M.J., Manning C.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the Association of Computational Linguistics (ACL) (2015) 2. Angelidou, M.: Smart city policies: a spatial approach. Cities 41, S3–S11 (2014) 3. Aspell 0.60.8 released. GNU Aspell Website. Kevin Atkinson. Accessed 22 November 2019 4. Barb, A.: Technical debt elicitation in text using natural language processing techniques. In: SERP 2016, Las Vegas, July 25–28, 2016 (2016) 5. Bellini, P., Benigni, M., Billero, R., Nesi, P., Rauch, N.: Km4City ontology building vs data harvesting and cleaning for smart-city services. J. Vis. Lang. Comput. 25, 827–839 (2014) 6. Dingare, D., Finkel, J., Nissim, M., Manning, C., Grover, G.: A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations. In: The 2004 BioLink Meeting: Linking Literature, Information and Knowledge for Biology at ISMB (2004)

38

N. Kilicay-Ergin and A. Barb

7. De Miguel-Rodriguez, J., Galan-Paez, J., Aranda-Corral, G., Borrego-Diaz, J.: Reasoning as a bridge from data city towards smart city. In: IEEE Conference on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, pp. 968–974 (2016) 8. Eik, K.A.: United Smart Cities (USC)-United nations Partnership for SDGs Platform. United Nations, 26 October 2015 (2015). https://sustainabledevelopment.un.org/partnership/? p=10009 9. Gabor, A., Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the Association of Computational Linguistics (ACL) (2015) 10. Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., Schneider, L.: Sweetening ontologies with DOLCE. In: Gomez-Perez, V., Benjamins, V. (eds.) Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management, vol. 2473, pp. 166–181. Springer (2002) 11. Gruber, T.: A translation approach to portable ontology specifications. Knowl. Acquis. 5, 199– 220 (1993) 12. Kojadinovic, I., Marichal, J.-L., Roubens, M.: An axiomatic approach to the definition of the entropy of a discrete Choquet capacity. Inf. Sci. 172, 131–153 (2005) 13. Kotlerman, L., Dagan, I., Szpektor, I., Zhitomirsky-Geffet, M.: Directional distributional similarity for lexical inference. Nat. Lang. Eng. 16(4), 359–389 (2010) 14. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL (1993) 15. Magro, E., Wilson, J.R.: Complex innovation policy systems: towards an evaluation mix. Res. Policy 42, 1647–1656 (2013) 16. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) 17. Marichal, J.L.: An axiomatic approach of the discrete Choquet integral as a tool to aggregate interacting criteria. IEEE Trans. Fuzzy Syst. 8, 800–807 (2000) 18. Mayangsari, L., Novani, S.: Multi-stakeholder co-creation analysis in smart city management: an experience from Bandung, Indonesia. Procedia Manuf. 4, 315–321 (2015) 19. Miller, G.A., Fellbaum, C.: WordNet then and now. Lang. Resour. Eval. 41, 209–214 (2007) 20. Miller, B.: When is consensus knowledge based? Distinguishing shared knowledge from mere agreement. Synthese 190(7), 1293–1316 (2013) 21. N-grams data: N-grams: based on 520 million word COCA corpus. United Nations, 15 October 2019 (2019). https://www.ngrams.info/intro.asp 22. Office of Innovation and Technology: SmartCityPHL Roadmap. City Of Philadelphia, Publisher, Date Published in 04 February 2019 (2019). https://www.phila.gov/media/201902 04121858/SmartCityPHL-Roadmap.pdf 23. Pease, A., and Niles, I.: IEEE standard upper ontology: A Progress report knowledge engineering review: Special issue on ontologies and agents, Vol. 17, (2002) 24. Pease, A.: Ontology: A Practical Guide. Articulate Software Press, Angwin, CA (2011) 25. Qiu, J., Chai, Y., Liu, Y., Gu, Z., Li, S., Tian, Z.: Automatic non-taxonomic relation extraction from big data in smart city. IEEE Access 6, 74854–74864 (2018). Special Section on Collaboration for Internet of Things 26. Reed, S.K.: A taxonomic analysis of abstraction. Perspect. Psychol. Sci. 11(6), 817–837 (2016) 27. Rodick, D.W.: Process re-engineering and formal ontology: a deweyan perspective. Philos. Soc. Crit. 41(6), 557–576 (2015) 28. Schvanevelt R.W.: Pathfinder Associative Networks: Studies in Knowledge Organization. Abl, Norwood, NJ (1990) 29. Smith, B.: Basic concepts of formal ontology. In: Guarino, N. (ed.) Formal Ontology in Information Systems, pp. 19–28. IOS Press, Beijing (1998)

Multi-level Evaluation of Smart City Initiatives …

39

30. Trindade, E.P, Hinnig, M.P.F., Moreira da Costa, E., Marques, J.S., Bastos, R.C., Yigitcanlar, T.: Sustainable development of smart cities: a systematic review of the literature. J. Open Innov. Technol. Market Complex. 3(11) (2017) 31. The White House, Office of the Press Secretary: FACT SHEET: Administration Announces New “Smart Cities” Initiative to Help Communities Tackle Local Challenges and Improve City Services. The white House–President Barack Obama, The White House, September 14, 2015 (2015). https://obamawhitehouse.archives.gov/the-press-office/2015/09/14/fact-sheet-admini stration-announces-new-smart-cities-initiative-help 32. Valaski, J., Malucelli, A., Reinehr, S.: Ontologies application in organizational learning: a literature review. Expert Syst. Appl. 39(8), 7555–7561 (2012) 33. Woelert, P.: Governing knowledge: the formalization dilemma in the governance of the public sciences. Minerva 53(1), 1–19 (2015)

Assistance App for a Humanoid Robot and Digitalization of Training Tasks for Post-stroke Patients Peter Forbrig, Alexandru Bundea, and Thomas Platz

Abstract During the last decades the number of people affected by stroke increased a lot. Unfortunately, the number of therapists is not large enough to fulfill the demands for specific training for stroke survivors. Within the project E-BRAiN (Evidencebased Robot Assistance in Neurorehabilitation), we want to develop software that allows a social humanoid robot to give instructions to perform and to observe carefully selected exercises and provide feedback. In addition, the robot should motivate patients in continuing their training tasks. We focus in this paper on the Arm Ability Training (AAT). Conventionally, some AAT exercises are performed with paper and pencil under the supervision of therapists. To improve the collaboration between patient and a humanoid the training tasks have to be digitalized. Such a digitalization of one training task is discussed by using a tablet computer. This allows the social humanoid robot to obtain information about the training performance. The robot can provide instructions, show pictures and videos. In addition, the humanoid can comment on the training results, can aid and motivate. Comments on the long-term results of the training tasks related to the individual goals of stroke survivors can be provided as well.

P. Forbrig (B) · A. Bundea Department of Computer Science, University of Rostock, Albert-Einstein-Str. 22, 18055 Rostock, Germany e-mail: [email protected] A. Bundea e-mail: [email protected] T. Platz Neurorehabilitation Research Group, University Medical Center Greifswald, Fleischmannstraße 44, 17475 Greifswald, Germany e-mail: [email protected] Spinal Cord Injury Unit, BDH-Klinik, Center for Neurorehabilitation, Ventilation and Intensive Care, Karl-Liebknecht-Ring 26a, 17491 Greifswald, Germany © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_4

41

42

P. Forbrig et al.

1 Introduction Robots of different styles exist in our modern world. In domains like industrial production [6] or healthcare [1] a lot of work is already done by robots. The reader is surely aware of many other application domains. Robots are very functional in factories on production lines. They are constructed for a specific purpose. They are obviously machines and might have arms like cranes. Such robots do not look like animated beings. Hence, there is no desire to communicate with such robots. Humans do not have empathy with them. However, sometimes robots can look like animals, and in such instances might be considered as pats. Communication and interaction is especially important for elderly people with dementia and robots that resemble creatures might play a role to stimulate social interaction [3]. Robots that look like humans and are characterized as humanoid robots. In the case of a socially interactive robot like “Pepper” [14], they are called social humanoid robots. We want to use Pepper to support patients after a stroke with their training aiming to restore brain function. According to [2], every 40 s someone in the United States has a stroke. Every 4 min, someone dies of a stroke. Fortunately, many stroke patients survive. However, they have to cope with functional deficits resulting from brain damage. Often one arm is disabled, which creates a lot of problems in daily life. Recent clinical research delineated opportunities to train the brain in such a way that patients can recover. The application of a social humanoid robot for this purpose is aimed for within our project E-BRAiN (Evidence-based Robot Assistance in Neurorehabilitation). The clinical effectiveness of the selected Arm Ability Training (AAT) is discussed and shown by Platz and Lotze [7]. We will provide an overview of the exercises and discuss the digitalization of one of them. Additionally, there exist exercises that are executed in front of a mirror like discussed in [7]. Performed exercises make the impression to the patient that the handicapped arm performs specific movements. This results in changes of the brain in such a way, that arm ability of the handicapped arm increases. The corresponding application for Pepper is discussed in more detail in [4, 5].

2 Arm Ability Training (AAT) The AAT was designed to promote manual dexterity recovery for stroke patients who have mild to moderate arm paresis [9]. Platz and Lotze report in [10] about its design, clinical effectiveness, and the neurobiology of actions. The idea of the AAT goes back to the identification of sensor motoric deficits of stroke survivors in [11, 12]. Figure 1 provides an overview of the suggested training activities.

Assistance App for a Humanoid Robot …

43

Fig. 1 Training tasks of AAT (From [10])

Each of the eight training tasks has to be practiced four times for 1 minute during a training session that is scheduled for each weekday over 3 weeks mostly. The training is supervised by a therapist on a 1:1 basis. This kind of training is really time-consuming for therapists. Even more important is the fact that the needed resources of therapists do not exist to the extent that those stroke survivors in need and who would benefit from the training could obtain the therapy. While the training is clinically effective, its widespread application is not feasible secondary to therapeutic resource restrictions and hence functional recovery cannot be supported to the degree that would be medically possible. A social humanoid robot like Pepper might be applicable to offer additional training sessions once a patient has been instructed and supervised by a therapist at the beginning of the training series. In those further sessions, Pepper could substitute the therapist to a certain degree and assist stroke survivors during training sessions. Motivating the patient is the most important aspects. Otherwise, patients often do not perform their training. This is similar to the need of a fitness trainer for general exercises. It will be explored in the project whether a social humanoid robot like Pepper can build a good relation to a patient and can motivate patients in continuing their efforts. It is not our intention to replace therapists. Therapists need to assess patients, explore the individual treatment goals, select the appropriate training, and introduce the training exercises to patients. They also need to supervise the first cycles of training performance. Once, this level of therapeutic care is achieved, there might be a role for social robots to support additional largely standardized training sessions.

44

P. Forbrig et al.

Fig. 2 Part of a document of the CANCELLATION training tasks

Individualized therapeutic goals are best expressed in a positive way, e.g., “I want myself to be enough skilled to perform fine motor tasks” instead of “I do not want to ask for help if I want to fine motor tasks to be done.” In the motivational part of the communication, the social humanoid robot could come back to these goals. There could be spoken sentences like: “You are really on the right track to reach your goal to become as skillful with fine motor tasks as you used to be. Some more of the just performed training tasks and you will be there!”. To be able to evaluate this approach, it is necessary to implement an app for the robot. Within our project, we plan to support all eight different training tasks mentioned in Fig. 1. In this paper, we want to focus on task 3 (CANCELLATION), where symbols have to be crossed out with a pencil. In Fig. 2, a part of a document is presented that was used for a training task. For this exercise, a patient has to cross out as many symbols as possible within 1 minute. The provided document contains only a symbolic example. It shows the crossing of symbols in one line and the beginning of the second line. A patient has to start at the upper left corner of a document. Symbols are crossed out from left to right. If a symbol is not crossed well, the therapist asks the patient to repeat the crossing out until it is correct. After finishing the first line a patient has to cross out symbols from right to left in the second line and from left to right again in the following third line (see e.g., second line in Fig. 2). After the time limit of 1 minute for each task there is a short break. This training task has to be repeated another three times. During the training it is not allowed to put the hand on the paper or the table. The arm has to be freely moved in the air. This makes the training task more difficult and is a greater challenge for the brain. Therefore, the chances for good training results increase. Another, easier version of the same training task is related to a document with capital symbols O only. It can be used as a “fall back” option in case patients’ capabilities are too low to cope with the more demanding version.

Assistance App for a Humanoid Robot …

45

Before we discuss the digitalization of the training task, we want to introduce the social humanoid robot Pepper a little bit more.

3 The Humanoid Robot Pepper Pepper is a social humanoid robot from the company SoftBank Robotics [14]. On the website of SoftBank Robotics, Pepper is advertised with the following sentences: “Pepper is a robot designed for people. Built to connect with them, assist them, and share knowledge with them—while helping your business in the process. Friendly and engaging, Pepper creates unique experiences and forms real relationships.” Additionally, it is mentioned: “Pepper brings the digital world and the real world together like never before.” Further, Softbank robotics claims that Pepper has an emotion engine: “Pepper has been optimized for engaging with people through conversation and the touch screen. Pepper is an assistant capable of recognizing faces and basic human emotions to welcome, inform, and entertain people in an innovative way.” Pepper is already used in shopping centers, railway stations, or airports to give support to customers by providing information (Fig. 3). Pepper has a very nice facial expression, looks at focused people, can talk, can understand natural language, and move around. The robot can also blink with eyes and ears. Additionally, Pepper can move its arms and fingers. It has a lot of sensors for analyzing the environment. We assume that a socially interactive robot can be helpful for stroke survivors when performing their individual training and hence for their recreation. However, the feasibility to use a humanoid robot for neurorehabilitation has yet to be analyzed within an appropriate research setting. Our E-BRAiN project has the objective to implement neurorehabilitation training in a digital form using humanoid robot technology. It will show, whether Pepper really can bring together the digital world and the real world in the domain of rehabilitation. Even if this works not for all patients, but for some groups, Pepper would be helpful in the future.

4 Digitalization of the Training Task CANCELLATION The social humanoid robot Pepper has a lot of sensors. However, it is very difficult to analyze the activities of a patient on paper using the inbuilt technology of Pepper. Therefore, it makes sense to use digital devices instead. It was our decision to use tablets. Because Pepper operates with the Android operating system, we decided to use tablets with the same operating systems.

46

P. Forbrig et al.

Fig. 3 Humanoid robot Pepper from SoftBanks Robotics [14]

4.1 Designing the Tablet App The following challenges were identified: • • • •

When is a symbol O correctly crossed out? How does Pepper react on mistakes? How are mistakes visualized on a tablet? How can we detect that the hand is not laid out on the tablet?

At the top of the screenshot of Fig. 4, one can see the remaining time (50 s) and the number of cross outs (8). The first eight cross outs were correct, while the ninth is not correct. A patient has to repeat an incorrect cross out until it is correct. The

Assistance App for a Humanoid Robot …

47

Fig. 4 Screenshot from the tablet application

symbol that has to be targeted next is highlighted, except the try was incorrect. In this case, it is marked red. Below the third line, an arrow is visible. It shows the result of the previous training. It has to be evaluated, whether such information is really supporting the training. Currently, mistakes are only be reported by the tablet application. The robot does not comment on that, Pepper informs only about the result and relates them to previous exercises. This is combined with some motivational sentences that are arbitrary selected. It is planned for the future that personal goals of the patient are collected and the comments are related to the success of these goals. Additionally, the whole context will be analyzed for machine learning aspects of the dialogue. Let us come back to the technical details of the crossing out tablet application. For the crossing out the number of pixels on each side of the line inside the symbol is counted. The minimal percentage of the number of pixels on each side has currently to be one percent. However, this can be changed. From our clinical cooperation partners from neurorehabilitation, we got the information that some white areas have to be visible on both side of the line within a symbol O. This is the rule for therapists to accept the crossing out in the analogue world. In the digital world we count the pixels on both sides of the line within an O. However, for high-resolution displays one pixel is not visible. In this way, one pixel seems to be not enough. Therefore, we calculated a percentage of visible pixel on both sides. If this number is larger than one in both cases, the crossing out is accepted. We started with three percent, but recognized that might be too strict. One percent seems to be appropriate. However, this can be changed if the evaluation of exercises with patients asks for changes. Additionally, extreme cases of crossing out have to be covered by applications as well. Consider the examples in Fig. 5. According to the provided requirements from our experts, both examples should be accepted. Obviously, one can see enough

48

P. Forbrig et al.

Fig. 5 Two examples of extreme crossing out of an O

white space on both sides of the line inside the Os. However, did the experts from medicine really have such examples in mind? Maybe, no patient ever crossed an O in such a way. Nevertheless, we want our program our application as correct as possible. We decided to provide another constraint for a correct crossing out. We specified that the distance between the points where the crossing line crosses the letter has to have a certain distance. Like the minimal percentage of the pixels this value is flexible and the application can be configured accordingly. Evaluations will show what works best. Start point and end-points of lines are only allowed in a certain environment of a symbol. Therefore, it is not possible to cross with a straight line several symbols. Additionally, it is checked, whether in a certain distance of the current symbol some touch points are recognized. In this case, it is assumed that the hand lies on the display of the tablet and an error is reported. It is our opinion that Pepper should not react on mistakes directly because the robot should be related to positive aspects mainly. In this way, patients might be able to establish positive relations to Pepper. Only if the patient has big problems to continue the robot should react. This leads us to the human to Pepper interaction discussion within the following paragraph.

4.2 Designing the Human to Robot Interaction We already mentioned that the training tasks are introduced by therapists to patients. Nevertheless, Pepper should provide an introduction as well. This should be definitely presented for the first training session with the robot and could be optional for the following ones. Currently, a patient is asked whether an introduction is wanted. In the future, this might be based on previous training results. In addition, a video is presented on the tablet of the robot that shows an example of the training task. We have to analyze whether the video should be managed like the introduction or whether it should be shown each time. In the second case, one has to analyze, whether such a video can be distracting for the training task. However, if Pepper recognizes a big problem in task performance of the patient, a hint to look at the video might be useful. Another possibility is the skipping of the current symbol that causes problems (example for the introduced training task “cancellation”). In this case, the robot should tell something like: “There is a problem with the current symbol. I have caused the skipping of the problematic letter. Please continue with the next one.”

Assistance App for a Humanoid Robot …

49

The motivation of the patient is the most important issue. Experience shows that success of the training tasks is demanding to realize for patients. Therefore, they might at times question whether the training will deliver the expected result. Accordingly, Pepper might best show success even when minimal. From the psychologists, in our project team, we got the hint that Pepper should refer to the personal goals of the patient to support the motivation to engage in the training. Examples could be: • We are on a very good track for you to reach your goal to become again skillful with fine motor tasks. • I am sure that only few more training sessions are necessary to reach your goal to become as skillful with fine motor tasks as you used to be. • Today you made a really good job. I am sure we can proceed in the same way. However, there might also be situations when patients need to be pushed a little bit. In this case, there could be a comment like: “Not bad, yet I am sure that you can perform better. Let us try again.” If Pepper recognizes that a patient is exhausted, it would possible that some relaxation or the paying of a game is suggested. Ideas from recommender systems like [8] could be helpful for controlling the interaction. Maybe, the idea of using bio signals [3] is applicable as well. Currently, Pepper presents on his tablet the results of the last eight exercises as a graph. We will have to investigate whether this graph has to be omitted if the results get worse. Motivation sentences and other suggestions like making a might be better in this case. However, this has to be studied for different patient groups. Technically, the communication between Pepper and the tablets is realized using MQTT [13].

5 Summary The paper discussed the idea of an app for a social humanoid robot to assist poststroke patients in performing their training tasks. The idea of the AAT was shortly discussed. A specific training task CANCELLATION was selected to demonstrate the digitalization. It was shown which problems had to be solved and how the tablet application supports the human–robot interaction. It allows a precise analysis of performed actions and a technical communication with a robot based on message exchange. Challenges for further research were outlined. It has to be evaluated whether all patients, certain groups of patients or no patients accept the collaboration with the social humanoid robot Pepper. Techniques from artificial intelligence might improve the interaction dialogue in natural language in the future.

50

P. Forbrig et al.

Acknowledgments This joint research project “E-BRAiN—Evidence-based Robot Assistance in Neurorehabilitation” is supported by the European Social Fund (ESF), reference: ESF/14-BM-A550001/19-A01, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. This work was further supported by the BDH Bundesverband Rehabilitation e.V. (charity for neurodisabilities) by a non-restricted personal grant to TP. The sponsors had no role in the decision to publish or any content of the publication. Additionally, we want to thank the students Lukas Großehagenbrock, Jonas Moesicke, and Joris Thiele for implementing the tablet application for CANCELLATION.

References 1. Ahn, H.S., Choi, J.S., Moon, H., Lim, Y.: Social human-robot interaction of human-care service robots. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction (HRI 2018), pp. 385–386. ACM, New York, NY, USA (2018). https://doi.org/10. 1145/3173386.3173565 2. Benjamin, E.J., Blaha, M.J., Chiuve, S.E., et al.: On behalf of the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Heart disease and stroke statistics— 2017 update: a report from the American Heart Association. Circulation 135(10), e146–e603. https://doi.org/10.1161/cir.0000000000000485 3. Cruz-Sandoval, D., Favela, J., Parra, M., Hernandez, N.: Towards an adaptive conversational robot using biosignals. In: Proceedings of the 7th Mexican Conference on Human-Computer Interaction (MexIHC 2018), Article 1, 6 pages. ACM, New York, NY, USA (2018). https://doi. org/10.1145/3293578.3293595 4. Forbrig, P., Buntea, A.-N.: Modelling the collaboration of a patient and an assisting humanoid robot during training tasks. In: HCII 2020, Copenhagen, Denmark, July 19–24, 2020 (2020) 5. Forbrig, P., Platz, T.: Supporting the arm ability training of stroke patients by a social humanoid robot. In: Accepted for IHIET Conference, Lausanne, Switzerland, pp. 1–6 (2020) 6. Forbrig, P.: Challenges in multi-user interaction with a social humanoid robot pepper. In: Workshop HCI-Engineering, EICS 2019 conference, Valencia, Spain, pp. 10–17 (2019). http:// ceur-ws.org/Vol-2503/ 7. Hallam, J.: Haptic mirror therapy glove: aiding the treatment of a paretic limb after a stroke. In: Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers (UbiComp/ISWC 2015 Adjunct), pp. 459–464. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2800835.2801648 8. Mahmood, T., Ricci, F.: Learning and adaptivity in interactive recommender systems. In: Proceedings of the Ninth International Conference on Electronic Commerce (ICEC 2007), pp. 75–84. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1282100.1282114 9. Platz, T.: Impairment-oriented training (IOT)–scientific concept and evidence-based treatment strategies. Reestor Neurol Neurosci. 22(3–5), 301–315 (2004) 10. Platz, T., Lotze, M.: Arm Ability Training (AAT) promotes dexterity recovery after a stroke-a review of its design, clinical effectiveness, and the neurobiology of the actions. Frontiers in Neurol. 9, 1082 (2018). https://doi.org/10.3389/fneur.2018.01082 11. Platz, T., Denzler, P., Kaden, B., Mauritz, K.-H. (1994). Motor leaning after recovery from hemiparesis. Neuropsychologia 32(10), 1209–1223 (1994). https://doi.org/10.1016/0028-393 2(94)90103-1 12. Platz, T., Prass, K., Denzler, P., Bock, S., Mauritz, K.-H.: Testing a motor performance series and a kinematic motion analysis as measures of performance in high-functioning stroke patients: reliability, validity, and responsiveness to therapeutic intervention 80(3), 270–277 (1999)

Assistance App for a Humanoid Robot …

51

13. Pulver, T.: Hands-On Internet of Things with MQTT: Build connected IoT devices with Arduino and MQ Telemetry Transport (MQTT). Packt Publishing Ltd., Birmingham (2019) 14. SoftBank Robotics. https://www.softbankrobotics.com/corp/robots/. Accessed 11 November 2019

A Novel Cooperative Game for Reinforcing Obesity Awareness Amongst Children in UAE Fatema Alnaqbi, Sarah Alzahmi, Ayesha Alharmoozi, Fatema Alshehhi, Muhammad Talha Zia, Sofia Ouhbi, and Abdelkader Nasreddine Belkacem

Abstract This paper presents a funny cooperative game that makes kids interact with their parents to indirectly educate both of them about the importance of making their own choices of eating unhealthy and healthy food. The game-based learning and cultivation of an informed decision-making approach throughout our proposed game design was utilized to achieve the obesity awareness objectives. This paper describes the design, implementation, and evaluation of our proposed cooperative game “ObeseGo” that was developed to enhance the obesity awareness for kids at an early age and educate them regarding their choices concerning eating a balanced Part of this study was funded by the United Arab Emirates University (SURE Plus grant G00003173). F. Alnaqbi · F. Alshehhi · S. Ouhbi Department of Computer Science and Software Engineering, CIT, United Arab Emirates University, Al Ain, UAE e-mail: [email protected] F. Alshehhi e-mail: [email protected] S. Ouhbi e-mail: [email protected] S. Alzahmi · M. T. Zia · A. N. Belkacem (B) Department of Computer and Network Engineering, CIT, United Arab Emirates University, Al Ain, UAE e-mail: [email protected] S. Alzahmi e-mail: [email protected] M. T. Zia e-mail: [email protected] A. Alharmoozi Department of Information Technology, CIT, United Arab Emirates University, Al Ain, UAE e-mail: [email protected]

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_5

53

54

F. Alnaqbi et al.

meal to be able to adopt a healthy lifestyle. A survey questionnaire was conducted to evaluate our game. The results of the survey allowed us to analyze the impact and usefulness of the proposed game and to plan for improvement in future work.

1 Introduction The terms “overweight” and “obesity” refer to body weight that is greater than what is considered normal or healthy for a certain height. Overweight is generally due to extra body fat that might affect our health. However, overweight may also be due to extra muscle, bone, or water. People are obese usually have too much body fat [3]. Most people who have overweight/obesity issues have higher chances of heart diseases. These issues are now dramatically on the rise in low- and middle-income countries. The rising obesity rate among the kids in the Middle East still requires a lot of attention. Obesity among children is a cause of concern in the UAE. The UAE government aims to reduce the prevalence of obesity among children from 13.17% (as of 2014) to 12% by the year 2021 [2]. Private sectors especially schools have taken initiatives [1] to nip this nuisance in the bud that is crawling-in in the lives of children. The school-going kids in the UAE are the main target group of this research. The kids are not aware of their options in terms of food selection, and they are unable to make informed decision about their choices in food because nobody has educated them in the early stages in their life about the importance of eating healthy food. The best way to educate them is to teach them while they are playing. Children love to play most of the interactive or cooperative video games. There have been a number of approaches being adopted in the past to educate students related to this topic [4, 9], but these approaches have not generated the required outcomes so far [10]. Our approach is different in a way that the kids can easily play the game in their leisure time at home, interacting with their family members (especially their parents) wherever they have access to mobile or computer. Some studies [5, 8] show that the psychological impact of game-based learning is more powerful and has more/equal potential to enhance knowledge than any other form of education. The game ObeseGo avatars are very close to the UAE’s cultural depiction in real life and kids can relate the game with their everyday life. The study [11], carried out at Stanford University, concluded that the representation of oneself over the digital media has a direct impact over a person’s behavior in daily life. When the kids involve themselves in the game environment, she would be representing him/herself while playing the game and when she finds out that his/her choices are having a direct impact in his/her ability to play the game, s/he would be unconsciously very careful in his/her actual life. As per our game’s flow, if the number of the unhealthy items picked by the player is greater than the number of healthy items, the player’s speed would gradually slow down, the points will not increase as expected and his/her happiness indicator would gradually decrease. In the aforementioned study [11], it is shown that if the avatar in the video game feels confident due to certain human behavior or appearance characteristics, the

A Novel Cooperative Game for Reinforcing Obesity Awareness …

55

player in the real world would try to replicate those attributes. This study is referred to as Proteus effect [11], in which the player tries to demonstrate their digital world representation in real life. A number of studies [7, 12] are carried out to find out the effect of computer games on human’s improving skills, behavioral correction, and individual identity characteristics building. The game avatar has all the potential to address the corrective aspect of human behavior, and knowledge and skills building. In ObeseGo, we tried to make use of combined benefits of playing a game with the psychological aspect of human development and demonstration of those characteristics in real life.

2 The Theme of the Game—ObeseGo This paper presents the outcomes of the game-based study conducted in the home environment. The name of the game “ObeseGo” was inspired from our research objective, which is enhancing awareness about overweight/obesity among the children. The main objective of ObeseGo is to develop and test a novel cooperative funny and attractive game, to be played in the comfortable environment of home. Two persons should play together (e.g., a player 1 and player 2 that is called a supporter). The game might be used to enhance children’s knowledge about certain commonly available food items in their home and the bonding between the kids with their parents in a culturally appropriate manner through the video game. This game can be played on a computer or on a laptop. By targeting kids and their parents as a player and supporter, respectively, we are actually trying to strengthen the bond between families. In ObeseGo, the player’s avatar of an Emirati kid, played by the kid, and supporter’s avatar of an Emirati on a wheelchair. The significance of the UAE culture is reflected in the outlook of the game. We have given special importance to both the avatars of the game and also the game floor and background which is discussed in the later section of this paper. In this game, the player finds him/herself in a kitchen with a lot of different kinds of food as shown in Fig. 1. The food items appear randomly and are placed at different locations, which could be healthy or unhealthy depending on the quantity of the food being consumed. At the beginning of the game, the player will be overwhelmed with the lots of different kinds of food items, scattered all over the game floor and appearing more one after another, like a hungry kid, he would try to eat as many food items as possible, which might increase some points but what s/he does not know in the beginning, that there are a lot of twists in the game logic and not every item present will help him/her in winning the game. We have segregated the food items appearing in the game in two major categories: (i) The food items like fruits and vegetables, rich in nutritional value like proteins, vitamins, and minerals are placed in the healthy category; (ii) The food items like processed food including pizza, burgers, fizzy drinks, which are full of carbs, fats, and sugary ingredients, are placed in unhealthy food category [6]. Here, the items

56

F. Alnaqbi et al.

Fig. 1 One scene of the proposed 3D game environment of ObeseGo game

present in unhealthy categories are edible and there will not be any major effect on player’s behavior if being consumed reasonably during the game just to boost or to get a quick increase in energy but there will be consequences, if being consumed unnecessarily or higher than a certain number. In the game, there are five different counters which are running: timer, score, happiness, number of healthy items, and number of unhealthy items. The maximum duration of the game is 80 s. In 80 s, the player needs to hit the score 100, which is the maximum score and only score required to win the game. The player needs to find out at the beginning which food items will help him to win the game or which will not. The supporter, who is playing the game in conjunction with the player, will be responsible for keeping an eye on the activities of the player during the game. The supporter can encourage the player while playing the game through different on-screen messages, if the player is heading in the right direction in terms of healthy eating, happiness, and speed of movement perspective. If the supporter finds out that the player is not picking healthy items of food and his/her speed is slowing down and its happiness factor is not giving a satisfactory number, then the supporter can have a small talk with the player and can instruct him/her about making the right choices, through different on-screen messages/advices as well. Before starting playing the game, the player and supporter are required to fill a survey, the answers are used to create an information database for the evaluation of the game in the end, with the pre- and post-survey questionnaire. The player and supporter can log in to the game through the same computer. The controls for the player and supporter are different keys on the keyboard. In the beginning, the player and the supporter have been provided some basic information related to the game controls. The shared environment/platform will also encourage shared learning. The ObeseGo uses advance and interactive video game technology and is developed on Unity platform. To demonstrate the game floor/stage, which is the most commonplace in every home and where you find a variety of food items, is the kitchen. The game stage is designed very creatively and thoughtfully to replicate the actual environment present in every household.

A Novel Cooperative Game for Reinforcing Obesity Awareness …

57

3 Methods The specific aim of the game is to determine the change produced in the players’ learning about obesity through the game, and their change of perception toward the food selection. The research does not correlate their change in their physical Body Mass Index (BMI) or weight before and after playing this game. The different stages of this research are discussed as follows: • A survey is given for each player to fill out before playing the game. This survey contains questions related to their age, gender, weight, and height and their general perspective toward different food items available on a daily basis. The prior survey contains also some questions related to their daily habits and how often they prefer to play video games in their daily life. • The development of the game “ObeseGo”. This game has the touch of MiddleEastern culture, which has its own uniqueness in terms of parents’ relation with their kids, and the costumes and landscape. Each segment of this game reflects those cultural values. • The game evaluation. The game was presented to the kids to play with their parents. Each player was required to play the game at least three times before moving on to the next stage. • Game assessment is carried out through a post-survey, which has to be filled by each player after playing the game. The survey contains a list of questions that helped to evaluate the knowledge and approach of each player, and his/her understanding of game’s objective. Subjects In this research, the main subjects are the kids (the players), between 5 and 14 years of age, who played with their parents as a supporter, at their home. This duo of kid and parent was supervised by one of the four first authors. The responsibility of this supervision was to set up the game on the laptop or the computer for the kids and parents, and to answer any question related to the controls and concept behind this game. Internet requirement is not necessary. 19 duos have come forward to play the game voluntarily. We are hoping to take this number of at least 50 duos (player and supporter).

4 Game Development The game development team, composed of the first five authors, worked on every aspect of this game from inception to completion. ObeseGo, is developed on the unity platform which is the most popular platform for creating 2D and 3D game development platform. ObeseGo is a 3D game. The design layout creation is divided into 3D game floor/stage (Kitchen), the player and the supporter avatars, the game flow, and the background. In Unity, there are a number of game floors available in its library, which provides a jump start to game content development. The design team

58

F. Alnaqbi et al.

picked the most appropriate game floor related to the concept and started developing game avatars/characters. To develop the game characters, another open-source 3D computer graphics software (a Blender) was used. The charters were developed on Blender and then imported to the Unity platform for better animation results. The plot of the game is the player and supporter find themselves in the kitchen, a bunch of different food items are scattered on the floor of the kitchen. The player is required to pick as many food items as possible within the 80 s time frame. The food items present on the floor are a mixture of high nutritional value and low nutritional value. The player is required to make a wise decision while collecting the food items. There are five counters which will start as the game starts: Time, Happiness, Score, Healthy, Unhealthy. Blender Processes in Player, Supporter, and Background formation The character formation on blender, involved adjusting image preferences, character sketch uploading, face and body modeling, wire modeling of whole character, formation and adjustment of whole armatures/bones. Once the characters were ready, then the ideal body movements were incorporated in the basic design. In our game, two characters were created from scratch, the player and supporter, mainly involving similar development steps. Except the player has an avatar of a young Emirati kid and the supporter has the avatar of an Emirati sitting on a wheelchair. The prominent features in blender that helped to create the game charters are listed below. • Mirroring is to replicate the movement of one half of the character to the another, it imitates the one side to another. • Subdivision helped us to create smoother character surfaces for more realistic avatars. • Inverse kinematics helped us to adjust the character’s legs movement to more realistic movements. • Weight paint mode helped us to connect the armature to mesh and adjusting the zones that each bone effect on the mesh. There are a few more background designs that were also created on the blender which includes the famous buildings in UAE as shown in Fig. 2. Unity Process Unity is the main building platform for our game animation. The unity processes involve are divided into the following: • Game floor/stage (Kitchen), food item (high and low nutritional value items) are added from the unity store. • Game avatars/characters and background building are imported from the blender, movements and other necessary kinematics are incorporated in the avatars through programming in C# (C sharp) programming language. • Animations are added to the food items using C# codes. • Five counters: Time, number of healthy items, number of unhealthy items, happiness, and score are added through scripting in C#. • Comments from the supporter, Game start and Game over messages are displayed using C# coding.

A Novel Cooperative Game for Reinforcing Obesity Awareness …

59

Fig. 2 ObeseGo’s Background and Avatars created in Blender Software

Highlights of Coding in C# • • • • • •

Provided movement to the avatars, Provided Speed to the avatars, Provided animation to the food items, Added Counters for the game, Displayed messages on the screen, and Enabled Multiplayer option in Game mode.

Character controls and Food items Both avatars controls can be controlled using the same keyboard with different keys combinations. For the player, we used arrow keys (up, down, right, and left), and for the supporter, we used number keys (1, 2, 3, and 4). Table 1 shows the food items used. In the game, there are four counters. Each counter represents a definite value or formula mentioned below. On each value increase of Healthy food item count, the speed will increase by the factor 5. On each value increase of unhealthy food item count, the speed will decrease by the factor 5.

60

F. Alnaqbi et al.

Table 1 Food Items segregation in the game ObeseGo Food items Healthy Apple Grape Pineapple Watermelon Pizza Ice cream Fizzy Drinks

• • • • •

Yes Yes Yes Yes No No No

Unhealthy No No No No Yes Yes Yes

Time = 0–80 s Happiness = 0–10 scale (0 is the lowest) Score = 0–100 (0 is the least) Healthy = 0–10 Unhealthy = 0–10

Messages On certain occasions, the supporter will send some messages to the player which are listed below. • • • • • • • • • • • •

“Be careful!” “Unhealthy food causes a lot of diseases!” “Stop eating unhealthy food” “That’s not good for your health!” “Eating too much healthy food also not good for your health” “Keep Going” “You’re eating too much unhealthy food, STOP doing that!” “Stop eating unhealthy food please!!” “Good Job, keep burning calories!!” “Great!!!!” “Keep going and avoid eating unhealthy food” “Not good for your health”

5 Game Evaluation Using Game Survey 5.1 Evaluation Outcomes For evaluating the outcomes of our game, we asked the players questions. Figure 3 shows that the design and contents of our proposed game were relevant to players’ interests and that the game is appropriately challenging which might promote selfcompetition. Most answers were strongly or partially agree.

A Novel Cooperative Game for Reinforcing Obesity Awareness …

61

Fig. 3 Game evaluation using pre- and post-survey questionnaire

5.2 Game-Based Learning Outcomes Figure 4 shows that our game was somehow enhancing knowledge about obesity/overweight issues. All players felt that the proposed game allowed for efficient learning compared with the old learning way.

Fig. 4 Game-based efficient learning evaluation

62

F. Alnaqbi et al.

Fig. 5 Food perception evaluation

5.3 Food Perception Figure 5 shows that our proposed game was able to enhance and change sometimes the perception of the children about some healthy and unhealthy foods. We checked the self-perception of the children to some specific foods before and after playing our proposed video game. The survey results showed that many children corrected their subjective opinion based on the environment of the game.

6 Conclusion In this paper, we designed and implemented a novel 3D game to enhance the children awareness concerned overweight and obesity, especially Emirati children. The results showed that 3D games might create many positive habits in children that may help them to live a healthy life. Although further studies are required, the evaluation results suggested that our proposed game might enhance children’s awareness and ability through the healthy choice of foods. Acknowledgements We would like to thank Dr. Maroua Belghali for bringing the obesity issue in UAE to our attention.

References 1. AlBlooshi, A., Shaban, S., AlTunaiji, M., Fares, N., AlShehhi, L., AlShehhi, H., AlMazrouei, A., Souid, A.K.: Increasing obesity rates in school children in united arab emirates. Obesity Sci. Pract. 2(2), 196–202 (2016) 2. Fadhil, I., Belaila, B.B., Razzak, H.A., et al.: National accountability and response for noncommunicable diseases in the united arab emirates. Int. J. Noncommun. Dis. 4(1), 4 (2019) 3. Friedman, J.M.: Obesity: causes and control of excess body fat. Nature 459(7245), 340 (2009) 4. Granic, I., Lobel, A., Engels, R.C.: The benefits of playing video games. Am. Psychol. 69(1), 66 (2014)

A Novel Cooperative Game for Reinforcing Obesity Awareness …

63

5. Hamari, J., Shernoff, D.J., Rowe, E., Coller, B., Asbell-Clarke, J., Edwards, T.: Challenging games help students learn: An empirical study on engagement, flow and immersion in gamebased learning. Comput. Human Behav. 54, 170–179 (2016) 6. Laguna-Camacho, A., Booth, D.A.: Meals described as healthy or unhealthy match public health education in england. Appetite 87, 283–287 (2015) 7. Peña, J., Hancock, J.T., Merola, N.A.: The priming effects of avatars in virtual settings. Commun. Res. 36(6), 838–856 (2009) 8. Plass, J.L., Homer, B.D., Kinzer, C.K.: Foundations of game-based learning. Educ. Psychol. 50(4), 258–283 (2015) 9. Squire, K., Jenkins, H.: Harnessing the power of games in education. Insight 3(1), 5–33 (2003) 10. Wilson, K.A., Bedwell, W.L., Lazzara, E.H., Salas, E., Burke, C.S., Estock, J.L., Orvis, K.L., Conkey, C.: Relationships between game attributes and learning outcomes: Review and research proposals. Simul. Gaming 40(2), 217–266 (2009) 11. Yee, N., Bailenson, J.: The proteus effect: the effect of transformed self-representation on behavior. Human Commun. Res. 33(3), 271–290 (2007) 12. Yee, N., Bailenson, J.N., Ducheneaut, N.: The proteus effect: implications of transformed digital self-representation on online and offline behavior. Commun. Res. 36(2), 285–312 (2009)

A Survey of Visual Perception Approaches Amal Mbarki

and Mohamed Naouai

Abstract The capacity to recognize perceptual organizations is a conventional human ability that demonstrates a peerless challenge for researchers of computer science. Compared to object recognition and image segmentation, which are considered as the primary interests of the image processing research, visual perception has fewer generic solutions or evaluated approaches. In this paper, we explain the difficulty of the visual perception process and we offer a structured survey of the related approaches. We focus on each approach, along with its advantages and disadvantages. Then we present a comparison and an evaluation of different approaches based on a number of chosen criteria. Finally, we propose an approach to model the visual perception using stochastic geometry. To solve the problem of pixel-based approach and to inject high-level knowledge, we propose a marked point process model.

1 Introduction Visual perception is the process of giving meanings to the sensory data and making interpretations of the stimuli. Humans tend to create groups of objects sharing certain characteristics. They have the ability to interpret effortlessly the arrangement of visual information, thus allowing the inference of perceptual organizations present in a scene. Visual perception is easily accomplished; thereby it belies its difficulty to be integrated into computational systems for a long time. Perceptual organization A. Mbarki (B) · M. Naouai Faculty of Sciences of Tunis, University of Tunis EL MANAR, B.P. No. 94 - Rommana, 1068 Tunis, Tunisia e-mail: [email protected] M. Naouai e-mail: [email protected] Computer Science Laboratory in Algorithmic and Heuristic Programming, University Campus Farhat Hached, Tunis, Tunisia

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_6

65

66

A. Mbarki and M. Naouai

and visual perception are two faces of the same coin, in the sense that the goal of the former is to extract and recognize the latter. Despite its challenging aspect, visual perception has been excessively studied by human vision and computer vision researchers. It is a highly amiable subject for diverse research communities. It presents a general concept that may be adapted for various contexts of studies. Gestalt psychologists, among many other researchers, were interested in studying the human visual perception. They presented an approach to formalize it known as the Gestalt theory [24]. This psycho-visual theory introduced a set of laws (proximity, closure, symmetry, similarity, continuity and figure-ground organization) in order to provide a unified framework for visual perception. Inspired by their work, computer science researchers have tackled this challenging problem in order to build machines having similar performance as the human visual system. This manuscript offers a survey of visual perception approaches, which are spread over a century. We present a literature study of visual perception. We consider research from different fields due to its cross-domain character. We take into account different references that may contribute to a greater understanding of the visual perception process. While it seems impossible to mention all works related to this topic, we will try to cover the major highlights. This paper is organized as follows: Sect. 2 defines the motivation for visual perception. Section 3 presents a structured survey of visual perception approaches. Section 4 contains a discussion of a number of the reviewed papers. Section 5 presents our proposed approach for modeling the visual perception. Section 6 displays the conclusion and final comments.

2 Motivation Visual perception is the process of allowing an image’s elements to be grouped into distinct organizations and objects. It is the act of imposing structural information on sensory data in order to group primitives arising from a common underlying cause. Lying in the middle ground between high-level and low-level processing, visual perception is one of the insufficiently studied subjects in computer science. It is a challenging problem because a “best” grouping interpretation should be chosen among an excessive number of candidates. The number of possible candidates grows exponentially with the number of elements to the group. Furthermore, what makes an interpretation better than another in the ambiguous notion of “goodness of form” or Prägnanz of Gestalt psychology [24]. This challenging criterion is present because of the “chicken and egg” aspect between high-level semantics and low-level features. To overcome noise, low-level processing needs to be directed by high-level knowledge; while the latter relies on the former to reduce the computing complexity. None of them can be enough on its own. In other words, in order to interpret small parts of a scene, it is important to consider its larger context, and there is no way to understand the context without the interpretation of small parts. Visual perception has always been attractive to various research areas. Since the mid to late nineteenth century, it has been a thriving research field. Similar to speech

A Survey of Visual Perception Approaches

67

Table 1 Previous surveys on visual perception Survey Scope Sarkar and Boyer [18] Wagemans et al. [22] Wagemans et al. [23] Jäkel et al. [10]

A classificatory structure for the way of visual perception in image processing The theoretical foundation of Gestalt theory, figure-ground organization Models implementing Gestalt laws, Holism, emergence, the whole primacy, Prägnanz Quantitative approaches in Gestalt perception

recognition, which has become integrated into many broadly available technologies, visual perception will likely be a powerful tool to conceive more intelligent computers. Different surveys have been conducted on this topic. (See Table 1 for the surveys on visual perception). However, despite all these literature reviews, the primary goals of the visual perception along with the relative computational approaches are insufficiently understood. Moreover, giving a clear definition to this phenomenon is not achieved yet.

3 Computational Approaches for Visual Perception A number of appealing approaches have been proposed to compute visual perception, where several models were presented. We consider it as a challenge and as an aspiration to propose an organization for all the varied approaches for the visual perception into one taxonomy. We have organized each paper by its underlying approach. This proposed organization enables the discussion of the evolution of the reviewed techniques. Our taxonomy is composed of the following five categories of approaches for visual perception.

3.1 Non-accidental Approach Visual perception is considered as an unconscious inference based on the likelihood principle of Helmholtz. The Helmholtz principle states that the visual system makes an interpretation of a scene through an unconscious inference process. Having its roots in this principle, the non-accidental model states that elements, with a lower chance of resulting from accidental grouping, are associated with the same perceptual organization. In the same spirit, the “Avoidance of coincidence principle” [17] asserts that a perceptual organization is preferred if it has the lowest coincidence degree. In other words, a scene interpretation is preferred when it is built upon characteristics which are less likely to be unexpected consequences of viewpoint.

68

A. Mbarki and M. Naouai

Furthermore, the non-accidental principle is formulated within a mathematical framework [2, 3], where the non-accidental principle is better known as the “a contrario” theory. This main idea of non-accidentalness claims that perceptual groupings are not found by chance. The lower the probability of chance occurrence, the higher the grouping is perceptually relevant. In other words, perceptual organizations are unlikely to emerge by chance in an aleatory arrangement of parts. The “a contrario” model provides a statistical detection method to handle noise points through setting a number of false alarms. It evaluates the expectation of the existence of an error. The smaller the expectation, the more perceptual the event is. It is an interesting approach since it provides a mechanism to compute saliency of a perceptual grouping in terms of its probability to occur by chance in a random arrangement of primitives. However, it presents a weak point. When we increase the order of a given form, its relative probability of chance occurrence is decreased no matter what is the relation type.

3.2 Structural Information Theory After the Information theory of Shannon, quite a number of approaches were proposed to compute the perceptual organization. Hochberg introduced the Minimum principle where a perceptual grouping is supposed to be obtained from the simplest stimuli description [9]. In his pioneering work, Attneave presented the uncertainties of a perceptual interpretation in a formal way [1]. This gave rise to a representational coding approach. Such an approach is not based on quantifying the information within a message by its probability of occurrence, as Shannon did. However, the quantification was based on the number of parameters useful to define its content. Attneave concentrated on the informational aspect of the perceptual grouping. Furthermore, a more comprehensive model, based on the length of the stimulus description using a fixed coding language, has been proposed [13]. Leeuwenberg proposed Structural Information Theory as a coding model for the classification of visual patterns. In the same period, mathematicians start to rethink information theory. The information theory was integrated as well within a Bayesian framework in the way that it connects the maximization of the Bayesian posterior to the minimization of the data encoding. Rissanen researched the problem of image partitioning using a descriptive language [16]. This language is simplest when it is the shortest. Such an approach is known as the Minimum Description Length (MDL). However, such models suffer from fundamental problems like the ad hoc nature of the adopted coding language (for each image a new language is needed). Furthermore, the degree of simplicity depends entirely on the employed description language. Therefore, the notion of simplicity is relative. As well, the choice of the coding language itself is not guided by the information theory. Also, a major limitation is the assumption that the chosen language encodes perfectly the image.

A Survey of Visual Perception Approaches

69

3.3 Bayesian Approach Perceptual organization is unified in a comprehensive mathematical framework using Bayesian theory. A growing trend considers the human perception as a Bayesian inference and the brain as a Bayesian machine. This theory assumes that modeling all the perceptual beliefs is equivalent to computing a Bayesian Posterior probability. According to this hypothesis, the Prägnanz law is equivalent to the unifying Bayes rule. Bayesian inference is a rational process combining the prior beliefs with the available evidence. Is allows the estimation of the perceptual groups through combining the available data with prior knowledge of the observer. The main idea is to develop a suitable likelihood model of the image structure. Such a model is in the form of p(A|Bi), where A is the data and B1 . . . Bn are candidate organizations. Such a formulation models the conditional probability of the data under hypothetical organizations. The Bayesian theory has been successfully applied to compute a number of perceptual tasks such as shape perception [4, 6] and contour detection [5]. However, a Bayesian formulation is considered as a “pure” computational theory that lacks concrete mechanisms. Thereby, procedures were proposed to compute the Bayesian Posterior; Such as Bayesian network [18], hidden Markov model [14] and a Bayesian mixture estimation [7]. A popular reason to appreciate the Bayesian approach is that it allows expressing the visual perception as an inferring problem. Therefore, considering visual perception as Bayesian inference may provide promising solutions for more challenging aspects. This approach has the benefit of assigning different probabilities to different grouping hypothesis. Moreover, it solves the dilemma of the bias-variance through providing a solution having an optimal complexity given the prior knowledge and the data. Additionally, a Bayesian formulation provides the optimal ways of objects grouping given both the image and the observer assumptions.

3.4 Graph Approach Graph theory was used to perform a perceptual organization. A graph-based method, using the eigenvalue of the matrix which is derived from an image, is presented in [19]. Moreover, an interesting grouping technique derived from the spectral graph theory was introduced by Shi and Malik [21]. Shi and Malik presented a cut over an affinity matrix which is built from a proximal distance and pixel similarities. Such a technique, known as the normalized cut, formulates the visual perception as a problem of graph portioning. The graph’s nodes present the entities to be organized, such as image pixels. The edges relating the nodes translate the strength of two nodes belonging to the same perceptual group. Intuitively, dividing the graph into perceptual organizations is done while maximizing the sum of edges’ weights within each organization. Sarkar [20] proposed a modified version of this technique combined with adjacency graphs. These adjacency graphs are constructed using a small group of Bayesian networks. Additionally, the normalized cut technique combined

70

A. Mbarki and M. Naouai

with prior knowledge is used in [25] in order to perform a perceptual organization. The author made a step toward linking discriminative methods (graph approaches based on local pairwise relationships) to generative methods (Markov Random fields within a Bayesian framework). The normalized cut technique has been successful as a grouping technique. However, an explicit definition of the Prägnanz law is not provided. Rather, an adherence to that principle was presented. As well, the use of graphs requires knowing the number of objects in the image; which is not obvious because the goal is objects extraction.

3.5 Optimization Approach Visual perception literature contains computational approaches that employ minimization and optimization techniques. In [27], a distance function is built in order to weight the possible perceptual grouping. While combining different cues, the author models each cue as a distance function on a pair of regions of the image. Two regions are grouped together if the linear sum of the distance function is the lowest. Simply said, the best grouping is the one with the shortest distance. Similar ideas can be found in [11] where energy minimization techniques are used to model the visual perception. Here, perceptual grouping is formulated as an energy minimization problem that is solved using simulated annealing. The first step is to map the perceptual organization as an energy function in which the minimum energy corresponds to the desired perceptual groups. Then, the second step is to optimize this energy function. This idea is unique because it pays attention to the influence of higher level knowledge. Nevertheless, the practical achievement of this idea is not clear and the computational cost is expensive. Moreover, the simplicity principle was explicitly defined as the minimum of an entropy function which is based on the probability of pairs to be grouped [15]. Hence, simple groups are recognized by minimizing the entropy.

4 Discussion We introduce eight factors for the categorization of the visual perception models (Table 2). – Bottom-up (f1) versus top-down (f2) model: We make a distinction between bottom-up and top-down models. Bottom-up features depend mainly on the characteristic of the image, while top-down concepts are determined by the subjective knowledge and expectations of the observer. – Hierarchical (f3) versus flat ( f¯3) model: for many scientists, visual perception is naturally a hierarchical process. Flat models are nonhierarchical ones.

A Survey of Visual Perception Approaches

71

Table 2 Summary of visual perception approaches. Factors in order are: Bottom-up (f1), topdown (f2) (if mid-level then (f1)+ and (f2)+), hierarchical (+)/nonhierarchical (−) (f3), task-type (f4), synthetic (f5), and natural (f6) stimuli, model type (f7), model evaluation (f8). In the Model type (f7) column: non-accidental model (NA), minimal model (MM), information theory (IT), Bayesian theory (BT), simplicity principle (SP), information theory (IT), structural information theory (SIT), minimum description length (MDL), normalized cut (NC), Bayesian mixture estimation (BME), graph theory (GT), Bayesian network (BN), average cut (AC), optimization (OP) No Model

Year

f1 f2 f3

f4

f5 f6

f7

f8

1

Desolneux and Lionel Moisan [3]

2008

+

–

+

NA-IT

+

+

Shape recognition

Experimental validation

2

Blusseau et al. [2]

2015

+

–

+

NA

+

–

Curve detection

Compare with human attentive performance

3

Hochberg and McAlister [9]

1954

–

+

–

SIT-SP

+

–

Shape understanding

No

4

Attneave [1]

1954

–

+

–

SIT-SP

+

–

Pattern perception

Experimental validation

5

Leeuwenberg [13]

1969

+

–

–

SIT

+

–

Pattern perception

Experimental validation

6

Feldman et al. [6]

2013

+

+

+

BT

+

–

Shape representation

No

7

Erdogan and Jacobs [4]

2017

–

+

–

BT

+

–

Shape recognition

Compare with other models

8

Sarkar and Boyer [18]

1993

+

+

+

BT-PIN

–

+

Grouping

Experimental validation

9

Feldman et al. [7]

2015

+

+

+

BT-BME +

–

Grouping

Qualitative validation

10

Kim and Nevatia [12]

1999

+

+

+

BT

–

+

Building detection

No

11

Mumford [14]

2002

–

+

–

BT

–

+

Grouping

No

12

Geman and Geman [8]

1984

+

–

+

BT

+

–

Image restoration

Experimental validation

13

Shi and Malik [21]

2000

–

+

+

GT-NC

+

+

Segmentation Compare result with other models

14

Sarkar and Soundararajan [20]

2000

+

+

+

GT-ACBN

–

+

Object recognition

Compare with the normalized cut approach

15

Yu and Shi [25]

2004

+

+

–

GT-NC

–

+

Data clustering

No

16

Ommer and Buhmann [15]

2003

+

–

+

PO-BT

–

+

Grouping

Compare to human segmented image

17

Kahn et al. [11]

1990

+

+

–

OP-BT

+

–

Grouping

No

72

A. Mbarki and M. Naouai

– Model type (f4): as we mentioned earlier in Sect. 4, models for visual perception may be considered as a non-accidental model, information theory based model, Bayesian model, graphical model, or an optimization model. – Synthetic (f5) versus natural (f6) stimuli: the visual stimuli can be categorized as natural such as natural scenes and photographs, or synthetic like Gabor patches, virtual environments, and search arrays. – Performed task (f7): the most recognizable tasks of the visual perception are segmentation, Object recognition, alignment detection, and grouping. – Evaluation measures (f8): A common issue in visual perception research is the way of evaluating the proposed approach. Each approach is designed to solve a certain problem and under specific conditions. The chosen method may depend on the objects themselves, the context of the scene, and the presence of noise or not. Therefore, consistent evaluation measures for visual perception approaches are not well elaborated. Different authors evaluated their work by comparing it to previous research.

4.1 Interpretation In this paper, we presented a literature review of the visual perception problem. First, we notice that most of the approaches are bottom-up, thoughts the top-down factors may present an important role in the visual perception.Thereby, visual perception modeling requires more unified approaches using top-down strategies. The interaction of both top-down and bottom-up approaches need to be more elaborated as well. Bottom-up approaches yield decent performance which makes them good heuristics. Nevertheless, top-down ones usually use learning mechanisms to readapt themselves to a specific perceptual task. Second, in the majority of approaches, visual perception is considered as a hierarchical process. Using a hierarchical interpretation, the goal of visual perception is believed to be portioning the image into equal classes. However, the visual system needs a more complex description and higher level relation in order to organize an image. We notice as well that generally, bottom-up models are hierarchical. Moreover, the mainly addressed perceptual tasks are contour integration, shape recognition, and grouping. Interesting models for contour integration are proposed [5]. Furthermore, heavily studied models for shape recognition are presented [4, 6]. As well, the most distinguished computational models of the visual perception are: the structural information theory [1], the minimum description length [26], Bayesian statistics [18]. However, only a few proposals were accompanied by an effort to compare human and machine vision. An interesting exception is Mumford’s work [14]. However, while appreciating the previous efforts, the visual perception issue still lacks some points. It may be integrated into all aspects of computer science, but is still treated in various and divergent ways. Proposed approaches are quite diverse theoretically and experimentally. Although different solutions were proposed, they still miss that quantitative aspect which will facilitate the adoption of the visual percep-

A Survey of Visual Perception Approaches

73

tion by computer vision scientists. A promising step for future work is developing a mathematical model that considers high-level expectations of what perceptual organization should be. Mathematical models are known to be efficient, pertinent, and elegant if they are constructed in the right way.

5 Visual Perception Modeling Using Marked Point Process In this section, we present the general idea of our approach for modeling of the human visual perception. This approach is based on Bayesian inference. We propose a probabilistic object model combined with stochastic algorithms. We use a marked point process to extract perceptual groups: the objects of interest are those forming perceptual groups. A geometrical form is first defined to represent the objects. Then, the proposed marked point process is defined by a density that encloses an a priori knowledge on the scene and a data term. The a priori term describes the interactions between objects of the process and the data term characterizes how well objects are fitted to the image. Later, this object model is sampled by a Reversible Jump MCMC algorithm integrated into a Simulated Annealing scheme to find the optimal configuration. At each step, a transition of the Monte Carlo Markov Chain (MCMC) brings the current configuration of objects to a new state. Such transitions are either rejected or accepted with a probability to be defined. An important aspect of the simulation is defining a set of transitions (proposition kernels) allowing the acceleration of the convergence of the proposed algorithm. The particularity of our approach is to represent objects within an image by a set of geometric shapes interacting with each other. Our main contribution is that we manipulate objects rather than pixels. An object approach allows linking the lowlevel vision features and high-level vision concept. A major advantage is to overcome the pixel approach: spatial interactions and geometric constraints are directly taken into account at the object level. We choose to use the marked point process due to its ability to encode the uncertainty of the perceptual interpretation within a stochastic framework and due to its solid theoretical mathematical foundations. Besides, we do not need to fix the number of objects to work with like in approaches using Markov or Bayesian network, because such a process is an aleatory process which manipulates a set of aleatory variables. Thus, the definition of a coherent statistical model based on mathematical foundations for visual perception is possible and offers new possibilities for its exploitation.

6 Conclusion Since Wertheimer came up with the Phi motion problem [24], a great advancement has been made in the visual perception research. In this manuscript, we discussed approaches for modeling the visual perception, while presenting a proposal for the

74

A. Mbarki and M. Naouai

classification of these approaches. A considerable body of research was reviewed, classified, and compared based on a number of criteria. Various fields have revealed a considerable part of the perceptual organization puzzle. Advancement made especially in the computer science field could help to solve challenging issues and to conceive more sophisticated technological applications. Most reviewed modeling research focuses on using a bottom-up strategy within a hierarchical framework. Research on visual perception has a very decent history. A considerable progress has been made in understanding and modeling the visual perception in a way close to human visual perception. Moreover, integrating prior knowledge and high-level concepts is crucial for modeling visual perception. Thus, we proposed a stochastic model based on the marked point process combining top-down and bottom-up strategies in order to model the visual perception. As future work, we focus on further elaborating on our proposed approach and moving forward in its implementation.

References 1. Attneave, F.: Some informational aspects of visual perception. Psychol. Rev. (1954). https:// doi.org/10.1037/h0054663 2. Blusseau, S., Carboni, A., Maiche, A., Morel, J.M., Grompone von Gioi, R.: Measuring the visual salience of alignments by their non-accidentalness. Vis. Res. 126, 192–206 (2015). https://doi.org/10.1016/j.visres.2015.08.014 3. Desolneux, A., Lionel Moisan, J.M.M.: From gestalt theory to image analysis. Interdiscip. Appl. Math. 34, 285 (2006). https://doi.org/10.1086/425848 4. Erdogan, G., Jacobs, R.A.: Visual shape perception as bayesian inference of 3D object-centered shape representations. Psychol. Rev. (2017). https://doi.org/10.1037/rev0000086 5. Ernst, U.A., Mandon, S., Schinkel-Bielefeld, N., Neitzel, S.D., Kreiter, A.K., Pawelzik, K.R.: Optimality of human contour integration. PLoS Computat. Biol. 8(5) (2012). https://doi.org/ 10.1371/journal.pcbi.1002520 6. Feldman, J., Singh, M., Briscoe, E., Froyen, V., Kim, S., Wilder, J.: An integrated bayesian approach to shape representation and perceptual organization. Shape Percept. Human Comput. Vis., 55–70 (2013). https://doi.org/10.1007/978-1-4471-5195-1 7. Feldman, J., Singh, M., Froyen, V.: Perceptual grouping as Bayesian Mixture estimation. Bayesian Hierarchical Group. 122(4), 575–597 (2015) 8. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(6), 721–741 (1984). https://doi. org/10.1109/TPAMI.1984.4767596 9. Hochberg, J., McAlister, E.: A quantitative approach to figural goodness: erratum. J. Exp. Psychol. (1954). https://doi.org/10.1037/h0049954 10. Jäkel, F., Singh, M., Wichmann, F.A., Herzog, M.H.: An overview of quantitative approaches in Gestalt perception. Vis. Res. (2016). https://doi.org/10.1016/j.visres.2016.06.004 11. Kahn, P., Adam, W., Chee, Y.C.: perceptual grouping as energy minimization. In: 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings (1990) 12. Kim, Z., Nevatia, R.: Uncertain reasoning and learning for feature grouping. Comput. Vis. Image Unders. 76(3), 278–288 (1999). https://doi.org/10.1006/cviu.1999.0803 13. Leeuwenberg, E.L.: Quantitative specification of information in sequential patterns. Psychol. Rev. (1969). https://doi.org/10.1037/h0027285 14. Mumford, D.: Pattern theory: the mathematics of perception. ICM III, 1–21 (2002). arXiv:math/0212400, http://arxiv.org/abs/math/0212400

A Survey of Visual Perception Approaches

75

15. Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. Energy Minimization Methods in Computer Vision and Pattern Recognition (February), pp. 501–516 (2003). https://doi.org/10.1007/978-3-540-45063-4, http://www.springerlink.com/ index/CKCQXJLYDG749FAC.pdf 16. Rissanen, J.: Modeling by shortest data description. Automatica (1978). https://doi.org/10. 1016/0005-1098(78)90005-5 17. Rock, I., Nijhawan, R., Palmer, S., Tudor, L.: Grouping based on phenomenal similarity of achromatic color. Perception (1992). https://doi.org/10.1068/p210779 18. Sarkar, S., Boyer, K.L.: Perceptual organization in computer vision: a review and a proposal for classificator extractor. IEEE Trans. Syst. Man Cybern. 23(2), 382–399 (1993) 19. Sarkar, S., Boyer, K.L.: Quantitative measures of change based on feature organization: eigenvalues and eigenvectors. Comput. Vis. Image Unders. (1998). https://doi.org/10.1006/cviu. 1997.0637 20. Sarkar, S., Soundararajan, P.: Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata. IEEE Trans. Pattern Anal. Mach. Intell. (2000). https://doi.org/10.1109/34.857006 21. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000). https://doi.org/10.1109/34.868688 22. Wagemans, J., Elder, J.H., Kubovy, M., Palmer, S.E., Peterson, M.A., Singh, M., von der Heydt, R.: A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychological Bulletin (2012). https://doi.org/10.1037/a0029333 23. Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J.R., Van der Helm, P.A., Van Leeuwen, C.: A century of Gestalt psychology in visual perception: II conceptual and theoretical foundations. Psychol. Bull. (2012). https://doi.org/10.1037/a0029334 24. Wertheimer, M.: Untersuchungen zur Lehre von der Gestalt. II [Investigations in Gestalt Theory: II. Laws of organization in perceptual forms]. Psychologische Forschung (1923). https:// doi.org/10.1007/BF00410640 25. Yu, S.X., Shi, J.: Segmentation given partial grouping constraints. IEEE Trans. Pattern Anal. Mach. Intell. (2004). https://doi.org/10.1109/TPAMI.2004.1262179 26. Zhu, S.C.: Embedding gestlat laws in Markov Random fields. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1170–1187 (1999). https://doi.org/10.1109/34.809110 27. Zobrist, A.L., Thompson, W.B.: Building a distance function for gestalt grouping. IEEE Trans. Comput. (1975). https://doi.org/10.1109/T-C.1975.224292

Analysis of Long-Term Personal Service Processes Using Dictionary-Based Text Classification Birger Lantow

and Kevin Klaus

Abstract Long-Term Personal Services are characterized by long-term objectives, human interaction and multiple service encounters. Examples would be services in Physiotherapy, Psychotherapy, Family Care or Coaching. In this domain, a lot of process knowledge lies in unstructured text data like reports, internal documentation and logged communication for the coordination of service encounters. The applicability and potential of dictionary-based text classification for the analysis of these service processes are investigated. Results are derived from literature review and a case study. The investigations are part of a larger project focusing on smart process management in social care.

1 Introduction The digital transformation opens more and more domains to data-driven analysis. However, traditional process analysis approaches show limitations when it comes to Personal Service Processes. These are characterized by a large influence of personal interaction, knowledge intensity and a limited set of distinguishable activities [1]. If there are long-term objectives that should be reached based on multiple service encounters we speak of Long-Term Personal Services. Since a lot of knowledge about these service processes lies in unstructured text documentation, there is a high potential of text mining approaches assumed for process analysis. This work is a contribution to uncover this potential. As a starting point, the characteristics of Long-Term Personal Service Processes are described in Sect. 2. Based on that, Sect. 3 shows why traditional process analysis approaches have difficulties in this domain. The prospects of using dictionary-based text classification for the analysis of such processes are discussed in Sect. 4, followed by a case study based evaluation B. Lantow (B) · K. Klaus University of Rostock, 18051 Rostock, Germany e-mail: [email protected] K. Klaus e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_7

77

78

B. Lantow and K. Klaus

in Sect. 5. Finally, Sect. 6 summarizes the findings and provides an outlook to future research directions.

2 Characteristics Long-Term Personal Service Processes The common characteristic of Personal Services is that they are directed at persons instead of things. Already Halmos [2] denoted those services as personal which are concerned with the change of the body or the personality of a client. Bieber and Geiger point out that a great challenge of Personal Services lies in the fact that the recipient of the service is at the same time in the role of a co-producer [3]. They state that Personal Services are characterized particularly by the close, indissoluble connection between persons who have to interact with each other. This interaction as a key element in Personal Services reduces the predictability and increases the uncertainty in the corresponding work processes. Therefore, many Personal Service processes show characteristics of knowledge-intensive processes [3]. Modelling and analysing such services requires approaches different from ‘traditional’ Business Process Management. Approaches like Adaptive Case Management (ACM) address the problem of high variability and high autonomy of actors in knowledge-intensive Processes [4]. A structured process model can be provided only at high abstraction levels. Fließ et al. [5] describe three phases: pre-service, service and post-service. Personal Services are often integrated into a longer term meta-process, which then also has long-term objectives. In this case, we speak of Long-Term Personal Services that involve multiple service encounters. This means that direct service encounters may be intertwined with overarching coordination tasks to achieve the objectives set. The resulting phase model for Long-Term Personal Services has been developed in [1] and can be seen in Fig. 1. Examples would be services in Physiotherapy, Psychotherapy, Family Care or Coaching. Since the nature of Personal Services is the interaction between people, human interaction and relationships play a strong role for service performance and thus should be considered when analysing service processes. This long-term relationship has great importance, for example, in therapy and coaching settings. Looking at the long-term process, context changes are likely to influence the outcome as well. An example would be a changing life situation caused by a new occupation. Another aspect of these processes is that activities in the service phase tend to be described coarsely in process models. While the type of interaction can be specified, there are problems with regard to the definition of the functional aspects of the Fig. 1 Long-Term personal services phase model [1]

Analysis of Long-Term Personal Service Processes …

79

activities. For example, what are pre-conditions, objectives and concrete procedures of a counselling session or a phone call? [6].

3 Limitations of Process Analysis Approaches Generally, process analysis involves the creation of an as-is process model and the measurement of process attributes in order to find possibilities for process improvement. For reasons of brevity and simplicity, we just name higher process efficiency or effectivity, respectively, as improvements. Thus, process analysis requires an appropriate process model in the first place and the possibility to assess efficiency and effectivity as well as the relevant context for the modelled process and its elements. Personal Service Processes are knowledge-intensive. A structured process model can be provided only at high abstraction levels as, for example, in the phase model shown in Fig. 1 or as described by Cano et al. in [7] for clinical situations. More detailed process models which contain measurable and controllable process elements for such processes can be provided by declarative process models that use, for example, CMMN as a notation. Generally, activities and rules that enable or disable the execution of the activities based on process data are used. However, there are several problems in the domain of Long-Term Personal services: (1) limited set of activity types, (2) unstructured process data, (3) lack of process classification. These problems are discussed in the following [6, 8]. Limited Set of Activity Types. Since the main part of Personal Services is human interaction, the definition of activity types can only address the general setting of this interaction. Further attempts to describe the process on a more detailed level fail [9]. Other activity types mainly cover administrative tasks within the other service phases. For example, the development of reference activities for a group of four German family care companies resulted in a list of only 12 activities. Among these, Counselling and General Activity are examples for Service Phase activities. [6] Unstructured Process Data. Relevant information with regard to the process lies in the communication of the involved actors as well as textual documentation related to clients, describing their condition, behaviour and progress with regard to the longterm goals. First, it is difficult to use unstructured data in rules for process description. Second, there is a high variability in these aspects and thus it might not be clear what is relevant to determine the next possible process activities. Additionally, context and relationship status which are also described by unstructured data relevant to the long-term process do not affect only single activities but whole parts of the process. Process variability approaches try to cover such situations. However, so far they rely on structured data (e.g. [8]). Lack of Process Classification. Since the goals of Long-Term Personal Services remain very abstract and there are often several goals formulated in an unstructured way, service providers have difficulties to define and differentiate processes. As a consequence, process modelling sessions [10] result in very abstract process models and process discovery using process mining technologies fails because of the variety

80

B. Lantow and K. Klaus

of execution paths in the event logs. A similar problem has been described by De Weerdt et al. in [10] for incident management processes. Using process mining technologies, discovered models showed little performance with regard to precision and recall. In addition to the problem of process model availability, the analysis of LongTerm Personal Services faces challenges with the measurement of efficiency and effectivity. Business process analysis and optimization approaches often focus on time measurement (e.g. [11] or [12]). Processing times can be used to assess resource utilization. Since the client as a co-producer has a big influence on these and as well on the achievement of the objectives of a certain activity. Activities may end without achieving these objectives or processing times lie out of the control of the service provider. Workshops with practitioners revealed that turnaround times (see [11] for a definition) are also seen critical as performance indicators for the processes. They are barely connected to the quality of service in this domain. Furthermore, if the focus is on a long-term relationship there are not necessarily turnaround times that at all [1]. Furthermore, the achievement of process objectives must be assessed for the measurement of effectiveness. As stated earlier, these objectives are coded in unstructured text documents as well as information with regard to the achievement of them. Again, a measurement seems to be difficult. In order to subsume the discussion, the analysis of Long-Term Personal Service Processes is difficult because of problems with the availability of appropriate process models as well as problems with the measurement and control of process performance. Generally, the application of text mining approaches seems to be promising. A lot of information with regard to process performance, activities and context lies in text documents.

4 Dictionary-Based Topic Mining for Personal Services De Weerdt et al. have shown in [10] that text mining can generally be used in order to improve the quality of discovered process models in process mining. Using communication content of trouble tickets allowed a better classification of incident management processes and thus a better accuracy of mined models. We assume that topic mining for Long-Term Personal Services allows a classification of process instances with regard to objectives and context. Furthermore, general activities can be mapped to topics. Thus, a more detailed distinction of activity types (type of interaction + topic) in process models will be possible. Text documents can be mapped to activities and stages of the long-term process. Thus, the information can be used on both abstraction levels. The following Sect. 4.1 aims at the general applicability and benefits of dictionary based text categorization to identify topics within process documentation. Section 4.2 then describes the process of dictionary creation as it is a main precondition for the application of the approach.

Analysis of Long-Term Personal Service Processes …

81

4.1 Characteristics of the Dictionary-Based Approach Quinn et al. compare in [13] the general approaches for text categorisation with regard to preconditions for their application and analysis costs. If an automated analysis is required, dictionaries, supervised learning and topic modelling (unsupervised learning) can be applied. While dictionaries and supervised learning impose high preparation costs according to Quinn et al., topic modelling requires more effort for the interpretation of the analysis results in comparison. Furthermore, topic modelling tends to find irrelevant topics [14]. Hence, texts are correctly categorized but the automatically (unsupervised) chosen topic categories are not necessarily relevant. Furthermore, topics that are semantically different but show a high correlation of occurence in the text corpus cannot be distinguished using this approach. Comparing dictionaries and supervised learning as approaches that both show a higher relevance of analysis results, dictionaries have some advantages, e.g. [15]: 1. Possibly lower preparation costs because of less data that need to be collected. A dictionary basically contains categories and indicator words for these categories. For supervised learning, a training data set needs to be prepared that contains a large amount of manually categorized texts. 2. Dictionaries can be created already when a small text corpus is available only. 3. Dictionaries can be adapted to the local terminology of an enterprise. 4. Existing dictionaries in different languages and conceptual models can be re-used. 5. Dictionary creation as a community process can improve knowledge sharing and documentation quality. 6. Depending on the context, dictionaries can be automatically enriched. 7. A good understandability and traceability of text categorization results is provided. Disadvantages of dictionaries are a possibly higher rate of false negatives [14] and limited quality control if no manually categorized texts are available as a reference.

4.2 Dictionary Creation In [15], a systematic literature analysis has been performed in order to derive a process model for dictionary-based text categorization. Figure 2 shows the core phase of dictionary creation which consists of seed word generation and dictionary enrichment. There are three approaches for seed word generation which are explained in the following. Manual Seed Words. Taboda et al. [16] justify a completely manual approach to seed word determination with the large influence of seed words on the final accuracy of the process. Thus, seed words should be based on a thorough discussion. Seed words are created based on randomly selected texts by identifying class-specific

82

B. Lantow and K. Klaus

Fig. 2 Process model of dictionary creation [15]

words from them [17]. The first step is to identify-relevant content words and to assign the associated known class values or labels [18, 19]. As mentioned above, the discussion of seed words and categories for a dictionary can also impose a positive influence on documentation quality by the involved actors. Hence, the awareness for the utilization, categorization, purpose and analysis of documentation is increased. Statistical Seed Words. If categorized texts are already available, a lightweight automated discovery of seed word is possible based on word frequencies per category [20]. Re-use of existing resources. If existent, foreign language dictionaries can be translated into the language of documentation [21]. Thus, the effort for seed word generation can be reduced. Furthermore, existing domain-specific resources like data models, vocabularies and ontologies can also be starting points for seed words that assure a high relevancy und help to lower the effort for seed word generation. After generating seed words using one or more of the mentioned approaches, the dictionary should be enriched. This step increases the number of dictionary entries and increases the accuracy of the text categorization. Hence, the number of false negatives is reduced. According to [22], too small dictionaries perform significantly worse. Enrichment can be done by adding synonyms, antonyms (if appropriate), hyponyms or hypernyms. Furthermore, new dictionary entries can be generated by adding prefixes and suffixes to existing entries [19]. However, the addition of pre- and suffixes can change semantics significantly. Thus, semantics need to be controlled during dictionary enrichment.

5 Case Study In the following, the qualitative case study evaluation of the dictionary-based approach for process analysis is described. Investigations are based on the internal documentation of long-term assistance processes performed by a small (24 employees) German youth care company. The company’s staff is required to digitally report its activities on a regular base since the year 2016. The resulting collection of time-stamped short reports has been used as the data source for process analysis.

Analysis of Long-Term Personal Service Processes …

83

Evaluation is based on interviews of company staff members. Section 5.1 describes the research process and its context, while Sect. 5.2 summarizes the results.

5.1 Methodology In conjunction with the company staff, two assistance process instances have been selected for analysis (‘Client A’, ‘Client B’). Selection criteria were the availability for evaluation purposes of actors involved in these process instances, the quality of documentation and the time frame of at least a year that is covered by the documentation. For both process instances, the year 2018 has been analysed. According to the involved actors, ‘Client A’ was characterized by problems with finding a permanent residence right from the beginning and in waves throughout the year. Finding an appropriate education institution was connected with the question of residence for this client. ‘Client B’ was characterized by a critical situation in summer when a new place of residence and a new position as an apprentice was needed. In order to use these statements as a reference for evaluation, the categories ‘place of residence’ and ‘place of education’ have been selected for the case study. For dictionary creation, a manual seed word generation has been performed. This approach has been selected because there were no labelled reports available. Furthermore, this approach can be applied without any preconditions except for the existence of time-stamped digital documentation. The setting for seed word generation was a workshop with ten participants including a brainstorming session of 15 min which resulted in 250 seed words. The ratio of found seed words for ‘place of residence’ to seed words for ‘place of education’ was 2:3. The dictionary has been enriched fetching synonyms from the German OpenThesaurus platform.1 The result was a dictionary with 900 words for ‘place of residence’ and 1800 words for ‘place of education’. Because of the high correlation of both topics, some words have been assigned to both topics. The text categorization has been based on a simple word count. Thus, the number of found signal words for a certain category in the dictionary found in a report is used as an indicator for the probability of belonging to that category. Reports have been aggregated for periods of 2 weeks. In consequence, relevant topics for any 2 weeks in the assistance process could be derived. Quantitative evaluation metrics like precision, recall and F-Measure were not appropriate in the given setting of only two analysed process instances. Furthermore, no experiences are available for the given context, what values of, for example, F-Measure could be considered as good, acceptable and so on. Therefore, a qualitative approach for evaluation has been selected. The two responsible employees for ‘Client A’ and ‘Client B’ have been interviewed separately with regard to perceived understandability, coverage, accuracy and utility of the process analysis results. The following questions were part of the used interview guideline: 1 https://www.openthesaurus.de/about/api.

84

B. Lantow and K. Klaus

• Understandability: Can you interpret the results? Are the signal words that have been found in the reports traceable to the categories? • Coverage: Do the results cover the actual contexts/topics of process execution? Is there something missing? • Accuracy: Does the occurrence of topics/categories over time in the process analysis results fit to the actual development during process execution? • Utility: How would you personally rate the utility of the analysis results? Are you able to gain new insights by the analysis results?

5.2 Results and Evaluation Figure 3 shows the analysis results as a distribution of topics over time. The year has been divided into 26 periods of 2 weeks. For each period, the count of signal words is indicated by the bars in the diagram—left side, red: ‘place of residence’ and right

Client A

Client B

Fig. 3 Topic distribution over time ‘place of residence’ (left/red) and ‘place of education’ (right/blue)

Analysis of Long-Term Personal Service Processes …

85

side, blue: ‘place of education’. Generally, the distribution of the topics over time corresponds to the description of the process instances in the previous section. The interview results are presented in the following. With regard to understandability, both responsible employees agreed independently that the results as shown in Fig. 3 are easy to understand an interpretable. The responsible for ‘Client B’ stated that some of the found signal words cannot be traced very well to the categories. An example was ‘counselling’ as an indicator for the category ‘place of residence’. However, the responsible for ‘Client A’ generally agreed with the matching signal words. Considering coverage, both interviewees agreed that the analysed topics have been well covered. However, the responsible for ‘Client A’ realized that more categories/topics should be part of the analysis because health issues (not analysed) played an important role in 2018. The development of topics/categories is depicted accurate by the analysis results. Nevertheless, the responsible for ‘Client A’ was not sure of the exact developments in 2018. At last, utility was seen positive by both interviewees. New insights could be gained with regard to patterns of topic occurrence as well as correlations between different topics. Furthermore, a common understanding of the process situation among employees could be gained. The interviewees also formulated some issues that should be solved in order to increase utility: (1) The analysis showed facts that are already well-known (‘Client A’), (2) More relevant topics should be analysed (both interviewees) and (3) The quality of documentation needs to be improved in order to increase analysis quality (‘Client B’).

6 Summary and Outlook The investigations proved the general applicability of dictionary-based text classification for process analysis. Advantages compared to traditional process mining approaches and to other text categorization techniques like topic modelling or machine learning have been shown. The performed case study provided a practical evaluation of the approach. Understandability, coverage and accuracy have been perceived positively. The main issues with utility resulted from the limited set of topics/categories that has been used for the case study. ‘Place of residence’ and ‘Place of education’ have been central topics for the interviewees in 2018. Thus, evaluation of accuracy was easily possible. In consequence, however, the gain of additional knowledge about the process instances was limited. Additional relevant topics/categories can be added in future. This will increase utility. The effort of 15-min brainstorming for two topics/categories seems reasonable. Generally, the qualitative nature of our investigations and the small sample set only allows to show the general potential of the approach as well as possible problems without being exhaustive. Further investigations in a larger scale are planned within the project network. Investigations for an improvement of the dictionary creation process (problem of traceability) and for an evaluation of the influence on documentation quality are suggested. A next step could be the creation of flow or state models that show

86

B. Lantow and K. Klaus

the changes of process states as well as more specific activity types based on topics/categories. The application of unsupervised topic modelling approaches should also be evaluated in practice. According to Guo et al. [14], topic modelling has problems with relevancy but has advantages with regard to the discovery of unknown knowledge.

References 1. Lantow, B., Baudis, T., Lambusch, F.: Mining Personal Service Processes. In: International Conference on Business Information Systems (BIS), pp. 61–72. Springer (2009) 2. Halmos, P.: The personal service society. Br. J. Sociol. 18, 13–28 (1967) 3. Bieber, D., Geiger, M.: Personenbezogene Dienstleistungen in komplexen Dienstleistungssystemen: Eine erste Annäherung. Personenbezogene Dienstleistungen im Kontext komplexer Wertschöpfung: Anwendungsfeld „Seltene Krankheiten“, pp. 9–49. Springer VS, Wiesbaden (2014) 4. Motahari-Nezhad, H.R., Swenson, K.D.: Adaptive case management: overview and research challenges. In: 2013 IEEE 15th Conference on Business Informatics, pp. 264–269. IEEE (2013 – 2013) 5. Fließ, S., Dyck, S., Schmelter, M., et al.: Kundenaktivitäten in Dienstleistungsprozessen-die Sicht der Konsumenten. In: Kundenintegration und Leistungslehre, pp 181–204. Springer (2015) 6. Lantow, B., Schmitt, J., Lambusch, F.: Mining personal service processes: the social perspective. In: International Conference on Business Process Management (BPM), pp 317–325. Springer (2019) 7. Cano, I., Alonso, A., Hernandez, C., et al.: An adaptive case management system to support integrated care services: lessons learned from the NEXES project. J. Biomed. Inform. 55, 11–22 (2015). https://doi.org/10.1016/j.jbi.2015.02.011 8. Sandkuhl, K., Koc, H.: On the applicability of concepts from variability modelling in capability modelling: experiences from a case in business process outsourcing. In: Iliadis, L., Papazoglou, M., Pohl, K. (eds.) Advanced Information Systems Engineering Workshops, pp. 65–76. Springer International Publishing, Cham (2014) 9. Herzog, P., Lantow, B., Wichmann, J.: Adaptive case management-creating a case template for social care organizations. Jt. Proc. BIR, 71–83 10. De Weerdt, J., vanden Broucke, S.K.L.M., Vanthienen, J., et al.: Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes. In: 2012 IEEE Congress on Evolutionary Computation, pp. 1–8 (2012) 11. Zur Muehlen, M., Shapiro, R.: Business process analytics. In: Handbook on Business Process Management 2. Springer, pp 243–263 12. Brucker, P.: Scheduling Algorithms, 5th edn. Springer-Verlag GmbH, Berlin Heidelberg (2007) 13. Quinn, K.M., Monroe, B.L., Colaresi, M., et al.: How to analyze political attention with minimal assumptions and costs. Am. J. Polit. Sci. 54(1), 209–228 (2010) 14. Guo, L., Vargo, C.J., Pan, Z., et al.: Big social data analytics in journalism and mass communication: comparing dictionary-based text analysis and unsupervised topic modeling. Journal. Mass Commun. Q. 93(2), 332–359 (2016) 15. Abel, J., Lantow, B.: A methodological framework for dictionary and rule-based text classification. In: IC3K 2019 - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 1 16. Taboada, M., Brooke, J., Tofiloski, M., et al.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

Analysis of Long-Term Personal Service Processes …

87

17. Bidulya, Y., Brunova, E.: Sentiment analysis for bank service quality: a rule-based classifier. In: 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (2016) 18. Al-Twairesh, N., Al-Khalifa, H., AlSalman, A.: AraSenTi: large-scale twitter-specific Arabic sentiment lexicons. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp. 697–705 19. Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: A lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2(1), 22–36 (2011) 20. Kolchyna, O., Souza, T.T.P., Treleaven12, P.C., et al.: Methodology for twitter sentiment analysis. arXiv preprint arXiv:1507.00955 (2015) 21. Baca-Gomez, Y.R., Martinez, A., Rosso, P., et al.: Web service SWePT: a hybrid opinion mining approach. J. Univ. Comput. Sci. 22(5), 671–690 (2016) 22. Abdulla, N.A., Ahmed, N.A., Shehab, M.A., et al.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT), pp. 1–6 (2013)

Toward a Smart Town: Digital Innovation and Transformation Process in a Public Sector Environment Johannes Wichmann, Matthias Wißotzki, and Kurt Sandkuhl

Abstract Not only companies, but also cities and public administrations are increasingly thinking about complementary and new digital business—and corresponding service models due to growing digital networking, smarter services, omnipresent access technologies, and dynamic customer requirements. Thus, the development and implementation of smart city technologies is a unique opportunity for cities of various sizes to make digital innovation potentials more usable for residents, visitors, and local businesses. This paper describes the application of the Digital Innovation and Transformation Process (DITP) in an urban environment, as it collects, analyzes, and evaluates digitalization goals, customer values, and actions of a small city, which focuses on digitalization in northern Germany and combines it with related research work within the area of digital services for smart cities. The investigation aims to analyze and focus the municipal governments’ intention for the digitization of urban space by using the first phase of the DITP. Possible business concepts and best practices were gathered and experiences from the use are used to improve DITP approach.

1 Introduction To face the challenge of customer satisfaction in terms of leisure, entertainment, family life, and cultural activities, cities are in need to modernize their digital service portfolio like ticketing, e-mobility, traffic control, public, and smart services as well as energy and waste management [3, 15]. Thus, new innovative user experience— and public service—formats have to be improved continuously, with technological support [7]. To facilitate the required digital transformation, different methodical and technical approaches from various areas of information systems have to be applied J. Wichmann (B) · M. Wißotzki Wismar University of Applied Sciences, Philipp-Mueller-Str. 14, 23966 Wismar, Germany e-mail: [email protected] J. Wichmann · K. Sandkuhl University of Rostock, Albert-Einstein-Str. 22, 18059 Rostock, Germany © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_8

89

90

J. Wichmann et al.

and suitable new digital-based services and business models are necessary. Some of these, which could be selected and applied to specific use cases, are business model management (BMM), capability management (CM), enterprise modeling (EM), and enterprise architecture management (EAM). To ease this methodical and technical integration, the Digital Innovation and Transformation Process (DITP) was proposed [4]. In this paper, we focus on the application of the DITP to a specific use case in the smart city and public administration sector. It addresses city administrators, business architects, and IT developers for cities that are looking for reference models for introducing smart services to cities as the relevant target groups. Section 2 represents the background for this research by introducing the DITP phases and elements. Section 3 reflects related work concerning smart services for cities. Section 4 analyzes and Sect. 5 evaluates identified digitalization potentials with the help of the corresponding DITP phases. Finally, the summary and outlook including further research approaches are presented in Sect. 6.

2 Research Background 2.1 Related Work Digitalization of cities around the world is becoming increasingly important [15, 17] and also advancing in Germany, both in the area of large- and medium-sized as well as small cities (also called “towns”). The term “smart cities” is closely related to the digitalization topic and characterizes the aim of various stakeholders (e.g., citizens, businesses, government agencies, NGOs) of an urban transformation that addresses the problem areas of urban living, like increasing the life quality and sustainability of cities by technological solutions [21]. As part of the research within the area, several scholars investigated the different requirements of individual stakeholders via qualitative and quantitative analyses and combined them with technology approaches for smart cities. Above all, major cities are in the foreground [15, 16, 18, 19]. In terms of small towns, especially in rural areas, the number of published papers is small. This is criticized by Hosseini et al. [3], who propose a transformation approach to smart services for towns as well as an innovation process to address the needs of smaller cities, as the research goals of the study. Due to structural similarities between the process approaches according to Hosseini et al. [3] and the DITP (see Sect. 2.2), we selected the DITP for our work on smart cities in this paper. Hosseini et al.’s paper is deemed to be an inspiration for this research. The same holds true for the studies by Visvizi and Lytras [20], whose structural approach determines different temporal dimensions for the need of a digital change. Accordingly, the implications of the research, especially the emerging research agenda derived from the temporal dimensions, have been used as a suggestion concerning the urgency of approaches for a digital transformation of services for smart cities.

Toward a Smart Town: Digital Innovation and Transformation …

91

2.2 Digital Innovation and Transformation Process (DITP) The Digital Innovation and Transformation Process (DITP) was developed with the intention to support companies in implementing technology—or innovationbased changes [4]. The DITP distinguishes between three different scenarios: (1) an enterprise forced to make quick changes due to market pressure, (2) a startup company, and (3) an enterprise interested in exploring previously unexploited potential to complement or expand its business activities [4, 12]. The starting activity in DITP is called Analyze & Focus Your Intention and it determines the essential aspects of new digital services: ideas, goals, visions, and strategies. The output of this step must have a clear business purpose. Therefore, the elaboration follows a participative goal through process modeling, that helps to identify and formulate requirements, internal and external factors as well as implementation approaches, among others [11, 21]. In order to ensure that a common, methodical starting point exists, a comprehensively documented business concept is developed. Those business concepts are the basis for creating a business model, which is later operationalized by means of a corporate architecture [21]. If a company is inexperienced in modelling, the components required to design a business model are not always obvious and manageable. For this reason, an alternative version is used during this process, which was adapted to the specifications according to [4, 22]. The goal is to identify the basic intentions for digitization projects. The process will therefore consider four aspects to focus on a company’s intentions: DITP1.1 What, DITP1.2 Who, DITP1.3 How, and DITP1.4 Value. DITP1.1 What intends to develop the basic motivation for a digitization project. Important elements of corporate governance are the vision, the strategy, or the corresponding motivational goals [21, 23, 24]. DITP1.2 Who deals with the relevant customer groups for the aforementioned goals [22]. DIT1.3 How focuses on identifying digital trends, inspirations, and best practices, and helps to determine potential digital approaches that may be relevant for the development of the business concept [22]. Derived from the project and the respective motivation, each digitization project should provide added value for the organization. This fulfillment is verified by the DITP1.4 Value, which may occur to internal (e.g., employee) or external stakeholders (e.g., customers). As an example, the newly created digital functions (Fig. 1), such as improving working conditions, increasing the quality of products, networking, and automating work processes or cost savings, can be realized. Consequently, this could enable the customer to buy products faster, cheaper and, if necessary, equipped with new functions [22].

92

J. Wichmann et al.

Fig. 1 Digital innovation and transformation process, based on [4]

3 Use Case: German Town of Grevesmühlen Starting point for the research is a cooperation concerning digitization efforts between the Wismar University of Applied Sciences and the town of Grevesmühlen, with about 10,500 inhabitants in the northwestern part of Mecklenburg–Vorpommern [5], aiming at the identification and implementation of digitization potentials. In cooperation with the public administration of Grevesmühlen the DITP was started and the first phase of the DITP “Analysis & Focus Your Intention” has been completed so far. Grevesmühlen can be classified as a case exploring the potential of digital transformation (scenario 3 in DITP). In order to obtain appropriate content for this phase, various analyses were evaluated jointly with the parties involved to ensure that an adaption of existing or new approaches is achieved. This determines the essential aspects of new digital services: ideas, goals, visions, and strategies. Altogether, the local businesspeople in Grevesmühlen are always interested and approachable for new ideas, but currently suboptimal prepared. Due to the lack of expertise and the resulting low involvement of retailers in terms of digitization, many potentials are currently unused, especially in the federal state of MecklenburgVorpommern [7]. Thus, an as-is analysis is necessary that generates a to-be vision, which should then be instrumentalized appropriately for the local businesspeople. Currently, only a few digitization measures exit, most of the companies have no significant competences regarding this topic. By contrast, the city administration and municipal enterprises have been pursuing digitization strategies for years. Thus, the Association for Water and Wastewater Treatment has had a GIS for more than a decade in cooperation with the city administration [26]. Furthermore, e-records [8] and e-invoices [9] have been introduced in the public administration. A broadband connection is available to 70% of the households, with a federal-supported

Toward a Smart Town: Digital Innovation and Transformation …

93

expansion closing the gap for all areas by 2020 [27]. In addition, the administration unions, the municipals utilities, and the city administration have their own fiber-optic networks for their own data transmission [6]. The representatives were interviewed and analyzed concerning stakeholders, goals, processes, revenue model, and possible digitization efforts according to [25]. The results were combined with experiences of other cities regarding their digital maturity. The results of the investigation of related cities and research work are presented in the next section. Both analyzes examined the questions: What are the current objectives in the context of digitization for smart cities (What)? How are the customers grouped together, nowadays and in the future (Who)? How is the subject of digitization currently handled (How)? Where have innovative digital approaches been implemented so far (How)? How does the average digital service portfolio of city look like (Value)? The results are summarized in Sect. 5.

4 Analysis of Existing Digitalization Activities in the Use Case In the following, the analysis results of the workshops together with the investigations of related work presented in Sect. 2.1 were combined in order to start the first phase of the DITP called “Analysis & Focus Your Intention” with its respective subsections (DITP1.1, DITP1.2, DITP1.3, and DITP1.4). DITP1.1: Vision, Goals, and Strategies Within workshops with various representatives of the city, different theses for the future of the city were defined: (1) A digital city is defined by measures beyond a fast internet connection; (2) A digital city is a very lively subject with many projects and the need for continuous improvements, unlike the digitization of a single project, such as the town hall; (3) The digital city is defined by us (as people) as we communicate and interact with each other; (4) Grevesmühlen is increasingly digitally connected. Compelling advantages of the analog, such as the interpersonal contact, must be maintained; (5) The digital city is not a luxury but a social necessity; (6) The digital city is an evolutionary process that urgently needs and lives from participation [6]. DITP1.2: Target Group The primary target group is Grevesmühlen’s citizens [6]. The average age of the residents is 48.5 years (with a median at 51.5 years in 2016), the age quotient of the town is very high [1]. From a digitalization perspective, the target group is divided in two sub-groups, the silver surfer, and the digital natives [10]. The tourists are determined as the secondary target group, as they are economically, with about 1,030,000 overnight stays in 2017 [2], very important for the city. The tertiary target group are the people who live in a radius of about 25 km around Grevesmühlen, as they do their shopping there [6].

94

J. Wichmann et al.

DITP1.3: Digital Potentials and Actions This section involves the research of digital actions and digital potentials. Based on the documented aspects of DITP 1.1 and DITP 1.2, the first activity (research of digital actions) allows the investigation of different data sources in order to determine the status quo based on the seven issues of DITP 1.1. (1) As an appropriate infrastructure is a foundation for digitalization approaches, Grevesmühlen pursues a broadband extension, including a free Wi-Fi in the city center. (2) The goal is to achieve comparable standards, especially regarding customer portals. The city administration of Grevesmühlen determines the city as a model for portal solutions. Internal processes are digitized continuously. (3) The central development is a city portal called “Experience Grevesmühlen” [6]. The aim of the portal is to provide information and services from retailing and tourism industry to the target groups. (4) To constantly adapt the portal in terms of customers’ needs, a digitization representative was appointed by the Stadtwerke Grevesmühlen in December 2018 to ensure the currency of the information and services. (5) As the digital city project should succeed over a long term, the involvement of all stakeholders is necessary. Therefore, three workshops have been held to which students, retirees, citizens, business executives, and interested parties have been invited and which had an average attendance of 60 people. (6) For the secondary target group (tourists) a digital tour via a RoadMap through Grevesmühlen is planned. It should contain: the digitization of sights, QR codes for additional information, online bookings for hotels, leisure activities, and restaurants (including online menu cards) as well as online introductory games [6]. A best practice for the latter is the city of Karlsruhe [14]. (7) The analog trading will be digitized. Therefore, the following measures are considered: digital payment systems in retail, delivery services with drones, the marketing of independent and regional products, digital signage measurements, and the development of digital shops (“enter the shop, ask for advices, compare prices online, order and deliver”). (8) Concerning the area of housing and social affairs, social, and residential aspects are important for the citizens. The allocation and control of online appointments as well as the performance of online consultation with doctors and authorities are just as relevant for them as a central allocation of kinder garden and nursing home availabilities. In addition, a regional pharmacy app as well as a progressive digitization of the school, both technically and pedagogically, are planned. (9) Within the area of traffic, the demands are: digital traffic signs, an app for the regional public transportation company “Nahbus,” route optimization for public transportation, a regional car park and a fellow passenger app [6]. DITP1.4: Gratification In order to structure and determine a focus regarding the selection of the determined digital potentials, it is important to clarify what kind of added value the different possibilities can offer for the various stakeholders mentioned in Sect. 2. Therefore, the examined potentials were subjected to a future (To-Be) value analysis [11]. The To-Be analysis aims to represent the issues improving the economic performance of a facility. Therefore, it is derived from digital potentials. The results of this analysis are summarized in Table 1.

Toward a Smart Town: Digital Innovation and Transformation …

95

Table 1 Evaluation results of digital potentials in Grevesmühlen Categories

Digital potentials

Value (generated by:)

Tourism and shopping

City platform

Online product distribution, pre-order, and pick-up service

Digital signage

Navigation

Health services

Ultrasonic indoor positioning

Housing and social affairs

Online appointments and consultation with doctors and authorities Regional pharmacy app Social services

Central allocation of kinder garden and nursing home availabilities Progressive digitization of schools

Traffic

Autonomous driving

Public transportation

Smart parking system

Digital parking guidance system

Personal

Regional car park

Mobile internet

5G Connection

Fellow passenger app Infrastructure

5 Focus the Intention for Smart Towns Derived from the results of the analysis, a participatory workshop [11] was realized. The aim was to evaluate the digital potentials (Table 1) and their priority with the project partners in order to develop possibilities for the next level of DITP (Design Your Business). (1) Health and Social Services: Concerning the health structure, Grevesmühlen has a DRK hospital, several physicians, health insurance companies, and a comprehensive nursing structure in place. One-third of the city’s total economic output is generated by healthcare services [13]. As part of digitization, a cooperation with the DRK hospital called Clinic Navigation (CliNav) is planned. The CliNav projects provide digital mobility solution for people in clinics in order to navigate themselves in buildings to their destination [31]. In addition, the city of Grevesmühlen plans to set up online appointments and consultations for doctors and authorities, which is especially relevant for the silver surfer, who are less mobile. Since the demand for medicine increased, it is planned to realize a network of regional pharmacies via an app for the Grevesmühlen area. The aim of this network is to ensure that a comprehensive supply of essential medicines to the population exists. Furthermore, it should perform better in terms of geographical proximity as well as in the speed and reliability of deliveries as an important competitive advantage as opposed to large online warehouses [6]. Beyond that, the city also considers digital social services. A central allocation of kindergarten and nursing home places is an important

96

J. Wichmann et al.

aspect. In the German comparison, it became apparent that the coverage rate for kindergarten places in Mecklenburg-Vorpommern is very high. Likewise, the demand for those places is above the German average [27]. The same holds true for number of care-dependent people, which is also above the German average. In contrast, to the demand for nursing home places, Mecklenburg-Vorpommern ranks 13th out of the 16 federal states of Germany concerning the number of nursing homes [28]. A central online platform for the allocation of kindergarten and nursing home places in this regard is a way to encounter the mentioned bottlenecks. Another measure is the progressive digitization of schools, both on a technical and pedagogical basis. (2) Autonomous Driving, Smart Parking System and Private Traffic: It is planned to procure an autonomous bus. For this purpose, two fields of application are being considered—one the one hand, the bus should provide a shuttle service between the city and leisure activities and on the other hand, a shopping tour through the two main shopping streets for people with limited mobility from shop door to shop door should be offered [6]. Another service to be displayed via the city portal would be a smart parking system for the target groups (DITP1.2). (3) City Platform and Digital Signage: The combination of digital signage solutions with a semi-digital ordering process is envisaged. The customer should be able to buy the product online and pick it up at the store. Furthermore, the platform should contain the aforementioned information and services for hotels, restaurants, and leisure activities. (4) Mobile Internet: The aim is to obtain a 5G connection for Grevesmühlen. This is, in conjunction with the other formulated requirements, a meaningful addition to the fact that the functionality of many digital achievements is directly dependent on network coverage [29].

Toward a Smart Town: Digital Innovation and Transformation …

97

6 Conclusion and Outlook Regarding the described process, it was determined that the first phase of the DITP called “Analyze & Focus Your Intention” is applicable to the context of smart cities. Concerning the research subject, Grevesmühlen is already pursuing initial approaches to smart services for the city. Important efforts in this term are the Wi-Fi infrastructure for the city center that already exists as well as the 5G connection that is considered for the region by the politics. In view of the basic architecture elements of digitization (BAEoD) [4], these initiatives are particularly important, as the research approach defines such technologies (in terms of infrastructure, data, and information systems) as basic prerequisites for follow-up digitization projects for cities. In addition, the city of Grevesmühlen benefits from the research results in categorizing and prioritizing its digitization efforts. As part of the participative workshops, it became clear that the research results in the first phase of the DITP corresponded to the ideas of the city administration members. Furthermore, the requirements of the stakeholders (e.g. locals, tourists, and retailers) were considered. As a contribution to knowledge, it can be stated theoretically that this article serves to apply the DITP to the context of smart cities and thus represents a supplement to the presented related work. Practitioners benefit from this research by gaining a framework for implementing smart services in cities as well as receiving impressions from other research projects from Sect. 2.1. Furthermore, this paper contains suggestions for developing smart services for cities, with a corresponding prioritization for the requirements of the city of Grevesmühlen. Regarding the target group of this research mentioned in the introduction, it can also be stated that city administrators as well as business architects and IT developers for cities benefit in different ways from this research. This outcome of the research is limited, as just the first phase of the DITP has been implemented so far for Grevesmühlen. As part of further investigations, the second phase of DITP (Design Your Business) will be implemented for the city. In this regard, it is conceivable that additional service requirements could arise for the city of Grevesmühlen as part of further research work. Examples include the energy and waste management measures mentioned in the introduction. In addition, the process was used for the first time in the smart city context. In order to validate the results presented in this research, it is necessary that further studies will be carried out for other cities.

References 1. Demographic Report Grevesmühlen. http://www.wegweiser-kommune.de/kommunale-ber ichte/demographiebericht/grevesmuehlen.pdf. Accessed 27 May 2019 2. Mecklenburg-Vorpommern Office of Statistics, Tourism Report in Mecklenburg-Vorpommern. https://www.laiv-mv.de/static/LAIV/Statistik/Dateien/Publikationen/G%20IV%20Tour ismus%2C%20Gastgewerbe/G%20413/2017/G413%202017%2012.pdf. Accessed 27 May 2019

98

J. Wichmann et al.

3. Hosseini, S., Frank, L., Fridgen, G., Heger, S.: Do not forget about smart towns: how to bring customized digital innovation to rural areas. Bus. Inf. Syst. Eng. 60(3), 243–257 (2018) 4. Wißotzki, M., Sandkuhl, K.: The digital business architect – towards method support for digital innovation and transformation. In: Poels G., Gailly F., Serral Asensio E., Snoeck M. (eds.) The Practice of Enterprise Modeling: Proceedings of the 10th IFIP WG 8.1. Working Conference, PoEM 2017. Lecture Notes in Business Information Processing, vol. 305, pp. 352–362. Springer International Publishing, Cham, Basel (2017) 5. Mecklenburg Vorpommern Office of Statistics. https://www.laiv-mv.de/static/LAIV/Statistik/ Dateien/Publikationen/A%20I%20Bev%C3%B6lkerungsstand/A123/2018/A123%202018% 2022.xls. Accessed 24 Sept 2019 6. Digital City Grevesmühlen. https://www.grevesmuehlen.eu/2019/09/05/grevesm%C3%BCh len-die-digitale-stadt/. Accessed 01 Oct 2019 7. Wittman, G., Listl, C., Stahl, E., Seidenschwarz, H.: Der deutsche Einzelhandel 2017 – erste IHK-ibi-Handelsstudie. https://www.ihk-muenchen.de/ihk/documents/Branchen/Han del/Studie_IHK-ibi-Handelsstudie-2017.pdf. Accessed 16 Oct 2019 8. Federal Ministry for Energy, Infrastructure and Digitalization Mecklenburg-Vorpommern Service Portal. https://www.mv-serviceportal.de/. Accessed 16 Oct 2019 9. eGo MV e-Invoice. https://www.ego-mv.de/projekte-themen/themen/erechnung/. Accessed 16 Oct 2019 10. Fietkiewicz, K.: Jumping the digital divide: how do “silver surfers” and “digital immigrants” use social media? Netw. Knowl. 10(1) (2017) 11. Sandkuhl, K., Stirna, J., Persson, A., Wißotzki, M.: Enterprise Modeling: Tackling Business Challenges with the 4EM Method. Springer, Heidelberg (2014) 12. Wißotzki, M., Wichmann, J.: “Analyze & focus your intention” as the first step for applying the digital innovation and transformation process in zoos. Complex Syst. Inform.Model Q. 20, 89–105 (2019) 13. Mecklenburg-Vorpommern Office of Statistics, Statistical Yearbook 2018. https://www.laivmv.de/static/LAIV/Statistik/Dateien/Publikationen/Statistisches%20Jahrbuch/Z011%202 018%2000.pdf. Accessed 06 Nov 2019 14. City of Karlsruhe, City&Quest. https://www.karlsruhe-erleben.de/blog/aktionen-und-ang ebote/City-Quest-Die-eigene-Stadt-als-Spiel-entdecken. Accessed 06 Nov 2019 15. Mora, L., Deakin, M., Reid, A.: Strategic principles for smart city development: a multiple case study analysis for European best practices. Technol. Forecast. Soc. Chang. 142, 70–97 (2019) 16. Sánchez-Corcuera, R., Nunez-Marcos, A., Sesma-Solance, J. et al.: Smart cities survey: technologies, application domains and challenges for the cities of the future. Int. J. Distrib. Sens. Netw. 15(6) (2019) 17. German Federal Ministry of Interior, Building and Community, Smart Cities: Urban development in the digital age. https://www.bmi.bund.de/EN/topics/building-housing/city-housing/nat ional-urban-development/smart-cities-en/smart-cities-en-artikel.htmlAccessed 07 Nov 2019 18. Yigitcanlar, T., Han, H., Kamruzzaman, M. et al.: The making of smart cities: are Songdo, Masdar, Amsterdam, San Francisco and Brisbane the best we could build. Land Use Policy 88, 104187 (2019) 19. Joss, S., Sengers, F., Schraven, D. et al.: The smart city as global discourse: storylines and critical junctures across 27 cities. J. Urban Technol 26(1), 3–34 (2019) 20. Visvizi, A., Lytras, M.D.: It’s not a fad: smart cities and smart villages research in European and global contexts. Sustainability 10(8), 2727 (2018) 21. Martynov, V.V., Shavaleeva, D.N., Salimova, A.I.: Designing optimal enterprise architecture for digital industry: state and prospects. In: Global Industry Conference (GloSIC) Proceedings, pp. 1–7 (2018) 22. Gassmann, O., Frankenberger, K., Csik, M.: The St. Gallen business model navigator. Working Paper (2013) 23. Rusnjak, A.: Entrepreneurial business modeling. In: Rusnjak, A. (ed.) Entrepreneurial Business Modeling: Definitionen – Vorgehensmodell – Framework – Werkzeuge – Perspektiven, pp. 81– 108. Springer Gabler, Wiesbaden (2014)

Toward a Smart Town: Digital Innovation and Transformation …

99

24. Wirtz, B.W., Pistoia, A., Ullrich, S., Göttel, V.: Business models: origin, development and future research perspectives. Long Range Plan. 49(1), 36–54 (2016) 25. Mayring, P.: Qualitative Inhaltsanalyse. In: Mey, G., Mruck, K. (eds.) Handbuch Qualitative Forschung in der Psychologie, pp. 601–613. VS Verlag für Sozialwissenschaften, Wiesbaden (2010) 26. Zweckverband Grevesmühlen. https://www.zweckverband-gvm.de/page/umweltpolitik/ern euerbare-energien.php. Accessed 17 Nov 2019 27. German Federal Motor Transport Authority. https://www.kba.de/DE/Statistik/Kraftfahrer/Fah rerlaubnisse/Fahrerlaubnisbestand/2019_fe_b_geschlecht_alter_fahrerlaubniskl.html;jsessi onid=311AB367B6C4C182ED5F88ECA85C8F2F.live21302?nn=652036. Accessed 19 Nov 2019 28. German Federal Ministry of Family Affairs, Senior Citizens, Women and Youth. https:// www.bmfsfj.de/blob/113848/bf9083e0e9ad752e9b4996381233b7fa/kindertagesbetreuungkompakt-ausbaustand-und-bedarf-2016-ausgabe-2-data.pdf. Accessed 19 Nov 2019 29. German Employers’ Liability Insurance Association for Health Service and Welfare Care. https://www.bgw-online.de/SharedDocs/Downloads/DE/Medientypen/Wissenschaft-For schung/BGW55-83-110_Trendbericht-Altenpflege_2018_Download.pdf?__blob=publicati onFile. Accessed 19 Nov 2019 30. German Federal Government. https://www.bundesregierung.de/resource/blob/992814/160 5036/61c3db982d81ec0b4698548fd19e52f1/digitalisierung-gestalten-download-bpa-data. pdf?download=1. Accessed 19 Nov 2019 31. Koopango. https://koopango.com/en/. Accessed 24 Jan 2020

Automatic Multi-class Classification of Tiny and Faint Printing Defects Based on Semantic Segmentation Takumi Tsuji and Sumika Arima

Abstract This paper describes an approach for automatic classification of multiclass printing defects based on semantic segmentation models. Classification of current printing defects strongly depends on visual inspection of skilled workers. Therefore, we developed an application that captures the expert’s perception and knowledge directly into the teaching image data, and classify the data automatically using semantic segmentation. We compared U-Net, SegNet, and PSPNet by benchmarking to find the best model for our situation where the number of input images for every defect type is set in the range of 10–120 by applying data augmentation. As the result, we found SegNet is the best model for our tiny and faint images. Finally, we added another grayscale channel to the input layer of SegNet to improve sensitivity to obscurity and show the effect.

1 Introduction This study is aimed at an automated visual inspection based on accurate classification of multiclass of the defects. The classes are defined by industrial knowledge including the cause of a defect besides the visual of the defect [1], and annotation is based on skilled experts’ recognition and knowledge [2]. As a first step, this paper will intend to see the possibility and the limits in an image processing approach when only features of a product image and an annotated pixel-wise class label are used.

T. Tsuji · S. Arima (B) University of Tsukuba, Ibaraki Pref, Tsukuba, Japan e-mail: [email protected] T. Tsuji e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_9

101

102

T. Tsuji and S. Arima

1.1 Target Area of the Research We chose a gravure printing for a target and cooperated with a company. Gravure printing is one major kind of intaglio printing processes, which uses an engraved plate as an image carrier. Each printing unit is composed of an engraved cylinder, an ink pan of a color, an impression roll and an ink scraping blade, and the image on the cylinder is transferred as a film passes between the cylinder and the roll (Fig. 1). The gravure printing can realize very high-speed production (several hundreds of meters per minute), and its application has become wider for various printed matter for industrial use such as wall papers, the paper money, and so on other than wrapping papers of various foods and daily necessities. A defect inspection process [3] of the printing film is usually placed at the end of the printing process since sometimes defects such as ink splashes and linear scratches can occur during printing. The mainstream of the current inspection process is a method in which a person in charge determines whether candidates detected by image subtraction processing are a defect or not through visual inspection. There are two main problems with this method. One is that the relationship with the cause and the defects is unknown since candidates of defects are evaluated only by those size and color differences. The other is that the judgment highly depends on knowledge and experience of each skilled worker and may not be unified. Some approaches including Auto Encoder and Deep Learning has studied to overcome these problems Fig. 1 Conceptual diagram of printing unit

Automatic Multi-class Classification of Tiny and Faint Printing …

103

[3, 4]. However, those tend to require a template image to train model before each printing process, and thus not suited for flexible manufacturing system. We propose a user-friendly data collection tool using tablet device and a classification approach based on semantic segmentation of pixel-wise. This paper is organized as follows. In Sect. 2, we review some related techniques on this topic. Next, the whole system of our approach is described in Sect. 3. Experimental results on 1.5 thousands of defect images acquired at our partner company are presented in Sects. 4 and 5. Finally, we conclude the paper with a brief summary, discussion and future works.

2 Research Subject and Related Works 2.1 Research Subject: Tiny and Faint Defects of Multiclass In the target industry of this study, mainly eight defect types are appeared as follows: 1. 2. 3. 4. 5. 6. 7. 8.

Ink Splash: ink splash from ink pan Dot Skipping: appearance of small unprinted holes Pouring: appearance of excess ink on the film Line Marks: sudden linear mark of undesirable ink Doctor Line: continuous linear mark of undesirable ink Mottling: poor lay of ink on the substrate and print looks non-uniform Swimming: curved lines due to the clogging of the cylinder Fish eye: nylon mass formed during film making.

I. The area of the defect occupies only 0.068% of the extracted image and the contrast of the defect is almost low as the color difference around 15. In addition, the defect types are separately defined as differ if the cause of the defects are different even though these appearances are similar. For example, defect types of#4 and #5, or #2 and #8, and #6 and #7 have similar appearance as shown in Fig. 2. Therefore, the tasks to detect and classify the defects are more difficult than the classification of features of objects in nature (e.g., cat or dog).

2.2 Image Subtraction Image subtraction (or background subtraction) is one of basic methods of object detection which is still widely used in many gravure printing companies, e.g., to detect the candidates of the defects on production sites. In this method, the difference in pixel values for each pixel is calculated by taking pixel-wise subtraction between the target image and the background image (or a frame image that is continuous in time series). A blob that differs more than a certain threshold is detected as an object.

104

T. Tsuji and S. Arima

Fig. 2 Examples of the printing defects (upper left to right: #1–4, bottom left to right: #5–8)

The problem here is that there’s no way to associate the detected object with its defect type, that is referred to quickly prevent defect outbreak. Here, it’s necessary to use another method (i.e., double-check by a person in charge) for classifying objects.

2.3 Early Machine Learning Object detection using machine learning has been studied extensively since around 2002. Variational Autoencoder [5] with only defect-free images is one such technique, and is actually used in the inspection process of machine parts such as screws. Autoencoder is composed with an encoder which compresses input image and extracts features and a decoder which reconstructs images from the extracted features. The encoder–decoder network is unable to reconstruct images with defects if it is trained only with defect-free images as there is no information about defects. Therefore, areas where reconstruction failed are considered as defects. However, this method does not support classification of detected defects, and requires a certain degree of similarity to the training images. Support Vector Machine (SVM) with Histogram of Oriented Gradients (HOG) [6] is another approach which is used to measure traffic volume. This method is to some extent able to understand what the detected object is by capturing the shape of the object based on the HOG feature. Still, it is weak for detecting extremely small objects or objects with a large shape change, and is needed to build as many models as classes for multi-class classification task. In summary, the early approaches using machine learning has limitations in the applicable area and accuracy.

Automatic Multi-class Classification of Tiny and Faint Printing …

105

2.4 Convolutional Neural Network Since the proposal of AlexNet [9], Convolutional Neural Network (CNN) has been widely studied in various fields as a powerful method for image recognition and classification. In the area of defect detection, many applications have been reported such as detection of road breaks [8], inspection of semiconductor surfaces [9]. CNN extracts features using a combination of convolutional layers and pooling layers. Fully connected layer then transforms these features into feature variables and classifies the belonging class in the output layer. The convolutional layer applies many filters to input image so that it can extract feature of detailed texture. Hence, CNN can classify object based on their attributes. The difficulties of applying CNN to real task are that the network requires many labeled images, and the background texture may become noise if the ratio of target object is very small. To overcome these, many approaches including transfer learning and learning with small patches have been proposed. There are many methods that have applied and extended CNN. R-CNN [10] and its derivation is one of them which can detect multiple objects simultaneously from a single image using a combination of region proposal method and CNN. Semantic Segmentation is a task to detect objects in an image by classifying every pixel in it (i.e., tumor detection from medical images, Classification of traffic scenes for autonomous driving). In recent years, the task has been improved rapidly due to the application of CNN. Several models have been proposed such as fully convolutional network [11], U-Net [12], SegNet [13], and PSPNet [14]. They have an encoder–decoder network like Autoencoder in which CNN network works as encoder to extract features, and deconvolution network and skip connections work as decoder to map those features into output image. While these applied methods of CNN are producing excellent results one after another, the problems are that the annotation cost is much higher, and training requires higher computation power. Considering the above described works and the complexity of printing defects, we decide to use semantic Segmentation in our study.

3 Proposed Approach In this section, we describe the proposed approach to annotate and classify printing defects in gravure printing system. First, we collect labeled teacher images using an annotation tool we have developed. Next, semantic segmentation is performed using the collected data to classify defects based on the shape and appearance.

106

T. Tsuji and S. Arima

Fig. 3 User interface of the annotation tool

3.1 Annotation Tool Semantic Segmentation requires the input RGB image and the corresponding segmentation image to train. There are several tools such as labelme [15] to prepare the dataset by ourselves. However, those are cumbersome for experts who are unfamiliar with computers to input data and are inappropriate. For such a practical use, we developed an annotation tool in which users can directly input their impression with a touch pen and a tablet device (Fig. 3). Experts can easily point out the defects in around 30 s. Also, the dataset is stored in the cloud server, so they can be used for learning instantly.

3.2 Semantic Segmentation Models In this study, U-Net, SegNet, and PSPNet are selected as model candidates to segment defects. These are compared by measures of stableness, accuracy, and training speed to choose one of classes. U-Net [12] was originally proposed to segment grayscale of biomedical image. In U-Net, the skip connections are done by simple concatenations of feature maps from each encoding layer and feature maps from the corresponding decoding layer to allow the network to get segmentation maps with localize information. SegNet [13] is another semantic segmentation model in which the skip connections are more efficiently arranged by passing max-pooling indices, that is, the

Automatic Multi-class Classification of Tiny and Faint Printing …

107

position the feature comes from instead of passing whole feature maps like U-Net. PSPNet [14] has another unique architecture in which the localized information is obtained by several size of pooling layer called “pyramid pooling module” where layer with small size can extract smaller and more localized feature, while layer with large size can extract global context. In following experiment, VGG16 was also used to initialize encoder parameters as the transfer learning.

4 Dataset Dataset of the print defects is obtained and stored using the annotation tool in cooperation with a printing company and its representatives of inspection staffs. The original RGB images are obtained from an on-site defect detection machine which based on the image subtraction method. Both source image and its annotation image for a segmentation map are in the same size of 256px × 256px. Due to the specifications of the detection machine, the source images are extracted by cutting out the periphery of the detected defect, so that the background area greatly differs. Furthermore, the ratio of defect is much smaller than the background (Table 1). The dataset includes eight defect types as follows: 1. 2. 3. 4. 5. 6. 7. 8.

Ink Splash: ink splash from ink pan Dot Skipping: appearance of small unprinted holes Pouring: appearance of excess ink on the film Line Marks: sudden linear mark of undesirable ink Doctor Line: continuous linear mark of undesirable ink Mottling: poor lay of ink on the substrate and print looks non-uniform Swimming: curved lines due to the clogging of the cylinder Fish eye: nylon mass formed during film making.

Table 1 Distribution of acquired data

Background Ink splash Dot skipping

# of images

Pixels in the whole dataset

Pixels per image

Percentage per image

1343

75341874

56100

85.601

64

26678

417

0.636

9

561

62

0.095

415

1532972

3694

5.636

Line marks

67

40426

603

0.921

Doctor line

66

136275

2065

3.151

8

14418

1802

2.750

Swimming

406

9248584

22780

34.759

Fish eye

152

6804

44

0.068

Pouring

Mottling

108

T. Tsuji and S. Arima

Since there are some defect types with an insufficient number of images, we use Pouring, Line Mark, Swimming, and Fish Eye.

5 Numerical Experiment 5.1 Experimental Setup As mentioned earlier, the dataset described in Sect. 3 includes some defect types with an insufficient number of images, so only four of them are used for network training and testing; Pouring (3), Line Mark (4), Swimming (7), and Fish Eye (8). In the first experiment, the images were resized to the resolution of 512 × 512 px to match the input resolution of the networks. In the second test, we apply 30 times data augmentation by random cropping of 384 × 384 px images from the original images, vertical and horizontal flip, and add salt and pepper noises. The experiments were implemented in TensorFlow with Keras wrapper, and each network was trained for 50 epochs using RMSprop optimizer with learning rate of 10−3 . We train each model with 100 epochs of training. In order to deal with the imbalance between background and defects, we adopted Focal loss (FL, Eq. 1) [16] as loss function and Dice coefficient (DC, Eq. 2) [17] as evaluation function with modification for multi-class segmentation defined as follows. Here, yc(t) is one-hot labels of the ground-truth class and yˆc(t) is the model prediction. α and γ are the weighting factors, and we set α = 0.25, γ = 2 in our experiment. Also, Tc and Sc are sets of true and predicted values for class c, respectively. FL = −

γ (t) αyc(t) 1 − yˆc(t) yc log yˆc(t) t

(1)

c

|Tc ∩ Sc | DC = 2 c c |Tc ∪ Sc |

(2)

A computational environment used here is Intel Core i5-9600 CPU and NVIDIA RTX-2070 (with 8-GB RAM).

5.2 Results Without Data Augmentation The results of the first experiment are shown in Tables 2 and 3. PSPNet has the best dice coefficient and the fastest computational time among the three models although the dice coefficient (0.83) is not enough. Also, the score of dice coefficient tends to decrease as the number of images used for training increases. That can be caused by

Automatic Multi-class Classification of Tiny and Faint Printing …

109

Table 2 Experimental results (without data augmentation) U-Net\Image per class

10

30

60

90

120

Dice coef (train)

0.75

0.76

0.82

0.81

0.80

Dice coef (validation)

0.79

0.77

0.82

0.80

0.79

Focal loss (train)

91.91

71.09

40.66

49.16

65.39

Focal loss (validation)

161.31

102.57

70.50

67.40

69.30

SegNet\Image per class

10

30

60

90

120

Dice coef (train)

0.75

0.69

0.77

0.77

0.80

Dice coef (validation)

0.76

0.71

0.81

0.78

0.79

Focal loss (train)

104.77

100.44

82.27

61.14

72.64

Focal loss (validation)

176.26

121.41

104.29

67.63

93.19

PSPNet\Image per class

10

30

60

90

120

Dice coef (train)

0.83

0.83

0.79

0.74

0.75

Dice coef (validation)

0.82

0.83

0.72

0.76

0.77

Focal loss (train)

11.48

13.05

18.37

23.80

24.32

Focal loss (validation)

47.01

39.99

29.47

26.60

27.96

Table 3 Computational time for learning phase [seconds] (without data augmentation) Model\Images per class

10

30

60

90

120

U-Net

650

850

1250

1600

2700

SegNet

550

950

1650

2000

2800

PSPNet

450

750

1050

1550

2300

the increase of the number of images with tiny defects like Fisheye and Line Marks. Since PSPNet uses the pyramid pooling module after several section of convolution process, it is considered that extraction failed if the features were lost during the process. U-Net and SegNet have higher loss value than PSPNet. The performance of U-Net is superior to that of SegNet, and it becomes the best when the number of images was 60. Also, it seems that the learning process is insufficient for the two models with 10 or 30 input images per class. With Data Augmentation The results of the second experiment are shown in Tables 4 and 5. The best model is SegNet with dice coefficient of 0.99. Unlike the first experiment, you can see that all models achieve high accuracy. It seems the number of original images should be more than 30 since the loss is too high when the number is 10. On the other hand, in the case of 120 original images, every model takes about 10 h for learning, which is not practical. PSPNet have much lower loss value and shorter

110

T. Tsuji and S. Arima

Table 4 Experimental results (with data augmentation) U-Net\Image per class Dice coef (train) Dice coef (validation) Focal loss (train) Focal loss (validation)

10

30

60

90

120

0.92

0.96

0.98

0.96

0.97 0.92

0.84

0.86

0.92

0.90

13.29

4.55

2.17

4.45

3.77

111.65

96.03

80.49

43.82

76.02

SegNet\Image per class Dice coef (train)

0.92

0.92

0.99

0.99

0.98

Dice coef (validation)

0.86

0.85

0.92

0.94

0.93

Focal loss (train) Focal loss (validation)

10.14

10.09

1.37

1.17

2.36

100.78

116.22

50.16

48.36

46.75

PSPNet\Image per class Dice coef (train)

0.84

0.98

0.98

0.98

0.98

Dice coef (validation)

0.75

0.93

0.95

0.94

0.94

Focal loss (train)

10.90

1.03

0.93

0.81

0.94

Focal loss (validation)

35.89

15.14

8.86

11.78

16.14

Table 5 Computational time for learning phase [seconds] (with data augmentation) Model\Images per class

10

30

60

90

120

U-Net

7600

9000

16350

24486

35025

SegNet

7400

8500

17200

23222

35249

PSPNet

2750

6750

15937

19048

31604

computational time as the first experiment. However, the advantage in processing time decreases as the number of images increases. Results for each class In Table 4, we show the dice score for each class on the test dataset (20 images per class) using models with data augmentation (90 original images). Although the overall score of each model was above 0.98, none of the classes achieved a score above 0.9. There seems to be two reasons for this. One is the fact that about 85% of each image is background. Therefore, the model is more likely to classify pixels as background. The other is that many test images have low contrast between defects and background, which makes the models difficult to spot out the targets. Comparison between models shows that SegNet seems the best model. Therefore, we adopted it for the base model of our approach (Table 6).

Automatic Multi-class Classification of Tiny and Faint Printing …

111

Table 6 Dice score for each defect class Pouring

Line mark

Swimming

Fish eye

U-Net

0.60

0.40

0.88

0.59

SegNet

0.69

0.37

0.87

0.63

PSPNet

0.66

0.10

0.86

0.01

Table 7 Dice score for each defect class (with grayscale channel) Dice coef (train/val)

Focal loss (train/val)

Pouring

Line mark

Swimming

Fish Eye

Proposed

0.92/0.89

10.63/18.51

0.72

0.33

0.90

0.66

SegNet

0.99/0.94

1.17/48.36

0.69

0.37

0.87

0.63

6 Improvement by the Grayscale Channel To respond to the low contrast problem, we modified the SegNet model to input a grayscale channel with enhanced color differences in addition to the regular RGB channels. The grayscale is computed following function (3) with parameter α = 4, β = −450, γ = 3. Since pixel value is limited in [0, 255], we clipped the value of (3). We evaluated the modified model using the same dataset as Sect. 4 (90 images per class, augmented 30 times). Table 4 shows the overall dice coefficient and focal loss in both train and validation data. The dice score is lower than that in Table 5. It seems that the non-excluded noise component generated in the enhancement process deteriorated the detection performance. Table 7 shows the dice score for each class with the modified model. Compared to the score in Table 6, you can see the slight improvement on Pouring, Swimming, and Fisheye although the score of Line Mark dropped (note: The drop can be caused by the noise factor since we can see improvement even in Line Mark (Fig. 5)). GRvalue = Clip(α ∗ (RGBvalue)γ + β, [0, 255])

(3)

7 Conclusions This paper described an approach for automatic classification of multi-class printing defects based on semantic pixel-wise segmentation models, because of those tininess. We developed an application that captures the inspector’s perception and knowledge directly into the teaching data. Also we evaluated the performance of U-Net, SegNet, and PSPNet by benchmarking where the number of input images for every defect type is set in the range of 10–120 with and without data augmentation. As the result, it is found that at least 30 images per defect type are required to obtain high accurate

112

T. Tsuji and S. Arima

Fig. 4 Examples of the printing defects

(a)Input

(b)Teacher

(c)SegNet

(d)Enhanced grayscale

(e)Proposed

Fig. 5 Output example for low-contrast image

model. The best model of the three in this study is PSPNet if there are more than 60 images per defect type, though SegNet is the best if only 30 images per the type. Finally, we deal with low contrast problem in our dataset by adding a grayscale input channel to SegNet and show the improvement. For future task, we need to deal with the class imbalance between defects and background. Also, the models should be evaluated for various different product examples because we did show here is only for images cropped from the printing films of one product design. In addition, we build up a system to retrain our model during actual use to refine and enable to find more subtle defects and defect types we didn’t obtain much. All contents of this manuscript are included in Patent pending (JP-2020-038172). Acknowledgments This research is supported by Cross-ministerial Strategic Innovation Promotion Program (SIP), “Big-data and AI-enabled Cyberspace Technologies” (Funding Agency: NEDO). We appreciate the support. The authors also appreciate all reviewers’ constructive comments.

Automatic Multi-class Classification of Tiny and Faint Printing …

113

References 1. Ministry of Economy, Trade and Industry of Japan.: White Paper on Manufacturing Industries (Monodzukuri) (2019). (English version) https://www.meti.go.jp/english/press/2019/ 0611_001.html 2019/1/22 2. Chugoku Industrial Innovation Center.: An investigation into possibility to promoting the automation of inspection process in the manufacturing company (2016) 3. Gollisch, T., Meister, M.: Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65(2), 150–164 (2010) 4. Teppei, T.: POODL–Image recognition cloud plat form for printing factory, https://www.sli deshare.net/TeppeiTamaki/poodl-a-image-recognition-cloud-platform-for-every-printing-fac tory. Accessed 22 Jan 2019 5. Shinichi, H., Takeshi, U., Toshinori, M., Nobuyuki, I.: Image recognition AI to promote the automation of visual inspections. Fujitsu 69(4), 42–48 (2018) 6. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013) 7. Llorca, D.F., Arroyo, R., Sotelo, M.A.: Vehicle logo recognition in traffic images using HOG features and SVM. In: 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pp. 2229–2234. IEEE, Hague (2013) 8. Krizhevsky, A., Sutskever, I., & Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. Neural Information Processing Systems Conference (NIPS), Nevada (2012) 9. Cha, Y.J., Choi, W., Büyüköztürk, O.: Deep learning-based crack damage detection using convolutional neural networks. Comput. Aided Civ. Infrastruct. Eng. 32(5), 361–378 (2017) 10. Imoto, K., Nakai, T., Ike, T., Haruki, K., Sato, Y.: A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 32(4), 455–459 (2019) 11. Girshick, R., Donahue, J., Darrell, T., & Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587. IEEE, Ohio (2014) 12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp. 3431–3440. IEEE, Massachusetts (2015) 13. Olaf, R., Philipp, F., Thomas, B.: U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS, vol. 9351, pp. 234–241. Springer, Munich (2015) 14. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481– 2495 (2017) 15. Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. IEEE, Honolulu (2017) 16. Image Polygonal Annotation with Python. https://github.com/wkentaro/labelme. Accessed 01 Jan 2019 17. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. IEEE, Honolulu (2017) 18. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)

A Novel Hand Gesture Recognition Method Based on Illumination Compensation and Grayscale Adjustment Dan Liang, Xiaocheng Wu, Junshen Chen, and Rossitza Setchi

Abstract Gesture recognition is a challenging research problem in human–machine systems. Uneven illumination and background noise significantly contribute to this challenge by affecting the accuracy of hand gesture recognition algorithms. To address this challenge, this paper proposes a novel gesture recognition method based on illumination compensation and grayscale adjustment, which can significantly improve gesture recognition in uneven and backlighting conditions. The novelty of the method is in the new illumination compensation algorithm based on luminance adjustment and Gamma correction, which can reduce the luminance value in the overlit image region and enhance the area with low illumination intensity. The grayscale adjustment is used to detect the skin color and hand area accurately. The binary image of the hand gesture is extracted through iterative threshold segmentation, image dilation, and erosion process. Five gesture features including area, roundness, finger peak number, hole number, and average angle are used to recognize the input gesture. The experimental results show that the proposed method can reduce the influence of uneven illumination and effectively recognize the hand gestures. This method can be used in applications involving human–machine interactions conducted in poor lighting conditions.

1 Introduction As a common way of interaction in daily life, gesture communication has the advantages of simplicity, good visualization, and convenience. With the popularity of cameras and other image acquisition devices, vision-based gesture recognition D. Liang (B) · X. Wu Faculty of Mechanical Engineering and Mechanics, Ningbo University, Ningbo 315211, China e-mail: [email protected] J. Chen · R. Setchi Research Centre in AI, Robotics and Human-Machine Systems, School of Engineering, Cardiff University, Cardiff CF24 3AA, UK © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_10

115

116

D. Liang et al.

methods have attracted much attention [1] in various areas such as real-time video conferencing, intelligent device control, and human–robot interaction. The vision-based gesture recognition includes static gesture recognition and dynamic gesture recognition [2]. Static gesture recognition can be used to identify the hand shape such as clenched fist and scissorhands rapidly for human–machine interaction applications. This paper focuses on the static gesture recognition technology, which utilizes a simple RGB camera to realize the rapid recognition of hand gestures for human–computer interaction. The gesture recognition process is usually influenced by the lighting and background environment, especially the illumination intensity, which has a distinct effect on the accuracy of gesture segmentation and recognition. During the image acquisition process, the mutual shielding between objects and variable environmental lighting conditions often leads to an uneven scene illumination. The luminance in bright areas of an image is sometimes too strong, while the dark areas are always insufficient. As a result, some important details are not highlighted or even covered up, which seriously affects the image clarity and recognition process. Therefore, it is of great significance to study the illumination-correction based gesture recognition method under uneven lighting condition. Various gesture recognition methods based on skin color extraction have got presented in order to reduce the illumination effect [3]. For example, a framework for real-time hand gesture recognition in uncontrolled environments is proposed, which firstly constructs a face skin color model to inhibit the luminance factor, and then utilizes threshold segmentation to extract the gesture area [4]. A skin color based gesture detection method is presented which utilizes an improved Kalman Filter to prejudge the hand position, and then gets the gesture region by a TSL(Tint, Saturation, and Lightness) skin color model [5]. Besides, optical flow information is also combined with skin color to extract the hand contour under varying illumination and complicated background [6]. The abovementioned methods utilize skin color to detect the hand region, through which the influence caused by illumination variation can be reduced. However, it is still rather difficult to effectively extract the gesture region under uneven and backlighting condition. Using depth data is another way to inhibit the illumination effect during hand gesture recognition. By combining depth data and infrared laser speckle pattern images captured from a Kinect device [7], the static hand gesture can be recognized accurately. Finger skeleton and hand texture are also combined with depth information to extract the gesture area [8, 9]. Using the depth information can effectively eliminate the lighting effects, but additional depth acquisition devices are needed, which increases the system complexity, cost, and inconvenience. This paper proposes a novel gesture recognition method based on illumination compensation and grayscale adjustment to facilitate the static gesture recognition under uneven and backlighting conditions with a simple RGB camera. The main contributions are as follows. Firstly, an illumination compensation algorithm is proposed by integrating the luminance adjustment with Gamma correction, which helps to reduce the luminance value in overlit region and enhance the image area with low illumination intensity. Then, a grayscale adjustment algorithm is designed to detect the skin color and hand area accurately. Furthermore, five hand features are

A Novel Hand Gesture Recognition Method Based on Illumination …

117

Fig. 1 The proposed gesture recognition method

constructed to recognize the input gestures commonly used for communication. In the following sections, the specific procedures of image illumination compensation and gesture recognition are presented in detail, as well as the recognition experiments and analyses.

2 Method 2.1 Concept When interacting with gestures, the illumination of the external environment is often uneven, such as backlighting and dark illumination, making some key areas seem dim and indistinct, which further affects the subsequent gesture segmentation and detection. This paper proposes a novel gesture recognition method based on illumination compensation and grayscale adjustment to facilitate the static gesture recognition under uneven and backlighting conditions. The proposed method consists of four steps (Fig. 1): illumination compensation based on luminance adjustment and Gamma correction, grayscale adjustment, threshold segmentation of gesture image, gesture feature computing and recognition.

2.2 Illumination Compensation Based on Luminance Adjustment and Gamma Correction The flowchart of the proposed illumination compensation algorithm is shown in Fig. 2, including luminance adjustment, Gamma correction, and image fusion. Luminance Adjustment. Firstly, the input image I(x,y) is converted from RGB space to YUV space to extract the Y component Bw (x,y).

118

D. Liang et al.

Fig. 2 The proposed illumination compensation algorithm

Bw (x, y) = 0.299 × I R (x, y) + 0.587 × IG (x, y) + 0.114 × I B (x, y).

(1)

Then, adjust the luminance value. Let Bw as the log-average luminance,

1 Bw = exp log(α + Bw (x, y)) , m × n x,y

(2)

where m and n are the row and column of the image. The α is an adjustment coefficient used to avoid the singularity of black pixels in image, which is set to 0.001. The image global luminance Bg (x,y) can be expressed as Bg (x, y) =

log(Bw (x, y)/Bw + 1) log(Bwmax /Bw + 1)

,

(3)

where Bwmax is the maximum luminance of the image after YUV conversion. Let the normalized luminance Bu (x,y) be Bu (x, y) = Bg (x, y)/Bw (x, y),

(4)

and let Bu (x, y) = 0, when Bw (x,y) equals 0. Then, the RGB image I O (x,y) after luminance adjustment can be got by ⎧ ⎪ ⎨ I O R (x, y) = Bu (x, y) × I R (x, y), I OG (x, y) = Bu (x, y) × IG (x, y), ⎪ ⎩ I O B (x, y) = Bu (x, y) × I B (x, y),

(5)

A Novel Hand Gesture Recognition Method Based on Illumination …

119

where I OR (x,y), I OG (x,y), I OB (x,y) are the R, G, B value of image I O (x,y). After the above image processing, the luminance of some shaded areas can get enhanced, but sometimes the image may get over-adjusted. Gamma correction is used in the following step to further improve the image compensation effect. Gamma Correction. The input image I(x,y) is converted from RGB to HSV color space to obtain the I OH (x,y), and I OV (x,y) is the corresponding V value. Let the multi-scale Gaussian function F(x,y) satisfy F(x, y) = λ exp(−

x 2 + y2 ), c2

(6)

where c is the scale factor, λ is the normalized constant. By convolving the Gaussian function with the original image, the estimated luminance value Bpre (x,y) can be obtained by Bpre (x, y) = I O V (x, y)F(x, y).

(7)

The scale factor c of the Gaussian function determines the convolution kernel scope. A larger value of c means a larger scope of the convolution kernel, and better global characteristics of the extracted illumination value. In order to balance both the global and local characteristics of the extracted illumination, Gaussian functions with different scales are used to extract each illumination component. Then, the illumination components are summed up with different weights, in order to obtain the estimated luminance value Bpre (x,y). Let Bpre (x, y) =

3

wn [I O V (x, y)Fn (x, y)],

(8)

n=1

where wn is the weight coefficient of the Gaussian function of nth scale. Let n = 1, 2, and 3, to extract the illuminance components under multi-scale Gaussian functions. The scale factor c is set to be 15, 65, and 245, respectively, and the weight factor ω1 = ω2 = ω3 = 1/3. After extracting the illumination component, a correction function can be constructed based on the illumination distribution characteristics to reduce the intensity of the overlit area and compensate for the low illumination area. Construct a two-dimensional Gamma function 1 m−B pr e (x,y) γ =( ) m , 2

(9)

where m is the average luminance value of Bpre (x,y). The luminance Bgama (x,y) after Gamma correction can be got by Bgama (x, y) = 255(

I O V (x, y) γ ) . 255

(10)

120

D. Liang et al.

The obtained Bgama (x,y) is taken as the V channel value of the image in HSV space. The image is then transferred from HSV back to RGB to obtain I G (x,y). Image Fusion. For the obtained image I O (x,y) and I G (x,y) after correction, weighted fusion is performed, and a new output image I N (x,y) can be got by I N (x, y) = j × I O (x, y) + k × IG (x, y),

(11)

where j and k are the weights used for image integration, and j = 0.2, k = 0.8 after a series of optimization.

2.3 Grayscale Adjustment Gray Transformation. Considering the extraction of skin color of gestures, the image I N (x,y) is transformed from RGB into YCb Cr to get I Y (x,y). The three channels Y, C b, and C r of I Y (x,y) can be got by ⎧ ⎨ Y (x, y) = 0.299 R(x, y) + 0.587 G(x, y) + 0.114 B(x, y), C (x, y) = − 0.1687 R(x, y) − 0.3313 G (x, y) + 0.5 B(x, y) + 128, ⎩ b Cr (x, y) = 0.5 R(x, y) − 0.4187 G(x, y) − 0.0813 B(x, y) + 128. (12) where R(x,y), G(x,y), and B(x,y) are the R, G, and B values of I N (x,y), separately. Let C t (x,y) = g × C r (x,y) + h × C b (x,y), g and h are adjustable parameters relative to specific skin color, which lie between zero and one. Then, choose C t (x,y) to generate a grayscale image I ct (x,y). Grayscale Enhancement and Integration. The gray image I ct (x,y) is enhanced by the following process: (1) Let the enhanced gray value C ab (x,y) satisfy ⎧ ⎨ Cab (x, y) = a, i f Cr (x, y) ≤ a C (x, y) = b, i f Cr (x, y) ≥ b ⎩ ab Cab (x, y) = 255/(b − a) × (Cr (x, y) − a), i f a < Cr (x, y) < b

(13)

where a and b are the optimal threshold values, respectively. Make sure that the number of pixels with lower gray value (Cr(x,y)< a) and larger gray value (Cr(x,y)> b) take up 1% of the total image separately. (2) Process the image I ct (x,y) using log transformation to obtain the new image I L (x,y), which can be expressed by I L (x, y) = log(1 + v × Ict (x, y))/(log(v + 1));

(14)

A Novel Hand Gesture Recognition Method Based on Illumination …

121

where v is the adjustment coefficient, and 1 < v < 2. (3) The enhanced images are then integrated to obtain the final grayscale image IF (x,y) by I F (x, y) = H1 Cab (x, y) + H2 I L (x, y),

(15)

where H1 and H2 are the integration coefficients. The initial H1 = 0.3, H2 = 0.7 after repeated experiments.

2.4 Threshold Segmentation of the Hand Gesture In this step, threshold segmentation is used to extract the gesture area in I F (x,y). (1) Let the initial threshold T = 0.5 × (V min + V max ), where V min and V max are the minimum and maximum grayscale, respectively. Then, calculate the final T by iterative comparing and adjusting. (2) The gray image I F (x,y) is then segmented using threshold T to generate a binary image I B(x,y) . Sequential open and closed operations are executed to eliminate the void region and retain the maximum connected region, in order to extract the final gesture region.

2.5 Gesture Feature Computing and Recognition The binary image I B(x,y) of the gesture region needs to get further processed to calculate the gesture features. In this paper, the proposed gesture recognition method is used for uneven and backlighting situation, and the distance between the hand and the camera is kept fixed. Five gesture features are calculated, including region area a, finger peak number p, area roundness r, average angle g, and area hole number k. Area a is defined as the number of pixels in the gesture area of the binary image. For a maximum point I m (x,y), if the point satisfies dis(y) > m × ycen , or dis(x) > n × x cen , then consider the point as a finger peak. The x cen and ycen are the x and y coordinates of the pixel barycenter of the binary image, respectively; dis() represents the Euclidean distance between the maximum point and barycenter; m and n are adjustable parameters, ranging from 1 to 1.5. The region roundness r can be got by r = (4π × a)/l 2 ,

(16)

where l represents the perimeter of the connection area. The average angle g is the average of the angle between the horizontal line and the line connecting each finger peak points and the barycenter.

122

D. Liang et al.

The k represents the number of holes with an area larger than threshold p in gesture binary image, and p is set as one-tenth of the total number of pixels. Through the above image enhancement, grayscale adjustment, gesture extraction, and detection, the classification of the gesture can be realized finally.

3 Experiments and Analyses The experiment is conducted on a PC platform under Windows 7 operating system. The programming tool is Matlab2016a, and the camera is an ordinary RGB webcam in PC. The proposed illumination compensation algorithm integrates luminance adjustment with Gamma correction, in order to balance the uneven illumination effects. The experimental result of the proposed illumination compensation algorithm is compared with Multi-Scale Retinex (MSR) [10], Space-variant Luminance Map (SVLM) [11], and Histogram Equalization (HE) [12] algorithms, respectively. Figure 3 shows the results of the different algorithms. The HE algorithm has almost no improvement in the input gesture image, which makes the gesture region rather uneven and obscure. The proposed illumination compensation algorithm can simultaneously reduce the overlit region and compensate the over-dark area in the image, making the gesture region more evident and homogeneous. Meanwhile, the images got by MSR seem over-enhanced. The SLVM shows a similar compensation effect, but a little dimmer than the proposed algorithm. Obviously, after illumination compensation, the gesture area gets well enhanced and balanced, which would facilitate the subsequent image gray scaling and threshold segmentation. In order to evaluate the compensation result quantitatively, the images processed by each algorithm are then transformed into gray space to calculate the corresponding mean value, average gradient, and Peak Signal-to-Noise Ratio (PSNR), as is shown in Table 1. The mean value reflects the overall luminance of the image. Average gradient reflects the sharpness and texture features. A higher average gradient means a higher image sharpness. The PSNR is used to evaluate the image distortion or noise level, and a larger PSNR represents less distortion and better identifiability. Obviously, the proposed algorithm shows the best PSNR (about 42.68) and a moderate mean value, which can effectively enhance the gesture region. The average gradient of the

Fig. 3 The experimental results of each algorithm. a The initial input image. b The image got by MSR. c The image got by SLVM. d The image got by HE. e The image got by the proposed illumination compensation algorithm

A Novel Hand Gesture Recognition Method Based on Illumination …

123

Table 1 Evaluation of the objective index of the images got by different algorithms Algorithm

Initial

SLVM

MSR

HE

Proposed

Mean value

90.641

114.182

175.671

128.012

129.513

Average gradient PSNR

1.151

1.413

1.072

1.840

1.421

40.814

41.521

42.173

38.455

42.682

proposed algorithm is also better than SLVM and MSR. The image got by HE has the largest average gradient, but its PSNR is the smallest. As shown in Fig. 4, eight types of gestures are used to conduct the recognition experiment under uneven and backlighting conditions. Table 2 shows the recognition accuracy of different gestures. The proposed method can recognize each gesture effectively, and the average recognition accuracy reaches 95.625%. Figure 5 shows the results of each step during the recognition process of gesture “5” under uneven and backlighting environment. It can be seen that the gesture area can be extracted effectively after sequential illumination compensation and grayscale adjustment, and the intensity of the gesture area can be significantly improved. Utilizing the defined parameters, the gesture can be detected accurately. During the experiment process,

Fig. 4 The eight typical gestures used for the gesture recognition

Table 2 The recognition accuracy of different gestures Gesture type

1

2

3

4

5

OK

Left

Right

Test times

20

20

20

20

20

20

20

20

Correctly recognized

20

20

18

19

20

18

19

19

100

100

90

95

100

90

95

95

Accuracy (%) Average accuracy (%)

95.625

124

D. Liang et al.

Fig. 5 The results during each step of the recognition process of gesture “5”. a The image got after illumination compensation. b The image got by the proposed grayscale adjustment algorithm. c The image got after threshold segmentation. d The final gesture recognition result

there also exists some issues which would need further adjustment and optimization. After image thresholding, the edge of the gesture area is sometimes irregular with some serrations, which might lead to misdetection of the finger peaks. Under different background, a series of experiments are also needed to optimize the parameters during image enhancement and threshold segmentation process. Overall speaking, the proposed method can effectively help to reduce the effects caused by the uneven and backlighting, and facilitate the image segmentation and gesture recognition process.

4 Conclusion and Future Work This paper proposes a novel gesture recognition method based on illumination compensation and grayscale adjustment, which can effectively recognize hand gestures using a simple RGB camera. A new illumination compensation algorithm is designed by integrating the luminance adjustment with Gamma correction, in order to process the images under uneven and backlighting environment. A grayscale adjustment algorithm based on grayscale enhancement and integration is used to detect the skin color and hand area accurately. Five gesture features including area, roundness, finger peak number, hole number, and average angle are constructed to recognize the input gesture. Experiment results show that the proposed method can effectively reduce the influence of uneven illumination and recognize the hand gestures. This method shows practical application value in low-light image processing, intelligent device control, and human–computer interaction. In future work, the illumination compensation and grayscale adjustment algorithm would be further optimized to improve the execution efficiency. It would also be considered to supplement the gesture features to identify more common gesture types, and design an adaptive selection and adjustment algorithm for the parameters used during each procedure to improve the gesture detection adaptability and efficiency. Furthermore, integrating the proposed illumination compensation algorithm with deep learning is also considered to achieve accurate recognition for various kinds of gestures under complex background.

A Novel Hand Gesture Recognition Method Based on Illumination …

125

Acknowledgements The authors would like to acknowledge the Centre for Artificial Intelligence, Robotics and Human–Machine Systems (IROHMS) operation C82092, part-funded by the European Regional Development Fund (ERDF) through the Welsh Government. This research is also supported by the National Natural Science Foundation of China (51805280), the Natural Science Foundation of Zhejiang province (LQ18E050005), the Natural Science Foundation of Ningbo (2019A610158), and the Ningbo Technology Innovation 2025 Project (2018B10005).

References 1. Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind. Ergon. 68, 355–367 (2018) 2. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015) 3. Qiu-yu, Z., Jun-chi, L., Mo-yi, Z., Hong-xiang, D., Lu, L.: Hand gesture segmentation method based on YCbCr color space and K-means clustering. Int. J. Signal Process. Image Process. Pattern Recognit. 8(5), 105–116 (2015) 4. Yao, Y., Li, C.T.: A framework for real-time hand gesture recognition in uncontrolled environments with partition matrix model based on hidden conditional random fields. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, 1205–1210 (2013) 5. Mo, S., Cheng, S., Xing, X.: Hand gesture segmentation based on improved kalman filter and TSL skin color model. In: Proceedings of the 2011 International Conference on Multimedia Technology, 3543–3546 (2011) 6. Liu, K., Kehtarnavaz, N.: Real-time robust vision-based hand gesture recognition using stereo images. J. Real-Time Image Proc. 11(1), 201–209 (2016) 7. Leite, D.Q., Duarte, J.C., Neves, L.P., De Oliveira, J.C., Giraldi, G.A.: Hand gesture recognition from depth and infrared Kinect data for CAVE applications interaction. Multimed. Tools Appl. 76(20), 20423–20455 (2017) 8. Wang, C., Liu, Z., Chan, S.C.: Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans. Multimed. 17(1), 29–39 (2014) 9. Plouffe, G., Cretu, A.M.: Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans. Instrum. Meas. 65(2), 305–316 (2015) 10. Bin, S., Xiongzhu, B.U., Zhengcheng, W., Minjie, G.: The defect image enhancement based on multi-scale retinex. Nondestruct. Test. 39(6), 25–27 (2017) 11. Lee, S., Kwon, H., Han, H., Lee, G., Kang, B.: A space-variant luminance map based color image enhancement. IEEE Trans. Consum. Electron. 56(4), 2636–2643 (2010) 12. Russ, John C.: The Image Processing Handbook, 4th edn. CRC Press, Boca Raton (2002)

Architecting Intelligent Digital Systems and Services Alfred Zimmermann, Rainer Schmidt, Kurt Sandkuhl, and Yoshimasa Masuda

Abstract Our paper gives first answers on a fundamental question: How can the design of architectures of intelligent digital systems and services be accomplished methodologically? Intelligent systems and services are the goals of many current digitalization efforts today and part of massive digital transformation efforts based on digital technologies. Digital systems and services are the foundation of digital platforms and ecosystems. Digitalization disrupts existing businesses, technologies, and economies and promotes the architecture of open environments. This has a strong impact on new value-added opportunities and the development of intelligent digital systems and services. Digital technologies such as artificial intelligence, the Internet of Things, services computing, cloud computing, big data with analytics, mobile systems, and social enterprise networks systems are important enablers of digitalization. The current publication presents our research on the architecture of intelligent digital ecosystems of products and services influenced by the service-dominant logic. We present original methodological extensions and a new reference model for digital architectures with an integral service and value perspective to model intelligent systems and services that effectively align digital strategies and architectures with artificial intelligence as main elements to support intelligent digitalization.

A. Zimmermann (B) Herman Hollerith Center, Reutlingen University, Boeblingen, Germany e-mail: [email protected] R. Schmidt Munich University of Applied Sciences, Munich, Germany e-mail: [email protected] K. Sandkuhl University of Rostock, Rostock, Germany e-mail: [email protected] Y. Masuda Carnegie Mellon University, Pittsburgh, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_11

127

128

A. Zimmermann et al.

1 Introduction Today, digital transformation [1] deeply disrupts existing enterprises and economies. The potential of the Internet and related digital technologies, such as the Internet of Things, artificial intelligence, data analytics, services computing, cloud computing, mobile systems, collaboration networks, blockchains, cyber-physical systems, and Industry 4.0, are strategic drivers and both enablers of digital platforms with fastevolving ecosystems of intelligent systems and services. Digitalization [2] fosters the development of IT systems with many, globally available and diverse, rather small and distributed cooperating structures, like Internet of Things or mobile systems. This has a strong impact on architecting intelligent digital services and products integrating highly distributed intelligent systems. Data, information, and knowledge are fundamental core concepts of our everyday activities, and are driving the digital transformation [3] of today’s global society. New services and smart connected products expand physical components by adding information and connectivity services using the Internet. Influenced by the transition to digitalization many enterprises are presently transforming their strategy, culture, processes, and their information systems to advance in digitalization and adopt systems and services with artificial intelligence. Human-centered intelligent systems are information systems applying artificial intelligence (AI) [4, 5] in order to support humans and to interact with people. Contemporary advances in the field of artificial intelligence have led to a rapidly growing number of intelligent services and applications. Unfortunately, the current state of the art in research and practice of architecting digital systems and services lacks a solid methodological foundation as the established methodical approaches do not fully accommodate requirements, e.g., caused by product-IT integration [6] or digital manufacturing. Therefore, our current paper focuses on the main research question: How can a value-oriented digital architecture for intelligent digital systems and services be designed in a methodological way? Our goal is to extend previous quite static approaches of enterprise architecture to fit for a flexible and adaptive digitalization of intelligent products and services. We start from the perspective that enterprise architecture lacks a clear understanding of an integral value and service perspective of architectural models supporting intelligent digital systems and services. Our goal shall be achieved by introducing new mechanisms for an integral design of intelligent digital systems and services, considering a service-dominant logic perspective [7] and a value-oriented modeling approach, as part of an effective AI-enabled digital architecture. We will proceed as follows. First, we are setting the fundamental architectural context for digitalization and digital transformation of digital products and services. We give insights to our current digital modeling approach for systematically defining value-oriented service-dominant digital products and services providing multilayer mappings from digital strategies to digital architectures. Then we present a fundamental AI-based mechanism for a selected suitable support of intelligent systems and services. We are focusing on an extended reference architecture for intelligent digital platforms and ecosystems and provide methods and mechanisms for our view

Architecting Intelligent Digital Systems and Services

129

of a multi-perspective architectural decision management. Finally, we conclude in the last section of our research findings and mention also our future work.

2 Digital Transformation Digital transformation is the current dominant type of business transformation [2, 3] having IT both as a technology enabler and as a strategic driver. Digital technologies are the main strategic drivers [1] for digitalization because digital technologies are changing the way, how business is conducted and have the potential to disrupt existing businesses. SMACIT defines in [1] the strategic core of digital technologies, with abbreviations for Social, Mobile, Analytics, Cloud, the Internet of Things. From today’s view, we have to enlarge this technological core by artificial intelligence and cognition, biometrics, robotics, blockchain, 3-D printing, and edge computing. Digital technologies deliver three core capabilities for a fundamentally changing business [1]: ubiquitous data availability, unlimited connectivity, and massive processing power. In the beginning, digitization was considered a primarily technical term [8]. Thus, a number of technologies are often associated with digitalization [2]: cloud computing, big data combined with advanced analytics, social software, and the Internet of Things. New technologies, such as deep learning, are strategic enablers and are strongly related to advances in digitalization. They allow computers to be applied to activities that were considered as exclusive to human beings. Therefore, the present emphasis on intelligent digitalization becomes an important area of research. Digital services and associated products are software-intensive [2] and therefore malleable and usually service-oriented [9]. Digital products are able to increase their capabilities via accessing cloud-services and change their current behavior. We are at a turning point in the development and application of intelligent digital systems. We see great future prospects for digital systems with artificial intelligence (AI) [4, 5], with the potential to contribute to improvements in many areas of work and in society through digital technologies. We understand digitalization based on new methods and technologies of artificial intelligence as a complex integration of digital services, products, and related systems. For years, we have been experiencing a hype about digitalization, in which the terms digitization, digitalization, and digital transformation are often confusingly used. The origin of digitalization is the concept of digitization. According to [8], we distinguish in Fig. 1 four levels of digitalization. Classical industrial products are static. You can only change them to a limited extent, if at all. On the contrary, digitized products are dynamic [2]. They contain both hardware and software with (cloud-)services. They can be upgraded via network connections. In addition, their functionality can be extended or adapted using external services. Therefore, the functionality of products is dynamic and can be adapted to changing requirements and hitherto unknown customer needs. In particular, it is possible to create digital products and services step-by-step or provide temporarily

130

A. Zimmermann et al.

Fig. 1 From digital enhancement to digital transformation Adapted from [8]

unlockable functionalities. So, customers whose requirements are changing can add and modify service functionality without hardware modification. When we use the term digitalization, we mean more than just digital technologies. Digitalization [2, 8] bundles the more mature phase of a digital transformation from analog over digital to fully digital. Through digital substitution (digitization), initially only analog media are replaced by digital media considering the same existing business values, while augmentation functionally enriches related transformed analog media. In a further step of the digital transformation, new processing patterns or processes are made possible by a digitally supported modification of the basic terms (concepts). Finally, completely new forms of value propositions for disruptive businesses, services, products, processes, and systems result from the digital redefinition (digitalization) of processes, services, and systems. Digitalization is, therefore, more about shifting processes to attractive high-automated digital business operations and not just communication by using the Internet. The digital redefinition usually causes disruptive effects on business. Going beyond the value-oriented perspective of digitalization, digital business requires a careful adoption of human, ethical, and social principles. Considering close related concepts of digitization, digitalization, and digital transformation [8, 3] we conclude digitization and digitalization are about digital technology, while digital transformation is about the changing role of digital customers and the digital change process based on new value propositions. We digitize information, we digitalize processes and roles for extended platformbased business operations, and we digitally transform the business by promoting a digital strategy, customer-centric, and value-oriented digital business models, and an architecture-driven digital change.

Architecting Intelligent Digital Systems and Services

131

3 Digital Service Design Digital technologies change the way we communicate and collaborate for value co-creation with customers and other stakeholders, even with competitors. Digital technologies have changed our view on how to analyze and understand the magnitude of real-time accessible data from multiple perspectives. Digital transformation has also changed our understanding on how to innovate in global processes to architect and develop intelligent digital products and services faster than ever approaching for the best available digital technology and quality. Digitalization forces us to look differently on value creation for and together with customers. While digital technologies are main strategic enablers [10] for new customer-focused digital businesses, it is necessary to solve real customer problems and encourage the customer engagement by creating new value propositions, promote customer co-creation, and offer new digital solutions and services. Digital systems and interfaces enable customer experiences over digital channels and make possible new and improved service-dominant product features. Better customer experiences and new product features may lead to increased customer satisfaction and increased revenues. Firstly, we model the digital strategy [10, 11], as in Fig. 2, which gives the digital modeling direction and sets the base and a value-oriented framing for the business definition models, with the business model canvas [12], and the value proposition canvas [13]. Having the base models for a value-oriented digital business, we map these base service and product models to a digital operating model [1]. From the value perspective of the business model canvas, [12] results suitable mappings to enterprise architecture value models [14] supported by ArchiMate [15]. Finally, we are setting the frame for digital services and products by modeling digital product compositions.

Fig. 2 Integral value and service perspective

132

A. Zimmermann et al.

Value is commonly associated with worth and aggregates potentially required categories like importance, desirability, and usefulness. The concept of value is important in designing adequate digital services with their associated digital products, and to align their digital business models with value-oriented enterprise architectures. From a financial perspective, the value of the integrated resources and the price defines the main parts of the monetary worth. The digital business discipline nowadays shifted to a nominal use of the value perspective [7] considering customer experience and customer satisfaction as important value-related concepts. Characteristics of value modeling for a service ecosystem were elaborated by [16]. Value has important characteristics [7]: value is phenomenological, co-created, multidimensional, and emergent. Value is phenomenological, which means that value is perceived experimentally and differently by various stakeholders in the varying context within a service ecosystem. Value is co-created through the integration and exchange of resources between multiple stakeholders and related organizations.

4 Intelligent Systems and Services From today’s view, probably no digital technology is more exciting than artificial intelligence offering massive automation capabilities for intelligent digital systems and services. Artificial intelligence (AI) [4, 17] is often used in conjunction with other digital technologies, like analytics, ubiquitous data, the Internet of Things, cloud computing, and unlimited connectivity. Fundamental capabilities of AI concern automatic generated solutions from previous useful cases and solution elements, inferred from causal knowledge structures like rules and ontologies, and from learned solutions based on data analytics with machine learning and deep learning with neural networks. Artificial intelligence receives a high degree of attention due to recent progress in several areas such as image detection, translation, and decision support [4]. It enables interesting new business applications such as predictive maintenance, logistics optimization, and improving customer service management. Artificial intelligence supports decision-making in many business areas. Most companies expect to gain competitive advantage from AI. Today’s advances in the field of artificial intelligence [18, 19] have led to a rapidly growing number of intelligent services and applications. The joint development of competencies via intelligent digital systems promises great value for science, economy, and society and is driven by data, calculations, and advances in algorithms for machine learning, perception and cognition, planning, and natural language. Artificial intelligence is often characterized as impersonal: From this point of view, intelligent systems operate completely automatically and independently of human intervention. The public discourse on autonomous algorithms working on passively collected data contributes to this view. However, this perspective of huge automation obscures the extent to which human work necessarily forms the basis for modern AI systems [17] and makes them possible in the first place. The human element of

Architecting Intelligent Digital Systems and Services

133

intelligent systems includes tasks like optimizing knowledge representations, developing algorithms, collecting and tagging data, and deciding what to model and how to interpret the results. The study of artificial intelligence from a human-centric perspective requires a deep understanding of the role of human ethics, human values and customs, and the practices and preferences for development and interaction with intelligent systems. With the success of AI, new concerns and challenges regarding the impact of these technologies on human life are emerging. These include issues of security and trustworthiness of AI technologies in digital systems, the fairness and transparency of systems, and the conscious and unintended impact of AI on people. Symbolic AI [4], which predominated until the 1990s, uses a deductive, expertbased approach. By interviewing one or more experts, knowledge is collected in the form of rules and other explicit representations of knowledge, such as horn-clauses. These rules are applied to facts that describe the problem to be solved. The solution of a problem is found by successively applying one or more rules using the mechanisms of an inference engine [4]. An inference path can usually also be traced backward and forward which offers transparency and rationality about instantiated inference processes by “how” and “why” explanations. The symbolic AI proved to be very effective for highly formalized problem spaces like theorem proving. After the last wave of enthusiasm at the end of the 1980s, the focus of research shifted [17–19]. Ontologies [4] represent the second wave of semantic technologies to support explicit knowledge representations. Ontologies have a foundation in the philosophy of being and existence. From the perspective of symbolic AI, ontologies [4] are explicit machine-readable representations of basic categories of concepts and their relations. The Web Ontology Language (OWL) defines a family of knowledge representation languages for ontologies to represent the formal semantics of concepts and relations. Neural networks are inspired by the metaphor of the human brain, which connects artificial neurons via a network that receives input and produces output data. Together with genetic algorithms, fuzzy systems, rough sets, and the study of chaos, they are examples of new approaches to artificial intelligence [19]. Deep learning is considered a subclass of machine learning approaches. Even more than machine learning, neural networks and deep learning are able to capture tacit knowledge [18]. The basic mechanism of neural networks is the adaptation of weights representing the strength of connections between neurons until the conversion of input signals into output signals shows the desired behavior. The adaptation of weights using training data is called learning. In contrast to symbolic AI, machine learning [17] uses an inductive approach based on a large amount of analyzed data. We distinguish three basic approaches to machine learning [18, 19]: supervised, unsupervised learning, and reinforcement learning. In supervised machine learning approaches, the target value is part of the training data and is based on sample inputs. Typically, unsupervised learning is used to discover new hidden patterns within the analyzed data. Reinforcement Learning (RL) is an area of machine learning with software agents [5] working to maximize cumulative rewards. The exploration environment is specified in terms of a Markov decision process because many reinforcement learning algorithms use dynamic programming

134

A. Zimmermann et al.

techniques. Reinforcement learning does not require marked input/output pairs and suboptimal actions do not need to be explicitly corrected. Combining product components of hardware and software with cloud-provided intelligent services enable new ways of intelligent interaction with customers, as in [20]. The lifecycle of digitized products is extended by intelligent services. An example is Amazon Alexa, which groups a physical device having a microphone and speaker with services, called Alexa skills. Users can enhance Alexa’s capabilities with skills which are similar to apps. The set of Alexa skills is dynamic and can be tailored to the customer’s requirements during runtime. Alexa enables voice interaction, music playback, to-do lists, set alarms, stream podcasts, play audio books, and provide weather, traffic, sports, and other real-time information such as news. Using programmed skills, Alexa can also connect and control intelligent products and devices.

5 Digital Enterprise Architecture Digital business architecture [1] is part of a digital enterprise architecture [6, 21, 22] which provides a comprehensive view on integrated elements from both business and IT. More precisely, we integrate configurations of stakeholders (roles, accountabilities, structures, knowledge, skills), business and technical processes (workflows, procedures, programs), and technology (infrastructure, platforms, applications) to execute digital strategies and compose value-proposition-oriented digital products and services. Digital business is foremost an aspect that is currently in use and constantly changing. Therefore, digital business design is not an end state. We have extended our service-oriented enterprise architecture reference model for the evolving context of digital transformation with micro-granular structures considering associated multi-perspective architectural decision-making [21] models, which are supported by viewpoints and functions of an architecture management cockpit. DEA—Digital Enterprise Architecture Reference Cube provides our holistic architectural reference model for bottom-up integrating dynamically composed microgranular architectural services and their models (Fig. 3), as in the usage scenario [23]. Enterprise Architecture Management, as today defined by several standards like [15], uses a quite large set of different views and perspectives for managing current IT. An effective and agile architecture management approach for digital enterprises should additionally support the intelligent digitalization of products and services and be both holistic [21, 22] and easily adaptable [24]. As an effective architecture management approach in a digital IT era, the “Adaptive Integrated Digital Architecture Framework—AIDAF [22]” is starting these days for applications in several global companies like healthcare and manufacturing industries, information societies. The AIDAF proposed model is shown in Fig. 4. The traditional operational backbone of core IT services does not offer preconditions for transition speed and flexibility needed for continuous, rapid, and agile

Architecting Intelligent Digital Systems and Services

135

Fig. 3 Digital enterprise architecture reference cube [21]

Fig. 4 Adaptive integrated digital architecture framework (AIDAF) [22]

digital innovation because it is designed for stability, reliability, and efficiency. As a consequence, digital companies are designing a second backbone for leveraging and hosting digital services. The digital services backbone [10] bundles a set of business and technology capabilities that enable rapid development of digital innovation. A successful digital architecture should use a service platform [2] which supports an actor to actor network and hosts a set of loosely coupled services as part of a fast-growing digital ecosystem [1]. A service platform is a modular structure that connects and integrates resources and actors sharing institutional logics [21] and promotes value co-creation by service exchange, according to the service-dominant logic [7]. The value of a platform for users [16] results from the number of platform

136

A. Zimmermann et al.

and service adopters. A digital platform [2] and ecosystem should enable value co-creation [7, 16]. A platform’s ability for fast growing is based on the principle of network effects [1, 7] and frictionless entry points for a large number of new participants. Maturing platforms often evolve for greater openness. Platforms value results from the community they serve. The design of a platform should begin supporting first its core interaction easily available and inevitable. Therefore, a digital platform [2] should provide three core functions: pull, facilitate, and match. As the participants and resource base of the platform grows, new interaction will be found by participants to expand the core interaction. Digital platforms are superior to traditional fixed value chains because of the value produced by network effects, leading to disruptive business transformations.

6 Conclusion In our research, we address the research question on how to design architectures of intelligent digital systems and services in a methodological way. Based on our fundamental research question, we have first set the context proceeding from digitalization and digital transformation to a systematic value-oriented digital service design, according to the service-dominant logic. Then we have selected suitable AI mechanisms to enable intelligent digital systems and services. To be able to support the dynamics of digital transformation with flexible software and systems compositions we have leveraged an adaptive architectural approach for open-world integrations of globally accessed systems and services with their local architecture models. We contribute to the literature in different ways. Looking to our results, we have identified the need and solution mechanisms for a value-oriented integration of digital strategy models through digital business models and a digital operating model up to models for intelligent digital systems and services as part of a value-oriented enterprise architecture for digital ecosystems. Strengths of our research result from our novel approach of an integral multilevel model mapping of digital strategies to value-oriented digital operating models for intelligent digital systems and services on a close related digital platform, supported by a uniform digital architecture reference model. Limits of our work are resulting from an ongoing validation of our research and open issues in investigating extended AI approaches and managing inconsistencies and semantic dependencies. Future research will cover mechanisms for flexible and adaptable integration of intelligent digital architectures. We are working to extend human-controlled dashboard-based decision-making by AI-based intelligent systems for decision support.

Architecting Intelligent Digital Systems and Services

137

References 1. Ross, J.W., Beath, C.M., Mocker, M.: Designed for Digital. How to Architect Your Business for Sustained Success. The MIT Press (2019) 2. McAfee, A., Brynjolfsson, E.: Machine, Platform, Crowd. Harnessing Our Digital Future. W. W. Norton (2017) 3. Rogers, D.L.: The Digital Transformation Playbook. Columbia University Press (2016) 4. Russel, S., Norvig, P.: Artificial Intelligence. A Modern Approach. Pearson (2015) 5. Poole, D.L., Mackworth, A.K.: Artificial Intelligence. Foundations of Computational Agents. Cambridge University Press (2018) 6. Sandkuhl, K., Seigerroth, U., Kaidalova, J.: Towards Integration Methods of Product-IT into Enterprise Architectures. In: 2017 IEEE EDOCW, pp. 23–28 (2017) 7. Vargo, S.L., Akaka, M.A., Vaughan, C.M.: Conceptualizing value: a service-ecosystem view. J. Creat. Value 3(2), 1–8 (2017) 8. Hamilton, E.R., Rosenberg, J.M., Akcaoglu, M.: Examining the Substitution Augmentation Modification Redefinition (SAMR) Model for Technology Integration. Tech. Trends 60, 433–441 (2016) 9. Newman, S.: Building Microservices Designing Fine-Grained Systems. O’Reilly (2015) 10. Ross, J.W., Sebastian, I.M., Beath, C., Mocker, M., Moloney, K.G., Fonstad, N.O.: Designing and Executing Digital Strategies. Proceedings of ICIS Dublin (2016) 11. Bones, C., Hammersley, J., Shaw, N.: Optimizing Digital Strategy—How to Make Informed, Tactical Decisions That Deliver Growth. Kogan Page (2019) 12. Osterwalder, A., Pigneur, Y.: Business Model Generation. Wiley (2010) 13. Osterwalder, A., Pigneur, Y., Bernarda, G., Smith, A., Papadokos, T.: Value Proposition Design. Wiley (2014) 14. Meertens, L. O., Iacob, M. E., Nieuwenhuis, L. J. M., van Sinderen, M. J., Jonkers, H., Quertel, D.: Mapping the BMC to ArchiMate. SAC, ACM, 1694–1701 (2012) 15. Open Group: ArchiMate 3.0 specification. The Open Group (2016) 16. Blaschke, M., Haki, M. K., Riss, U., Aier, S.: Design principles for business-model-based management methods-a service-dominant logic perspective. In Maedche, A. et al. (eds.) DESRIST, pp. 179–198, Springer (2017) 17. Hwang, K.: Cloud Computing for Machine Learning and Cognitive Applications. The MIT Press (2017) 18. Skansi, S.: Introduction to Deep Learning. Springer (2018) 19. Munakata, T.: Fundamentals of the New Artificial Intelligence. Neural, Evolutionary, Fuzzy and More. Springer (2008) 20. Warren, A.: Amazon Echo: The Ultimate Amazon Echo User Guide 2016 Become an Alexa and Echo Expert Now!. CreateSpace Independent Publishing, Scott Valley (2016) 21. Zimmermann, A., Schmidt, R., Sandkuhl, K., Jugel, D., Bogner, J., Möhring, M.: Decisionoriented coposition architecture for digital transformation. In Czarnowski, I., Howlett, R., Jain, L. C., Vlacic, L. (Eds.): Intelligent Decision Technologies, pp. 109–119 (2018) 22. Masuda, Y., Viswanathan, M.: Enterprise Architecture for Global Companies in a Digital IT Era. Springer (2019) 23. Sandkuhl, K., Wißotzki, M., Smirnov, A., Shilov, N.: Digital Innovation Based on Digital Signage: Method, Categories and Examples. In: BIR, pp. 126–139. Springer (2018) 24. Masuda, Y., Shirasaka, S., Yamamoto, S., Hardjono, T.: An adaptive enterprise architecture framework and implementation: Towards global enterprises in the era of cloud/mobile IT/digital IT. Int. J. Enterp. Inf. Syst. IJEIS. 13(3), 1–22 (2017)

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM Soheil Human and Florian Cech

Abstract According to different legal frameworks such as the European General Data Protection Regulation (GDPR), an end-user’s consent constitutes one of the well-known legal bases for personal data processing. However, research has indicated that the majority of end-users have difficulty making sense of what they are consenting to in the digital world. Moreover, it was demonstrated that marginalized people are confronted with even more difficulties dealing with their own digital privacy. In this paper, using an enactivist perspective in cognitive science, we develop a basic human-centric framework regarding digital consent. We argue the action of consenting is a sociocognitive action and includes cognitive, collective, and contextual aspects. Based on this theoretical framework, we present our qualitative evaluation of the practice of gaining consent conducted by the five big tech companies, i.e. Google, Amazon, Facebook, Apple, and Microsoft (GAFAM). The evaluation shows that these companies are lacking in their efforts to empower end-users by considering the human-centric aspects of the action of consenting. We use this approach to argue that the consent gaining mechanisms violate principles of fairness, accountability and transparency and suggest that our approach might even raise doubts regarding the lawfulness of the acquired consent–particularly considering the basic requirements of lawful consent within the legal framework of the GDPR.

S. Human (B) Sustainable Computing Lab & Institute for Information Systems and New Media, Vienna University of Economics and Business (WU Wien), Vienna, Austria e-mail: [email protected] F. Cech Centre for Informatics and Society, Vienna University of Technology (TU Wien), Vienna, Austria e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_12

139

140

S. Human and F. Cech

1 Introduction According to the European General Data Protection Regulation (GDPR) “[t]he protection of [all] natural persons in relation to the processing of personal data is a fundamental right” (GDPR 2016, Recital 1) [20, Rec. 1]. The concept of end-user given consent plays an important role in digital regulations such as the GDPR. As a result of its enforcement and other legal obligations, data controllers have widely developed consent-obtaining mechanisms in recent years. Although consent is only one of the possible bases for the lawful practice of data processing based on the GDPR (see [20, Article 6]), obtaining consent is still widely practiced and it can be perceived as an important means, which can potentially enable an end-users’ agency regarding the management and ownership of their personal data. While the GDPR expects specific prerequisites for a lawful consent, which should be valid, freely given, specific, informed and active, it is clear that this requires consent-giving agents, i.e., the end-users, to be able to provide and manage such consents [31]. Previous research, however, shows that the majority of people do not seem to be empowered to practice their digital right to privacy and lawful consenting. As an example, a European Commission’s Special Eurobarometer with the title “Data Protection” [16], which was conducted in 28 EU member states with nearly 28.000 participants, sheds some light on how people deal with their digital privacy. Most end-users do not seem to have the necessary legal and technological information required to actively protect their personal data [16]. Only 20% of respondents to the European Commission’s survey have said that they are always informed about the conditions and further uses of data collection. 41% of the respondents said they are only sometimes informed. 22% have said that they are rarely informed about these issues, and around 11% replied that they are never informed. On the topic of legal authorities, 37% said that they knew about their national authority for data protection, while 61% said they did not. Furthermore, reading privacy policies and terms of use before starting to use a service is time consuming. When asked how much control they feel they have over the information they provide online, 15% of participants said they have complete control, 50% said they have partial control and 31% said they feel no control at all. In an environment where users are connected to many digital technologies, keeping track of every privacy policy involved becomes an impossible task. Therefore, many users are unwilling and poorly equipped to deal with the numerous privacy policies they face with. In the aforementioned Eurobarometer, participants were asked about the reasons why they do not fully read privacy policies and 67% of them found policies too long to read while 38% responded that the policies were unclear or difficult to understand [16]. Considering the disappointing results of the Eurobarometer and the vast amount of other research endeavors with similar implications [46, 47, 50, 52], we propose that end-users need to be empowered in terms of their right to consent (including their right to withdraw their consents at any time) by interdisciplinary and multidimensional socio-technical means and approaches. Based on the enactivist [26, 32, 51] perspective in cognitive science, in this paper, we propose a basic

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

141

human-centric framework for end-user empowerment wherein consenting is considered a sociocognitive action which includes cognitive, collective, and contextual dimensions. We subsequently apply this framework to evaluate the practice of consent-obtaining by the 5 biggest tech companies, i.e., Google, Amazon, Facebook, Apple, Microsoft, whose consent-obtaining mechanisms affect the lives of billions of humans around the globe. After presenting our evaluation methodology and results, which show that these companies do not follow a human-centric approach toward obtaining end-users’ consents, we provide a concise discussion on the research presented and outline future work.

2 The Need for a Human-Centric Perspective on Digital Consenting While research shows that many of the people who are involved in online activities are aware that their personal data is being collected and shared, this gives no concrete proof to assume that people are willing to give away their personal data [44]. On the contrary, the importance of digital privacy to people has been demonstrated. People even “develop innovative strategies to achieve privacy while participating in the systems that allow them to access information, socialize with friends, and interact with contemporary entertainment platforms” [44] (see also [11, 12, 42, 43]). A similar line of thought is expressed by Busch [13] as well, who emphasizes that while people’s professed interests in maintaining privacy do not always line up with their actual behavior, we might want to consider that (a) individual choices are not fully rational and (b) small decisions of individuals accumulate into large consequences that all users (i.e., all humans) have to face. On this matter, as [4, p. 27] reminds us: “the question whether do consumers care?” is a different question from “does privacy matter?”. We affirm both of these questions. However, we believe that answering the third question is even more important: if most of the endusers do care about their privacy and if privacy does matter, how can people be empowered to really practice (enact) their privacy-values? If we accept that “the individual end-users and their needs and values, as well as the environment (including socio-economical contexts, other actors, etc.) and technologies they interact with, continuously co-create the [...] end-user empowerment” [29] (see also [6, 30]), only an approach which considers all these different involved dimensions can truly enable human empowerment. We call such an approach human-centric, wherein individual (cognitive) and social (collective & contextual) dimensions of every single end-user and all end-users combined are taken into account when an information system— a consent-obtaining mechanism in our case—is designed, implemented, evaluated, and released. We propose that considering humans as cognitive systems enacting in their sociocontextual environments provides a framework for empowering them based on their sociocognitive needs, values, capabilities, and limits. Using a human-centric perspec-

142

S. Human and F. Cech

tive will not only enable designers and developers to consider the sociocognitive aspects of consenting-agents (i.e., end-users) in the development of new consentobtaining mechanism, but can also provide a framework for evaluating the existing mechanisms designed to obtain consent on the Internet. In the latter case, which is how we use the developed basic human-centric perspective (presented in Sect. 3), the human-centric framework can be used to evaluate whether an existing consentobtaining mechanism (e.g., a cookies form) is able to empower end-users by considering different cognitive, collective, and contextual dimensions of the consenting action (i.e., giving consent as a multidimensional action) that is expected to be conducted via that mechanism. We propose that without considering a human-centric perspective, which considers the multi-dimensionality of human actions (or enactions), research or development on empowering technologies (including consent-obtaining mechanisms) is hardly an achievable task.

2.1 Fairness Matters: A Human-Centric Perspective and Marginalized People We propose that end-user empowerment using human-centric perspectives should be considered a universal approach in design, implementation, evaluation, and release of consent-obtaining mechanisms for everyone. However, when marginalized and underprivileged people are concerned, the urgent need for a human-centric shift in design and implementation of consent-obtaining mechanisms becomes even more explicit: Marwick and Boyd [44] point out that while it is increasingly challenging for all individuals to maintain digital privacy due to our “networked age”, it is even more challenging for people at the margins. They argue that people who are structurally and systematically oppressed (for example, immigrants, LGBTQ+ communities, people of color) experience privacy differently than people with more privilege. Providing an example for such a scenario, they write that an ill person without adequate insurance who is seeking treatment will share their personal information much easier than a person with full health coverage. Tene and Polonetsky [48] demonstrate a similar scenario wherein parent–children relationships can become strained due to privacy issues. They claim that parents have always tended to control their children’s online activity. Until before the invention of parental filters that monitor and control every activity, children had the opportunity to shut the door and afford themselves some privacy. Given today’s state of technology, the authors ask “what social norms constrain parents from persistently peering into their children’s lives?” To describe how the “right to be left alone” is exercised in the current digital world, Marwick and Boyd [44] write that “as data-based systems become increasingly ubiquitous, and companies that people entrust frequently fail to protect personal data, the lines between choice, circumstance, and coercion grow increasingly blurry.” In order to manage their privacy in this ever less private digital world, people seek technological and social information. Therefore, people who are socially disadvan-

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

143

taged will presumably have a harder time maintaining their privacy [41]. Companies serving different demographics including very young and elderly people who are technologically less adept, could take advantage of these users. In a recent scandal involving Facebook’s underage users, news reports in January 2019 stated that Facebook tried to increase its online gaming revenue by accepting payments from children unaware that their parents’ credit cards were being charged [33]. In this predatory behavior, Facebook actively exploited children’s lack of knowledge and refused to make proposed changes that could overcome the problem. Therefore, a “one-size-fits-all” approach to privacy can be ignorant of social inequalities and in the worst case, it will allow for the exploitation of the least protected users. The legal framework differentiates between data subjects, controllers, and processors, but it would be a mistake to treat any of these groups as homogeneous, since data subjects have varying privacy attitudes [36], just as controllers and processors may collect different types of information for their services and products [39]. Although privacy is a legal right granted equally to citizens, the exercise of this right in real-life conditions is usually more complicated and not everyone benefits equally from protection. We, therefore, propose that a human-centric perspective, wherein individual needs, values, capabilities, and limits of every single individual end-user is taken into account [30], is a significant aspect that needs to be considered in consent-obtaining digital mechanisms, as one of the most important information systems which are expected to protect human rights and values. Such human-centric consent-obtaining mechanisms will not only empower privileged end-users but also empower and protect underprivileged and marginalized ones. If we consider the right to privacy as a basic human right and accept that consent-obtaining mechanisms need to be fair regarding the services they provide by respecting their users’ needs, values, capabilities, and limits, human-centricity particularly gains prevalence.

3 Enacting Consent: Consenting as a Sociocognitive Action In the previous section, we argued for a need for human-centric approaches toward consent-obtaining. In order to develop a basic human-centric framework for the evaluation of existing consent-obtaining mechanisms, we use one of the current paradigms in cognitive science, enactivism [32, 51]. This paradigm is supported by some of the most recent advancements and findings in cognitive science (see e.g., [5, 15]). According to enactivism, cognition arises through the continuous interaction of a cognitive system and its environment [51]. By taking an enactivist perspective, we propose that instead of reflecting on consent as a symbolic and abstract concept, we need to consider the action of consenting as one, which is the result of continuous and dynamic interactions between the end-user and the consent-obtaining system, performed in a social (and environmental) context. Based on this understanding, consenting is a sociocognitive action involving cognitive as well as social dimensions and processes.

144

S. Human and F. Cech

Figure 1 provides a simple visualization of the sociocognitive dimensions of consenting. From an enactivist perspective, it is difficult to draw a line between different dimensions as they have overlaps. Moreover, all dimensions are in continuous interaction. Considering the state of the art in cognitive science (e.g., [32]) as well as the literature on digital privacy and current consenting behavior of endusers, the Social dimension is divided into two ever-interacting and overlapping sub-dimensions, i.e., a collective dimension, and a contextual dimension. Collective refers to collective ownership of data, collective management of consents, and support of individuals regarding their privacy and consent-giving by other humans or communities. Contextual refers to contexts such as location, time, emotional-state, importance, etc., wherein a consent is being given. Each of these dimensions is shortly discussed below in the specific context of consenting.

3.1 The Cognitive Dimension of Consenting In an ideal world, everyone who agrees with using the same services should be protected digitally in the same ways but there appears to be a number of obstacles to attain this goal. The users, often treated as a homogeneous group, comprise agents with different needs, preferences, abilities, skills, knowledge, and resources. As Marwick and Boyd [44] have recently discussed, maintaining privacy requires not just setting preferences on a digital service but also an active protection of information. Managing privacy by controlling when and where to share the information as well as who has access to the information is a difficult task. Users need increasingly

Fig. 1 A simple visualization of sociocognitive dimensions of consenting; The social dimensions are colored in Khaki.

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

145

complex technical and social skills to navigate these digital spaces, to assess the costs and benefits of their actions, and to strategically decide to share or hide information. In case of using consent-obtaining mechanisms, as they are implemented today, the users need to have a good understanding of their consent-related decisions (e.g., being able to consider their individual, contextual, and collective short and long-term consequences), as well as the ability to perceive and understand the visual elements of the user interfaces, to eventually comprehend and make sense of the texts and information provided, among others, in the terms of service, privacy policy, and guiding documents. Taking the complexity of consent-obtaining mechanisms into account, one may doubt a normal human being’s ability to perform these tasks, given the finite cognitive capacities and limited time, expertise, knowledge, resources, and so on. A detailed discussion of this subject exceeds the scope of this paper. It is, however, clear that consenting has a cognitive dimension, which should be taken into account. As proposed in literature [30], consent-obtaining mechanisms need to give consideration to humans’ cognition and processes such as needs, values, capabilities, and limits, if they aim at empowering them regarding the management of their privacy.

3.2 The Collective Dimension of Consenting Lehtiniemi and Kortesniemi [37] note that people reveal information not only about themselves but also about others. They emphasize that “privacy self-management frames the decision-making on personal data as an individual choice based on private cost-benefit analysis, despite personal data often also conveying information about others” [37]. For instance, when group photos or the location and time of events are posted on the Internet, people are publishing information about their peers as well. Subsequently, by downloading a mobile application that has access to the users’ contacts, they are giving personal information away, which essentially belongs to other people. Considering that social data can be extracted from personal data and individual privacy decisions have social impacts, we must turn our attention to collective attitudes toward privacy protection. Other aspects of the collective dimension of consenting include peer-influence and expert-influence. In a study conducted by Das et al. [17], the frequency of adoption of three Facebook security features by a group of 1.5 million users was analyzed. The results have shown that if a user has many friends, especially from different social groups, that have adopted a security feature, they are more likely to adopt that measure as well. In a more recent study conducted by Emami Naeini et al. [19], 1000 online participants were asked about scenarios where the collection of personal data was allowed or rejected. In the study group, the researchers gave the participants information about the decisions of their peers and experts on the topic. In the control group, this information was not available. When participants were aware of the attitudes of other people, they were quicker to decide. This effect was larger in scenarios where the task was more difficult and required more time to

146

S. Human and F. Cech

decide. Granovetter [22] (as cited in [38]) summarizes this point by proposing that “private cost-benefit decisions to disclose data are embedded in a network of social relations, and looking at them from an individuated, under-socialized point of view is misleading.” In summary, the collective dimension of consenting has different aspects: on the one hand, since many of the data types have collective privacy-related attributes (e.g., group photos), the consent associated with them, needs to be given (or obtained) in a collective manner; on the other hand, since (1) many of the people do not have enough expertise or abilities to manage their own privacy in an appropriate way, and (2) the social consequences of privacy invasion go beyond single individuals, people need to be supported by other peers and experts in their consenting and personal data management [8].

3.3 The Contextual Dimension of Consenting Consenting, as a sociocognitive action, always happens within and in relation to contextual dimensions [35]. Time, location, emotions, other involved people, purpose, trust, urgency, contextual self-identity, and many other factors can influence the action of consenting. An individual from a specific social minority might have no issue with the processing of geographical information during the weeks while she is in her home city, but she might have serious concerns if the same data is collected while she is traveling on the weekend (e.g., due to potential consequences). The contextual aspect of human-centric perspective toward the development of consent-obtaining mechanisms reminds us about at least two things: (1) providing fine-grained control possibilities for individuals, who are able to consider their own contextual specifications such as their situated needs, values, concerns, attitudes, capabilities, and limits; (2) perceiving consent as a contextual entity which should always be obtained in relation to contexts.

4 A Human-Centric Evaluation of the GAFAM Practice of Consent Obtaining Following the conceptualization of consenting as cognitive, collective, and contextual enactions, we turn to the specific mechanisms of giving and obtaining consent in a online environment in the form of privacy options and cookie consent forms. The following sections investigate the nature of these three aspects of consenting in the context of the “Big 5”—Google, Amazon, Facebook, Apple, and Microsoft (GAFAM)—and their efforts to obtain end-user consent. The main reasons to choose abovementioned website are: (1) they are ranked among the websites with the highest numbers of users, (2) the enterprises running these websites not only have the legal obligations to obtain lawful consents from their diverse end-users (same as

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

147

other websites), but also—as some of the most technologically advanced endeavors in the world—they possess sufficient resources to develop highly sophisticated consent infrastructures if they choose to, (3) they extensively use data subjects’ personal data for a variety of purposes, including targeted advertising and profiling, which are specifically regulated by the GDPR. Starting with a critical analysis of web practices aimed at artificially increasing the cognitive load on the user, we investigate the process of giving and withdrawing consent for the use of private data, tracking cookies, and targeted advertising.

4.1 Evaluating the Cognitive Aspect of Consenting The process of giving or withdrawing consent in the digital space (i.e., digital consenting or online consenting) is becoming an increasingly difficult cognitive task. Not only is the potential amount of information to be disclosed growing massively, but the number of service providers and companies that are collecting data and thus are subject to the GDPR are also growing. Subsequently, users are confronted with the necessity to either accept the companies’ terms of services as-is or to undertake the arduous task of choosing when to share which of their private data through web interfaces provided by the data collectors themselves, meaning privacy and cookie consent forms. These forms are located in a paradox space between legal compliance and the interests of the company to collect as much data as possible on the one side, and the users’ needs and rights to take control of their personal data, on the other. Therefore, in this context, it seems plausible to assume that data controllers would not only first and foremost provide interfaces that are geared toward legal compliance, but also utilize design mechanisms in these digital consent mechanisms which are actively discouraging their own use. To answer just how common these presumed practices are is the empiric focus of our work, which we see as a necessary step toward establishing both the necessity and applicability of the framework presented before. As described both in academic literature and by professionals in the web design industry [14, 23], a variety of design measures exist that are tailored to complicate interaction processes and increase the cognitive strain on the user, commonly referred to as dark patterns. Dark patterns, a concept introduced by Brignull et al. in 2011, are [...] a type of user interface that appears to have been carefully crafted to trick users into doing things [where these user interfaces] are carefully crafted with a solid understanding of human psychology, and they do not have the user’s interests in mind (Brignull et al., cited by Greenberg et al. [24, p. 2]).

To compare the utilization of such user interface design patterns in consent forms of the main webpages of the GAFAM, we utilize a taxonomy of these patterns introduced by Mathur et al. [45]: Asymmetry, Covertness, Deceptiveness, Hidden Information and Restrictive Design.

148

4.1.1

S. Human and F. Cech

Methodology

In order to investigate the use of these patterns and discover other commonalities and discrepancies as well as subsequently evaluating the current practices in the context of our human-centric framework of consenting, we utilized a scenario-based critical interaction and design analysis as interpretative ethnography [7, 18]. Borrowing from Blackmon et al. [9], we asked usability and web design experts to conduct a cognitive walkthrough of the GAFAM cookie and privacy consent mechanisms to identify the abovementioned dark patterns, conceptualizing them as usability problems. For each of the five main web pages, this meant enacting a single user story: “optout of and withdraw consent for as many data collection and privacy practices as possible”. Starting at the top-level domain1 for each company, we set the following goals: 1. Locate and document a cookie consent banner, notice, or similar. 2. Follow the available links to reach all available opt-out/consent withdrawal options. 3. Document design patterns supporting or hindering the cognitive process of withdrawing consent. The process was completed separately by five experts in user interface design and information architecture in August 2019 while being observed and subsequently cross-validated by the authors to resolve discrepancies and questions. Choosing experts—with both industry and UI/UX research backgrounds- this task in lieu of a larger, user-centric study allowed uncovering deeper design patterns and subtle mechanisms that average users might not be familiar with. Each process was conducted using a pristine browser in “private” mode (with no cookies present prior to the process). When necessary, user accounts were created to explore the internal privacy options provided only for registered users. This allowed starting from a tabula rasa state for each web page, while also preserving the necessary cookie information as long as the session lasted. Given that the browsers utilized for the test2 both prohibit the detection of whether or not the private mode was used—Chrome by default, Firefox through an extension—this approach was most promising in creating a pristine, comparable environment for each of the tests. No significant difference between the two browsers in regards to behavior or consent mechanism design were noticeable by the evaluators. Evaluators were familiarized with the concepts of dark patterns and their taxonomy as described in the previous section prior to the cognitive walkthrough. Not all the evaluators reported the same patterns. However, during the ex-post discussions, the evaluators did not express any significant disagreement with the other evaluators’ observations. 1 To

ensure the webpage was specifically designed to be compliant with the GDPR, we chose the local top-level domain of Austria where applicable. All consent forms mentioned the GDPR in one way or the other. 2 Firefox 68.0.1 and Chrome 76.0.3809, respectively.

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

149

Table 1 Consent mechanism evaluation: key results Company Notice

Targeted Ads: Opt-Out

Shortest path (clicks)

Asymmetry Covert

Deceptive Information Restrictive hiding design

Amazon

Explicit

ThirdParty

103

Yes

Yes

Apple

Implicit

Yes

Yes

Yes

No

7

Yes

No

Yes

Yes

Yes

Facebook Explicit

ThirdParty

9

Yes

No

Yes

Yes

Yes

Google

Explicit

Yes

8 (+3)

Yes

Yes

Yes

Yes

No

Microsoft Explicit

Yes

9

Yes

No

Yes

Yes

No

4.1.2

Results

While observing the process and documenting the observed design patterns, three aspects were particularly in focus: interaction design, visual design, and textual descriptions. In terms of interaction design, the minimum amount of clicks and page jumps necessary to reach the required options, animation, and hidden features (such as the “read more” accordion or collapsible panel pattern [49]) received particular attention. For the visual design aspects, the size and location of interaction buttons were of primary importance, and the textual descriptions were compared throughout different pages in terms of the consistency of the description and terminology as well as their semantic relation to the interaction options (e.g., answering the question “Does the action promised by the “opt-out” button correlate with the textual description underneath?”). Significant observations were collected and evaluated against the taxonomy of dark patterns noted above. The following analysis presents an overview of our findings (Table 1). Two overall distinctions emerge from our empiric3 , 4 analysis: first, only four of the five analyzed sites (Amazon, Facebook, Google, and Microsoft) make it explicit that users are giving consent to the use of their data in form of a cookie consent banner or a pop-up. Apple required looking for a footer link titled “Use of cookies” in order to access information about collected data and find further links to the opt-out process. Second, not all the sites provide an opt-out option for users that either do not have an account or are not signed in. Specifically, Apple does not offer these options and instead urges users to sign in or create an account in order to set privacy preferences. Of the remaining four, Microsoft and Google allow opting out of targeted advertising through their own consent forms, and Amazon and Facebook only provide a link to three consent intermediaries—the Digital Advertising Alliance5 (for the U.S.), Digital Advertising Alliance of Canada6 and the European Interactive Digital Advertising 3 Through

third-party tool. YouTube. 5 http://optout.aboutads.info/. 6 https://youradchoices.ca/. 4 Including

150

S. Human and F. Cech

Fig. 2 Asymmetric choices in Google’s consent banner and forms

Alliance7 —which provide cookie-based opt-out settings that should span multiple third-party websites and data collectors. While consent intermediaries promise some potential as alternatives to the privacy self-management model, Lehtiniemi et al. point out that they do not represent a solution to the underlying “insuperable problems” that stem from an individual-centric approach to privacy negotiation [38, p. 10]. Asymmetry describes a user interface design that “[...] impose[s] unequal weights or burdens on the available choices [...]” [45, p. 2], in most cases by subtle means, such as the size or placement of a preferred interaction (e.g., an “Accept” button). It is worth noting that the default settings of all five web pages, in most cases, lie strictly in the interests of the data collectors, which assume end-users consent to any and all types of data collection and processing. In these data-controller-centric designs, the users are often expected to actively opt-out. With regards to specific user interface measures utilizing asymmetrical patterns, Facebook prompts a large “Accept” button for its terms of service agreement next to a comparatively small “See my options” link upon logging in. Google’s interface design, at first, seems to try to nudge the user to “Review Now” their pop-up titled “A privacy reminder from Google”, with the only other option being “Remind me later”; in the following review window, the “Accept now” button is prominently placed next to a more subdued “Other Options” button (see Fig. 2). Both examples illustrate a strategy of presenting two interaction choices in a graphical way that suggest user a binary, either-or decision (I consent” vs. “I do not consent”), which in reality represent separate, non-binary functionalities (such as “consenting” vs. “exploring other options”). This strategy exploits a user’s cognitive bias by violating expectations of regularity [25] and increases the cognitive strain to accomplish the task of withdrawing consent. Covertness patterns aim at hiding the effects of a user’s choice in order to steer them in a certain direction. Not all consent form interfaces employ this strategy per se, but they may exploit similar effects that can be subsumed under “covertness” strategies. Amazon, for instance, allows users to opt-out of “interest-based ads”, but informs them in the small print underneath that “[e]ven if you choose not to see interest-based ads, you may still see personalized product recommendations and other similar features unless you’ve adjusted Your Recommendations in your Account 7 http://www.youronlinechoices.eu/.

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

151

Settings or Your Browsing History” [1]. The link to “Account Settings” leads the user to a page listing their recent purchases. On that page, the user can choose manually (and one by one) to exclude their purchases from being employed for further recommendations, but no “reject all” button is provided. “Your browsing history”, on the other hand, presents a similar list that shows recently viewed items and does provide an option for removing all items from the history, and turn off the browsing history tracking altogether, though both options are hidden behind a “Manage history” button. The multitude of different interaction methods complicates the cognitive process of comprehending the available choices and their effects. Also, it can be subsumed under covertness strategies. Similarly, Microsoft’s opt-out page for personalized ads only mentions data utilization for personalization, but not data collection itself [2]. Only through the long-form help pages on “How to access and control your personal data” [3] the user is informed that opting out of ad personalization does not stop the data collection, and that the execution of their rights to data sovereignty as mandated by the GDPR will require contacting Microsoft via email or another form (but it does not allow an automated opt-out). While some of the covertness strategies of complicating the process exemplified above represent a more subtle way of dark pattern design, a number of interface design choices are obviously misleading and leave the impression of purposeful deceptiveness. Google’s consent pop-up, for instance, presents the following “tip”: “When you sign in with your Google Account, you can control what’s saved to your account and manage past searches.” The language used suggests that signing in with an account might actually increase one’s control over the data collected, while omitting the fact that signing in also increases both the amount of (meta-)data collected and its value within the culture of surveillance capitalism that ad-revenue dependent companies like Google thrive in [40]. Another example is Microsoft’s “Privacy Statement” [3], which users reach by clicking on a cookie consent banner presented at microsoft.com. This link leads to the section for “Cookies or similar technologies”, which in general terms explains what cookies are and how they are used by Microsoft. The paragraph concludes: You have a variety of tools to control the data collected by cookies, web beacons, and similar technologies. For example, you can use controls in your internet browser to limit how the websites you visit are able to use cookies and to withdraw your consent by clearing or blocking cookies.

Notably, this does not mention any opt-out functionalities to withdraw consent. Underneath the paragraph, a small “Learn more” link opens a much more detailed long-form version of this short paragraph, which includes a reference to “Interestbased advertising”, which leads the user to yet another summary on the same page titled “How to access and control your personal data”. Here, finally, the user can find a link to the general opt-out page for targeted advertisements as well as a link to the Microsoft privacy dashboard for users with a Microsoft account. Even here, after disclosing the link to the opt-out page, the corresponding “Learn more” link leads to a much longer list of Microsoft products collecting and utilizing personal data (such as Microsoft Cortana, Skype, or the Microsoft Store) and the respective

152

S. Human and F. Cech

links to control privacy settings and data collection. The numerous redirecting links and hidden pages suggest that the goal of this page is not to empower the users to control their (implicitly assumed) consent, but to keep all but the most persistent users from finding and executing their right to withdraw consent for data collection and utilization. A dark pattern that is employed by all GAFAM companies is Hidden Information. Functionalities that allow users the opt-out from data collection practices are often obscured through lengthy texts, multiple pages, synonymous descriptions, and terms or they are visually hidden and require specific interaction to be found. Microsoft’s “Privacy Statement” and its hidden paragraphs and Amazon’s distribution of privacy controls over 4 different pages are just two examples of overly convoluted information architecture and design. Vague language on which personal data are collected and which personal data are required to provide the services, and a reluctance to clarify the difference between data collection and utilization (in the case of targeted ads, for instance) can be found consistently throughout the privacy and consent mechanisms of all five company ecosystems. One specific example of particular obfuscation is Apple’s privacy information page: a lengthy text explains Apple’s use of targeted ads in its Apple News and App Store products and informs the user of “Limit Ad-Tracking” in order to opt-out of this feature. There is no direct link to any settings page that allows this. The text also does not mention that this setting is only available on Apple devices like iPhone, iPad, or Apple TV. It means that the users themselves need to find the instructions on Apple’s support pages. Finally, Restrictive Design plays a role for all five companies. A tendency to initially push the responsibility for opt-out of tracking, data collection, and targeted advertising toward the user by suggesting browser-based measures to block cookies as a first solution can be observed in all cases. The fact that this inevitably leads to restrictions not only in the use of the webpage in question, but any webpage that requires cookies to function, can be seen as leverage against this global opt-out. Three of the five companies (Amazon, Apple, and Facebook) explicitly mention that blocking cookies would prevent the user from using certain parts of their sites, but Apple and Facebook remain vague about the exact nature of these restrictions. Only Amazon clarifies: [...] if you block or otherwise reject our cookies, you will not be able to add items to your Shopping Basket, proceed to Checkout, or use any Amazon Services that require you to Sign in

The combination of primarily suggesting that users who are privacy conscious should block all cookies, and the reluctance to provide clear instructions and finegrained control over which type and use of cookies the user consents to, results in an underlying theme of coercing the user towards giving consent to all data use.

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

153

5 Discussion 5.1 Dark Patterns and the Assault on the Cognitive Dimension of Consenting Considering the empirical results, little evidence has surfaced that suggest the GAFAM web pages were designed with a human-centric perspective to empower users to give their informed consent. On the contrary, the nature of the techniques employed suggests that empowering users is not the main focus of these consent mechanisms. Patterns of coercion that nudge the user toward consenting, strategies of information hiding, covert and confusing interface behavior have been shown to exploit human cognitive weaknesses more than supporting the complex process of consenting to a multitude of data collection and utilization [10, 14]. Current research literature (e.g., [5, 32]) from the cognitive sciences supports the theory that the highlevel patterns we observed have an adverse effect on the cognitive efforts involved in enacting consent through the mechanisms studied, as detailed in Sect. 3.1.

5.2 The Missing Aspects: Collectiveness and Contextuality Current approaches to privacy management presume that users are informed and capable of deciding these matters individually. As we’ve discussed, this perspective ignores that many users find privacy policies complex and the fact that the decisions of other people influence individual users. Empowering end-users should not only aim at overcoming the mechanisms that actively discourage the end-users from the withdrawal of their consents or push them to choose specific types of consent. A human-centered approach should also regard the collective dimension of consenting—an aspect for which the existing solutions provide no support at all. For instance, the difficulties of grasping the complex technical consequences of consenting to targeted ads might be alleviated by presenting the users with expert opinions on what their potential choice entails, or by providing an overview of choices by comparable peers. Friends, family members, peers, trusted enterprises, or even trusted AI systems can support users in managing their online consent and tracking decisions. Furthermore, the current model of individual consent stands in stark contrast with the very broad, all-encompassing options presented to the user—in some cases even simply blocking any and all cookies, which often results in the exclusion from a variety of services implemented to be dependent on the use of cookies. Such “one-size-fits-all” approaches to consenting or objecting, ignore the heterogeneous contexts and multiplicity of the human experience, and represent another coercive strategy to encourage the user to engage in the bargain of consenting to everything to get something. A human-centric approach to consent would thus imply providing users with more fine-grained controls: this approach would not only include the

154

S. Human and F. Cech

question of what the user consents to share, but also the time, location or other contextual information. As mentioned in Sect. 2.1 for instance, a user might be willing to share certain information while at home, but withdraw that consent while working on sensitive information at their office. Similarly, users might generally consent to the use of targeted ads, but want an emergency opt-out mechanism in case of a personal crisis. Considering some of these contextual dimensions could alleviate the strain that the implicit, “always on” consent approach puts on members of marginalized populations. Finally, although some of the consent-obtaining systems are quite elaborate in terms of interaction and visual design, none of the surveyed systems utilized any other type of informing users but lengthy textual descriptions of their data and privacy practices. Since the legal framework provided by the GDPR does not specify the nature of how users should be informed, one answer to the heterogeneous needs of a diverse user base should be the presentation of information relevant to the consenting actions in different forms. For instance, animations, video explanations, interactive visualizations, or negotiation-based interactive approaches (combined with personal privacy management systems) can be used to reach a diverse audience and could address the contextual dimension of enacting consent on a more individual level. Moreover, pluralist approaches to knowledge representation (see e.g., [27, 28]) can be used to represent privacy-related information to end-users based on their diverse needs.

5.3 Fairness, Accountability, and Transparency of GAFAM Consent-Obtaining Mechanisms Evaluating these current approaches to consenting shows obvious shortcomings with regards to transparency. On the one hand, the use of patterns like covertness or hidden information directly contradicts standards of transparency and disclosure regarding the data utilization practices employed by the data collectors. On the other hand, the common practice of providing extensive texts to explain data use and the company’s privacy policy in the name of transparency leads to little more than burying few relevant pieces of information, for instance, instructions on how to withdraw consent or adapt it. Referring to the issue of the language used in explaining the practices and available options to the users, as Kemper [34] argues, transparency alone is not enough to provide a greater accountability of such systems: without a critical audience, accountability remains an “empty signifier”. But even with an engaged critical audience, it is highly questionable that this kind of transparency automatically leads to a greater amount of accountability of such systems either: these strategies of over-disclosing would qualify as “opaque transparency”, according to Fox [21]. Another dimension of transparency in the consent-obtaining mechanisms concerns user feedback on the choices made. In this case, transparency is a necessary precondition for accountability: if there are no ways for the users to see concrete

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

155

evidence of the effect of their choices, the resulting system can hardly be described as accountable. Given the increasingly subtle mechanisms of targeted advertising and the fact that, even after an opt-out, ads might still be tailored to the users based on intermediate activities without data collection or processing that would fall under the regulations in the GDPR [20], it might be increasingly difficult for a user to tell the difference their choice made. Illustrative examples and more transparent traceability of data collection related activities—similar to the “Why am I seeing this ad?” utilized by Facebook—might provide illumination in these cases. As discussed in Sect. 2.1, any implementation of privacy and consent-obtaining mechanisms carries the danger of affecting members of marginalized groups negatively in disproportional numbers if the implementation does not consider the context of the user. The current examples seem woefully lacking in this regard, as explained above. As long as the consent-obtaining mechanisms and their implementations are framed within the perspective and goals of the data controllers (i.e., the company providing services in exchange for data collection), fairness in privacy will not improve, but may be actively compromised. Finally, given that obtaining consent is just one of the various legal bases for a lawful practice of data processing as outlined by article 6 of the GDPR [20], it stands to reason that—should consent be used to justify data processing over other options—companies should be held accountable for the way they elicit that consent.

5.4 Lawfulness of Non-human-centric Consent-Obtaining Mechanisms? While this paper does not aim at proposing any legal claim, a basic reflection on the potential legal implications of our human-centric framework can be helpful for other researchers. According to the GDPR (in particular Art. 6 [20]), end-user consents, if obtained, need to be valid, freely given, specific, informed and active. Considering our evaluation of cognitive dimensions of consenting, the justification of the obtained consents, as practiced by GAFAM, as valid, freely given, specific, informed and active seems very difficult from a human-centric perspective. As a result, one can raise doubt regarding the lawfulness of these practices from a human-centric perspective. Moreover, given the fact that based on the GDPR, the data processing purposes and the privacy policies of data controllers must be understandable for the end-users— in case of obtaining their consents at least—one can question the willingness and attempts of GAFAM to develop understandable consent-obtaining mechanisms. Furthermore, since the GDPR allows complementary approaches such as visualizations, an open question would be why these data controllers do not use other means such as videos, animations, or interactive media in their consent-obtaining mechanisms.

156

S. Human and F. Cech

6 Limitations & Future Work The basic human-centric framework presented in this paper provides many potential research directions that we aim at following in the next steps. Given the state of the art in cognitive science and in particular, the predictive processing [32] account of cognition, we aim at conducting empirical experiments on the cognitive aspects of consenting in different social groups. Moreover, we aim at developing a simple collective consent-management prototype and evaluate how this could support marginalized people to deal with their privacy on the Internet. Regarding our GAFAM evaluation, the study is limited by both its scope and methodology; a further empirical and user-centric evaluation to verify the specific impact of the observed patterns on cognitive load (and on the other aspects of human cognitive systems), including the collection of quantitative evidence such as site structure, or tree-analysis of the necessary steps as well as timed scenarios could provide further proof for the lack of human-centricity in the current implementations. Additionally, the study of region-specific or social group-specific differences as well as a separate evaluation of consent mechanism behavior on mobile devices was outside the possible scope of this study as well, which could yield further insight into the landscape of consent-obtaining mechanisms at large. Nevertheless, the results of the current study are sufficient to be used to design and conduct a user-based evaluation study of the target consent-obtaining (and consentmanagement) mechanisms. Moreover, we aim at conducting a set of qualitative and user-based evaluations on consent-obtaining intermediaries (cookies consent forms) and comparing the existing solutions from a human-centric perspective. Furthermore, we hope to be able to conduct more in-depth research on the potential legal consequences of our proposed human-centric approach. Finally, we propose that more research on the socioeconomic aspects of consent-obtaining mechanisms, such as the business models of the involved enterprises as well as the consequences of the current practices are needed.

7 Conclusion In this research, we proposed a shift toward human-centric end-user empowerment regarding obtaining digital consent. We argued that based on the recent advancements in cognitive science, consenting needs to be considered a sociocognitive action. Such action includes dynamically interacting cognitive, collective, and contextual dimensions that should be taken into account in the design and implementation of consentobtaining mechanisms and consent-management systems. Based on the developed framework, we evaluated the mechanisms of consent-obtaining in GAFAM’s main websites. Our results show that the collective and contextual aspects are almost completely ignored in their design. Moreover, we showed that human-centric cognitive dimensions of consenting are not only used in a positive manner—which could lead

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

157

to end-user empowerment—but also have been suppressed by implicit dark patterns that need to be avoided. While we are aware that further research is needed on how the developed framework can be used to design and implement new mechanisms (not just the evaluation of the existing ones), we still think that the developed basic framework can provide a useful evaluative approach for consent-obtaining mechanisms. Finally, we propose that while many of the legal frameworks (such as the GDPR) follow a very individual-centric approach toward consenting, considering different aspects of collective and contextual dimensions of consenting in design and implementation of consent-obtaining mechanisms (and consent-management systems) can highly contribute to human empowerment in the digital era. Acknowledgments This work is partially funded through the EXPEDiTE project (Grant 867559) by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology under the program “ICT of the Future” between September 2018 and February 2020. We would like to express our great appreciation for valuable criticism and ideas contributed by Gustaf Neumann, Seyedeh Anahit Kazzazi, Seyedeh Mandan Kazzazi, Stefano Rossetti, Kemal Ozan Aybar, Rita Gsenger, and Niklas Kirchner.

References 1. Amazon.de: Advertising Preferences. https://www.amazon.de/adprefs?ref_=ya_d_l_advert_ prefs (2019). Accessed 22 Aug 2019 2. Microsoft account | Privacy. https://account.microsoft.com/privacy/ad-settings/ (2019). Accessed 23 Aug 2019 3. Microsoft Privacy Statement—Microsoft privacy. https://privacy.microsoft.com/en-us/ privacystatement (2019). Accessed 23 Aug 2019 4. Acquisti, A.: Privacy in electronic commerce and the economics of immediate gratification. In: Proceedings of the 5th ACM Conference on Electronic Commerce, pp. 21–29. ACM (2004) 5. Allen, M., Friston, K.J.: From cognitivism to autopoiesis: towards a computational framework for the embodied mind. Synthese 195(6), 2459–2482 (2018) 6. Alt, R., Human, S., Neumann, G.: End-user empowerment in the digital age. In: Proceedings of the 53rd Hawaii International Conference on System Sciences, pp. 4099–4101 (2020) 7. Anderson, R.J.: Representations and requirements—the value of ethnography in system design. Hum.-Comput. Interact. 9(2), 151–182 (1994). https://doi.org/10.1207/s15327051hci0902_1 8. Aybar, K.O., Human, S., Gesenger, R.: Digital inequality: call for sociotechnical privacy management approaches. Workshop on Engineering Accountable Information Systems. European Conference on Information Systems—ECIS 2019 (2019) 9. Blackmon, M.H., Polson, P.G., Kitajima, M., Lewis, C.H.: Cognitive walkthrough for the web. CHI p. 463 (2002). https://doi.org/10.1145/503376.503459 10. Bösch, C., Erb, B., Kargl, F., Kopp, H., Pfattheicher, S.: Tales from the dark side: privacy dark strategies and privacy dark patterns. Proc. Priv. Enhanc. Technol. 2016(4), 237–254 (2016). https://doi.org/10.1515/popets-2016-0038 11. Boyd, D.: It’s Complicated: The Social Lives of Networked Teens. Yale University Press (2014) 12. Boyd, D., Marwick, A.: Social privacy in networked publics: teens attitudes, practices, and strategies. In: Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. Oxford, UK (2011)

158

S. Human and F. Cech

13. Busch, A.: Privacy, technology, and regulation: why one size is unlikely to fit all. Social Dimensions of Privacy: Interdisciplinary Perspectives, pp. 303–323. Cambridge University Press, Cambridge (2015) 14. Chromik, M., Eiband, M., Völkel, S.T., Buschek, D.: Dark Patterns of Explainability, Transparency, and User Control for Intelligent Systems. IUI Workshops, vol. 2327 (2019) 15. Clark, A.: Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36(3), 181–204 (2013) 16. Commission, E.: Special Eurobarometer 431: Data Protection (2015) 17. Das, S., Kramer, A.D., Dabbish, L.A., Hong, J.I.: The role of social influence in security feature adoption. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1416–1426. ACM (2015) 18. Dourish, P.: Implications for design. In: The SIGCHI Conference, p. 541. ACM Press, New York, NY, USA (2006). https://doi.org/10.1145/1124772.1124855 19. Emami Naeini, P., Degeling, M., Bauer, L., Chow, R., Cranor, L.F., Haghighat, M.R., Patterson, H.: The influence of friends and experts on privacy decision making in iot scenarios. In: Proceedings of the ACM on Human-Computer Interaction, vol. 2(CSCW), p. 48 (2018) 20. EU: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing. Off. J. European Union 1–88 (2016) 21. Fox, J.: The uncertain relationship between transparency and accountability. Dev. Pract. 17(4– 5), 663–671 (2007). https://doi.org/10.1080/09614520701469955 22. Granovetter, M.: Economic action and social structure: the problem of embeddedness. Am. J. Sociol. 91(3), 481–510 (1985) 23. Gray, C.M., Kou, Y., Battles, B., Hoggatt, J., Toombs, A.L.: The dark (patterns) side of UX design. In: Proceedings of the Conference on Human Factors in Computing Systems. Purdue University, West Lafayette, United States (2018). https://doi.org/10.1145/3173574.3174108 24. Greenberg, S., Boring, S., Vermeulen, J., Dostal, J.: Dark patterns in proxemic interactions—a critical perspective. In: Proceedings of the Conference on Designing Interactive Systems, pp. 523–532 (2014). https://doi.org/10.1145/2598510.2598541 25. Huber, J., Payne, J.W., Puto, C.: Adding asymmetrically dominated alternatives: violations of regularity and the similarity hypothesis. J. Consum. Res. 9(1), 90–98 (1982) 26. Human, S., Bidabadi, G., Peschl, M.F., Savenkov, V.: An enactive theory of need satisfaction. In: Müller, V.C. (ed.) Philosophy and Theory of Artificial Intelligence 2017, pp. 40–42. Springer International Publishing, Cham (2018) 27. Human, S., Bidabadi, G., Savenkov, V.: Supporting pluralism by artificial intelligence: conceptualizing epistemic disagreements as digital artifacts. In: Müller, V.C. (ed.) Philosophy and Theory of Artificial Intelligence 2017. Springer, Cham (2018) 28. Human, S., Fahrenbach, F., Kragulj, F., Savenkov, V.: Ontology for representing human needs. In: Ró˙zewski, P., Lange, C. (eds.) Knowledge Engineering and Semantic Web, pp. 195–210. Springer International Publishing, Cham (2017) 29. Human, S., Gsenger, R., Neumann, G.: End-user empowerment: an interdisciplinary perspective. Hawaii Int. Conf. Syst. Sci. 2020, 4102–4111 (2020) 30. Human, S., Neumann, G., Peschl, M.: [how] can pluralist approaches to computational cognitive modeling of human needs and values save our democracies? Intellectica 70, 165–180 (2019) 31. Human, S., Wagner, B.: Is informed consent enough? considering predictive approaches to privacy. In: CHI2018 Workshop on Exploring Individual Differences in Privacy. Montréal (2018) 32. Hutto, D.D.: Surfing uncertainty: prediction, action and the embodied mind, by andy clark, pp. xviii+ 401,£ 19.99 (hardback), 2016. Oxford University Press, New york (2018) 33. Kain, E.: Facebook Turned a Blind Eye to ‘Friendly Fraud’ as Kids Racked up Thousands on Games. Forbes (2019) 34. Kemper, J., Kolkman, D.: Transparent to whom? No algorithmic accountability without a critical audience. Inf. Commun. Soc. 1–16 (2018). https://doi.org/10.1080/1369118X.2018. 1477967

A Human-Centric Perspective on Digital Consenting: The Case of GAFAM

159

35. Kirchner, N., Human, S., Neumann, G.: Context-sensitivity of informed consent: the emergence of genetic data markets. Workshop on Engineering Accountable Information Systems. European Conference on Information Systems—ECIS 2019 (2019) 36. Kumaraguru, P., Cranor, L.F.: Priv. Indexes : A Surv. Westin’s Stud. (2005). https://doi.org/10. 1184/R1/6625406.v1 37. Lehtiniemi, T., Kortesniemi, Y.: Can the obstacles to privacy self-management be overcome? Exploring the consent intermediary approach. Big Data Soc. 4(2), 2053951717721,935 (2017) 38. Lehtiniemi, T., Kortesniemi, Y.: Can the obstacles to privacy self-management be overcome? Exploring the consent intermediary approach. Big Data & Soc. 4(2), 205395171772,193 (2017). https://doi.org/10.1177/2053951717721935 39. Lupton, D.: Personal data practices in the age of lively data. Digit. Sociol. 335–50 (2016) 40. Lyon, D.: Surveillance Capitalism, Surveillance Culture and Data Politics, pp. 1–15 (2018) 41. Madden, M.: Privacy, security, and digital inequality: how technology experiences and resources vary by socioeconomic status, race, and ethnicity. Data Soc. (2017) 42. Marwick, A., Fontaine, C., Boyd, D.: Nobody sees it, nobody gets mad: social media, privacy, and personal responsibility among low-ses youth. Soc. Media+ Soc. 3(2), 2056305117710,455 (2017) 43. Marwick, A.E., Boyd, D.: Networked privacy: how teenagers negotiate context in social media. New Media Soc. 16(7), 1051–1067 (2014) 44. Marwick, A.E., Boyd, D.: Privacy at the margins understanding privacy at the marginsintroduction. Int. J. Commun. 12, 9 (2018) 45. Mathur, A., Acar, G., Friedman, M., Lucherini, E., Mayer, J., Chetty, M., Narayanan, A.: Dark patterns at scale—findings from a crawl of 11K shopping websites. CoRR 1907 (2019). arXiv:1907.07,032 46. Obar, J.A., Oeldorf-Hirsch, A.: The biggest lie on the internet: ignoring the privacy policies and terms of service policies of social networking services. Inf. Commun. Soc. (2018) 47. Rudolph, M., Feth, D., Polst, S.: Why users ignore privacy policies–a survey and intention model for explaining user privacy behavior. In: International Conference on Human-Computer Interaction, pp. 587–598. Springer (2018) 48. Tene, O., Polonetsky, J.: A theory of creepy: technology, privacy and shifting social norms. Yale J. Law Technol. 16, 59 (2013) 49. Tidwell, J.: Designing interfaces. Patterns for Effective Interaction Design. O’Reilly Media, Inc. (2005) 50. Van Dijck, J., Poell, T., De Waal, M.: The Platform Society: Public Values in a Connective World. Oxford University Press (2018) 51. Varela, F.J., Thompson, E., Rosch, E.: The Embodied Mind: Cognitive Science and Human Experience. MIT Press (2017) 52. Zuboff, S.: The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. Profile Books (2019)

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions Andreas Speck, Melanie Windrich, Elke Pulvermüller, Dennis Ziegenhagen, and Timo Wilgen

Abstract Besides many initiatives for making manufacturing and production more intelligent, the processes in the industrial productions are still mostly traditional. The paper presents a scenario providing prerequisites for applying intelligent solutions. The lean universal control concept presented in the paper supports almost any kind of automation devices such as robot arms, PLC, and numerical control machine tools. Due to its open architecture, the universal control system serves as a base of intelligent solutions supporting the production. A graphical workflow notation for modeling the control programs is combined with automated checking which helps the human developers identifying and preventing rule violations already in the design process of the control application programs. In a further step, the control programs can first be tested with simulated devices. The visualization blends simulated and real devices controlled by the control application programs. This fosters the human users to monitor and validate the behavior of the devices. It supports a save commissioning of real devices.

The work is funded by the Interreg 5a Programm, Deutschland—Danmark. A. Speck (B) · M. Windrich · D. Ziegenhagen · T. Wilgen Christian-Albrechts-University of Kiel, 24118 Kiel, Germany e-mail: [email protected] M. Windrich e-mail: [email protected] D. Ziegenhagen e-mail: [email protected]; [email protected] T. Wilgen e-mail: [email protected] E. Pulvermüller · D. Ziegenhagen University of Osnabrück, 49090 Osnabrück, Germany e-mail: [email protected]

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_13

161

162

A. Speck et al.

1 Introduction Manufacturing and production are a crucial part of our economy. Digitalization and initiatives like Industry 4.0 intend to make these branches more intelligent. New technologies like robots as co-workers (cobots) emerge as support of humans. Intelligent production control systems optimize production processes. This looks all very promising. However, when having a look in the production of local SMEs and after a survey conducted by our regional business development institute [5] the reality is much different: production and manufacturing are still in a traditional way, humans are driving and controlling the processes, the level of automation is mediocre at best, efforts to introduce a higher degree of automation, e.g., by putting more industrial robots in service turns out to be quite tough and the lack of personnel is considered as a highly critical problem. Starting from this background the question is how to provide a higher level of automation in these industries. How to introduce additional support for deployment of intelligent systems? The focus is not introducing AI directly, but providing means for introducing intelligent systems in a further step. We focus on production techniques that are prototypical for the current structure of production, are used in challenging but typical high mix low volume production, and are open for new concepts including the application of AI techniques. Our prototypical production scenario is 3D printing on an industrial level. These 3D printers are establishing themselves in their niche. The high flexibility compensates their disadvantage of low production capacity and speed. Nevertheless, 3D printers share features of the typical concepts of machine tools in control. For example, 3D printers accept G-code which is the characteristic input of numerical control machines (NC). Examples for the successful usage, e.g., of 3D printing systems are in the manufacturing of highly individual medical products, individual add-ons for devices like personalized handles adapted to the person using the device, spare parts, and many further. It may be expected that this new kind of production will establish itself beneath the traditional methods. Our starting point is a universal control architecture as base for human-centered intelligent support: simulation of the device behavior on the real control system, visualization of simulated and real monitored devices, and a workflow-based control program modeling including rule checking and in future a recommendation support system.

1.1 Current State In traditional architectures, usually each device has its own individual control, a robot arm has its robot control, a machine-tool has its numerical control, and material flow or transfer systems use PLCs (programmable logic controllers) as their control. Since in most cases different devices are collaborating for production or manufacturing

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions

163

tasks these controls may be either connected by direct signaling, e.g., a digital signal on a single cable connection, or by bus, e.g., fieldbus or LAN. For the programming, each of the control systems usually has its own programming concepts. Since for many control systems, there are different programming languages or models available only a very limited number of systems may be programmed in the same language. From the hardware point of view, such systems are very modular in their possible combinations. For example, different types of robot arms can supply different machine tools and are supported by different transfer systems. If a device needs to be replaced, the effort is marginal. The usage of the same standardized interfaces, e.g., by digital I/O devices connected via standardized Fieldbus interfaces increases the compatibility even further. The programming and in particular the synchronization of the devices, however, is less simple. When there are different possibilities for realizing the synchronization hardware (signal cable, bus systems, etc.) and numerous programming concepts supporting synchronization the effort is growing even higher. Furthermore, programming each device in its own specific programming paradigm makes matters worse. An automation engineer needs to be experienced in all of these programming languages and methods.

1.2 Production Scenario with 3D Printer The production scenario we have chosen is typical of industrial 3D printing. Since the printing speed is limited, printing should ideally take place automatically overnight. The results of the printing process should be cooled and automatically removed from the printer chamber. The latter is done by a robotic arm. However, this is not an expensive industrial robot, but an inexpensive arm which is produced in 3D printing. This arm does not have the payload of typical small industrial robot arms. Though, due to the low weight of the printed workpieces, this is not a problem. It is important that the device for removing the printed parts is significantly cheaper than the 3D printer itself (c.f. Fig. 3).

2 Lean Universal Control Architecture as Base In contrast to the existing control systems, the universal control is open for any kind of support and improvement by intelligent production control systems [4]. For example, new scanning systems may be integrated as the base of an intelligent control program improvement, e.g., in the case of milling. Moreover, it may provide data for any kind of operation. Such data may be the base of further improvement by AI. Due to the ability of executing different control programs in parallel on the same control hardware aggregated data with correct time stamps and real temporal correlations may be extracted directly. The control processes, respectively, the behavior of the

164

A. Speck et al.

controlled devices may be simulated in a realistic way. They may be improved in an intelligent way on the actual control platform while the same control platform also controls real devices. As depicted on the left side of Fig. 1 the abstract architecture splits into different layers. In general, the components on each layer may be exchanged freely and furthermore, new components may be added. The layers of the universal control architecture are the following: • Control application layer: These are the control programs implemented by the user. A control program may control more than one device and may, therefore, invoke different components of the control type layer representing the control of these devices. • Control type layer and device specific layer: Both layers implement the control functionality for the devices. However, these are in two separate layers: – The control type layer consists of components implementing the abstract control algorithms, e.g., the open loop control of a robot arm or machine tool or 3D printer. These functions may be the trajectory planning or the control of machine parameters like heating of the printer head. Another example are the control functions of the software PLC. The control type layer components make use of the components of the device specific layer. – These components in the device specific layer address the devices. Hence, these components need to be parameterized in order to control their devices in a correct manner. For example, a concrete drive control component needs the know the number of increments of a maximal revolution, maximal and minimal speed as well as the limits of acceleration and deceleration of the real drive. An I/O component needs to know the number of ports of the I/O device and their real addresses. The arrangement in these two layers offers a high degree of changeability: If new drives are mounted on a robot arm, only the drive control components need to be replaced or repaired. If drives are reused in a new device, the drive control components can also be reused. • Coordination layer: This layer provides the means for an internal communication and coordination of all processes. This communication and the scheduling of the processes are directed by the control loop. The control loop coordinates the control application processes and the processes in the device layer. • Device layer: This layer consists of components that are needed to connect the control hardware to the devices, e.g., a direct connection to frequency converters of the drives or the I/O devices or a connection via a bus system like a Fieldbus or Ethernet. Furthermore, device simulation components simulate the behavior of the controlled devices. For example, a simulation component for a drive delivers the simulated current position of the drive which changes when the simulated drive is running.

Fig. 1 Conceptual architecture and realization in two control processes

(a) Layered architecture model for universal con(b) Processes of a real universal control system trol systems

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions 165

166

A. Speck et al.

The figure on the right side of Fig. 1 shows a real example of a universal control system (c.f. Sect. 1.2). Two control application processes are executed: robot arm control process and 3D printer control process. Both processes consist of control application programs and the characteristic control components. The robot arm control process has the user developed control application program robot arm control program, which uses functions of the 5 axis robot arm specific open-loop control. This component uses the specialized servo drive component. The 3D printer control process contains the user developed control application program 3D printer control program, which uses functions of the 3D printer, 3 linear axis G-Code execution. This component uses a different specialized servo drive component which is adapted to the drives of the 3D printer. For their coordination both control application programs use the component SoftPLC coordination, which itself uses the component digital I/O. In this case, this digital I/O is used only for internal synchronization. This means that the signals are uses for the synchronization of the two control application processes only. Both control application processes communicate by the internal communication components with the control loop process. The control loop process controls the real devices with the help of the device connection process. The simulated behavior of the devices is provided by the simulation process.

3 Visualization for Simulation and Monitoring The visualization is a core element of the human-centered intelligent support in monitoring real devices as well as the representation of simulated devices or blending reality and simulation (provided by the lean universal control, c.f. Fig. 1). The modular architecture of the universal control systems is the base of a modular visualization framework, which makes the generation of the graphical presentation lean. In our approach, the CAD data which are created during the device design are reused for the visualization (c.f. Fig. 2). The CAD data of the different device elements have to be transformed into an intermediate format (in our case STL which is also used for 3D printing). These STL documents of the devices plus the geometry data of the arrangement of joints/links and their potential elongations are the base of the visualization. The STL documents are then presented as 3D meshes by the JavaFX 3D engine. Further effort for redrawing in a visualization of specific parts is not required. This concept of a low-cost visualization may compete in the presentation quality with other commercial systems. Furthermore, it supports the visualization of specific parts in a smart and cost-efficient way. Since it supports monitoring as well as simulation of almost any kind of devices it also supports intelligent improvements of production processes of almost arbitrary devices. The screenshot in Fig. 3 shows the robot arm and parts of the 3D printer in our production scenario as described in Sect. 1.2. Both devices are not complete but show

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions

167

Fig. 2 Use of CAD Model as mesh in our visualization

Fig. 3 Visualization of real universal control system controlling a robot arm and a 3D printer (the visualization is not completed yet)

a certain status in the hardware development process. Many parts of the 3D printer are realized by printing. The red parts are from a standard design. The yellow parts are improvements driven by the experience gained by some extensive trials of the printing system. With the visualization newly designed parts can be tested by the simulation and after the parts are printed the simulation be compared with the real functionality of the parts.

168

A. Speck et al.

4 Programming and Validation of the Control System A further element of human interaction with the control system is the control application program. Base for our approach is the formal representation of the program as the workflow model. Early approaches applying formal workflow models in order to describe processes in industrial automation date back to the beginning of this engineering discipline. For example, Herrscher [3] demonstrated in his theses how to model automation systems with statecharts.

4.1 Workflow-Based Control Programming The control program model we use in this paper (an example is depicted on the left side in Fig. 4 is similar to the SFC model (sequential function chart) which is used in the PLC programming. For example, almost all large PLC vendors like Allan Bradley PLCs, Siemens S7 PLC line, or Mitsubishi MELSEC support SFC programming. Besides property, vendor-specific robot programming languages, there are further approaches to state-based robot programming languages. [4] presents an SFC-based robot programming. Furthermore, there are also approaches using program models for an integrated programming of different devices such as PLC and robot arm like described in [6]. Our state-based programming is based on this background: the language has similar semantics like SFC (besides some minor differences in graphical representation) and enriched by positioning and trajectory elements as required for robot or NC programming. The core elements of our programming model are the following: • Step (colored green): describes the functionality to be executed, • Transition (colored magenta): the condition for transiting to the next step, • Position (colored yellow): contains the position data of a drive or a set of drives of, e.g., a robot arm (target position), • I/O data (colored gray): input and output data of an I/O device; used for PLC functionality, • Besides the control flow there are – parallel operations executed as concurrent activities represented by the logical AND, – choices between two or more alternatives which are represented by the logical XOR. In general, this proposed notation is quite similar to the sequential function charts (SFC). However, the symbols used are a little different and colored. Furthermore, we added positions in order to support the modeling of movements of devices like robot arms.

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions

169

Fig. 4 Validation of the graphical control program

Examples for activities in steps are the motion of a robot arm, e.g., PTPmove(), which acts as a point-to-point motion of a robot arm to a specific target position. PLC logic operations need input and output data to be processed by Boolean operations. When executed the control application program of Fig. 4 may run in a single process or the parallel paths may be split in different parallel processes. The right side of Fig. 1 shows a split into two parallel processes.

4.2 Checking of Control Application Programs An automated (model) checking validates the control applications. This validation system is embedded in the editor and is able to check Boolean logic as well as temporal logic rules. The temporal logic is expressed in Graphical Computation Tree Logic (G-CTL) [1] (based on Computational Tree Logic, CTL). In general, the temporal logic allows to formulate rules or specifications denouncing in which sequential order of elements in a workflow model should occur. The rules comprise for example that s.th. has to appear in the successive path or next to a specific event. Another example is that a certain property holds exactly until another defined property gets true. Furthermore, these rules may have to become true in at least one case or on all paths.

170

A. Speck et al.

In right window of the screenshot in Fig. 4 an example for a typical rule (gripper rule) is depicted. The rule is modeled in a (graphical) G-CTL style using the control application program models and combining it with the temporal operators of G-CTL. In the case actually, this rule expresses an error that should not occur in a control application program: In case the gripper of a robot grabs a piece not successfully that the robot arm just goes on without retrying the gripping the piece. An effect which of course should not occur. The check results may be: In case an error is detected (which actually is the result, shown in the model by red highlighting of the elements in the error path, as depicted in the left window of Fig. 4) the program is correct. In the other case, the desired behavior of rerunning the gripping is not part of the control application program and needs to be added.

5 Related Work Robotics in general is and has been the subject of AI research for a long period. Reference [8] presents a compendium of concepts. The problem of these approaches is the effort to apply them in an industrial context. The approaches to combining different control systems in one system are promising. NC control functionality is quite similar to robot control functions. For example, [9] present an approach to integrate an NC kernel in a robot control. Today already first commercial solutions are available. The Beckhoff industrial PC combines the control of I/Os as well as drives. The Universal Robots control is also an industrial PC based and provides an interface for the execution of self-developed control routines. Some approaches generate the robot arm trajectories from CAD models like presented in [7]. Such transformations are supported by the simulation of the robot movements. These simulations are made for the same reason as the simulation in this paper. However, not the real robot control but virtual robots on program editing systems are used for the simulation. In these approaches, the temporal behavior of the control may only be estimated. There are numerous visualization systems existing, e.g., research approaches like the ROS-based simulation Gazebo [2] or visualization systems of commercial (robot) controls (e.g., Universal Robots). All share the problem of a high effort for building a model to be visualized. Commercial (control) visualizations support only a predefined set of devices. A further alternative are design systems like Siemens NX or Dassault Catia. These systems bear CAD functionality and furthermore allow to import CAD models which are then animated. However, these systems simulate an entire system which includes the device as well as the control of the device. The (temporal) behavior of the control may also only be estimated. Monitoring of arbitrary real devices is hard to be realized.

An Industrial Production Scenario as Prerequisite for Applying Intelligent Solutions

171

6 Conclusion and Future Work Our lean universal control architecture supports almost any kind of automation devices by providing robot arm control functions, PLC, and numeric control functions. This system is open and provides any kind of data as the base of an intelligent support. A realistic play-back device simulation is part of the system, which may interact with intelligent systems optimizing the (production) processes. The visualization then may serve as the interface to the human users. Furthermore, the rule checking provides an intelligent support of the control application program development. This helps engineers and machine tool workers in developing control programs. A first step is the graphical notation for modeling the programs. The designing these control programs is supported by automated checking. The rules are expressed in Boolean or temporal logic. This kind of intelligent checking helps to identify and prevent rule violations already in the process of designing the control application programs. Besides the checking and simulation/monitoring an intelligent recommendation system may support the development of control applications [11]. Proven and successful solutions selected by intelligent systems may be recommended to the developers making their work more easy. We currently make further steps towards such recommending systems by investigating the tracing of developer actions as base [10]. Due to the low level of adaptation effort (the user uses accustomed the program editor as usual with the only difference that there are intelligent recommendations when needed) the acceptance of the industrial production business may be quite high.

References 1. Feja, S., Witt, S., Speck, A.: BAM: A requirements validation and verication framework for business process models. In: In Proceedings of 11th International Conference On Quality Software (QSIC 2011). pp. 186–191. IEEE Computer (July 2011) 2. Gazebo (accessed in Jan 2020), http://gazebosim.org 3. Herrscher, A.: Flexible Fertigungssysteme, Entwurf und Realisierung prozessnaher Steuerungsfunktionen. Springer, Berlin; Heidelberg; New York, NY (1982) 4. Jackman, J.: Robotic control using sequential function charts. In: Proceedings of the SPIE, the international society of optics and photonics. SPIE, vol. 2911, pp. 120–128 (December 1996) 5. Jakobs, R.: internal report. SME skilled labor and automation level survey, KielRegion (2019) 6. Maglica, R.: On Programming and Control of Robots in Flexible Manufacturing Cells. Ph.D. thesis, Chalmers tekniska högskola, Göteborg (1996) 7. Toquica, J., Zivanovic, S., Alvares, A., Bonnard, R.: A STEP-NC compliant robotic machining platform for advanced manufacturing. The International Journal of Advanced Manufacturing Technology 95, 3839–3854 (2018) 8. Tzafestas, S., Verbruggen, H.: Artificial Intelligence in Industrial Decision Making, Control and Automation. International series on microprocessor-based and intelligent systems engineering, 14, Springer Netherlands, Dordrecht (1995) 9. Wu, K., Krewet, C., Bickendorf, J., Kuhlenkötter, B.: Dynamic performance of industrial robot with CNC controller. The International Journal of Advanced Manufacturing Technology 90, 2389–2395 (2017)

172

A. Speck et al.

10. Ziegenhagen, D., Pulvermüller, E., Speck, A.: Capturing tracing data life cycles for supporting traceability. In: Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2020, Prague, Czech Republic, 2020, to appear (2020) 11. Ziegenhagen, D., Speck, A., Pulvermüller, E.: Using developer-tool-interactions to expand tracing capabilities. In: Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2019, Heraklion, Crete, Greece, May 4-5, 2019. pp. 518–525 (2019)

Spectrum Management of Power Line Communications Networks for Industrial Applications Abdellah Chehri and Alfred Zimmermann

Abstract Power line communications (PLC) reuse the existing power-grid infrastructure for the transmission of data signals. As power line communication technology does not require a dedicated network setup, it can be used to connect a multitude of sensors and Internet of Things (IoT) devices. Those IoT devices could be deployed in homes, streets, or industrial environments for sensing and to control related applications. The key challenge faced by future IoT-Oriented Narrowband PLC Networks is to provide a high quality of service (QoS). In fact, the power line channel has been traditionally considered too hostile. Combined with the fact that spectrum is a scarce resource and interference from other users, this requirement calls for means to increase spectral efficiency radically and to improve link reliability. However, the research activities carried out in the last decade have shown that it is a suitable technology for a large number of applications. Motivated by the relevant impact of PLC on IoT, this paper proposed a cooperative spectrum allocation in IoT-Oriented Narrowband PLC Networks using an iterative water-filling algorithm.

1 Introduction The principle of power line communication (PLC) is based on the use of an electrical installation (already existing) as a physical medium for communication. By superimposing a higher frequency signal on the 50 Hz home AC (or 60 Hz depending on the region), the information can propagate through the electrical installation to be decoded remotely. Thus, a PLC receiver anywhere on the same network can decode a frame sent from a PLC transmitter on a network [1–4]. A. Chehri (B) Department of Applied Sciences, University of Québec, Chicoutimi, QC G7H 2B1, Canada e-mail: [email protected] A. Zimmermann Faculty of Informatics, Reutlingen University, Alteburgstraße 150, 72762 Reutlingen, Germany e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_14

173

174

A. Chehri and A. Zimmermann

PLC is also known as power-line digital subscriber line (PDSL), powerline carrier, powerline telecommunications, and power-line networking (PLN). PLC is better than wireless infrastructure, because in emergencies very often conventional networking technologies encounter congestion due to spiking the collision rate. Unlike wireless solutions based on ZigBee or WiFi, PLC-based solution has a proven track record of being able to avoid network congestion when cooperative schemes are employed [5]. However, since the electrical installation is not suitable for transporting highfrequency signals, part of the energy is radiated by radio waves due to the lack of shielding in the plant. The different frequency ranges are used to make it possible to use single carrier frequency modulations for low frequencies, or multi-carrier modulations for higher frequencies [6]. While the full potential of PLC is yet to be exploited in the market, it is expected that PLC will play a significant role as an enabler for efficient communications in the emerging smart grid, home automation, and “Internet of things” networks. By adopting some features of IEEE 802.15.4 and IPv6 over low-power wireless personal area network (6LoWPAN) into power lines, the authors in [7] demonstrate a low-rate PLC system over the IPv6 network [8]. The technology is referred to as 6LoPLC and can be used for home energy management system’s applications, smart grid, and industrial applications. A model is developed in the NS-3 simulator, and the network performance is validated with several measurements. The obtained results provide some useful insights for system designers and application developers. Furthermore, the model presented in this paper is also feasible for smart grid applications and other cyber-physical systems, where high reliability and low cost are of higher priority than high throughput [9]. Non-cooperative behavior often leads to inefficient outcomes. For example, in the IEEE 802.11 distributed medium access control protocol, distributed coordination function (DCF), and its enhanced version, EDCF, competition among selfish users can lead to inefficient use of the shared channel. This topic has been largely neglected, and to the best of our knowledge, algorithms for power allocation for IoT-oriented narrowband power line communication using game theory have not been studied before. In this paper, we proposed a cooperative spectrum allocation in IoT-Oriented Narrowband PLC Networks using an iterative water-filling algorithm. The paper is organized as follows: In the next section, we provide a brief overview of narrowband PLC networks. In the Sect. 3, we present the problem formulation. The proposed solution based on iterative multiuser water-filling solution is given in the section. Results and discussion are provided in Sect. 4. The conclusions and future works are drawn in Sect. 5.

Spectrum Management of Power Line Communications …

175

2 Narrowband PLC Networks PLC network applications are mainly related to home automation, industrial applications, and public services, in order to monitor or control electrical devices such as electric meters or lighting systems. The PLC network is classified into two categories according to the frequency range used: • Broadband PLC (BBPLC): Operates at high frequencies (1.8–250 MHz) and high data rates (up to 100 Mbps). It is mainly used for short-range applications such as home networks. • Narrowband PLC (NBPLC): Uses low frequencies (3–500 kHz) and low bit rate (up to 100 s/kbps) for a range of several kilometers and has low energy consumption. It is used in particular for the automation of sensors. Current research tends to define a so-called average usage band, which would be below 12 MHz (Table 1). In 2013, the IEEE 1901.2 standard was finalized. The standard is intended for line carrier communications for frequencies lower than 500 kHz. The IEEE 1901.2 is intended for communication between different devices. It, in turn, is based on several standards, including the two standards adopted by the ITU for low-speed PLC communications G9903 and G9904 [10]. The ITU G9904 standard provides a robust communication network. Figure 1 presents the reference model of the communication used in the ITU G9904 specification. Table 1 Main standards for PLC Broadband PLC

Narrowband PLC

Medium frequency

ITU-T G.hn

ITU-T G.hn

IEEE 1901.1

IEEE 1901

IEEE 1901.2 ITU-T G.9902 ITU-T G.9903 ITU-T G.9904

Fig. 1 Reference model of communication in the PRIME specification

176

A. Chehri and A. Zimmermann

The convergence layer classifies the traffics and associates them with their own MAC connection. This layer maps any traffic to a MAC service data unit (MSDU). The MAC layer provides functionality for channel access management, frequency allocation, connection establishment and maintenance, and topology resolution. The physical layer is based on the orthogonal frequency-division multiplexing (OFDM) technique. It uses 97 subcarriers for data transmission. The signal is modulated according to one of the following three types of constellations: DBPSK, DQPSK, or D8PSK. Thus, the three theoretical bit rates obtained are 47 kbps, 94 kbps, and 141 kbps, respectively. The ITU G9903 standard specifies the parameters for communication by PLC in the CENELEC (European Committee for Electrotechnical Standardization) and Federal Communications Commission (FCC) bands. The physical layer (PHY) is based on OFDM modulations. The use of digital modulations for each subcarrier simplifies the task of demodulation at the receiver level. The maximum number of subcarriers that could be used is fixed at 128 as a result of an inverse fast Fourier transform (IFFT) of size N = 256. The spacing between the OFDM subcarriers is 1.5625 kHz for the CENELEC bands and 4.6875 for the FCC bands. The number of subcarriers is 36 for the CENELEC A band, 16 for the CENELEC B band, and 72 for the FCC band [10]. The system supports two error correction modes (the Reed Solomon coder and the convolutional coder). The reference model of MAC sublayers is presented in Fig. 2. This reference model has two functional blocks: • The MCPS (MAC Common Part Sublayer) is responsible for communication with neighboring nodes. • The MAC Layer Management Entity (MLME) is responsible for managing the MAC sublayers. It is essentially based on the MAC PAN Information base (MAC

Fig. 2 The reference model of MAC sublayers

Spectrum Management of Power Line Communications …

177

PIB), the main element of which is the neighbor table, which contains all the information that the MAC layer and the PHY layer will need to establish two-way communication with the neighbors.

3 Problem Formulation We consider a set L = {1, . . . , I } of I fixed transmitter–receiver IoT devices communicating using OFDM modulation through a PLC channel. Examples of suitable IoT-Oriented Narrowband PLC Networks for this end are the ones defined in the ITU-T Recommendations G.9902 (known as G.hnem), G.9903 and G.9904, and the IEEE P1901.2. ITU-T G.9903 and G.9904 are based on the industry specifications G3-PLC and Powerline Intelligent Metering Evolution (PRIME), respectively [11]. Communications are assumed to take place at the same time and in the same frequency band B, divided into N subcarriers spaced by f Hertz. To communicate, these smart IoT devices operate in narrowband PLC channel. Each player (IoT device) will then choose a strategy to maximize its payoff function (or utility function) called u i . The transmitting IoT nodes are each subject to their power constraint:

pi,n ≤ Pimax

(1)

k

Water-filling is a well-known algorithm to decide the power allocation and the information distribution of a communication system, with or without coordination [12]. As one of the most prosperous algorithms, water-filling utilizes fast bit loading techniques based on the channel signal-to-noise ratio, described as the signal-to-noise ratio (SNR) with unit signal power across the entire frequency band [13]. The waterfilling technique is an optimal power allocation strategy allowing for an improvement of the transmission channel capacity. The multiuser resource allocation can be performed using water-filling based techniques. In [14], an iterative water-filling algorithm is proposed for the sake of maximizing the sum capacity of a Gaussian multiple access channel. In [15], the data rate maximization is performed otherwise; each subcarrier is attributed to its best user (i.e., the user with the highest channel gain), and then water-filling is applied on subcarriers assigned to each user individually. To optimize the resource allocation in a multiuser environment, functions for optimal bit rates are investigated. A cost function for optimal bit rates are presented by Bogaert et al. [16] and reproduced in the following formula:

178

A. Chehri and A. Zimmermann

J (P1 (k)P2 (k)) =

log2 1 +

k

+

1 h 2 (k)P2 (k) + N1 (k)

log2 1 +

k

h 211 (k)P1 (k)

h 222 (k)P2 (k)

1 h 2 (k)P1 (k) + N2 (k)

P1 (k) + λ2 P2 − P2 (k) + λ1 P1 −

(2)

The optimum solution for user n is then found by taking the derivative of the cost function with respect to Pi (k) and setting it equal to zero: 1 ∂J = − λk = 0 (k) ∂ Pn (k) Pn (k) + hn2Nn(k)

(3)

nn

Thus the water-filling equation is found by rewriting Eq. (3) as presented in Eq. (4): Pn (k) =

n Nn (k) 1 − 2 λn h nn (k)

(4)

By substituting the expression for the background noise power of Nn (k) with an expression including interfering cross talk, the formula for power allocation for iterative water-filling is obtained and presented for user 1 in a two-user environment: P1 (k) = K 1 −

1 h 212 (k)P2 (k) + N1 (k) h 211 (k)

(5)

K 1 is the upper limit of the transmit power in each carrier and consequently could be seen as the water level of the water-filling algorithm. The fraction following the margin, 1 , is the inverse signal-to-noise ratio. The expression is sometimes denoted as the channel noise-to-signal (NSR) ratio or simply the noise-to-channel ratio (NCR), concisely characterizing the channel. The utility function (full expression for the total transmission rate) based on the theoretical information rate is chosen for this problem, consistent with the current usage of OFDM modulation [17]. The average mutual information of the transmitter– receiver devices link can be expressed by the ergodic capacity [18]

Ri p i , p−i = α

N n=1

E log 1 +

2 σi,n

Hii,n 2 pi,n

2

+ j=i H ji,n p j,n

(6)

The expression of Ri p i , p−i as given by Eq. (6) neglects the interference between OFDM symbols (ISI) and between inter-carrier interference (ICI) (i.e., cross talk between the subcarriers). This justifies that the synchronization hypothesis from the previous assumption plays a crucial role. However, the work of [19] shows that the inclusion of ISI and ICI due to non-synchronization does not fundamentally change the formulation of the problem since these interferences are added to external noise and interference.

Spectrum Management of Power Line Communications …

179

4 Simulations Results First, let us consider that the PLC architecture was supposed to satisfy IoT-Oriented Narrowband PLC Networks (Fig. 3). Then, let us consider three typical and different cables, in term of length and quality, which can be used to run the simulations. References [20–22] provide the number of paths for modeling each category of cable. The signal propagation speed of the transmitted signal for all channels is considered to be v p = 1.53 × 108 m/s. This spectrum was allocated for all calculations at static spectrum management and will be referred to as the spectrum at nominal value or merely the fixed spectrum. All frequencies used throughout this paper are chosen within the typical narrowband PLC values (see Fig. 4), which range between 3 kHz and 148.5 kHz in Europe and up to 500 kHz in the USA [11].

Fig. 3 Example of typical narrowband PLC deployment for industrial applications

Fig. 4 G.hnem band plans for CENELEC and FCC bands with the number of carriers

180

A. Chehri and A. Zimmermann

The water-filling algorithm presented in the previous section can be autonomously executed on each IoT device in an iterative approach denoted iterative water-filling described in the last section. The objective of the algorithm presented in this section was to maximize the bit rate while maintaining an acceptable bit rate in any interfering PLC line. Figure 5 presents the transmit power spectral density using water-filling compared to the PSD of the fixed spectrum where no rate constraint is applied. The water-filling algorithm is then performed on the longest PLC line, illustrated by the transmit spectrum in Fig. 5. Figure 6 shows that the transmit PSD on an average is significantly lower than the fixed spectrum at nominal value illustrated by the dashed line in the figure. Still the system maintains the same bit rate as found in the previous section’s fixed spectrum analysis. With the abilities of a dynamic spectrum, this superfluous power can instead be shifted to carriers where an increase in the bit loading is possible.

Fig. 5 Transmitted power spectral density

Spectrum Management of Power Line Communications …

181

Fig. 6 Power spectral density in the presence of simultaneous transmitters

5 Conclusion Power line communication is an emerging technology in the field of communications that aims to use powerline as a medium to bi-directionally transfer data. However, communication still faces different distortions, which cause inefficient data transmission. A model of multiuser and non-cooperative transmissions, including many realistic constraints for IoT-Oriented Narrowband PLC Networks, has been studied. From this model, a spectrum sharing method between OFDM communications links has been developed. The game theory approach made it possible to propose a solution non-cooperative and decentralized, requiring only limited statistical knowledge between IoT devices. Simulation examples explain the performance of the proposed schemes.

References 1. Pau, G., Collotta, M., Ruano, A., Qin, J.: Smart home energy management. Energies 10, 382 (2017) 2. Ikpehai, A., Adebisi, B., Rabie, K.M.: Broadband PLC for clustered advanced metering infrastructure (AMI) architecture. Energies 9, 569 (2016)

182

A. Chehri and A. Zimmermann

3. IEEE Draft Standard for Low Frequency (less than 500 kHz) Narrow Band Power Line Communications for Smart Grid Applications. In: IEEE P1901.2/D0.08.00, pp. 1–336 (2013) 4. Chiti, F., Fantacci, R., Marabissi, D., Tani, A.: Performance evaluation of an efficient and reliable multicast power line communication system. IEEE J. Sel. Areas Commun. 34(7), 1953–1964 (2016) 5. Qian, Y., et al.: Design of hybrid wireless and power line sensor networks with dual-interface relay in IoT. IEEE Internet of Things J. 6(1), 239–249 (2019) 6. Ahiadormey, R.K., Anokye, P., Jo, H., Lee., K.: Performance analysis of two-way relaying in cooperative power line communications. IEEE Access 7, 97264–97280 (2019) 7. Ikpehai, A., Adebisi, B., Rabie, K.M., Haggar, R., Baker, M.: Experimental study of 6LoPLC for home energy management systems. Energies (2016) 8. Ikpehai, A., Adebisi, B., Anoh, K.: Can 6LoPLC enable indoor IoT? In: IEEE International Symposium on Power Line Communications and its Applications (ISPLC), Praha, Czech Republic, pp. 1–6 (2019) 9. Ikpehai, A, Adebisi, B.: 6LoPLC for smart grid applications. In: IEEE International Symposium on Power Line Communications and Its Applications (ISPLC), Austin, TX, pp. 211–215 (2015) 10. Oksman, V., Zhang, J.: G.HNEM: the new ITU-T standard on narrowband PLC technology. IEEE Commun. Mag. 49 (2011) 11. Cortés, J.A., Sanz, A., Estopiñán, P., et al.: Analysis of narrowband power line communication channels for advanced metering infrastructure. EURASIP J. Adv. Signal Process. 27 (2015) 12. Starr, T., et al.: DSL Advances. Prentice Hall, Upper Saddle River, NJ (2002) 13. Song, K.B., et al.: Dynamic spectrum management for next-generation DSL systems. IEEE Commun. Mag. 40(11), 101–109 (2002) 14. Yu, W., Rhee, W., Boyd, S., Cioffi, J.M.: Iterative water-filling for Gaussian vector-multipleaccess channels. IEEE Trans. Info. Theory 50(1), 145–152 (2004) 15. Jang, J., Lee, K. B.: Transmit power adaptation for multiuser OFDM systems. IEEE J. Sel. Areas Commun. 21(2), 171–178 (2003) 16. Bogaert, E., Van den, et al.: DSM in practice: performance results of iterative water-filling Implemented on ADSL modems. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 337–340 (2004) 17. Zhao, Y., Pottie, G.J.: Optimal spectrum management in multiuser interference channels. IEEE Trans. Inf. Theory 59(8), 4961–4976 (2013) 18. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley (2006) 19. Scutari, G., Palomar, D.P., Barbarossa, S.: Distributed totally asynchronous iterative waterfilling for wideband interference channel with time/frequency offset. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1325–1328 (2007) 20. Chehri, A.: A Low Complexity Turbo Equalizer for Power-Line Communication with Applications to Smart Grid Networks. IEEE ISPLC, Prague (2019) 21. Zimmermann, M., Dostert, K.: A multipath model for the powerline channel. IEEE Trans. Comm. 50, 553–559 (2002) 22. Channel Model Sub-committee of P1901. Electrical Network and Topology Channel and Noise Model, P1901-10-0356-00 (2010)

Innovations in Medical Apps and the Integration of Their Data into the Big Data Repositories of Hospital Information Systems for Improved Diagnosis and Treatment in Healthcare Mustafa Asim Kazancigil Abstract The common use of tablets, smartphones, and smartwatches, which are today equipped with HD digital cameras and touchscreen electronic visual displays and sensors, have enabled software developers to use new algorithms and methods for the creation of medical apps. These apps can perform tests for diagnosing a large variety of diseases, including skin cancer, cardiovascular disorders, and diabetes. In this paper, the main focus is given on the use of smartphone digital cameras for testing and diagnosing dermatological diseases, while comparisons are made with previous research work on apps for measuring blood pressure, diabetes, and ocular anomalies. The research aims to identify the areas for converging the capabilities of mobile apps by integrating their data into the Cloud-based data warehouses or Big Data repositories of online hospital information systems. As such, it will be possible to improve the performance of diverse medical apps that are used in the testing, diagnosis, and treatment of a multitude of diseases. Thanks to the similarities in the tools, methods, and parameters for the measurement and diagnosis of various types of medical disorders, the possibility of creating a unique multipurpose medical app is continuously increasing. Another area of focus is the architecture of Cloud-based data warehouses or Big Data repositories, through which these apps can exchange data with online hospital information systems and therefore be used for aiding physicians in making more accurate decisions.

1 Introduction The standard use of HD digital cameras, touchscreen electronic visual displays, and sensors in contemporary mobile computing devices such as tablets, smartphones, and smartwatches have enabled software developers to utilize new methodologies and algorithms in the design of mobile medical apps. These apps can be used for testing and diagnosing a large array of diseases, including dermatological disorders, M. A. Kazancigil (B) Yeditepe University, 34755 Istanbul, Turkey e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_15

183

184

M. A. Kazancigil

cardiovascular disorders, and diabetes. This research mainly focuses on the use of smartphone digital cameras for testing and diagnosing dermatological diseases, while comparisons are made with previous research work on apps for measuring blood pressure, diabetes, and ocular anomalies. The research aims to identify the areas for converging the capabilities of mobile apps by integrating their data into the Cloud-based data warehouses or Big Data repositories of online hospital information systems.

1.1 Innovations in Medical Apps for Diagnosing Skin Diseases There are numerous diseases and health disorders which can be tested, diagnosed, and monitored with smartphones and other mobile computing devices; such as skin cancer, cutaneous health disorders, cancerous tissues, blood pressure, cardiovascular disorders, blood glucose levels, diabetes, ocular health disorders, neurological health disorders, urinalysis, kidney, and urinary tract diseases, etc. In 2019, a team of Yeditepe University students developed a medical app as part of their graduation project for testing and diagnosing skin cancer and other types of cutaneous health disorders by using images obtained via HD smartphone cameras and comparing them with the images in their dataset, which are tagged with the disease names and were used for training the computer vision algorithms [1] (Fig. 1). In digital image processing, a convolution is a weighted sum, which is computed for each pixel of an image. It is the process of adding each element of the image to its local neighbors, weighted by the kernel:

Fig. 1 The methodology used in the medical app developed by Aydin and Ermertcan (Yeditepe University, 2019) for testing and diagnosing cutaneous disorders [1]

Innovations in Medical Apps and the Integration of Their Data …

185

Fig. 2 Max pooling is a sample-based discretization process in digital image processing

result = sum(k[n, p] × I [x, y]) In digital image processing, max pooling is a sample-based discretization process. The objective is to down-sample an image by reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned (Fig. 2). In medical apps for testing and diagnosing cancerous tissues, skin cancer, and other cutaneous health disorders, the accuracy of the results is influenced by: • The image resolution quality of the HD camera on the smartphone or other mobile computing device (photos taken with higher resolution cameras yield more accurate results); • The correct tagging of the images in the dataset for training the computer vision algorithms; • The correct application of the convolution and max pooling methods; • The correct application of the thresholding method for image segmentation in order to optimize the test results and diagnosis (Figs. 3 and 4). In order to optimize the diagnosis, the thresholding method of image segmentation is applied for creating binary (black and white) images from grayscale images [1] (Figs. 5 and 6). Similar smartphone and smartwatch apps for diagnosing skin cancers and other cutaneous disorders have become available in recent years. UMSkinCheck, developed by the University of Michigan, is compatible with mobile devices using the iOS/iPadOS or Android operating systems [2, 3]. It is a free mobile application intended for skin cancer self-exam, which is applied for identifying suspicious moles or pre-cancers, i.e., lesions that may be cancer or growths that may develop into skin cancer [2, 3]. The app allows users to complete and store a full body photographic library, track the changes in detected moles/lesions, download informational literature and video, and to locate the nearest skin cancer specialists [2, 3]. Other popular mobile apps for the detection of skin cancers and cutaneous disorders include SkinVision developed in the Netherlands [4] (Fig. 7), Miiskin developed in Denmark [5], and MoleScope developed in Canada [6]. SkinVision mobile app

186

M. A. Kazancigil

Fig. 3 Applying the convolution and max pooling methods on the images for testing and diagnosing cutaneous disorders (Aydin and Ermertcan, Yeditepe University, 2019) [1]

Fig. 4 The image is compared with the correctly tagged images in the dataset to obtain a list of the most probable skin disorders (Aydin and Ermertcan, Yeditepe University, 2019) [1]

uses a machine-learning algorithm to analyze spots on the skin [4]. Miiskin is an app which enables users to take high-resolution photos of their skin and store the image files in an online database [5]. The app uses artificial intelligence techniques for making comparisons with the follow-up photos to identify any changes [5] (Fig. 8).

Innovations in Medical Apps and the Integration of Their Data …

187

Fig. 5 Code for the thresholding method of image segmentation, which is applied for creating binary (black and white) images from grayscale images in order to optimize the diagnosis (Aydin and Ermertcan, Yeditepe University, 2019) [1]

Fig. 6 Improved diagnosis after the thresholding method of image segmentation, which is applied for creating binary (black and white) images from grayscale images in order to optimize the results (Aydin and Ermertcan, Yeditepe University, 2019) [1]

Fig. 7 SkinVision mobile app developed in the Netherlands uses a machine-learning algorithm to analyze spots on the skin and lists the results with the highest probability [4]

188

M. A. Kazancigil

Fig. 8 Miiskin mobile app developed in Denmark enables users to take high-resolution photos of their skin and store the image files in an online database. The app uses artificial intelligence techniques for making comparisons with the follow-up photos to identify any changes [5]

1.2 Innovations in Medical Apps for Diagnosing Other Types of Diseases Smartphone apps can detect heart rhythm irregularities as accurately as a hospital electrocardiogram (ECG), according to recent studies in the field [7]. They also have the capability to detect if someone is having a STEMI (ST-Elevation Myocardial Infarction) heart attack [7]. Muhlestein et al., who developed the AliveCor KardiaMobile smartphone app in 2015, demonstrated that a smartphone can be used to obtain similar results with a 12-lead ECG (electrocardiography) device for the initial evaluation of myocardial ischemia [7]. By 2018, the study was conducted on 204 patients with chest pain to evaluate the effectiveness of a 12-lead ECG generated from a twowire attachment of electrodes connected to AliveCor KardiaMobile, and to compare its performance to the standard 12-lead ECG devices used in hospitals [7]. The tests revealed that the 12-lead ECG outputs generated from the two wires of AliveCor KardiaMobile corresponded to those obtained from a standard 12-lead ECG device used in hospitals [7]. To reach a similar level of accuracy, the two wires of AliveCor KardiaMobile are moved around the bodies of the patients to examine all 12 parts [7] (Fig. 9). Approved by the FDA in 2017, AliveCor KardiaBand for Apple Watch records an electrocardiogram and checks the pulses to determine the user’s heart rate to make

Innovations in Medical Apps and the Integration of Their Data …

189

Fig. 9 AliveCor KardiaMobile smartphone app has demonstrated that a smartphone can be used to obtain similar results with a 12-lead ECG (electrocardiography) device for the initial evaluation of myocardial ischemia [7]

sure the upper and lower chambers of the heart are in rhythm [8]. If they are out of rhythm, this may indicate atrial fibrillation [8] (Fig. 10). In 2017, the PupilScreen mobile app, which aims to allow anyone with a smartphone to screen for concussion and other brain injuries, was developed by researchers at the University of Washington [9]. PupilScreen can detect changes in a pupil’s response to light by using a smartphone’s video camera and applying artificial intelligence techniques to determine the results [9]. The app works by stimulating the patient’s eyes using the smartphone’s flashlight and records the pupil’s response using the smartphone’s video camera [9]. The PupilScreen box, which resembles a head-mounted virtual reality display, controls the eyes’ exposure to light [9]. The

Fig. 10 AliveCor KardiaBand for Apple Watch was approved by the FDA in 2017 [8]

190

M. A. Kazancigil

Fig. 11 The PupilScreen box, which resembles a head-mounted virtual reality display, controls the eyes’ exposure to light and is used for improving the visual data obtained by the smartphone’s video camera for the mobile app [9]

recorded video is processed using convolutional neural networks that track the pupil’s diameter as it responds to changes of light over time, generating clinically relevant measures, with a median error of 0.30 mm, according to tests [9] (Fig. 11). The Epic Health app developed in 2017 in the United Kingdom for smartphones using the iOS and Android operating systems replaces the need for diabetics to prick their fingers for making tests [10]. Suitable for both Type 1 and 2 diabetics, the application works by placing a fingertip over the camera lens of a smartphone and capturing a series of close-up images that convey information about the user’s heart rate, body temperature, blood pressure, respiration rate, and blood oxygen saturation level [10]. GlucoWise is another smartphone app developed in the U.K. which aims to measure blood glucose levels by placing the sensor device on the earlobe or the skin between the thumb and forefinger [11]. The real-time measurements are then sent to the smartphone application for evaluation and visualization [11]. The device measures blood glucose levels through the use of radio waves [11].

2 Hospital Information Systems 2.1 Integrating Medical Apps to Hospital Information Systems An online hospital information system (HIS) which can receive real-time patient data from mobile medical apps and integrate them with its data warehouses or Big Data

Innovations in Medical Apps and the Integration of Their Data …

191

Fig. 12 A basic architecture for integrating data from mobile medical apps with the Cloud-based data warehouses or Big Data repositories of online Hospital Information Systems [12]

repositories can provide a centralized source of information for physicians about the health history of their patients and the results of their treatment process [12–14]. The most advanced hospital information systems today have online databases and are connected with Big Data repositories for storing and sharing massive quantities of patient information [12, 13]. Through the spread of IoT devices and the ubiquity of information and data files, thanks to cloud computing, hospitals and physicians today can remotely track the well-being of their patients in real time [12, 13] (Fig. 12).

3 Conclusion Mobile computing devices equipped with HD cameras, touchscreen monitors, and sensors have paved the way for the development of multipurpose healthcare apps which can help physicians in testing and diagnosing a multitude of health problems and diseases, thanks to refined algorithms and artificial intelligence techniques. Therefore, it is of utmost importance to integrate data from trustworthy medical apps into online hospital information systems. Data obtained from various types of medical apps (for cardiovascular diseases, diabetes, ocular anomalies, etc.) can be collected in a centralized hospital information system to improve the accuracy of the decisions made by physicians regarding the treatment of their patients. Through

192

M. A. Kazancigil

the use of Big Data repositories, data warehouses, cloud computing, and IoT, it is possible to quickly transfer real-time data obtained via mobile medical apps into the databases of hospital information systems. This will also enable physicians in remote locations to monitor the treatment process of their patients more effectively.

References 1. Aydin, R., Ermertcan, M., K.: Gradual approach to skin cancer classification with artificial intelligence and OpenCV draw contouring. Yeditepe University, Istanbul (2019) 2. University of Michigan, Michigan Medicine Homepage: UMSkinCheck App. https://www.uof mhealth.org/patient%20and%20visitor%20guide/my-skin-check-app. Accessed 26 Jan 2020 3. University of Michigan, Rogel Cancer Center, UMSkinCheck App. https://www.rogelcancerc enter.org/skin-cancer/melanoma/prevention/app. Accessed 26 Jan 2020 4. SkinVision Homepage. https://www.skinvision.com. Accessed 26 Jan 2020 5. Miiskin Homepage. https://miiskin.com/app. Accessed 26 Jan 2020 6. MoleScope Homepage. https://www.molescope.com. Accessed 26 Jan 2020 7. Muhlestein, J.B., Le, V., Albert, D., Moreno, F.L., Anderson, J.L., Yanowitz, F., Vranian, R.B., Barsness, G.W., Bethea, C.F., Severance, H.W., Ramo, B., Pierce, J., Barbagelata, A.: Smartphone ECG for evaluation of STEMI: results of the ST LEUIS pilot study. Elsevier J. Electrocardiol. 48(2), 249–259. Elsevier, Amsterdam (2015) 8. AliveCor Homepage. https://www.alivecor.com/press/press_release/fda-clears-first-medicaldevice-for-apple-watch. Accessed 10 Mar 2020 9. Mariakakis, A., Baudin, J., Whitmire, E., Mehta, V., Banks, M.A., Law, A., McGrath, L., Patel, S.N.: PupilScreen: Using Smartphones to Assess Traumatic Brain Injury. University of Washington, Seattle (2017) 10. Miglierini, G.: An app to measure glucose without the need of blood samples. Pharma World Mag. 19 February 2018 11. Fernandez, C.R.: Needle-free diabetes care: 7 devices that painlessly measure blood glucose. Labiotech, 23 July 2018 12. Winter, A., Haux, R., Ammenwerth, E., Brigl, B., Hellrung, N., Jahn, F.: Health Information Systems: Architectures and Strategies. Health Informatics, 2nd edn. Springer, London (2011) 13. Li J.S., Zhang Y.F., Tian, Y.: Medical big data analysis in hospital information system. In: Soto, S.V., Luna, J., Cano, A. (eds.) Big Data on Real-World Applications, London (2016) 14. Kazancigil, M.A.: Innovations and convergence in mobile medical applications and cloudbased hospital information systems for the real-time monitoring of patients and early warning of diseases. In: Proceedings of the 2019 IEEE World Congress of Services (SERVICES), Milan, Italy, pp. 301–306. IEEE, Piscataway, NJ (2019)

Automatic Classification of Rotating Machinery Defects Using Machine Learning (ML) Algorithms Wend-Benedo Zoungrana, Abdellah Chehri , and Alfred Zimmermann

Abstract Electric machines and motors have been the subject of enormous development. New concepts in design and control allow expanding their applications in different fields. The vast amount of data have been collected almost in any domain of interest. They can be static; that is to say, they represent real-world processes at a fixed point of time. Vibration analysis and vibration monitoring, including how to detect and monitor anomalies in vibration data are widely used techniques for predictive maintenance in high-speed rotating machines. However, accurately identifying the presence of a bearing fault can be challenging in practice, especially when the failure is still at its incipient stage, and the signal-to-noise ratio of the monitored signal is small. The main objective of this work is to design a system that will analyze the vibration signals of a rotating machine, based on recorded data from sensors, in the time/frequency domain. As a consequence of such substantial interest, there has been a dramatic increase of interest in applying Machine Learning (ML) algorithms to this task. An ML system will be used to classify and detect abnormal behavior and recognize the different levels of machine operation modes (normal, degraded, and faulty). The proposed solution can be deployed as predictive maintenance for Industry 4.0.

1 Introduction Rotating machines represent the most significant part of the mechanisms created by designers. The rotational movement can be used to store energy, for example, in a flywheel, to transfer power, by activating a belt or a gearbox, or to recover kinetic W.-B. Zoungrana · A. Chehri (B) Department of Applied Sciences, University of Québec, Chicoutimi, QC G7H 2B1, Canada e-mail: [email protected] W.-B. Zoungrana e-mail: [email protected] A. Zimmermann Faculty of Informatics, Reutlingen University, Alteburgstraße 150, 72762 Reutlingen, Germany e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_16

193

194

W.-B. Zoungrana et al.

energy from external sources, for example, thermal energy using a turbine, or the wind using a wind turbine. The rotating parts, also known as “rotors,” obviously play a central role in the processes mentioned, and represent the primary source of nuisance in these systems. However, several geometric or material faults, or even the interfaces, cause a loss of power in various forms. As in all mechanisms, thermal dissipation is observed, but the mechanical energy losses are also often significant, and a certain number of vibrations accompany the movement of the rotor. First, the vibrations of the rotor can be excited, in the axial, radial directions, or torsion to its axis of rotation. Besides, oscillations also appear at the level of the guide supports (so-called bearings). Through these biases, energy waves are therefore transmitted to the non-rotating parts, or even to neighboring equipment. They are considered important components in many industrial applications as power systems, manufactories, power plants, electric vehicles, and home appliances. For certain applications, these machines may operate under unfavorable conditions, such as high temperature, dust (mining applications), which can eventually result in motor malfunctions that lead to high maintenance costs and safety hazards [1–3]. The integration of different sensors, the Internet of things (IoT), and intelligent software into electric machines facilitate diagnostics and report technical problems in advance. The integration of intelligent software improves the proactivity of maintenance [4–7]. Predictive analyses identify the potential options to be favored for anticipation purposes. The predictive maintenance philosophy consists of scheduling maintenance activities only when a functional failure is detected [4]. The machine would then be shut down at a time when it is most convenient, and the damaged components would be replaced. If left unattended, these failures could result in costly secondary failures. One of the advantages of this approach is that the maintenance events can be scheduled in an orderly fashion. With predictive maintenance, devices participate proactively in their own maintenance in collaboration with operators. Nowadays, significant progress in vibration signal analysis techniques is used for rotating machinery monitoring [8–11]. Vibration analysis is one of the most widely used techniques for predictive maintenance in high-speed rotating machines. Since the bearing is the most vulnerable component in a motor drive system, accurate bearing fault diagnostics has been a research frontier for engineers and scientists for the past decades. However, accurately identifying the presence of a bearing fault can be challenging in practice, especially when the fault is still at its incipient stage, and the signal-tonoise ratio of the monitored signal is small. In fact, there may exist many unique features or patterns hidden in the data themselves that can potentially reveal a bearing fault. Consequently, almost impossible for humans to identify these convoluted features through manual observation or interpretation. Therefore, the monitoring requires an intelligent method to analyze vibration data generated by the sensors. An important step involved in intelligent data analysis is to identify the signal signature and use the signatures in the learning method. In this work, we use machine learning classification methods to recognize the different levels of machine operation modes. The goal is to recognize the normal and faulty signals from the extracted vibration signal.

Automatic Classification of Rotating Machinery Defects …

195

The rest of this paper is organized as follows. In the next section, the motivation and related works are described. Section 2 describes the methodology. The predictive maintenance of rotating machines is described in Sect. 3. Section 4 provides the simulation results. Section 5 concludes this paper.

2 Methodology By using sensors to determine when equipment verification is required, it is possible to prevent breakdowns and reduce routine maintenance costs. Thanks to integrated sensors connected to the Internet, control of production equipment is carried out remotely and in real time. In this case, the recommendations are sent to the operators to remedy the problems before they even occur. This method reduces operating and capital costs by promoting repair and proactive maintenance of equipment to improve capacity utilization and productivity. The developed method is tested on reference data extracted from the “NASA prognostic data repository” and relating to several experiences of bearing failures carried out under different operating conditions. Besides, the method is compared to traditional forecasts of time and frequency characteristics, the simulation results. Figure 1 shows the main steps of our methodology.

2.1 Data Acquisition Protocol Data was generated by the National Science Foundation (NSF) Industry-University Cooperative Research Centers (IUCRC) for Intelligent Maintenance Systems (IMS-www.imscenter.net) with support from Rexnord Corporation in Milwaukee,

Fig. 1 Data analysis steps (classification)

196

W.-B. Zoungrana et al.

Fig. 2 Illustration of the bearing test ring and sensor placement—IMS data set [12]

Wisconsin, United States. Four bearings were installed. The rotational speed was kept constant at 2000 rpm by an alternating current. The motor is coupled to the shaft via friction belts. All bearings are forcibly lubricated. Rexnord ZA-2115 double row bearings were installed on the shaft, as shown in Fig. 2. A PCB 353B33 with high sensitivity ICP Quartz accelerometers was installed on the bearing (two accelerometers for each bearing for data set 1, an accelerometer for each bearing for data sets 2 and 3). Figure 2 also shows the position of the sensors. All failures occurred after exceeding the expected lifetime of the bearing, which is more than 100 million rotations [12].

2.2 Data Pre-processing Pre-processing condition monitoring data is a very important and fundamental step when developing data models systems. The first step in the pre-processing data stage is to filter out noise from the data. We have proceeded by applying a median filter to the vibration signals to keep the useful information and by eliminating the high-frequency noise components. Since the spectrograms P(t, ω) of the data for good and defective bearings are different, representative characteristics can be extracted from the spectrograms and used to accurately calculate the average peak frequency as an indicator of the state of the system. Once the data filtering is complete, the next step is to segment the data to eliminate the nonlinear effects. There are three distinct speed regions: (1) normal mode, (2) degraded mode, and (3) faulty mode. Figure 3 shows the extracted signal for different modes (in the time domain).

Automatic Classification of Rotating Machinery Defects …

197

Fig. 3 Classification of the vibration signal into three modes according to the presence or absence of faults. a Normal signal; b Degraded signal; c Faulty signal

2.3 Time Signal Based Approach Time series data in their nature is way more complicated than static data. Therefore, it is more challenging to get insightful knowledge from them. As a consequence, the ability of time series algorithms to provide comprehensible classification results becomes extremely important. The temporal characteristics capable of identifying the precise location of the faults are as follows: Root Mean Square (RMS): RMS relates to the power of the vibration signal. RMS is sensitive to load and speed changes (Eq. 1). RMS indicates the general condition at the last stages of degradation, and it is one of the important factors for machinery status diagnosis.

VRMS

N 1 = xt (i)2 N i=1

(1)

198

W.-B. Zoungrana et al.

Crest Factor (CF): Indicates a relationship between the peak value of the signal and the RMS value indicating early signs of damage, especially when the vibration signals have impulsive characteristics (Eq. 2). It is used to determine the deterioration of bearings by relative comparison. CP =

max(xt (i)) VRMS (xt (i))

(2)

Kurtosis: Kurtosis is increasingly used for fault detection of electrical machines, due to the simplicity of the algorithm, and its ability to detect non-stationary events. Kurtosis indicates the main peaks of the amplitude distribution of the vibration signals (Eq. 3). Figure 4 shows the kurtosis of different types of signals. For example, a good bearing with no flaws that cause impulses to the signal will have a kurtosis-value ~3 and in general, a kurtosis-value above 4 is a sign of a bad condition (xt (i) − E[xt (i)])4 K (x) = 2 (xt (i) − E[(xt )(i)])2

(3)

Fig. 4 The kurtosis in the time domain of the signals. a Normal signal; b Degraded signal; c Faulty signal

Automatic Classification of Rotating Machinery Defects …

199

3 Predictive Maintenance of Rotating Machines In this part of the article, we have set up a system, which explains how to extract characteristics from measured data to carry out monitoring and forecasts. Based on the extracted functionalities, dynamic models are generated, validated, and used to predict the time of breakdowns so that actions can be taken before the actual breakdowns occur.

3.1 Classical Machine Learning Based Approaches Supervised learning is a type of machine learning that uses a known data set to make predictions. The training data set includes input data and response values. Supervised learning algorithms seek to create a model capable of predicting the response values of a new data set. Using larger training data sets and optimizing the model’s hyperparameters can often increase the predictive power of the model and allow it to generalize well for new data sets. A set of test data is often used to validate the model. The k-NN, Decision Trees, and Support Vector Machines (SVMs) algorithms are widely exploited for Time Series Classification in conjunction with one or few similarity measures [13]. Below, we briefly explain these techniques.

3.2 Support Vector Machines (SVMs) SVMs have supervised learning models that analyze data used for non-probabilistic classification or regression analysis. The SVM algorithm is a so-called linear classifier, which means that, in the perfect case, the data must be linearly separable. It allows you to find the best separator (line, plane, or hyperplane) that best separates the two classes. The SVM method is based on the theory of statistical learning introduced by Vapnik in the 1990s [14].

3.3 Decision Trees If providing comprehensible classification results is of interest, Decision Trees are usually recommended. This method uses a top-down approach in order to build a tree recursively. At each internal node, there is a split test evaluation, and each leaf node contains information about a class to be assigned to the new test instance [15].

200

W.-B. Zoungrana et al.

3.4 k-Nearest Neighbor (k-NN) Classifier The k-NN is one of the most commonly used algorithms in the Time Series Classification [16]. It is a simple, robust, and accurate classifier, which depends on very few parameters and requires no training. For distance measures, there are several distance functions, and all of them are used with continuous variables only. The k-NN classifier is simple but very effective with high accuracy.

3.5 Naive Bayes Classifier Naive Bayes classifier is utilizing Bayesian methods to classify data. During the training phase, the Naive Bayes classifier calculates the probability of every class based on independent variables. After that, these probabilities will be used to classify unlabeled data to the most likely class [17].

4 Results and Discussion With MATLAB’s Diagnostic Feature Designer toolbox, we imported and visualized all of the vibration bearing signals in order to extract the essential characteristics of each data. Other signals were generated randomly to emulate a faulty bearing vibration signal. Machine learning algorithms were used to classify those signals into different classes. Before looking at the details of the Time Series Classification algorithms, we discuss the problem of interpretable and comprehensible ML. Figure 5 shows the

Fig. 5 Histogram of the characteristics of the vibration signals (Kurtosis vs. RMS)

Automatic Classification of Rotating Machinery Defects …

201

Fig. 6 Classification of signals with the Cross-Validation method

characteristics obtained with Kurtosis and RMS. We can observe that the RMS data are better because it distinguishes better the three groups of data according to the level of failure compared to the characteristics of Kurtosis.

4.1 Test Results with Cross-Validation The Cross-Validation technique was used in this study to ensure the selection of the classifier models with the highest accuracy and the lowest prediction errors. As shown in Fig. 6, the main data set was divided into a training and validation data set (85%) and a testing data set (15%). The testing data set was created randomly. The training and validation data set was partitioned into k (a positive integer) equal size sub-dataset. Subsequently, k iterations of training and validation were performed. In each iteration cycle, a different sub-dataset was held-out for validation, while the remaining sub-datasets (k−1) were used for training. At the end of each training cycle, each classifier model of Sect. 4 were run against the validation data set to make predictions and then calculate the errors. Each model was stored in models array with its error rate. Upon completion, all observations were used for both training and validation; and the model with a lower error rate was selected as the final model. In the Confusion Matrix analysis, the model accuracy is approximately 99.3% with the Naive Bayes, SVM, and k-NN methods, compared to the decision tree (98%).

202

W.-B. Zoungrana et al.

5 Conclusion This paper prospect of using vibrations to analyze the condition and possible diagnose mechanical presses was researched. The analysis was done by comparing some randomly chosen measurements from a different bearing signal and assigning them with one of three outputs, “normal,” “degraded,” or “faulty.” The goal was to establish if the time domain classification RMS or statistical measure kurtosis would provide information that made each condition distinguishable. The following conclusions were made: • Vibration analysis can be used in the predictive monitoring of rotating machines. • RMS does seem to work for fault detection but should be complemented with some form of statistical kurtosis analysis. The information obtained through the machine learning and monitoring system can help operators to prepare the necessary material before a failure occurs. Thus, traditional maintenance strategies involving corrective and preventive maintenance can be replaced by predictive maintenance (since predictive maintenance is the core of industry 4.0). With an excellent recognition rate of nearly 100%, this demonstrates that the use of recognition of vibration signals can allow a more precise prediction of the various elements that fail on machines.

References 1. Nembhard, A.D., Sinha, J.K., Elbhbah, K., Pinkerton, A.J.: Fault diagnosis of rotating machines using vibration and bearing temperature measurements. Diagnostyka 14(3), 45–52 (2013) 2. Tabaszewski, M.: Optimization of a nearest neighbours classifier for diagnosis of condition of rolling bearings. Diagnostyka 15(1), 37–42 (2014) 3. Astolfi, D., Castellani, F., Terzi, L.: Fault prevention and diagnosis through SCADA temperature data analysis of an onshore wind farm. Diagnostyka 15, 71–78 (2014) 4. Chehri, A., Jeon, G.: The Industrial Internet of Things: Examining How the IIoT will Improve the Predictive Maintenance. Lecture Notes of the Institute for Computer Sciences, Smart Innovation Systems and Technologies. Springer, Heidelberg (2019) 5. Chehri, A., Jeon, G.: Routing Protocol in the Industrial Internet of Things for Smart Factory Monitoring. Lecture Notes of the Institute for Computer Sciences, Smart Innovation Systems and Technologies. Springer, Heidelberg (2019) 6. Jeon, G., Awais, A., Chehri, A., Cuomo, S.: Special Issue on Video and Imaging Systems for Critical Engineering Applications. Multimedia Tools and Applications, Springer (2020) 7. Jeon, G., Chehri, A, Cuomo, S, Din. S, Jabbar, S.: Special Issue on Real-time Behavioral Monitoring in IoT Applications using Big Data Analytics. Concurrency and Computation: Practice and Experience. Wiley (2019) 8. Saufi, S.R., Ahmad, Z.A.B., Leong, M.S., Lim, M.H: Challenges and opportunities of deep learning models for machinery fault detection and diagnosis: a review. IEEE Access 7, 122644– 122662 (2019) 9. Zhang, S., Zhang, S., Wang, B., Habetler, T.G.: Machine learning and deep learning algorithms for bearing fault diagnostics—a comprehensive review (2019) 10. Jaafar, A.: Vibration analysis and diagnostics guide. College of Engineering, University of Basrah (2012)

Automatic Classification of Rotating Machinery Defects …

203

11. Yang, W., Court, R., Jiang. S.: Wind turbine condition monitoring by the approach of SCADA data analysis. Renew. Energy 53 (2013) 12. Lee, J., Qiu, H., Yu, G., Lin. L.: Bearing data set. NASA ames prognostics data repository, NASA Ames Research Center, Moffett Field, University of Cincinnati (2004) 13. Saber, M., Saadane, R., Aroussi. H., Chehri, A.: An optimized spectrum sensing implementation based on SVM, KNN and tree algorithms. In: IEEE 15th International Conference on Signal Image Technology & Internet Based Systems, Sorrento, NA, Italy (2019) 14. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, Pittsburgh, Pennsylvania, USA, pp. 144–152 (1992) 15. Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K.: Decision-tree induction from time-series data based on a standard-example splittest. In: Machine Learning, Proceedings of the Twentieth International Conference, Washington, DC, USA (2003) 16. Kilian, Q., Weinberger, J., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2006) 17. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)

Technologies to Improve Senior Care

Potentials of Emotionally Sensitive Applications Using Machine Learning Ralf-Christian Härting, Sebastian Schmidt, and Daniel Krum

Abstract This paper is focusing on the Potentials of Emotionally Sensitive Applications Using Machine Learning. Artificial intelligence is a topic that has become increasingly relevant in recent years. Especially virtual assistants such as Alexa, Siri, and the Google Assistant have brought the topic to the public eye. The aim of this paper is to determine the emerging potential of artificial intelligence, especially in the area of emotionally sensitive applications. To determine these potentials, a literature review was carried out and ten experts in relevant fields were interviewed in semi-structured interviews. Based on a Grounded Theory approach, six influencing factors have been identified. As an additional result, possible areas of application for emotionally sensitive artificial intelligence could be identified. These were mainly areas with a high level of customer contact. But also areas such as personnel management and personnel coaching can benefit from emotionally sensitive AI.

1 Introduction With sentences like “Ok Google, how will the weather be tomorrow,” “Alexa, what dates do I have today” everyone will be able to connect them with the language assistants of Google and Amazon. These language assistants are based on Artificial Intelligence (AI) that are already part of our daily lives and are an indispensable part of our lives.

R.-C. Härting (B) · S. Schmidt · D. Krum Aalen University, Beethovenstraße 1, 73430 Aalen, Germany e-mail: [email protected] S. Schmidt e-mail: [email protected] D. Krum e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_17

207

208

R.-C. Härting et al.

The current breakthrough in all areas of AI can be explained by an increase in computing power, a large amount of available data, and the further development of the algorithms used by Machine Learning (ML) systems [1]. Virtual assistants in particular, who have long since reached a point where they have far exceeded a mere technical gimmick on our smartphones, benefit from this progress. Computer programs that imitate human communication will change existing business models in any area of customer contact. The fact that large companies like Deutsche Post, Deutsche Bank, and Telekom which are experimenting with chatbots suggest the future relevance of this topic [2]. A paradigm shift can already be seen today in the example of customer service. Already 70% of customers prefer to communicate via short messages to calls to customer service, 80% of customer questions can already be answered by chatbots, and 80% of all customer conversations are expected to be conducted via a chatbot by 2020 [3]. There are many interesting approaches to what chatbots can be used for in the future. The potential area of application is much larger than simply answering questions in customer service. There are different approaches to integrate emotions in AI applications [4] Since language is not only used to transmit information at the content level, but also to hide a lot of information such as emotions between the lines, which even humans can only vaguely recognize, this is an interesting application area for AI. Because AI is particularly successful in recognizing hidden patterns in the data provided to it that may remain hidden from humans [5]. An example of this is e-mail traffic. If the sentics (emotional) modulation is lacking, misunderstandings can arise. There is probably nothing that has made this point clearer than the enormous dependence of many people on e-mail (electronic mail), which is at present largely limited to text. Most people using e-mail have found misunderstood themselves at some point—their commentary has been recorded with the wrong tone [6]. After all, emotional intelligence is of central importance in areas such as marketing, trustworthiness, fluctuation, organizational awareness, and departmental management. This competence is of central importance, as the success of a company depends on whether it can build a long-term and sustainable relationship between customers, employees, and business partners [7]. Therefore, the aim of this paper is to identify different areas that offer the potential for emotionally sensitive applications in the future. Detailed information regarding the questionnaire can be found at http://www.kmu-aalen.de/kmu-aalen/? page_id=169.

Potentials of Emotionally Sensitive Applications …

209

2 Research Method and Data Collection To conduct the investigation of the potentials of emotionally sensitive applications, the authors developed a qualitative research study. This study is based on the methodology of Grounded Theory (GT) according to Glaser. This research approach originates from social science research and is still used to gain insights into the research question to be achieved [8]. Rather, the focus is on the elaboration and development of a theory than on its verification. Therefore, the primary goal of GT is to develop a new theory and create a model from the qualitative data collected. Important to note is that a literature review is not part of the concept of GT [9]. Moreover, an intensive examination of existing literature could lead to an impartial treatment of the research question. That is the main reason why an open attitude toward the course of the process is most important. The first step in this process according to the GT is to collect the required data [8].

2.1 Data Collection The data result was from a qualitative survey which was collected by implementing a semi-structured interview. This is the most effective method to get useful insights and knowledge from the interviewed experts. In the first step, the authors looked for leading experts in the area of emotionally sensitive applications to conduct the investigation. Therefore, some adequate companies have been selected and in the next step contacted. In the selected companies, eligible candidates have been identified to execute the expert interviews. For reasons of anonymity the names of the interviewed experts have been changed. These pseudonyms are shown in the following table. Within the framework of the research project, relevant experts were interviewed regarding their estimation of emotionally sensitive applications and their potentials. The field of the interviewed experts ranged from IT experts and corporate strategists to leadership and management functions. As one of the criteria, all experts had to have several years of professional experience in their positions. Such a broad field of positions were chosen in order to obtain the widest possible range of views (Table 1). The authors were able to locate ten participants who agreed to conduct an interview. This sample is already big enough to develop such a concept out of the collected data. All of the participants have some years of experience in their job positions. In other words, the researcher extends the sample size until the data collection (e.g., interviews) does not provide any new data. This could require 10, 20, 30, or more interviews [10].

210

R.-C. Härting et al.

Table 1 Interviewed experts Pseudonym

Job position

Martin

HR manager

Patricia

Head of Personnel and Organizational Development

Paul

Software developer

Marius

AI Expert

Andy

HR manager

Esther

NLP Human Factors Specialist

Sarah

Recruiting manager

Marcel

IT strategist

Manuel

Head of Information Processing

Torben

Graduate psychologist

2.2 Data Analysis In order to be able to analyze the data afterward, it is necessary to transcribe the partially structured interviews. The transcription means the written fixation of the conversation with the literal statements of all interlocutors [11]. With the coding method of Glaser and Corbin, the obtained data were analyzed. This allows correlations between the individual interviews to be filtered out and different categories to be derived [9]. In the first step, the interview data were compared and examined for relationships [12]. The results of the influencing factors are shown in Fig. 1. These influencing

Fig. 1 Empirical model of potentials of emotionally sensitive applications

Potentials of Emotionally Sensitive Applications …

211

factors are then analyzed in more detail and a subcategory is formed within the respective category [13]. Finally, all categories were checked for their relationship to the core category, which made it possible to eliminate categories without a relationship to the core category [12].

3 Potentials of Emotionally Sensitive Applications (Hypothesis Model) By analyzing the data using Grounded Theory, six influencing factors with associated indicators could be developed. These influencing factors form the basis for the identification of potentials of emotionally sensitive applications, with respect to, emotionally sensitive artificial intelligence. The model for the representation of the influencing factors is shown in Fig. 1. It is a so-called empirical model. In addition, hypotheses are generated to enable further investigations. An overview of the influencing factors and the associated indicators is presented in Table 2 of the appendix. In the following, the various identified influencing factors’ respective potentials based on the expert statements within qualitative social research are explained in detail.

3.1 Quality Improvement The first influencing factor that could be generated within the scope of the study is quality improvement. The experts were able to make clear statements about the extent to which the quality of processes in particular is improved by the use of emotionally sensitive artificial intelligence. It should be noted that the quality improvement relates to three specific thematic areas, which were therefore identified as indicators of the influencing factor. These include reducing errors, increasing accuracy, and increasing productivity. The experts’ statements confirm the relevance of emotionally sensitive artificial intelligence for the quality of processes. This is what software developer Paul says […] Especially in the medical field (diagnosis of diseases). Here, Artificial Intelligence far surpasses human capabilities in terms of accuracy. Due to the increasing digitalization of all areas of life, data that can be used by Artificial Intelligence is generated in masses that can be used by artificial intelligence. […]

While the accuracy of hits is increased, the use of artificial intelligence, e.g., in the form of emotionally sensitive applications, can also lead to a reduction in errors. This is confirmed by HR manager Andy: […] In the context of standard requests, or tiring activities that can be learned through artificial intelligence. Thus error reduction. […]

212

R.-C. Härting et al.

The IT strategist Marcel also recognizes the potential of artificial intelligence in error detection. The third indicator identified was the increase in productivity, which was highlighted by Patricia, Head of Personnel and Organizational Development: […] Increased productivity, i.e. in the case of highly standardized activities, the use of the language assistant could possibly lead to a higher number of units being produced. The language assistant is impartial. There are no more subjectivity or exceptions. The language assistant can be used 24/7. […]

As a conclusion of the survey results, it can be deduced that emotionally sensitive applications have the potential in improving the quality of processes. The resulting hypothesis is that the use of emotionally sensitive applications can improve the quality of processes (e.g., by reducing errors).

3.2 Cost Savings Cost savings are an important influencing factor that was identified in the study. This factor was cited by over 50% of respondents. The experts identified several reasons for the cost savings. It should be noted that the cost savings essentially relate to two concrete areas, which were therefore determined as indicators of the influencing factor. This includes the automation of processes and the use of emotionally sensitive artificial intelligence in administrative activities. Human Resources manager Andy sees great potential in extending the use of artificial intelligence to administrative activities: […] The mechanical robots, which have taken over mechanical work processes from skilled workers and replaced them, will also extend with Artificial Intelligence to technical and administrative, i.e. less value-adding administrative topics, and offer savings potential here. […]

Until now, robots have mainly been used for mechanical activities in production. With the help of artificial intelligence, however, the field of application is also increasingly extended to administration. The expert for artificial intelligence, Marius, also identifies potential cost savings through the use of artificial intelligence and software developer Paul says […] I am convinced that especially in customer service there will be hardly any real people left in the future. Similar to robotics in production lines, a few people will be responsible for monitoring them. […]

For personnel manager Martin, artificial intelligence in the field of automation can contribute to cost savings: […] We often receive requests for appointments or vacancies. These are easy to answer but cost us a lot of time and could be solved by artificial intelligence. This would give employees more time to deal with more complex issues. […]

Potentials of Emotionally Sensitive Applications …

213

Through the automation of relevant business processes, for example, rationalizations can be carried out in the administrative area and costs can be saved. Manuel, head of information processing, sees the advantage of emotionally sensitive artificial intelligence as the fact that such systems can take over part of the data acquisition as assistance and that suggestions for further procedures can be made via sets of rules. As a conclusion, it can be deduced that emotionally sensitive applications have potential in the area of cost savings, which mainly result from increased automation and use in administration. The resulting hypothesis is that the use of emotionally sensitive applications can reduce a company’s costs (e.g., through automation).

3.3 Improvement of Individual Customer Service The third influencing factor in the empirical model is the improvement of individual customer care. Some of the experts attach particular importance to this factor, assuming a fully functional technical solution. It could be noted that the improvements in individual customer service essentially relate to three concrete areas, which were therefore determined as indicators of the influencing factor. These include the increase in individual customer satisfaction, the decoupling of actual employee capacities, and the customer’s feeling that his criticism has been taken seriously. According to the expert Marcel, emotionally sensitive artificial intelligence can potentially increase the acceptance of virtual assistants and additionally he says: […] may an emotion-sensitive language assistant bring advantages in terms of customer satisfaction, provided that content and performance are right. […]

It is emphasized that the content and performance of emotionally sensitive artificial intelligence must be mature. This is essential when dealing with customers. Nevertheless, Marius, an expert in artificial intelligence, also sees potential in the area of customer satisfaction. Customer care can be improved by using emotionally sensitive artificial intelligence and, in particular, language assistants. This is reflected, for example, in the all-day availability of artificial intelligence. This is not limited in time by any legal regulations and customer care can, therefore, be decoupled from the actual employee capacity. Software developer Paul, among others, believes that this is the case: […] Full-featured emotion-sensitive language assistants relieve the strain on real customer service employees. […]

The NLP Human Factors Specialist, Esther, also suspects the possibility of a work relief: […] Potentials through the use of Artificial Intelligence arise in the area of human factors, e.g. in reducing the workload of the operator. […]

214

R.-C. Härting et al.

If employees are relieved of artificial intelligence, their manpower is instead freed up for more important, value-adding activities. They can focus on important topics, e.g., in the field of science. Recruiting manager Sarah sees this advantage in emotionally sensitive artificial intelligence. The experts identify a significant potential in the fact that customers feel taken seriously in their criticism of artificial intelligence. Software developer Paul mentioned: […] Through an emotional component that understands the needs and mood of the opposite, customers would feel taken seriously in their criticism and no longer, as before, when it comes to complaints only accept a human customer care representative. […]

IT strategist Marcel adds that the acceptance of virtual assistants can be significantly increased by emotion sensitivity and thus contributes to work relief and higher automation. Personnel manager Martin identifies another important aspect of improving individual customer service: […] In my experience, emotions and needs are topics that are often misinterpreted by people. If they have this ability, machines could possibly judge situations more objectively than humans and identify dissatisfied or angry customers at an early stage. […]

The conclusion is that emotionally sensitive artificial intelligence has manifold potentials in individual customer care. The prerequisite for this, however, is that the technology is full-fledged in terms of content and performance. The resulting hypothesis is by using emotionally sensitive applications, individual customer care can be improved (e.g., by decoupling actual employee capacities).

3.4 New Business Models Another influencing factor is the emergence of new business models. From an economic point of view, digitization is changing business models [14]. Through the use of emotionally sensitive artificial intelligence, digitization can be promoted and thus also the change of business models. In some cases even completely new business models can emerge. It should be noted that the emergence of new business models essentially relates to three specific areas, which were therefore identified as indicators of the influencing factor. These include new services, changes in the value chain, and the digitization of processes. Since emotionally sensitive artificial intelligence can support or relieve employees in customer service, for example, employee capacities are freed up. Patricia, head of personnel and organizational development, notes […] On the other hand, this can create time for advisory activities and thus be used in a broader and more value-adding way. […]

The use of emotionally sensitive artificial intelligence can, therefore, change the value chain. For example, personnel who were tied to the end of the value chain

Potentials of Emotionally Sensitive Applications …

215

as part of customer service can now be deployed more widely or elsewhere in the value chain. Esther and Sarah, for example, state that their companies are already using artificial intelligence in the form of projects in human resources. In addition, artificial intelligence is also used in the context of IT services. In addition, completely new services can be created with emotionally sensitive artificial intelligence. Software developer Paul gives the following example: […] Artificial intelligences have special competences in the field of diagnostics, which could be well adapted here (medication/recognition of symptoms of diseases). […]

The head of information processing, Manuel, also sees the possibility that interpersonal therapy interactions can be selectively replaced by human–machine interaction. This could create alternatives for people for whom no therapy can be guaranteed due to the lack of specialists. According to Recruiting manager Sarah, it would also be conceivable to offer services that support people in their everyday lives or guarantee them long-term independence. She also sees a further service in the promotion of globality through support in the translation of languages. This means that emotionally sensitive artificial intelligence can create a spectrum of new service offerings. Emotionally sensitive artificial intelligence can also be of great help in digitizing processes. If it is mature and fully functional, it will also be possible in the future to digitize processes that previously had to be completed primarily by humans. Personnel manager Martin comments on the potential: […] especially in the digitization of business processes. Another interesting approach for more mature systems would be to digitize and automate the application process to a certain point. […]

As a conclusion, it can be deduced that emotionally sensitive artificial intelligence can change existing business models of a company and even create completely new business models. The resulting hypothesis is that new business models can be created through the use of emotionally sensitive applications (e.g., new services).

3.5 Customization Individualization was identified as the fifth influencing factor. Individualization has become an increasingly important topic for companies in recent years. Consumers usually prefer products that are better suited to their needs. As a result, more and more companies are addressing the issue and driving individualization forward [15, p. 66]. The potential of individualization essentially relates to two concrete areas, which were therefore determined as indicators of the influencing factor. These include the reaction to the needs of customers and individualized services. Software developer Paul sees a possible benefit of emotionally sensitive artificial intelligence in the recognition of needs:

216

R.-C. Härting et al.

[…] A language component would also be conceivable here to inquire about the needs of the person to be cared for and to react accordingly. […]

By recognizing needs, artificial intelligence can take that into account when responding to them. This enables the customer, for example, to be individually looked after. Individualized services can also be provided. Recruiting manager Sarah, for example, mentions empathic training companions that can be used as support for the application process and can respond individually to the respective applicant. The conclusion is that emotionally sensitive artificial intelligence has manifold potentials for individualization. The resulting hypothesis is that emotionally sensitive applications can improve and further advance individualization (e.g., by responding to the user’s needs).

3.6 Simplification of Processes The last influencing factor identified in the study is process simplification. Artificial intelligence is already being used to simplify human–machine communication in automation processes [16, p. 24]. Emotionally sensitive artificial intelligence can contribute to the simplification of processes, since it can be used for processes that have not yet been automated. Andy, HR manager, recognizes this potential: […] In the context of a service orientation of personnel processes, a facilitation and simplification of highly standardized processes can be achieved. […]

The ability of emotionally sensitive artificial intelligence to recognize needs and respond to them can further improve human–machine communication and increasingly simplify processes. Torben, graduate psychologist, mentions […] Applicant management and customer service are very suitable for emotionally sensitive applications. I also see potential for internal processes. […]

The conclusion is that emotionally sensitive artificial intelligence can be used to simplify processes, especially in processes that were previously excluded from automation (e.g., customer service). The resulting hypothesis is that emotionally sensitive applications can simplify processes.

4 Summary and Outlook As a result of this study, potentials of emotionally sensitive applications based on artificial intelligence were identified based on empirical data from German experts. The qualitative study was conducted by means of expert interviews and evaluated according to the Grounded Theory concept. Six decisive influencing factors were

Potentials of Emotionally Sensitive Applications …

217

identified during the evaluation of the interview data. These include quality improvement, cost savings, improvement of individual customer service, new business models, individualization, and process simplification. Together with the associated indicators, these influencing factors form the basis for the identified potentials. A great potential of emotionally sensitive artificial intelligence lies in the fact that processes can be simplified and automated or supported by artificial intelligence, which were previously excluded. These include administrative processes. In addition, an improvement in quality can be achieved, since the increasing automation results in error reductions and increased accuracy of hits. In the area of customer care, emotionally sensitive artificial intelligence can be used to ensure more comprehensive support, as one is not bound by statutory working hours. Since the artificial intelligence can respond to needs, individual customer satisfaction can still be ensured, since the customer feels, for example, that his criticism has been taken seriously. Emotionally sensitive applications can also change existing business models of companies or even create completely new business models, such as alternatives for therapies that previously could only be provided on an interpersonal basis. A further potential lies in individualization, which can be further promoted by emotionally sensitive artificial intelligence by responding to user needs and providing individualized services. Notwithstanding the extensive and useful results obtained by the study, limitations of the research should nevertheless be mentioned. First of all, it should be noted that experts from Germany were interviewed, except for only one interviewee outside Germany. Furthermore, not all areas of the economy were reflected by the interviewed experts. Some of the interview partners recruited work in human resources while no interview partner could be recruited who works in management. In addition, two of the interviewed experts come from the healthcare industry, which places very high demands on applications based on artificial intelligence [17]. Finally, it should be noted that the number of respondents can lead to variances in the results. The hypotheses derived from the study form the basis for further research in theory and practice. The aim of the expert survey was an initial identification of the potentials of emotionally sensitive applications or emotionally sensitive artificial intelligence. A holistic view of the study results revealed that the systems developed so far do not have a sufficient degree of maturity to be used toward the stakeholders. Interpersonal contact is preferred, as complex and non-standardized content is often communicated. A reliable representation of this complex communication via artificial intelligence is not yet possible. In addition to the identification of potentials, a risk analysis as well as an ethical and economic analysis should also be carried out. Finally, it can be summarized that the qualitative method used, as well as the empiricism gained from it, could make an important contribution to answering the leading research question, which potentials are there for emotionally sensitive applications or emotionally sensitive artificial intelligence.

218 Table 2 Summary of influencing factors and indicators

R.-C. Härting et al. Influence

Indicators

Quality improvement

Error reduction Increased accuracy of hits Increase in productivity

Cost savings

Use in administrative activities Automation of processes

Improvement of individual customer service

Increase individual customer satisfaction Customer feels taken seriously in criticism Decoupling of actual employee capacities

New business models

New services Change in the value chain Digitization of processes

Customization

Responding to needs Individualized services

Appendix See Table 2.

References 1. Chui, M., Manyika, J., Henke, N., Chung, R., Nel, P., Malhotra, S.: Notes from the AI frontier: insight from hundreds of use cases (2018) 2. Ivanov, S.H., Webster, C.: Adoption of robots, artificial intelligence and service automation by travel, tourism and hospitality companies—a cost-benefit analysis. In: Artificial Intelligence and Service Automation by Travel, Tourism and Hospitality Companies–A Cost-Benefit Analysis (2017) 3. Business Insider, 80% of businesses want chatbots by 2020. https://www.businessinsider.de/ 80-of-businesses-want-chatbots-by-2020-2016-12?r=US&IR=T. Accessed 20 Mar 2019 4. Calvo, R., D’Mello, S.: Affect detection: an interdisciplinary review of models, methods, and their applications. Trans. Affect. Comput. 1, 18–37 (2010) 5. Wolfangel, E.: Die Seele auf der Zunge. Die Zeit, pp. 27–28 (2008) 6. Picard, R.W.: Affective Computing. MIT press (2000) 7. Holt, S., Jones, S.: Emotional intelligence and organizational performance: implications for performance consultants and educators. Perform. Improv. 44(10), 15–21 (2005) 8. Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Routledge, London and New York (2017) 9. Strauss, A., Corbin, J.: Grounded Theory Methodology—An overview. Published in Handbook of qualitative research von Denzin, N. und Lincoln, Y., Thousand Oaks (1994) 10. Thomson, S.B.: Sample size and grounded theory. JOAAG 5(1), 45–52 (2011)

Potentials of Emotionally Sensitive Applications …

219

11. Stadler, E., Mündliche Befragung, published in “Empirisches wissenschaftliches Arbeiten: Ein Studienbuch für die Bildungswissenschaften” von J. Aeppli u.a., published by utb GmbH (2014) 12. Grounded Theory Institut. http://www.groundedtheory.com/what-is-gt.aspx. Accessed 8 Feb 2019 13. Strauss, A., Corbin, J.: Basics of Qualitative Research: Procedures and Techniques for Developing Grounded Theory. Published by Thousand Oaks. Sage, CA (1998) 14. Wirtschaftskammer Österreich, Digitalisierung der Wirtschaft. Bedeutung, Chancen und Herausforderungen, Stabsabteilung Wirtschaftspolitik, Wien (2016) 15. Chung, T.S., Wedel, M., Rust, R.T.: Adaptive personalization using social networks. J. Acad. Mark. Sci. 44(1) (2016) 16. Eschweiler, J., Tiefemann, M., Möller, T., Rameseder, S., Hechler, E.: Lernen aus Daten. Digitale Welt 3(1) (2019) 17. Chung, K., Park, R.: Chatbot-based healthcare service with a knowledge base for cloud computing. Clust. Comput. 22 (2019)

A Brief Review of Robotics Technologies to Support Social Interventions for Older Users Daniela Conti , Santo Di Nuovo , and Alessandro Di Nuovo

Abstract In the last few decades, various studies demonstrated numerous robotics applications that can tackle the problem of the aging population by supporting older people to live longer and independently at home. This article reviews the scientific literature and highlights how social robots can help the daily life of older people, and be useful also as assessment tools for mild physical and mental conditions. It will underline the aspects of usability and acceptability of robotic solutions for older persons. Indeed, the design should maximize these to improve the users’ attitude toward the actual use of the robots. The article discusses the advantages and concerns about the use of robotics technology in the social context with a vulnerable population. In this field, success is to assist social workers, not to replace them. We conclude recommending that care benefits should be balanced against ethical costs.

1 Introduction Robotics is a broad field covering different aspects of the creation and use of robots, from familiar technologies like automated vacuum cleaners and smart appliances to more advanced robots that look like humans or animals. While there is no universally accepted definition of a robot [1] they typically comprise “sensors” gather information about the robot’s environment, such as monitoring temperature, “actuators” provide physical motion to the robot in response to input from the sensors and controllers, such as hoists, and “controllers” respond to data from the sensors and allow parts of one or more robots to operate together [2]. D. Conti (B) · A. Di Nuovo Sheffield Robotics, Sheffield Hallam University, Sheffield S1 1WB, UK e-mail: [email protected] A. Di Nuovo e-mail: [email protected] S. Di Nuovo University of Catania, Via Teatro Greco 84, 95124 Catania, Italy e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_18

221

222

D. Conti et al.

While robots are typically thought to comprise all three components, sensors and actuators can be employed on their own and can be used in social care, like sensors that detect falls and actuators in the form of stairlifts. Robots can operate with varying levels of autonomy by making use of artificial intelligence (AI), and machine learning technologies [3]. A wide range of robotic technologies can be used in social care by providing physical, social, and cognitive assistance and several studies report positive impacts on users’ education, mobility, mental health, and cognitive skills [4–10].

1.1 Social Robots as Support to the Social Care Systems The promise of Artificial Intelligence (AI) and Robotics in health care offers substantial opportunities to improve patients and clinical team outcomes, reduce costs, and influence population health. Robots can free up time for caregivers enabling them to focus on delivering a better service for care recipients. Meanwhile, more advanced robots can help to reduce loneliness and isolation by facilitating connectivity with friends and family, and even simulating social interaction. Robots providing physical assistance have been shown to increase users’ autonomy and dignity by assisting with tasks like feeding, washing, and walking, and are being developed to support physiotherapy [11]. Prototypes of robotic toilets have also been developed that can raise, tilt, recognize the user, and adjust settings according to their preferences. Novel opportunities are provided by Socially Assistive Robots (SAR), which include robots that aid daily living activities, such as those that remind users when to take their medicine and those that detect and prevent falls [12]. It can also include robots designed to provide companionship and assist with loneliness and social engagement, monitor and improve well-being, and can also help educate preschool children [13, 14]. For instance, a pilot study conducted by Allen [15] found that the use of Amazon Echo did result in a reduction in users’ self-reported feelings of isolation and loneliness. Specifically, in a sample of participants aged 16–93 (60% female, 40% male) 70% had Parkinson’s disorder, 16% showed visual impairment, and 14% had elderly frail. Results showed that 72% “agree Echo helps improve their life,” 68% “agree Echo helps maintain their independence,” 64% “agree Echo gives more access to information,” 62% “agree Echo helps them feel less isolated,” and 48% “agree Echo reduces their reliance on others” [15]. Also, robotic pets introduced in an UK care home were reported to bring happiness and comfort to residents. While much has been written about the potential uses of such technology, the development and use of robotics in social care is still relatively new and, currently, there is limited evidence of robotic technology being used in social care outside of some small-scale trials. Many of the robotic services developed for social care appear to be at the conceptual or at design phase, because of the technical limitations to the tasks that they can undertake. However, this might change with the increasing adoption of technology in social care and investment in robotics research.

A Brief Review of Robotics Technologies …

223

1.2 Social Robots and Ethical Issue So far, most of the research studies about robots in social care were focused only on evaluating how well the technology functions, without really considering the deeper socio-economic impact. A key question is whether robots and robotic technology can integrate into existing social care environments, and with current technology, or replace them altogether [16]. Currently, there are technical limitations to the tasks that they can undertake. Crucially, increasing the use of robotics in social care will require training for existing staff to be able to work alongside the new technology. Analysts suggest that there may be more jobs in other sectors, such as for those with skills in robotics including data analysts and programmers, but these are already in very high demand and it is unclear how the new positions can be filled. Furthermore, ethical issues have been raised: for the degree to which robots could prevent people from engaging in risky behaviors, e.g., smoking; the extent that robots could persuade users to do something if they did not wish to, like take scheduled medication; and the potential that users may become dependent on robots, undermining their ability to do things for themselves and reducing independence [17]. Other challenges to the use of robotics in social care include privacy, security and legal and regulatory concerns [18]. However, these are like those we currently face when using smartphones and computers and they can be addressed by applying similar solutions [19]. For example, robots capable of processing personal data are subject to regulation under the General Data Protection Regulation (GDPR), which requires “privacy-by-design,” whereby data protection safeguards are built into technology early on.

1.3 Acceptance or Acceptability Toward Robots A component of robotics research has recognized the importance of understanding general attitudes toward robots, including social acceptance and social acceptability [20]. Some conceptualizations of social acceptance imply its importance for situations in which participants are exposed to specific robots, such as when they define it “attitudes and perspectives about the socialness of the robot and…willingness to accept it as a social agent” [21] (p. 547). Social acceptance is also known to be culture-specific [21], and it is thus more frequently used to refer to relatively constant attitudes and general evaluations of robots as opposed to reactions in specific situations [22]. Busch and colleagues [23] considered acceptability divided in social acceptability that is how society views robots, and practical acceptability that is how people perceive a robot during or after interacting with it. Also, the difficulty of distinguishing between general opinions about robots and perceptions during specific encounters with robots is obvious in usage that equates acceptability with user acceptance [24]. Nam [25] suggests a

224

D. Conti et al.

clearer distinction regarding timing has been offered: “acceptability” as an evaluation before use or implementation of robots, and “acceptance” as the evaluation after the implementation. Literature suggests that acceptance is influenced by the psychological variables of individual users [26, 27] and their social and physical environment [28]. Also, the evidence regarding how gender, education, age, and prior computer experience impacts on anxiety and attitude toward robots presents a complex picture. Examining technology acceptance is closely related to research fields of social acceptance and attitudes in general. In detail, the deployment of new technology concerning social and human factors has been studied under the concept of technology acceptance [29], and based on the theory of reasoned action [30]. Currently, robots are starting to become a part of working life in many areas including journalism [31], agriculture [32], the military [33], medicine such as surgery [34], education [14, 35], and care [36, 37]. A factor influencing the attitude toward robots may indeed be a concern over the risk of unemployment caused by robots [38], considering certain occupations are even at risk of being replaced by robots or other technology [39]. In recent decades, many studies on the factors that can influence the acceptance by potential users and on how such acceptance can be increased have been conducted. There is a mixed attitude in the public about robots. According to a Eurobarometer survey [40], Europeans (n = 26.751) generally have a positive view of robots, but they do not feel comfortable about having robots in domains such as caring for children, the elderly, and the disabled. In fact, 60% of Europeans surveyed thought robots should be banned from such care activities [40]. Specifically, they are seen as technically powerful but potentially dangerous machines, which are mainly useful in space exploration, in military applications, and in industries where human beings are not present. For this reason, the objectives of the recent robotics research focus on the attitude, usability, and acceptability of the users, aspects which are often not correlated [41–45]. The recent Eurobarometer survey shows also the public concern about robots, a technology that “require careful management” (88%), and about replacing humans and stealing jobs (72%). Though, the analysis underlines how the attitude is related to the exposition to information in the last year, which makes more likely to have a positive view of artificial intelligence and robots (75% vs. 49% who have not). This negative attitude is shaped by people’s previous experience and expectations and may be indicated through their attitudes to computers and related technologies more generally [46]. Taipale et al. [47] specified further that people are reluctant to use robots in the fields of childcare and elderly care, leisure, and education. Nor did they favor robots for “jobs that require artistry, evaluation, judgement and diplomacy” [48]. On the other side, a project by the Isle of Wight council suggested that robots were perceived more positively when they are designed to operate alongside people or with human input because they are less likely to replace human caregivers [49].

A Brief Review of Robotics Technologies …

225

2 Social Robotics for Older People The growing number of older people living alone in need of care is one of the great societal challenges of the most developed countries (e.g., Japan, USA, Europe, and Australia) [50]. Indeed, high-income countries have the oldest population profiles, with more than 20% of the population predicted to be over 65 in 2050, when citizens older than 80 will be triple than today. This is likely to increase social isolation and loneliness, which can be associated with several health hazards, e.g., cognitive deterioration, and increased mortality [51]. This is a challenge for the social care systems, which, as of now, are struggling to meet the demand of assistance for vulnerable adults because of limitations in their budgets and, moreover, in the difficulty in recruiting new skilled workers. The new technologies, in particular the social robotics, are seen as a way to address human resource and economic pressures on social care systems. Humanoids robots are capable to provide greater support to older people, because they can pick things up, move around on their own, and have a more natural, intuitive way of interaction, e.g., include gestures with the hands and arms. Usually, the more advanced humanoid platforms embed additional sensors and devices like touchscreens to provide easier to use interfaces, thanks to multimodality: it has been observed that older users preferred to send commands to the robot using speech because they found touchscreen difficult to use, vice versa they like to have visual feedback on screen when the robot is speaking. The availability of multiple ways for the interaction is indispensable in the case of age-related hearing loss or visual impairments which can reduce the ability of the elderly to interact [52]. The increasing evidence from scientific research is leading the growth of the robotics market focused on services for aging well, with robots that are increasingly available to assist and accompany the older users. To this end, one of the most developed commercial examples is Mobile Robotic Telepresence (MRT) [53] systems that incorporate audio and video communication equipment onto mobile robot devices which can be steered from remote locations. MRT systems facilitate social interaction between people by eliminating the necessity to travel while still providing a physical presence, which has a greater positive influence in the social perception of the interaction [54]. Thanks to MRT technology, relatives can visit more often their older family members and social workers will be able to engage more clients per day, especially in sparsely populated rural areas. MRT from a simple smartphone app, meanwhile the local user is free while interacting with the pilot user who can also use the robot to inspect the home. The freedom for the local user is particularly beneficial in the case of people with disability who can have difficulties in reaching a phone. However, MRT systems still require a human operator for the social interaction, which can be present only for a limited amount of time during the day, for the rest of the time the current MRT systems risk to remain just a modern piece of furniture with no use.

226

D. Conti et al.

Another solution could be robot companions, which embody advanced Artificial Intelligence (AI) functionalities to conduct social interaction in complete autonomy. Nevertheless, such completely autonomous robots are not available on the market yet, but the underlining idea of a robot companion has been extensively investigated with pet-like shape robots, e.g., Aibo, MiRo, or humanoids robot, e.g., Pepper, Careo-Robot, which resemble the shape of the human body. Pet robots are programmed with limited interaction abilities, but they proved to be as effective or even more than real pets in reducing loneliness [55] for elderly in care homes while overcoming the concerns about live animals. Humanoid robots are more ambitious systems, which include support for complex functionalities such as dexterous manipulation, advanced navigation, and a natural, more intuitive interface, which can overcome some of the difficulties currently experienced especially by the elderly, thanks to the multimodality of stimulation given by them [52]. Social robots can provide a solution for the aging population challenge, in particular, to reduce social isolation and loneliness. Solutions like MTR systems or pet-like companions are already in the market and ready to be deployed soon. More sophisticated humanoid companions with humanlike social capabilities are being studied and seem a promising solution for more comprehensive quality care. Nevertheless, researchers and service providers must address public anxiety and make clear that the robots are being designed to improve productivity by assisting the social workers, who will be facilitated in their work and not replaced. Moreover, robot programmed autonomy has to be limited and humans must always be in full control so that any danger or accidental situation can be avoided. Scientific research is also exploring multi-robot systems to favor independent living, improve the quality of life, and the efficiency of care for older people. For instance, this was the case of the Robot-Era project [52] where a multinational European consortium of academies and industries developed a plurality of complete advanced robotic services, integrated into intelligent environments. The project conducted one of the largest experiment ever carried out using multiple service robots, developing eleven different services to support older users individually at home, or collectively in the building and outside. In summary, the experimental results [6] showed that the robot companions can be effective at home as an instrument to help the family with their care and in case of need (e.g., illness). Researchers are also exploring the use of multi-robot systems which would enable more independent living for seniors because they are able to coordinate with each other to better perform their tasks, also outside the home.

2.1 Acceptance and Acceptability in Older People The acceptance of robots by older people has been examined in many studies. However, usually, older users have expressed an opinion without interacting directly with a robot, showing a strong limitation in the studies [6].

A Brief Review of Robotics Technologies …

227

In a study, a robot was used as a physical exercises coach with 33 older participants [56]. The results showed that most of the users were pleased with the robot as an exercise motivator [56]. In another recent study with 16 adults, the acceptability of robots for partner dance-based exercise was investigated. The results suggested that the robot was perceived as useful, easy to use, and enjoyable [57]. In a study with 32 older participants, the authors [58] investigated how the human-likeness of the robot’s face could influence the perceptions of the robots. But no real robots were used in the study and the imagination of the participants was stimulated by robot images. Finally, with interviews and questionnaires, the results showed a greater preference for the human aspect of the robots by older adults [58]. In European Robot-Era project the results of the experiments indicated that older participants were keen to accept robot companions at home as a way to help the family with their care [52]. Specifically, experiments were conducted in a domestic environment, condominium, and outdoor areas. Eleven robotic services were provided by the Robot-Era system, and each service was tested by older adults who extensively interacted directly with three robots to accomplish tasks [6]. The perception of usability, measured using the System Usability Scale [59] was very high (the median score for 67 users was 82 out of 100, over the cutoff score of 68), and significantly correlated (0.32; p < 0.05) with acceptability, measured using the Unified Theory of Acceptability and Use of Technology (UTAUT) questionnaire [60]. More specifically, the Perceived ease of use correlated with usability 0.50 (p < 0.001), more than the intention to use (0.31; p < 0.05) [52]. Moreover, the actual usability of the system influences the perception of the ease of use only when the user has no or low experience, while expert users’ perception is related to their attitude toward the robot [6]. This finding should be more deeply analyzed, because it may have a strong influence on the design of the future interfaces for elderly–robot interaction since it is expected that the number of elderly that possess and use technological devices is growing. Finally, the authors suggest that the positive perception of the robots’ aesthetics could play a role in increasing the acceptance of robotic services by older users [6, 61].

2.2 Social Robots as Assessment Tools Considered that often the diagnosis is affected by the bias of the subjectivity of the evaluator, some recent studies have investigated how social robots could support the clinician during psychological diagnosis. Studies indicated that the support of robots could lead to a more objective assessment, guarantee standardized administration and assessor neutrality, especially for gender and ethnicity, and allow micro-longitudinal evaluations [44, 62]. Indeed, robots can be a useful tool for large-scale screening of cognitive functions. This condition requires further examinations by clinical psychologists, who must always be responsible for the final diagnosis. This can occur if the robotic administration of a cognitive test is supervised by a professional expert [63, 64].

228

D. Conti et al.

In a recent study, 21 elderly Italian participants were involved [44]. The aim was to compare the prototype of a robotic cognitive test with a traditional psychometric paper and pencil tool and investigated personality factors and acceptance of technology on tests. The authors tested the validity of the robotic assessment conducted under professional supervision. Some factors such as Anxiety (0.47; p < 0.05), Trust (–0.49; p < 0.05), and Intention to use (0.47; p < 0.05) were related to performance in psychometric tests. Finally, the results show the positive influence of Openness to experience on the interaction with the robot’s interfaces (0.58; p < 0.01) [44].

3 Conclusion Though research into social robots is just beginning, we know so far that they can provide some solutions to society’s aging population challenge—and might also help in reducing social isolation and loneliness—if society is willing to adopt them. MRT systems and “pet” companions are already on the market. Humanoid companions are still being studied, but seem like a promising solution for more comprehensive quality care [65]. In a recent review, Savela and colleagues [22] found that social acceptance of robots is still a relatively new but an incremental field of research as most of the 42 selected studies were published in the 2010s and were focused on the fields of social and health care. The literature suggests that young people are more in favor than older people to use robots in caring [66]. Also, differences with males, between countries, and those who live in cities, and more educated are more favorable have been found. Besides the importance of psychosocial variables for user acceptance of social robots and technology in the context of everyday functioning, the level of psychosocial functioning could either hinders or promotes robot acceptance [67]. The observation of an “uncanny valley,” that is a phenomenon in which highly humanlike entities provoke aversion in human observers, has had an important role in the recent researches [68]. To understand the uncanny valley, and the visual factors that contribute to an agent’s uncanniness, the relationship between human similarity and people’s aversion toward humanlike robots via manipulation of the agents’ appearances were been studied [69]. The authors showed a clear and consistent “uncanny valley,” and the category ambiguity and atypicality provoke aversive responding, thus shedding light on the visual factors that drive people’s uneasiness [69]. Also, the time and/or exposure to robots is unlikely to mitigate the “uncanny valley” effect, because no relationship exists between people’s aversion and any pre-existing attitudes toward robots [69]. The robots’ acceptance in intervention and diagnostic evaluation will be essential for employing robots in social purposes, particularly for older users. However, it is evident that the research is still in progress and, as usual in the diffusion of innovation, the success is mostly shown with early adopters. For this reason, future studies should focus on managing and acting upon adverse user responses to maximize the

A Brief Review of Robotics Technologies …

229

effectiveness of robots also with the general population. Furthermore, longitudinal studies would be needed to assess the long-term effects—positive and negative—of how older people perceive social robots. In conclusion, academics and service providers must address public anxiety and clarify that robots are designed to assist social workers, not to replace them. Indeed, before social robots can be fully integrated on a wider scale into practices and care homes, most robotics researchers sensibly recommend that care benefits should be balanced against ethical costs. As long as humans remain in full control to avoid any negative outcome, robots might well be the future of care.

References 1. Harper, C., Dogramadzi, S., Tokhi, M.O.: Developments in vocabulary standardization for robots and robotic devices. In: Mobile Robotics: Solutions and Challenges, pp. 155–162. World Scientific (2010) 2. Siciliano, B., Khatib, O.: Springer handbook of robotics. Springer (2016) 3. Conti, D., Di Nuovo, S., Cangelosi, A., Di Nuovo, A.: Lateral specialization in unilateral spatial neglect: a cognitive robotics model. Cogn. Process. 17, 321–328 (2016). https://doi.org/ 10.1007/s10339-016-0761-x 4. Di Nuovo, A., McClelland, J.L.: Developing the knowledge of number digits in a child-like robot. Nat. Mach. Intell. 1, 594–605 (2019). https://doi.org/10.1038/s42256-019-0123-3 5. Di Nuovo, A., Jay, T.: Development of numerical cognition in children and artificial systems: a review of the current knowledge and proposals for multi-disciplinary research. Cogn. Comput. Syst. 1, 2–11 (2019). https://doi.org/10.1049/ccs.2018.0004 6. Cavallo, F., Esposito, R., Limosani, R., Manzi, A., Bevilacqua, R., Felici, E., Di Nuovo, A., Cangelosi, A., Lattanzio, F., Dario, P.: Robotic services acceptance in smart environments with older adults: user satisfaction and acceptability study. J. Med. Internet Res. 20, e264 (2018) 7. Matari´c, M.J.: Socially assistive robotics: human augmentation versus automation. Sci. Robot. 2, eaam5410 (2017) 8. Wood, L.J., Zaraki, A., Robins, B., Dautenhahn, K.: Developing kaspar: a humanoid robot for children with autism. Int. J. Soc. Robot. (2019). https://doi.org/10.1007/s12369-019-00563-6 9. Wang, N., Di Nuovo, A., Cangelosi, A., Jones, R.: Temporal patterns in multi-modal social interaction between elderly users and service robot. Interact. Stud. 20, 4–24 (2019) 10. Conti, D., Cattani, A., Di Nuovo, S., Di Nuovo, A.: Are Future Psychologists Willing to Accept and Use a Humanoid Robot in Their Practice? Italian and English Students’ Perspective. Front. Psychol. 10, 1–13 (2019). https://doi.org/10.3389/fpsyg.2019.02138 11. Bowling, A.: Quality of life: Measures and meanings in social care research. (2014) 12. Pedersen, I., Reid, S., Aspevig, K.: Developing social robots for aging populations: a literature review of recent academic sources. Soc. Compass. 12, e12585 (2018) 13. Belpaeme, T., Kennedy, J., Ramachandran, A., Scassellati, B., Tanaka, F.: Social robots for education: a review. Sci. Robot. 3, eaat5954 (2018) 14. Conti, D., Cirasa, C., Di Nuovo, S., Di Nuovo, A.: “Robot, tell me a tale!”: a social robot as tool for teachers in kindergarten. Interact. Stud. 21, 221–243 (2020) 15. Allen, M.: Alexa, Can You Support People With Care Needs? (2018) 16. Prescott, T.J., Caleb-Solly, P.: Robotics in social care: a connected care EcoSystem for independent living (2017) 17. Sharkey, A., Sharkey, N.: Granny and the robots: ethical issues in robot care for the elderly. Ethics Inf. Technol. 14, 27–40 (2012)

230

D. Conti et al.

18. Leenes, R., Palmerini, E., Koops, B.-J., Bertolini, A., Salvini, P., Lucivero, F.: Regulatory challenges of robotics: some guidelines for addressing legal and ethical issues. Law, Innov. Technol. 9, 1–44 (2017). https://doi.org/10.1080/17579961.2017.1304921 19. Draper, H., Sorell, T.: Ethical values and social care robots for older people: an international qualitative study. Ethics Inf. Technol. 19, 49–68 (2017) 20. Krägeloh, C.U., Bharatharaj, J., Kutty, S., Kumar, S., Nirmala, P.R., Huang, L.: Questionnaires to measure acceptability of social robots: a critical review. Robotics 8, 88 (2019) 21. Charisi, V., Davison, D., Reidsma, D., Evers, V.: Evaluation methods for user-centered child-robot interaction. In: 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 545–550. IEEE (2016) 22. Savela, N., Turja, T., Oksanen, A.: Social acceptance of robots in different occupational fields: a systematic literature review. Int. J. Soc. Robot. 10, 493–502 (2018) 23. Busch, B., Maeda, G., Mollard, Y., Demangeat, M., Lopes, M.: Postural optimization for an ergonomic human-robot interaction. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2778–2785. IEEE (2017) 24. Salvini, P., Laschi, C., Dario, P.: Design for acceptability: improving robots’ coexistence in human society. Int. J. Soc. Robot. 2, 451–460 (2010) 25. Nam, T.: Citizen attitudes about job replacement by robotic automation. Futures 109, 39–49 (2019) 26. Stafford, R.Q., MacDonald, B.A., Jayawardena, C., Wegner, D.M., Broadbent, E.: Does the robot have a mind? Mind perception and attitudes towards robots predict use of an eldercare robot. Int. J. Soc. Robot. 6, 17–32 (2014) 27. Heerink, M., Albo-Canals, J., Valenti-Soler, M., Martinez-Martin, P., Zondag, J., Smits, C., Anisuzzaman, S.: Exploring requirements and alternative pet robots for robot assisted therapy with older adults with dementia. In: International Conference on Social Robotics, pp. 104–115. Springer (2013) 28. Wu, Y., Wrobel, J., Cornuet, M., Kerhervé, H., Damnée, S., Rigaud, A.-S.: Acceptance of an assistive robot in older adults: a mixed-method study of human–robot interaction over a 1-month period in the Living Lab setting. Clin. Interv. Aging 9, 801 (2014) 29. Venkatesh, V., Davis, F.D.: A theoretical extension of the technology acceptance model: four longitudinal field studies. Manage. Sci. 46, 186–204 (2000) 30. Fishbein, M., Ajzen, I.: Belief, attitude, intention, and behavior: an introduction to theory and research (1977) 31. Jung, J., Song, H., Kim, Y., Im, H., Oh, S.: Intrusion of software robots into journalism: the public’s and journalists’ perceptions of news written by algorithms and human journalists. Comput. Hum. Behav. 71, 291–298 (2017) 32. Suprem, A., Mahalik, N., Kim, K.: A review on application of technology systems, standards and interfaces for agriculture and food sector. Comput. Stand. Interfaces 35, 355–364 (2013) 33. Marchant, G., Allenby, B., Arkin, R., Barrett, E., Borenstein, J., Gaudet, L., Kittrie, O., Lin, P., Lucas, G., O’Meara, R.: International governance of autonomous military robots’. Columbia Sci. Technol. Law Rev. 12, 272 (2010) 34. Palep, J.H.: Robotic assisted minimally invasive surgery. J. Minimal Access Surg. 5, 1 (2009) 35. Mubin, O., Stevens, C.J., Shahid, S., Mahmud, A. Al, Dong, J.-J.: A review of the applicability of robots in education. Technol. Educ. Learn. 1 (2013) 36. Di Nuovo, A., Conti, D., Trubia, G., Buono, S., Di Nuovo, S.: Deep learning systems for estimating visual attention in robot-assisted therapy of children with autism and intellectual disability. Robotics 7, 25 (2018). https://doi.org/10.3390/robotics7020025 37. Conti, D., Trubia, G., Buono, S., Di Nuovo, S., Di Nuovo, A.: Evaluation of a robot-assisted therapy for children with autism and intellectual disability. In: Annual Conference Towards Autonomous Robotic Systems, pp. 405–415. Springer (2018). https://doi.org/10.1007/978-3319-96728-8_34 38. Manyika, J., Chui, M., Bughin, J., Dobbs, R., Bisson, P., Marrs, A.: Disruptive technologies: advances that will transform life, business, and the global economy. McKinsey Global Institute San Francisco, CA (2013)

A Brief Review of Robotics Technologies …

231

39. Frey, C.B., Osborne, M.A.: The future of employment: how susceptible are jobs to computerisation? Technol. Forecast. Soc. Chang. 114, 254–280 (2017) 40. Commission, European: Special Eurobarometer 382-Public Attitudes Towards Robots. Belgium, Brussels (2012) 41. Kanda, T., Miyashita, T., Osada, T., Haikawa, Y., Ishiguro, H.: Analysis of humanoid appearances in human–robot interaction. IEEE Trans. Robot. 24, 725–735 (2008) 42. Conti, D., Di Nuovo, S., Buono, S., Di Nuovo, A.: Robots in education and care of children with developmental disabilities: a study on acceptance by experienced and future professionals. Int. J. Soc. Robot. 9, 51–62 (2017). https://doi.org/10.1007/s12369-016-0359-6 43. Conti, D., Commodari, E., Buono, S.: Personality factors and acceptability of socially assistive robotics in teachers with and without specialized training for children with disability. Life Span Disabil. 20, 251–272 (2017) 44. Rossi, S., Santangelo, G., Staffa, M., Varrasi, S., Conti, D., Di Nuovo, A.: Psychometric evaluation supported by a social robot: personality factors and technology acceptance. In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 802–807. IEEE (2018) 45. Broadbent, E., Stafford, R., MacDonald, B.: Acceptance of healthcare robots for the older population: review and future directions. Int. J. Soc. Robot. 1, 319–330 (2009). https://doi.org/ 10.1007/s12369-009-0030-6 46. Commission, European: Special Eurobarometer 460-Attitudes Towards the Impact of Digitisation and Automation on Daily Life. Belgium, Brussels (2017) 47. Taipale, S., de Luca, F., Sarrica, M., Fortunati, L.: Robot shift from industrial production to social reproduction. In: Social Robots from a Human Perspective, pp. 11–24. Springer (2015) 48. Takayama, L., Ju, W., Nass, C.: Beyond dirty, dangerous and dull: what everyday people think robots should do. In: 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 25–32. IEEE (2008) 49. Isle of Wight Council: Social Care Digital Innovation Programme. Discovery phase report for exploring the potential for Cobots to support carers (2018) 50. Shishehgar, M., Kerr, D., Blake, J.: A systematic review of research into how robotic technology can help older people. Smart Health. 7, 1–18 (2018) 51. Steptoe, A., Shankar, A., Demakakos, P., Wardle, J.: Social isolation, loneliness, and all-cause mortality in older men and women. Proc. Natl. Acad. Sci. 110, 5797–5801 (2013) 52. Di Nuovo, A., Broz, F., Wang, N., Belpaeme, T., Cangelosi, A., Jones, R., Esposito, R., Cavallo, F., Dario, P.: The multi-modal interface of Robot-Era multi-robot services tailored for the elderly. Intel. Serv. Robot. 11, 109–126 (2018). https://doi.org/10.1007/s11370-017-0237-6 53. Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. Hum. Comput. Interact. 2013, 3 (2013) 54. Li, J.: The benefit of being physically present: a survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum. Comput. Stud. 77, 23–37 (2015). https://doi.org/10.1016/j.ijhcs.2015.01.001 55. Robinson, H., MacDonald, B., Kerse, N., Broadbent, E.: The psychosocial effects of a companion robot: a randomized controlled trial. J. Am. Med. Dir. Assoc. 14, 661–667 (2013). https://doi.org/10.1016/j.jamda.2013.02.007 56. Fasola, J., Matari´c, M.J.: A socially assistive robot exercise coach for the elderly. J. Hum. Robot Interact. 2, 3–32 (2013) 57. Chen, T.L., Bhattacharjee, T., Beer, J.M., Ting, L.H., Hackney, M.E., Rogers, W.A., Kemp, C.C.: Older adults’ acceptance of a robot for partner dance-based exercise. PLoS ONE 12, e0182736 (2017) 58. Prakash, A., Rogers, W.A.: Why some humanoid faces are perceived more positively than others: effects of human-likeness and task. Int. J. Soc. Robot. 7, 309–331 (2015) 59. Brooke, J.: SUS-A quick and dirty usability scale. Usability evaluation in industry. 189, 4–7 (1996) 60. Heerink, M., Kröse, B., Wielinga, B., Evers, V.: Of an interface robot and a screen agent by elderly users Categories and Subject Descriptors. People Comput. 430–439 (2009). https://doi. org/10.1163/016918609X12518783330289

232

D. Conti et al.

61. Robot-Era project. http://www.robot-era.eu/robotera/ 62. Varrasi, S., Di Nuovo, S., Conti, D., Di Nuovo, A.: A social robot for cognitive assessment. In: HRI’18 Companion: Conference on ACM/IEEE International Conference on Human-Robot Interaction, 5–8 March 2018, pp. 269–270. Chicago, IL, USA (2018). https://doi.org/10.1145/ 3173386.3176995 63. Di Nuovo, A., Varrasi, S., Lucas, A., Conti, D., McNamara, J., Soranzo, A.: Assessment of cognitive skills via human-robot interaction and cloud computing. J. Bionic Eng. 16, 526–539 (2019) 64. Varrasi, S., Lucas, A., Soranzo, A., McNamara, J., Di Nuovo, A.: IBM cloud services enhance automatic cognitive assessment via human-robot interaction. In: Carbone, G., Ceccarelli, M., Pisla, D. (eds.) New Trends in Medical and Service Robotics, pp. 169–176. Springer International Publishing, Cham (2019) 65. Dahl, T., Boulos, M.: Robots in health and social care: a complementary technology to home care and telehealthcare? Robotics 3, 1–21 (2014) 66. Hudson, J., Orviska, M., Hunady, J.: People’s attitudes to robots in caring for the elderly. Int. J. Soc. Robot. 9, 199–210 (2017) 67. Baisch, S., Kolling, T., Schall, A., Rühl, S., Selic, S., Kim, Z., Rossberg, H., Klein, B., Pantel, J., Oswald, F.: Acceptance of social robots by elder people: does psychosocial functioning matter? Int. J. Soc. Robot. 9, 293–307 (2017) 68. Mori, M.: Bukimi no tani [The uncanny valley]. Energy, 7(4) 33–35. (Translated by Karl F. MacDorman and Takashi Minato in 2005) within Appendix B for the paper Androids as an Experimental Apparatus: Why is there an uncanny and can we exploit it? In: Proceedings of the CogSci-2005 Workshop: Toward Social Mechanisms of Android Science, pp. 106–118 (1970) 69. Strait, M.K., Floerke, V.A., Ju, W., Maddox, K., Remedios, J.D., Jung, M.F., Urry, H.L.: Understanding the uncanny: both atypical features and category ambiguity provoke aversion toward humanlike robots. Front. Psychol. 8, 1366 (2017)

The Human–Robot Interaction in Robot-Aided Medical Care Umberto Maniscalco, Antonio Messina, and Pietro Storniolo

Abstract This article deals with the problem of the interaction and engagement between humans and humanoid Robots in circumstances where it is essential to be sure that engagement has occurred and it persists for the right time. In particular, in Robot-aided medical care (but also in other critical situations), it is essential to be sure that the flow of information between the human and the humanoid Robot effectively occurs between the actual patient and the Robot. Many sensory data of the Robot will be involved in a data fusion algorithm to provide robustness and stability in the Human–Robot interaction. The methodology described in this article has been realized on real Robots (Nao and Pepper) and implemented through Python scripts in ROS (Robot Operating System).

1 Introduction In Anglo-Saxon languages, the common meaning of the term engagement indicates marriage or a relationship between individuals with characteristics of stability and durability. In the field of Robotics, a definition of engagement used in numerous scientific articles and for the first time introduced by Sidner et al. [1], is “the process by which individuals in an interaction start, maintain and end their perceived connection to one another”. In the field of the Human–Robot Interaction, the concept of engagement [2] is often thought of as a binary concept; that is, it is considered a wholly engaged or not engaged subject. However, there are different types of engagement in the human behaviors and, in some way, we can distinguish different intensities of engagement. Regarding the types of engagement, we can make a first distinction based on the number of people engaged. In fact, in addition to the typical situation in which the U. Maniscalco (B) · A. Messina · P. Storniolo Istituto di Calcolo e Reti ad Alte Prestazioni - C.N.R. Human Robot Interaction Group, Via Ugo La Malfa, 153, Palermo, Italy e-mail: [email protected] URL: http://www.icar.cnr.it © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_19

233

234

U. Maniscalco et al.

engagement concerns only two subjects, circumstances may arise in which several people are engaged among them with different roles that may also vary over time. In this case, it is what is called “affiliation” that determines the rules of engagement. Affiliation [3] represents the role that is acknowledged for each individual who constitutes the social group. If a member of the group is the speaker and the others are bystanders, then the engagement can be considered unidirectional. The speaker does not need to verify that all onlookers are engaged while continuing his talk. Nevertheless, every onlooker must be, in some form, engaged in understanding the speech. Instead, if members of a group are on an equal footing, for example of friends chatting with each other, the affiliations of speakers and listeners will vary with time and so will the engagement. In both these cases, the continuity of the engagement does not constitute such a determining element for communication. Some subjects of the groups could be distracted without thereby dropping the fundamental requirements for communication. In other cases, instead, and in particular the one in which only two subjects are involved, the useful continuation of the engagement is fundamental for the progress of communication. It is especially true in the case of a Robot employed in support of medical care. If the Robot has the task of making the patient perform a therapeutic act, it must have a reasonable certainty that this action is executed. Although for us humans it may seem trivial, the Robot must first be sure that it has engaged the correct patient, that the engagement remains at a sufficient level for the duration of the assistance, and then there should be feedback, either positive or negative, from the patient. In this article, we will focus on the latter case, and we will try to define both when the Robot can be considered engaged with the human and when this engagement must be considered interrupted. As we will demonstrate, the concept of maintaining engagement cannot be used literally. Here, we will consider only the multi-modal information flow that goes from human to Robot. However, we have also developed and implemented a multi-modal information flow that goes from Robot to human. This way, the human interlocutor also has a sufficient quantity of basic information to decide if an engagement exists and persists. If the state of engagement is indispensable during communication, the set of previous activities that the user must or can perform to attract the attention of the Robot to get the engagement is equally important. Likewise, the activities that the Robot must carry out to obtain the user’s attention are essential if the initiative to reach an engagement starts from the Robot. This paper will also analyze these preparatory activities for achieving engagement in the context of Robot-Aided Medical Care. The next section shows the sensory data and the communication channels that constitute the necessary elements for the achievement and management of the engagement. In Sect. 3, we describe how sensory data can be merged in a suitable model and used to verify the conditions of the engagement and its persistence. Section 5 reports conclusions and some notes on future developments.

The Human–Robot Interaction in Robot-Aided Medical Care

235

2 The Sensory Data Referring to relations among humans, each individual has his attention model through which he deduces if his interlocutors are attentive and are following his speech. This model is not the same for all individuals. For example, it may be influenced by cultural or geographic aspects. Moreover, this model may also slightly vary in the individual, depending on the social circumstances. Despite this variability, it is always based on a composition of some invariant elements. One of the most significant aspects that humans take into consideration during a social engagement is a nonverbal behavior based on face-to-face interaction through which they communicate quite a lot about purpose [4, 5]. Other fundamental concepts that humans use to manage social engagement are related to visibility (e.g., facial recognition and expression, body gesture), to audibility (e.g., voice, intonation, sound), and to the social distance that separates the interlocutors [6, 7]. Thus, humans decide, from time to time, if engagement exists and persists based on the composition of multi-modal information. In the case of Human–Robot engagement, to make the interaction as similar and natural as possible, we should try to reproduce the human attention model also in the Robot. Then, we have to arrange the sensory data in a suitable model manageable by the Robot. Furthermore, we should try to make visible in the Robot the nonverbal signals that we usually perceive in our interlocutor. We wish to underline the term Robot here is used to refer to anthropomorphic Robots or a Robot with anthropomorphic capabilities. Thus, the Robot has auditory and visual abilities, and it can also measure, in some way, the distance that separates itself from objects. Let us consider Pepper and Nao humanoid Robots by SoftBank Robotics,1 used in our experiments. As Fig. 1 shows, referred to the Pepper Robot, but also valid for the Nao Robot, the capabilities involved in the human model are available in the Robot. More specifically, the ability to measure the distances between oneself and objects is entrusted to sonars and precisely the one in the front position. The vision skills are made possible by the RGB camera, and the audio-related abilities are made possible by the presence of both microphones and speakers.

2.1 Visual Information By using its RGB camera, the Robot can acquire much crucial information to determine if it can be considered engaged with the user. This information flows in the direction that leads from the human to the Robot and then it regards the user. More in detail, by the use of the RGB camera, we can achieve information about the presence or absence of a human in front of the Robot, his name or ID, and the gaze direction (here, we use Boolean information to indicate if the human is looking the Robot in 1 https://www.softbankRobotics.com/us/Robots.

236

U. Maniscalco et al.

Fig. 1 Pepper Robot. The ability to measure the distances is entrusted to sonar, the vision skills are made possible by the RGB camera, and the audio-related skills are made possible by the microphones and speakers

the eye or not). The direction of the gaze has great importance both as a social signal and element of synchronization of the conversation [8]. From a theoretical point of view, it would be quite simple to merge this data to say that a specific user is in front of the Robot and is looking at it. However, if we take into account the variability over time of this basic information and their noisy nature, the composition of the single components produces an even more variable and noisy result. To overcome the variability and noise of the data is to consider instead of the instantaneous values of the three information flows a FIFO queue and we evaluate its content.

2.2 Proxemics Information As previously mentioned, the social distance between two interlocutors is also an important element in determining whether an engagement exists between the two. Robots measure distances either via lasers or via sonars. Having the latter a wider cone of irradiation, they are generally employed to measure distances from objects even in movement. Sonar measurements are often noisy and not very precise, so even in this case, it is necessary to proceed with a filtering operation before using them to determine the social distance of an interlocutor. Also in this case, the instantaneous values of the distances are not considered as-is, but the median of the content of a FIFO queue of distance values is used instead.

The Human–Robot Interaction in Robot-Aided Medical Care

237

2.3 Auditory Information Auditory information is essential and it alone is often enough to establish whether there is an engagement between two (or more) individuals. For example, think about a telephone conversation: as long as the audio channel carries information between one subject and another, we can say that there is an engagement. Conversely, prolonged silence will arouse suspicion in one of the two interlocutors that the engagement is, for some reason, terminated. In our model, we use a dual audio channel to establish the conditions of the engagement. A first audio channel uses a matrix of 4 microphones which allows locating the direction of origin of the sound to the Robot frame. This channel is used to attract the attention of the Robot through auditory signals, and therefore it can be considered, as we will see, a proper tool to achieve engagement. The second audio channel allows real communication between the human and the Robot. In this case, we take into account the analysis of the power in an audio signal. For each chunk into which the audio stream is divided, the root-mean-square R M S of the power is calculated and if it exceeds a certain threshold tr , the Robot considers that its interlocutor is speaking to it. Similarly, if, after activation of the audio channel, the R M S power of the chunks turn under the established threshold for a specific time, t, then the Robot considers that its interlocutor has stopped talking to him.

3 The Robot Model of Interaction In our model of Robot-aided medical care, we use the Health Level Seven (HL7) protocol2 as a knowledge base from which the Robot draws information for its tasks. HL7 is formed by a set of international standards used to manage both clinical and administrative data. The extensive use of XML to encode both information and exchange messages makes HL7 particularly suitable for use in our model. Thus, for each patient, we have a specific therapeutic protocol, coded in HL7, which requires the patient to perform a series of actions during the day. Although the HL7 protocol does not define a degree of importance (i.e., mandatory, suggested, optional, and so on), we have also introduced this possibility for each act of therapy in our model. These actions can be essential and have a great need to be performed, such as taking a life-saving pill, or they can also be voluntary actions such as controlling the body weight. Our model can manage three levels of urgency in the management of therapeutic tasks: mandatory, suggested, and voluntary. The introduction of these levels of urgency does not compromise compliance with the HL7 standard: everything prescribed by the therapeutic plan encoded in HL7 can have a mandatory level. At the same time, it allows us to insert in the protocol activities not strictly related to the therapeutic protocol such as those of entertainment to study the mood of the patient. 2 http://www.hl7.org/.

238

U. Maniscalco et al.

Fig. 2 The task timeline that the Robot must perform during the day to manage each therapeutic act

Therefore, starting from the therapeutic protocol encoded in HL7, the model builds a task timeline that the Robot must perform during the day to manage each therapeutic act (see Fig. 2). A set of parameters completes each of these tasks: type, level of urgency, start time, expected duration, required feedback (yes/no), type of feedback, and others. In the Robot-aided medical care, if we consider the more severe case in which the therapeutic act is labeled as mandatory, such as reminding the patient to take a lifesaver drug and explaining how this should be done, the Robot must: – be able to attract the patient’s attention in some way; – be sure that the interlocutor is the patient and that it does not change during the whole engagement period; – be sure that what has been communicated has been received and completed by the interlocutor; – claim to have certain feedback from the patient to have performed the therapeutic act as prescribed; – be able to handle failure. Figure 3 shows how our model manages and implements the search for the engagement by the Robot when a therapeutic activity must be performed. This moment coincides with the label “Start” in Fig. 3. The finite-state automaton consists of five states: “Waiting4Person”, “PersonRecongition”, “Call”, “Error”, and “Action”. The Robot switches from one state to another according to sensory events or to the achievement of some timeouts. If the Robot starts from the “Waiting4Person” state, it can go into the “Call” state in the absence of people in its presence. In this state, the Robot will try to attract the patient’s attention via an audio message. Once the audio message is finished, the Robot returns to the “Waiting4Person” state. In the case of a mandatory task, the Robot repeats the patient’s re-call procedure three times, waiting for some time before repeating the message. For the other levels of urgency, the attempts to call the patient’s attention will be two or one concerning the level.

The Human–Robot Interaction in Robot-Aided Medical Care

239

Fig. 3 The finite-state automaton used to obtain the attention of the patient. The five states are Waiting4Person, PersonRecongition, Call, Error, and Action

At the end of the third iteration, if the Robot does not meet anyone, the state changes to “Error”. The whole task is completed and the result is “Aborted”. Otherwise, if the patient (or another person) appears in front of the Robot, it switches to the “PersonRecongition” state. In this state, the Robot verifies whether the face coincides with that of the patient. If the person in front of the Robot is not the patient, then the Robot will ask him to call the patient, and then it returns to the “Waiting4Person” state. This loop can be iterated three times before deciding that the task has ended with “Aborted” result. In this case, for the other levels of urgency, the attempts will be two or one depending on the level, too. On the other hand, while the Robot is in the “Waiting4Person” state, if someone who is recognized to be the patient is revealed, then the Robot enters the “Action” state, in which it can start performing his therapeutic task. From this state, the Robot always ends with a successful result. This result does not mean that the therapeutic task was successful, but only that the search and the first contact with the patient were successful, which are the necessary conditions to start the therapeutic activity. This procedure allows the Robot to complete some of the activities previously described. In fact, this way, the Robot is – able to attract the patient’s attention in some way; – sure that the patient is in front of it; – able to handle failure (in this case, the overall failure of the task). Let us now consider the case in which the Robot has the certainty of having the patient in front of it and then the therapeutic act can begin. Figure 4 shows that the initial state, once the patient’s presence in front of the Robot is obtained, is “Waiting4Eng”. Therefore, the Robot is waiting to perceive an

240

U. Maniscalco et al.

Fig. 4 The finite-state automaton used to manage the engagement and the dialog with the patient

engagement with the patient. In the transition from the “Waiting4Eng” state to the “Dialog” state, the sensory data described in Sect. 2 are involved, more in detail the data described in Sects. 2.1 and 2.2. The triggering conditions of the engagement (see Fig. 5) are continuously (at a certain frequency) assessed by the logical AND of – – – –

the persistence of the presence of a person in front of the Robot; identification of the person in front of the Robot as the patient; the patient’s gaze directed toward the Robot; the presence of the patient from the Robot at a distance less than that established.

In our model, the achievement of engagement (and the other significant states) is made explicit by the Robot through iconic information represented on the Robot tablet and through different eye colors [9]. This way, the patient can have enough information to manage the engagement and, more generally, the interaction. Having reached the engagement, the state of the Robot is now “Dialog”. From this state, depending on who begins to speak, the state switches to the “Robot Speaks” or “Patient Speaks” state. During any speech phase, the triggering conditions of the engagement are not taken into account. It means that we always consider engagement as long as one of the two parties is talking. At the end of each speech phase, the state returns to being “Dialog” and the conditions of engagement are re-verified as explained above. During these dialog phases, the patient receives information from the Robot on the therapeutic act to be received. The patient may also ask for further information and explanations of the therapeutic act. The dialog takes place in natural language with also the help of visual information thanks to the robot tablet. The Robot will always try to get feedback from the patient regarding both the understanding of what it has explained about the therapeutic act and about the

The Human–Robot Interaction in Robot-Aided Medical Care

241

Fig. 5 The triggering conditions of the engagement

effective execution of the therapeutic act by the patient. The dialog in natural language has been developed by colleagues of CNR-ICAR and the details of this critical part are described in the works reported in [10, 11].

4 The ROS Implementation The described methodology has been implemented in a set of ROS-compliant Python scripts running on real Robots (Nao and Pepper) and implemented through Python scripts in ROS (Robot Operating System). Each information channel was implemented through a ROS topic that publishes the related information. We have implemented several topics each of which reports: – – – –

True, if there is a person in front of the robot (otherwise false); True, if the interlocutor’s gaze is facing the robot (otherwise false); the name of the person recognized in front of the robot (otherwise “no name”); True, if someone is present within the established social zone (otherwise False).

All this information that flows through the topics is filtered by the process described in Sect. 2.1. This way, we have at our disposal a ROS node that subscribes to all these topics the information necessary to establish when engagement begins and if it persists. As it is easy to see, this implementation reflects the scheme of Fig. 5. Furthermore, as described from a theoretical point of view, also in the ROS implementation, if the audio channel is active, the engagement is always considered active regardless of the values of the previous topics.

5 Conclusion In this paper, we have faced the problem of the interaction and the engagement between humans and humanoid Robots in Robot-aided medical care. That is an example of a circumstance where it is essential to be sure that a real engagement has occurred and persisted for the right amount of time. The flow of information

242

U. Maniscalco et al.

between the human and the humanoid Robot must occur between the actual patient and the Robot. Robustness and stability of the human–Robot interaction are granted by a data fusion algorithm which works on Robot’s sensory data. Acknowledgements This research was partially supported by the project AMICO—Assistenza Medicale In COntextual Awareness, with funding from the National Programs of the Italian Ministry of Education, Universities and Research (code: ARS01_00900).

References 1. Sidner, C.L., Lee, C., Kidd, C.D., Lesh, N., Rich, C.: Explorations in engagement for humans and Robots. Artif. Intell. 166(1–2), 140–164 (2005). ISSN 0004-3702 2. Ehrlich, S., Wykowska, A., Ramirez-Amaro, K., Cheng, G.: When to engage in interaction and how? EEG-based enhancement of robot’s ability to sense social signals in HRI. In: 2014 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, pp. 1104–1109 (2014) 3. Bartl, C., Dorner, D.: Psi: a theory of the integration of cognition, emotion and motivation. In: Proceedings of the 2nd European Conference on Cognitive Modelling. DTIC Document, pp. 66–73 (1998) 4. Patterson, M.L.: Nonverbal Behavior. A Functional Perspective. Springer, New York (1983) 5. Cassell, J.: Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.) Embodied Conversational Agents, pp. 1–18. MIT Press, Cambridge, MA (2000) 6. Rauterberg, M., Dtwyler, M., Sperisen, M.: From competition to collaboration through a shared social space. In: Blumental, B., Gornostaev, J., Unger, C. (eds.) Proceedings of the East-West International Conference on Human-Computer Interaction (EWHCI95), vol. II, pp. 94–101 (1995) 7. Hall, E.T.: Proxemics. Curr. Anthropol. 9, 83–108 (1968) 8. Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, New York (1976) 9. Miyauchi, D., Sakurai, A., Makamura, A., Kuno, Y.: Active eye contact for human–robot communication. In: Proceedings of CHI 2004–Late Breaking Results, vol. CD Disc 2, pp. 1099– 1104. ACM Press, New York (2004) 10. Minutolo, A., Esposito, M., De Pietro, G.: A conversational chatbot based on kowledge-graphs for factoid medical questions. In: SoMeT, pp. 139–152 (September 2017) 11. Caggianese, G., De Pietro, G., Esposito, M., Gallo, L., Minutolo, A., Neroni, P.: Discovering Leonardo with artificial intelligence and holograms: a user study. Pattern Recognit. Lett. (2020)

Experiment Protocol for Human–Robot Interaction Studies with Seniors with Mild Cognitive Impairments Gabriel Aguiar Noury, Margarita Tsekeni, Vanessa Morales, Ricky Burke, Marco Palomino, and Giovanni L. Masala

Abstract While assistive robotics (AR) have shown promise in supporting seniors with daily life activities and psycho-social development, evaluation of AR systems presents novel challenges. From a technical point of view, reproducing HRI experiments has been problematic due to the lack of protocols, standardization and benchmarking tools, which ultimately impairs the evaluation of previous experiments. On the other hand, working with seniors with cognitive decline presents a major design challenge for researchers, since communication skills, state of mind and attention of participants are compromised. To address these challenges, this paper presents practical recommendations and a protocol for conducting HRI experiments with seniors with mild cognitive decline (MCI).

1 Introduction The worldwide elderly population is expected to reach approximately one billion in 2030 and 1.5 billion in 2050 [1]. This global population ageing phenomenon is increasing the burden of healthcare systems, who are looking for innovative solutions to satisfy this new demand while maintaining the quality and affordance of care delivery. One area of technology that shows promise in solving these challenges is the assistive robotics (AR) [2]. AR assumes the primary role of providing help to carers or directly to patients. From automating physical tasks that a senior can no longer do to encouraging social behaviour, AR is a growing area of research with potential benefit for eldercare. The advance of effective methods and tools to evaluate human–robot interactions (HRIs) with seniors with cognitive decline is lacking. We have adopted and modified G. Aguiar Noury (B) · M. Tsekeni · V. Morales · R. Burke · M. Palomino University of Plymouth, Plymouth, UK e-mail: [email protected] G. L. Masala Manchester Metropolitan University, Manchester, UK © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_20

243

244

G. A. Noury et al.

methods of testing and evaluating robots from the field of human–computer interaction, but HRI is not identical [3]. Most importantly, the design methodology of HRI research studies that produce verifiable, reliable and reproducible results has been a major challenge in the last decade [4]. Among other reasons, this is because the experiment procedures have not been standardized [5]. On the other hand, conducting studies upon patients with cognitive impairment represents always a challenge, even more, if considering psychological factors as state of mind, concentration and technology dexterity (i.e., [6, 7]). Cognitive impairment is a common problem within the elderly population with an occurrence rate of approximately 21.5 to 71.3 per 1,000 person-years in seniors [8]. The elderly population with cognitive impairments finds it difficult to distinguish and differentiate between simultaneous sensory stimulations and become confused easily. They also develop communication disorders which difficult their ability to express their views. Besides, ageing can reduce the ability to see, hear and touch. All of these represent a challenge for researchers to gather useful and unbiased data. This highlights the need for developing common protocols as an open research issue in HRI with seniors with cognitive decline. Therefore, this paper has the objective of proposing a protocol for evaluating HRI with seniors with mild cognitive decline (MCI). This study is for researchers working on live interactions, interaction with products of reduced functionality, mock-ups operated in Wizard-of-Oz mode and acted demo [9]. In Sect. 2, we describe the methodology used in this study. Then, Sect. 3 explores the design of the University of Plymouth’s Robot Home; a lab for the evaluation of HRI. Next, Sect. 4 presents practical guidelines for selecting senior participants with MCI. Section 5 explores the different techniques for collecting data during HRI experiments. Finally, Sect. 6 proposes a protocol for conducting HRI experiments with seniors with MCI. Conclusions and further research are presented in Sect. 7.

2 Methodology This paper builds upon the lessons learned from setting up the University of Plymouth Robot Home lab for senior participants with cognitive decline, and the twenty-two dropout sessions made during the EHealth Productivity and Innovation in Cornwall and the Isles of Scilly EPIC project [10]. The lab resembles a living room that follows care environmental guidelines adopted from a desk research and expert consultation with occupational therapists. The study does not focus on patients with severe dementia, hearing or visual impairments—this includes patients that suffer from hallucinations or low consciousness level. Instead, this paper gives recommendations for working with participants with MCI; “stage between the expected cognitive decline of normal ageing and the more serious decline of dementia” [11]. It can involve deterioration of memory, attention and cognitive function that are greater than expected based on age and educational level. These subjects are becoming the focus of many studies and early

Experiment Protocol for Human–Robot Interaction Studies …

245

intervention trials since MCI is about four-times greater than dementia [12]. Moreover, this paper does not discuss ethical concerns since they have been deeply covered by previous studies [13].

3 An Example of Robotics Lab for HRI: The Robot Home The Robot Home set-up started in May 2019 and finished in September 2019. It was funded by the Interreg 2 Seas Mers Zeeën Ageing Independently (AGE’In) project. The aim of the lab design was to create a facility that follows strict care environmental guidelines for the evaluation of an HRI with seniors and vulnerable participants. A lab that will allow researchers to evaluate the acceptability and usability of AR technologies while supporting the integration of third party devices for the simulation of different scenarios related to smart homes and independent living. Figure 1 shows the lab that resembles a living room; a relatable, but a secure place that will reduce cognitive bias from experiment participants. The project counted with the support of two interior designers, one architect, one multimedia engineer and three occupational therapists from the University of Plymouth. The work was divided into four main activities; • Market Study; to identify AR, sensors and IoT devices for the evaluation and enhancement of HRI. • Multimedia Study; to identify the cameras, microphones and accessories needed and the location of the same in the room. • Care Environment Design; to generate and evaluate different concepts for the design of the lab. • Implementation; including room adaptation and setting up the equipment. The interior design team generated more than 25 different concepts. The most promising solutions were then evaluated by the architect and occupational therapist,

Fig. 1 Robot Home, University of Plymouth. From left to right, the robots in the picture are AMY A1, NAO, and QBo One

246

G. A. Noury et al.

and the design was refined through their feedback. The sofas of the room follow regulations on the seat height and depth, and arm height to allow participants to sit and stand up without difficulties. In the same way, the colour pattern of the room was chosen to generate a calm environment, but also to reduce light reflection and colour interference while tracking people’s faces. The lab’s carpet is a non-slippery carpet, soft enough to provide some protection against injury from falls to people and robots, but not to interfere with the mobility of the systems. Finally, the blinds were placed to control the amount of natural light that enters the room and to hide the cameras to be placed behind. For the data collection, the room counts with four GoPro Hero 7 cameras and four modify GoPro with different lenses (Table 1). The cameras allow us to capture 1440 resolution and 60 frames per second, with a 4:3 aspect ratio and wide field of view. The integrated application of the cameras allows us to control and monitor several cameras in real-time. The position of the cameras could be adjusted depending on the interaction setting, and they will support further studies. To analyse participants behaviour, the room has four Kinect Azure Cameras, for building computer vision and speech models (Table 1). The cameras will show instance segmentation, 2D key points and 3D joints. This provides a fully articulated body tracking of multiple participants. Besides, through Azure cognitive services, researchers will be able to detect and identify peoples’ emotions during the experiments. Table 1 presents a list of the sensors used in the lab. In term of robotic platforms, the room counts with a NAO robot (commonly used as an example of socially assistive robots [14]), the AMY A1 telepresence robot (a commercially available telepresence robot used to explore how RAS could address social isolation issues [15]) and the Qbo One robot (a research platform used for its potential as a robot companion at home [16]). These robotic platforms will allow researchers to conduct different studies. For the integration of smart devices, the room counts with both Google Assistant and Alexa hub. This is complemented with two smartphones; Pixel 3a and iPhone XR, that allow researchers to evaluate AR technologies that work with mobile phones. Wearable devices such as the Apple Watch Series 4 and the Samsung Galaxy Watch allow researchers to monitor participants resting heart rate (while the user is Table 1 Robot home sensors Sensor

Item

Description

Camera

GoPro Hero 7

1440 resolution, 60 fps, 4:3 ratio

Body tracker

Azure Kinect

2D and 3D joint extraction

Microphone

Azure Kinect

On-board microphone array

Facial/Emotion recognition

Azure Kinect

Azure cognitive services

Heart rate

Apple Watch 4

ECG monitor

Room temperature

Microbot Alert

Temperature, humidity, air pressure, light intensity, noise level

Experiment Protocol for Human–Robot Interaction Studies …

247

not performing a physical activity), as a channel for gathering psychophysiological measurements. The room also counts with four smart switches to control the heating and air conditioning of the room to be used in home automation scenarios. The smart switches can also control indoor weather, light and noise sensors for further applications.

4 Guidelines for Participant Selection It is important that researchers report the cognitive level of their experiment participants. However, identifying the cognitively level of seniors is a difficult task [17]. In the UK, a general practitioner, or a specialist at a memory clinic or hospital, can only diagnose MCI [18]. Therefore, before assessing the cognition of research participants, first review their cognitive impairment records. If not available, there are tools to assess the cognitive impairment of seniors as the General Practitioner Assessment of Cognition [19], or the Mini-Cog test [20]. They are short validated tests that a researcher can use. Running these assessments provides a baseline for homogenous samples. • We recommend using the Montreal Cognitive Assessment (a screening tool broadly used for detecting MCI [21]). It takes around ten minutes to administer and it also assesses the attention and verbal fluency of the senior. In the same way, it is important to measure the participants’ hearing and visual impairments since this will influence the HRI. • We recommend using the Hearing Handicap Inventory for the Elderly Screening Version [22] and the Amsler grid for assessing visual loss [23], which are tests that are easy to conduct and assess.

5 Guidelines for Data Collection Traditionally in HRI studies, there are three main methods to collect participants’ feedback: self-report, behavioural and psychophysiological measures [4].

5.1 Self-report Measures Self-report measures are one of the most used methods in HRI. With questionnaires, researchers explore the views of participants regarding the appearance, interaction and overall satisfaction of the robot. These measures are easy to gather and analyse, involving simple statistical techniques.

248

G. A. Noury et al.

• While working with self-report questionnaires with seniors with MCI, researchers should take into consideration the time between the interaction and the assessment. We have seen that between 15 and 30 min after the interaction, some seniors tend to forget sensible elements. • MCI affects the communication skills of the participants, who are no longer able to describe, in-depth, their feelings, attitudes and recommendations towards the technology being assessed. Questionnaires must be carefully designed and tested before beginning an experiment. The next clinical questionnaires constitute a useful source for researchers: • ICECAP-O [24] or the WEMWBS [25]: To assess senior general wellbeing. • CES-D [26] or Giervald scale [27]: To explore depression and loneliness of seniors. • Duke Social Index or the Lubben Social Scale [27]: To assess social isolation. • SF-36 questionnaire [28]: To study HRI impact in senior general health. It is useful for HRI researchers to explore these tools since the wording and question structure allow them to frame their assessments under validated questionnaires’ protocols.

5.2 Behavioural Measures These measures focus on the conduct, functioning and actions performed by the participants during experiments. The data is gathered through video recording or researchers’ observations. For instance, valence and arousal, the time spent looking at the robot, the time mutually looking at a specific cue or the time spent in open interaction with the robot. The analysis of the data frequently requires independent coders. For instance, [29] or [30] are some of the multiple examples of HRI that gather behavioural measures. Non-verbal communication is essential for evaluations. As ageing progresses, body language and physical contact become the main communication channel. • Gestures, facial expression and body language can be recorded and the video can be analysed by different coders. • We recommend using an open coding system by independent researchers, using a five-point Likert scale assessing valence and arousal. Arousal and valence scales can label quality and intensity of affective body language by utilizing a large range of affective states [31], disregarding if the participants are standing or seated. These scales are effective in describing a persons’ affective behaviours during social interactions. Finally, valence and arousal had better characterized experimental and clinical findings than a categorical emotional [32].

Experiment Protocol for Human–Robot Interaction Studies …

249

5.3 Psychophysiology Measures Psychophysiology measures focus on the interaction between the mind and body [33]. The most common measures used in controlled HRI experiments are electroencephalography, heart rate variability, skin conductance response, interbeat interval, blood pressure, respiratory sinus arrhythmia and electromyography. • The use of psychophysiological measures is challenging while working with seniors. For instance, while using electromyography techniques, locating the electrode placement, and making sure that the appropriate amounts of conducting gel or paste are being used, is a difficult task with seniors. While there are some devices that offer alternatives to the use of traditional EEG (i.e., Emotiv EPOC +), these intrusive devices will only interfere with the results of the evaluation. • We do recommend the use of smartwatches capable of reading resting blood pressure. Seniors have used watches before; thus, the technology will not overwhelm them.

6 Experiment Protocol Performing short-term pilots in the field of HRI has a major shortcoming: participants are every so often excited for interacting with a robot for the first time (this is the novelty effect). Besides, due to the loneliness that residents experience at care homes, seniors are eager to interact with researchers and provide positive feedback [13]. On the other hand, uncertainty, drastic changes on daily routine or loss of control affect deeply seniors with MCI. To address these issues, we propose a protocol focussed on four pillars (Fig. 2). During the initial work, • First, contact the healthcare organization or family member regarding the study ethics concerns.

Fig. 2 Summary of the experiment protocol

250

G. A. Noury et al.

• Then, schedule with the seniors’ caregivers an appropriate time for the experiment. During certain times, seniors are more lucid or in a good mood. • One week before the experiment, request the caregivers to talk with the seniors about the experiment, about the robot that is visiting them and the day and hour when this will take place. If possible, ask the carers to show pictures or videos of the robot. • Visit the site before the evaluation. Choose the room where the study will take place, select the research participants and have the first interaction with them. On the day of the experiment, we recommend reducing contact with the participants while setting up. On the day of the experiment, • Do the initial set-up of the robot outside the experiment room. • Once ready, let the participant or participants enter the experiment room and sit down or stand up according to the experiment design. • Researchers should introduce themselves and explain the activity to the senior. • Once the senior feels at ease, start the recording equipment to be used in the research. At this point, the robot will be ready to enter the experiment room: • Position it where the person can see it as clearly as possible. If the senior is sitting down, we recommend having the robot at the same level. • Let the subject interact freely with the robot, allowing the participant to become familiarized with the device while recording any feedback. • During this initial interaction, allow the carers to be in the experiment room. • Make sure that no application is running, except for those which are the focus of the experiment (i.e., pre-program apps from the manufacturers). • If the participant gets upset or distressed to the point where the experiment cannot begin, take the robot out of the room, allow carers to calm down the participants, and with their approval, repeat the robot introduction. This initial interaction could take between one to five minutes, depending upon senior engagement. If the senior level of consciousness is low, it is unlikely that the senior will react to the technology. Once this initial interaction has been completed, • Run the designed experiment. • The experiment time with each senior may vary, and rushing the senior will influence the evaluation. • Immediately after concluding the interaction, with the robot still in the room, proceed with any self-reported method chosen for the collection of data. • Prompt seniors to elaborate their answers by asking open-ended questions. • Listen patiently and work through to deeper questions of the evaluation. If the senior gets confused or upset by the question or by their communication skills, change the subject. The researcher can rephrase the question and ask it again later on. • The researcher should build upon the participants’ answers in order to avoid the senior feeling she/he is being interrogated.

Experiment Protocol for Human–Robot Interaction Studies …

251

Finally, in long term pilots, it has been reported that once the experiment has concluded, participants feel depressed due to the departure of the robot or the researchers. It is unethical therefore for the researcher to overlook this effect of the intervention. We strongly recommend for pilots that take more than two weeks to debrief seniors during the last day of the intervention that researchers and robotic platform will leave.

7 Conclusion The aim of the paper was to guide HRI researchers while conducting experiments with seniors with MCI. Besides contributing to researchers without prior training in clinical sciences, we raise awareness about the importance of experiment standardization. Conducting experiments with seniors with MCI is challenging. Communication deterioration, poor eyesight and hearing difficulties make the evaluation difficult. Seniors get easily irritated with voice recognition technologies and new technologies can frighten them. On the other hand, evaluating how a senior reacts to and is affected by AR is a methodological conundrum. The senior could be happy due to the novelty effect of the robot, or because of the interaction with people. Still, seniors could also be upset due to the change in their daily routine, or confused due to the presence of a robot. Researchers should follow guidelines to mitigate these effects. Working with care questionnaires to assess the cognition of our participants, hearing and visual impairments, allow readers to understand and recreate our experiments. Clinical tests also allow researchers to understand the wording and framing of questions to support our evaluations. Finally, the HRI community needs to establish methods for AR first contact with seniors to ensure the integrity of the data collected and the seniors’ wellbeing. It is unethical for the researchers to focus only on the technology, and not on the interaction that they are directly having on the experiment participants. Every time a researcher enters a care establishment, they should follow a conduct protocol. To address these issues, this research has established recommendations presented as practical steps for researchers to follow. This is by no means an exhaustive list and will evolve with the state-of-the-art and the new opportunities that AR will bring. Acknowledgements Authors of this paper acknowledge the funding provided by the Interreg 2 Seas Mers Zeeën AGE’In project (2S05-014) to support the work in the research described in this publication.

252

G. A. Noury et al.

References 1. Lee, Y., Hwang, J., Lim, S., Kim, J.T.: Identifying characteristics of design guidelines for elderly care environments from the holistic health perspective. Indoor Built Environ. 22, 242–259 (2013) 2. UK-RAS: Robotics in Social Care: A Connected Care EcoSystem for Independent Living (2017) 3. Kidd, C., Breazeal, C.: Human-robot interaction experiments: lessons learned. In: Proceeding of AISB’05 Convention (2005) 4. Bethel, C.L., Burke, J.L., Murphy, R.R., Salomon, K.: Psychophysiological experimental design for use in human-robot interaction studies. In: Proceedings of the 2007 International Symposium on Collaborative Technologies and Systems, CTS, pp. 99–105 (2007) 5. EURobotics: Strategic Research Agenda For Robotics in Europe 2014-2020. IEEE Robot. Autom. Mag. 24, 171 (2014) 6. Desideri, L., Ottaviani, C., Malavasi, M., di Marzio, R., Bonifacci, P.: Emotional processes in human-robot interaction during brief cognitive testing. Elsevier (2019) 7. Di Nuovo, A., Varrasi, S., Lucas, A., Conti, D., McNamara, J., Soranzo, A.: Assessment of cognitive skills via human-robot interaction and cloud computing. J. Bionic Eng. 16, 526–539 (2019) 8. Tricco, A.C., Soobiah, C., Lillie, E., Perrier, L., Chen, M.H., Hemmelgarn, B., Majumdar, S.R., Straus, S.E.: Use of cognitive enhancers for mild cognitive impairment: protocol for a systematic review and network meta-analysis. Syst. Rev. 1, 25 (2012) 9. Xu, Q., Ng, J., Tan, O., Huang, Z., Tay, B., Park, T.: Methodological issues in scenario-based evaluation of human-robot interaction. Int. J. Soc. Robot. 7, 279–291 (2015) 10. Jones, R., Asthana, S., Walmsley, A., Sheaff, R., Milligan, J., Paisey, M., Aguiar Noury, G.: Developing the eHealth sector in Cornwall, Plymouth (2019) 11. Petersen, R.C., Smith, G.E., Waring, S.C., Ivnik, R.J., Tangalos, E.G., Kokmen, E.: Mild cognitive impairment. Arch. Neurol. 56, 303 (1999) 12. Eshkoor, S.A., Hamid, T.A., Mun, C.Y., Ng, C.K.: Mild cognitive impairment and its management in older people. Clin. Interv. Aging 10, 687–693 (2015) 13. Berghmans, R.L.P., Meulen, R.H.J.T.: Ethical issues in research with dementia patients. Int. J. Geriatr. Psychiatry 10, 647–651 (1995) 14. Comito, C., Caniot, M., Lagrue, E., Coignard, P., Fattal, C.: Psychological and symbolic determinants relating to the first meeting with a humanoid robot. Ann. Phys. Rehabil. Med. 59, e87 (2016) 15. Robot Center: Amy A1 Robot–Collaborative Robotics. https://www.robotcenter.co.uk/pro ducts/amy-a1-robot. Accessed 11 Feb 2020 16. Corpora: An Interactive Open Source Robot for Kids, Developers and Eldercare. http://thecor pora.com/. Accessed 11 Feb 2020 17. Alzheimer’s Society: Assessing cognition in older people: a practical toolkit for health professionals, pp. 1–2 (2016) 18. NHS Choices: Tests for diagnosing dementia-Dementia guide-NHS Choices. https://www.nhs. uk/conditions/dementia/diagnosis-tests/. Accessed 24 Jan 2020 19. Dementia Collaborative Research Centre-Assessment and Better Care: GPCOG|Home. http:// gpcog.com.au/. Accessed 24 Jan 2020 20. Hartford Institute for Geriatric Nursing: Mental status assessment of older adults: the mini-cog. Alzheimer’s Dement. 13, 325–373 (2017) 21. MoCA: MOCA Montreal Cognitive Assessment (2004) 22. Servidoni, A.B., Conterno, L. de O.: Hearing loss in the elderly: is the hearing handicap inventory for the elderly-screening version effective in diagnosis when compared to the audiometric test? Int. Arch. Otorhinolaryngol 22, 1–8 (2018) 23. Schuchard, R.A.: Validity and interpretation of amsler grid reports. Arch. Ophthalmol. 111, 776–780 (1993)

Experiment Protocol for Human–Robot Interaction Studies …

253

24. Makai, P., Brouwer, W.B.F., Koopmanschap, M.A., Nieboer, A.P.: Capabilities and quality of life in Dutch psycho-geriatric nursing homes: an exploratory study using a proxy version of the ICECAP-O. Qual. Life Res. 21, 801–812 (2012) 25. Tennant, R., Hiller, L., Fishwick, R., Platt, S., Joseph, S., Weich, S., Parkinson, J., Secker, J., Stewart-Brown, S.: The warwick-edinburgh mental well-being scale (WEMWBS): development and UK validation. Health Qual. Life Outcomes 5, 63 (2007) 26. Radloff, L.S.: The use of the center for epidemiologic studies depression scale in adolescents and young adults. J. Youth Adolesc. 20, 149–166 (1991) 27. de Jong Gierveld, J., van Tilburg, T.G.: Social isolation and loneliness. In: Encyclopedia of Mental Health, 2nd edn., pp. 175–178 (2016) 28. Ware, J.E., Sherbourne, C.D.: The MOS 36-item short-form health survey (Sf-36): I. Conceptual framework and item selection. Med. Care. 30, 473–483 (1992) 29. Sidner, C.L., Kidd, C.D., Lee, C., Lesh, N.: Where to look: a study of human-robot engagement. In: Proceedings of the 9th International Conference on Intelligent User Interface-IUI’04, pp. 78–84 (2004) 30. Breazeal, C., Kidd, C.D., Thomaz, A.L., Hoffman, G., Berlin, M.: Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pp. 383–388 (2005) 31. Kapoor, A., Burleson, W., Picard, R.W.: Automatic prediction of frustration. Int. J. Hum. Comput. Stud. 65, 724–736 (2007) 32. Posner, J., Russell, J.A., Peterson, B.S.: The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 17, 715–734 (2005) 33. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording (2012)

Designing Robot Verbal and Nonverbal Interactions in Socially Assistive Domain for Quality Ageing in Place Ioanna Giorgi , Catherine Watson, Cassiana Pratt, and Giovanni L. Masala

Abstract Endowing robots with the role of social assistance in silver care could be a powerful tool to combat chronic loneliness in ageing adults. These robots can be tasked with functional and affective care to support quotidian living and grant companionship that helps lessen the burden of cognitive decline and impairment emerging from social isolation. To accomplish such imperative tasks, artificial agents must be adept at communicating naturally with the human elder. In this work, we aim to enable human–robot interaction by designing human-like verbal and nonverbal behaviours of an autonomous robot companion. We employed the robot on a trial run using customisable algorithms to address a range of needs, while thriving social and emotional attachment with the potential senior user, with the final intent being that such endeavours can help achieve quality ageing in place.

1 Introduction Robots are greatly becoming important technical resources. Their ability to understand and communicate in a human-like way allows them to act like social actors and contribute in assisting humans in their households, communicating directly with smart devices and in turn be understood by humans and their social networks via friendly interactions. Several demographic studies have reported a fast ageing population in advanced economies across the world, often referred to as “the silver tsunami” [1], leading to greater attention in designing artificial companions for social purposes and elderly care. Other studies showed an increasing number of elderlies living alone and although the human lifespan is increasing, for most quality of life is decreasing, as elderly people are suffering from isolation, decreased mobility, impaired vision and hearing, memory loss and several mental health issues [2]. There is a paramount I. Giorgi The University of Manchester, Manchester, UK C. Watson · C. Pratt · G. L. Masala (B) Manchester Metropolitan University, Manchester, UK e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_21

255

256

I. Giorgi et al.

need for technological companions that can help society overcome these obstacles surrounding our elderly neighbours, providing them to age well in their homes autonomously for longer, combat loneliness and perform their everyday life routines. While seminal contributions have been made in terms of social robots for aged care, there exist certain significant problems. On one hand, the existing platforms and solutions are yet not fully commercially available or can be accessed at a high cost. Furthermore, most come with pre-programmed services, the range of which is often short to meet the constantly changing needs of elderlies [3]. On the other hand, not much is yet known about what influences the way elderly people perceive social robots and to what extent they welcome them in their homes, in particular robots at an early developmental stage [4]. Most works regarding aged care address elderly residents that suffer from cognitive or physical impairment [5–17], focusing less on the risk of them being isolated and lonely. Via this work, we aim to contribute in constantly improving the quality of life of our senior neighbours via programmable social robot companions by designing simple agent behaviours that account for several requirements and challenges in the elder’s everyday life. We propose the use of an autonomous socially assistive robot to naturally interact with the senior end-users and proactively engage them with easy quotidian activities. We trained the agent using simple algorithms that in addition can be further customised and personalised to meet a range of specific needs of the targeted user. The rest of the paper is organised as follows. In Sect. 2, we briefly review the most relevant contributions in the field of social robotics for eldercare. Section 3 introduces the proposed method that employs the humanoid robot NAO as the agent caregiver, followed by a detailed description and planning of the designed activities. The laboratory-based trials and the estimated behaviour of the autonomous agent are given in Sect. 4. The work is concluded in Sect. 5.

2 Social Robots for Aged Care Advances in robotics research and human–robot interaction have provided significant solutions for important challenges of ageing populations, acting as an aid in treatment [5], mental health—dementia [6–9], physical health and therapy [10, 11], social assistants and home companions. Pet-like prototypes like iCat, AIBO, Paro, NeCoRo have proven to be helpful when trialled with the targeted elderly people, in particular those suffering from cognitive impairments, such as dementia. Trials and preliminary results have reported increased level of happiness, pleasure and engagement, whilst significantly fighting the sense of restlessness, agitation and loneliness in older adults [9, 12–14]. Social and mental commitment robots have been designed, of which Matilda, an assistive robot able to recognise voices, faces and emotions among users and providing services like dancing, telling the news and weather or even making Skype calls [8, 15]. Matilda (also called PaPeRo) showed positive engagement in residential senior care [15] and home-based care [16]. Other robots, like Mabu, are

Designing Robot Verbal and Nonverbal Interactions …

257

designed to aid the elderly with regular intake of medications, learning their use and side effects, remembering dispensing schedules and setting reminders [5]. A core concern when designing artificial companions is the simulation of social interaction. Many of the existing prototypes for aged care lack human-like attributes, such as human voice, gestures or emotion [4]. Researchers aim to have assistive robots that convey compassionate qualities [17], with more studies finding elderly individuals are more likely to interact with a human-like (more child-like) or animalresembling agent rather than a screen [17]. Furthermore, robots should be able to learn behavioural patterns on individual patients and adapt to future encounters [6]. To improve the support offered in aged care, humanoid social robots have been designed and broadly used to provide physical and emotional well-being of the elderlies, here mentioning NAO, Pepper, Brian 2.1, Hobbit, Bandi, Nexi and Matilda [15, 16, 18–21]. Focusing mainly on the emotional well-being of the elderly, these social assistive robots have shown promising improvements in both cognitive declined older adults that suffer from loneliness or isolation and cognitive impaired patients, in particular residents with dementia. For instance, a research aimed to use Pepper as a companion robot that can autonomously understand when an elderly is in need of attention and willing to start an interaction, and which can answer this need by initiating the conversation. The research targeted loneliness among other things, which is not only experienced when living alone, but also in care homes where it is more challenging to detect, revealing its potential with the agent care able to take the right decision 59% of the time by the end of the experiments [22]. Positive impacts have been reported from the use of social assistive robots to detect early signs of mild cognitive impairments, by guiding its users to complete cognitive tasks. The prototype developed within the MoveCare project was used to evaluate the acceptability and usability of such agents, with promising results showing how old people were more opened to the guidance of a robot, understanding and comfortably accepting its supervision instead of a clinician’s [23]. Other researches using social robots proved that NAO robots can improve the communication among older people [24], Hobbit robot helped the elderly engage more in their daily activities [18], Brian 2.1. robot could assist them in social activities [19] and Kabochan Nodding Communication robot showed satisfactory improvements in executive and memory functions, when used to assist in cognitive activities [25].

3 Method 3.1 The Robotic Caregiver The reason why a humanoid robot, like NAO, is preferred over virtual assistants for silver care is because it provides a powerful tool to combat loneliness, as the interactions are more personable and there is a physical presence with whom the users

258

I. Giorgi et al.

can create social and emotional attachment. Humanoid robots have many human-like attributes, such as human voice and gestures and can perform similar activities like walking around, pointing, grasping and so on. Therefore, they can offer more support to their caretakers as they can assist with or conduct tasks for them, especially helpful for the elderly with reduced mobility. NAO’s Specifications. The academic version of the robotic platform features 25 degrees of freedom (DoF), an inertial measurement unit with accelerometer, a gyro meter, 4 directional microphones that enhance the sound source location system, a stereo broadcast system equipped with 2 loudspeakers, 2 HD cameras for computer vision, head, hand and foot sensors and bumpers and four ultrasonic sensors that provide NAO with stability and positioning within space [26]. The version used here includes strong metallic joints, improved grip and a speech recognition module with 3 languages (English, French and Japanese). The robot is controlled by a specialised Linux-based OS, dubbed NAOqi [26]. The interface for the communication with the user consists of a graphical programming tool named Choregraphe [26], with the latest release 2.8.5.10 used in this work, a simulation software package and a software developer’s kit. NAO for (Health)Care. The robot is a powerful ambassador in healthcare advocacy, raising disease awareness and helping patients with disease management and has largely become integrated caregivers in healthcare institutions. Typical use cases of NAO in the Healthcare [27] include Health Assistant, aiding patients with self-diagnosis, telemedicine and information distribution (alerts, fall detection); Communicator, in prevention care, community activities with children and elderly, given a multi-language fluency and translation via Cloud Services; Edutainment, supporting people’s physical and mental well-being through exercise and entertainment, while advocating educational healthcare context; Receptionist, assisting human staff in high-quality welcoming visitors and patients, managing queues, registration and FAQs; Service Providers, such as consultations, generating healthcare reports on-demand and counting [27].

3.2 Experimental Design To fulfil the aim of this work, we design and test some functionalities of a social robot (NAO) mainly on: (a) disease management, helping the users with the daily intake of their medicines, setting reminders, providing descriptions and side effects of the medicine; (b) entertainment, not only for the purpose of combating loneliness, but also supporting their independent living, by helping them perform basic tasks such as making phone calls, web navigating and other uses of technology in their homes. The experiment was designed as a one-to-one interaction of the end-user and the robot in a relatively real-life like environment. The human participant role-played the potential elderly that receives care from the robotic agent through a series of activities. The tasks to be performed in each scene (see Fig. 1) were divided into three categories:

Designing Robot Verbal and Nonverbal Interactions …

(a)

(b)

259

(c)

Fig. 1 Setting of the experimental set-up: the participant engages in role-play activities with the robot given three scenarios: a interaction in the form of verbal conversations or performing an exercise routine, b managing regular intake of medications by reminding, describing or pointing at the correct scheduled medicine and c assisting the user access and navigate a Webpage

1. The agent engages with the human in simple verbal interactions and/or in activities with social or healthcare context (singing, exercising) to help combat loneliness. 2. The artificial agent assists the seniors to manage their medications, remembering dispensing schedules and setting reminders or helping the less able elder take the right medicine. 3. The agent helps the elderly understand and use different forms of technology in their home, such as making phone calls with their loved ones, web navigating, using their mobiles/tablets/computers and so on. Each of the above tasks determined the behaviour of the robotic caregiver and the actions it performed: speech recognition and generation, face and object recognition, static movement or moving in the direction of a target (e.g. towards the elderly) and pointing. Interaction activity. NAO begins the interaction with the elderly owner by calling their attention with a greeting, e.g. “Hello! How are you feeling today?” If the response of the elderly indicates positive feeling, the robot expresses enthusiasm (“That’s great!”); otherwise, it takes further actions to reverse the situation, by offering them a choice of different options. The elderly can decide if they want to talk about a certain topic (“Do you want to hear a joke?”, “Do you want me to tell you a story?”), if they want to exercise or if they prefer to do nothing instead. Movement. In the event of an exercise request, NAO will ask the elderly to follow along while it performs an easy exercise routine that is beneficial for awakening and stretching their muscles, acknowledging their efforts and encouraging them all along through verbal interaction. The routine can be personalised for the elderly, looked into and researched prior to being performed by the robot, to better serve the physical state and ability of the user. In between the routine, NAO speaks (and listens) to the human, prompting the next moves and asking if they are feeling well. If not, NAO

260

I. Giorgi et al.

will cease the routine. Interruption can also be triggered if the participant decides they want to stop. Facial expression recognition. Finally, if the participant chooses the option “Do nothing right now”, the robot will consider their decision; however, it might monitor their emotional state, through facial expression recognition and can trigger interaction if it detects bad mood (assuming the elderly lingers in the visibility range of NAO). Medication management activity. NAO remembers a list of medicine prescriptions, along with their dose intake and the dispensing schedule for each medicine. It sets and monitors reminders to help the elderly regularly take their medication. When a reminder pops up, NAO will query their elder on their medicine intake “Have you taken your medicine today?” Should the answer be negative, the robot will offer further assistance. Given the timeslot, the agent caregiver encourages the elder to take their medicine, with some additional information on the type of medicine, e.g. “Time to take the pill for diabetes” and, asks “Would you like me to help you choose which medicine”. Object recognition and pointing. Depending on the user’s decision, the robot will perform either of the actions: (a) do nothing because they can take the medicine independently, (b) describe the correct medicine box or (c) point the correct medicine box for the elder to grasp. The latter can only be performed assuming the boxes are located in front of the robot (e.g. the robot asks the human to put the boxes in a table in front of it). Regardless of the triggered behaviour, there will always be some sort of verbal interaction (e.g. “Remember to take a glass of water”) to convey a sense of care for the elder. Technology assisting activity: The main designed activities consist of helping the human participant navigate a certain webpage or perform a VoIP call from a messenger like application. NAO is trained with the ability to recognise its elder caretaker and the basic features of a webpage, such as URL/search bar, close button, menu button or the messaging interface, contacts list and call button. The basic principle behind the working mechanism for the Web navigation task is illustrated in the flowchart in Fig. 2 Initially, a device is placed in front of NAO, preferably a mobile/tablet or a computer monitor. The robot detects which device is being used. The elder will trigger the agent’s behaviour by telling them what they want to do (e.g. “I want to shop some groceries”). For the webpage navigation activity, we selected a supermarket retailer webpage and a retailer webpage for any other shopping purposes. The robot will guide the owner through a step by step process that the elder will perform to locate and open the browser and navigate around the website of choice. The user is taken to log in their details into the website to access their account. If they already have an account, they can start shopping. Once the order is complete, the user can then check their email for an order confirmation. Throughout the process, the user can ask NAO to go back a step (Fig. 3).

Designing Robot Verbal and Nonverbal Interactions …

261

Fig. 2 Flowchart of the working mechanism for the Web navigation task

4 Trial Runs The pilot activities with the NAO robot were implemented using Python programming language and the toolboxes in the Choregraphe interface version 2.8. We carried out live testing experiments with academic NAO V6 in a home-simulated laboratory environment, some of which are illustrated in Figs. 3, 4 and 5. We ran each scene independently, trying all possible interactions between the user and the robot, whether the user would engage in a verbal conversation, an exercise routine, receiving help with medication intake or web searching. The final aim was to investigate if the robot could act autonomously in accordance with the expected designed behaviour and assess the potentiality of these agents becoming an integrated part of the elderlies’ homes.

5 Conclusions In this work, we use the humanoid social robot NAO to design natural verbal and nonverbal interactions with a potential elder user, to grant personable assistance in their quotidian living. We programmed the agent using simple customisable algorithms to address two main activities: (a) disease management, helping the users with

262

I. Giorgi et al.

Fig. 3 Example of an interaction activity: Tell me a story. The NAO robot is not only trained on narration, but it also actively invites the user to retrieve the context, guess the characters and what happens next in the plot

Fig. 4 Programming movement in NAO: Exercise routine. If the agent receives as input the “exercise option”, it will perform a routine for the elderly to follow along. In between the exercise, the robot verbally interacts with the user, encouraging them to repeat the same movements, ask them if they feel well with it and/or cease the routine if the user chooses to

Designing Robot Verbal and Nonverbal Interactions …

263

Fig. 5 Health assistant activity: Medication intake. NAO uses reminders to manage the medication intake of the elder. It starts by querying them if they have had their medicine and takes further assistive actions: reminding, describing or pointing the right medicine for the less able senior

the daily intake of their medicines, setting reminders and selecting the right medicine box for the less able; (b) entertainment, to combat loneliness, support their physical and mental well-being through educational health context and interface the senior with digital technology. The carefully controlled laboratory-based trials surmise a promising behaviour of such social agents in supporting independent ageing in place. Acknowledgements Authors of this paper acknowledge the funding provided by the Interreg 2 Seas Mers Zeeën AGE’In project (2S05-014) to support the work described in this publication.

References 1. Bartels, S., Naslund, J.: The underside of the silver tsunami-older adults and mental health care. N. Engl. J. Med. 368 (2013). https://doi.org/10.1056/nejmp1211456 2. Netuveli, G., Blane, D.: Quality of life in older ages. Br. Med. Bull. 85(1), 113–126 (2008). https://doi.org/10.1093/bmb/ldn003 3. Rabbitt, S.M., Kazdin, A.E., Scassellati, B.: Integrating socially assistive robotics into mental healthcare interventions: applications and recommendations for expanded use. Clin. Psychol. Rev. 35, 35–46 (2015)

264

I. Giorgi et al.

4. Broadbent, E., Stafford, R., MacDonald, B.: Acceptance of healthcare robots for the older population: review and future directions. Int. J. Soc. Robot. 1(4), 319–330 (2009). https://doi. org/10.1007/s12369-009-0030-6 5. Keay, A., Silicon Valley Robotics.: Catalia health uses social robots to improve health outcomes. Robohub (2017) http://robohub.org/tag/silicon-valley-robotics/ 6. Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: ACM International Conference on Human-Robot Interaction, Christchurch, New Zealand, pp. 589–590. IEEE Press, New Jersey (2016) 7. Molestina, K.: UTA researchers using Shakespeare & robots to help seniors. CBS Dallas Fort Worth (2017) http://dfw.cbslocal.com/ 8. O’Keeffe, D.: Robot ‘Matilda’ helps engage older people living with dementia, new research shows. Australian Aging Agenda (2017). http://www.australianageingagenda.com.au/ 9. Roger, K., Guse, L., Mordoch, E., Osterreicher, A.: Social commitment robots and dementia. Can. J. Aging, 87–94 (2012). https://doi.org/10.1017/s0714980811000663 10. Shen, Z., Wu, Y.: Investigation of practical use of humanoid robots in elderly care centres. Paper presented at the ACM International Conference on Human Agent Interaction, pp. 63–66. Biopolis, Singapore (2016). https://doi.org/10.1145/2974804.2980485 11. Kumahara, Y., Mori, Y.: Portable robot inspiring walking in elderly people. Paper presented at the ACM International Conference on Human Agent Interaction, pp. 145–148. Tsukuba, Japan (2014). https://doi.org/10.1145/2658861.2658908 12. Wada, K., Shibata, T.: Living with seal robots—its sociophysical and physiological influences on the elderly at a care house. IEEE Trans. Robot. 23(5), 972–980 (2007). https://doi.org/10. 1109/tro.2007.906261 13. Tamura, T., Yonemitsu, S., Itoh, A., Oikawa, D., Kawakami, A., Higashi, Y., Nakajima, K.: Is an entertainment robot useful in the care of elderly people with severe dementia? J. Gerontol. Ser. A Biol. Sci. Med. Sci. 59(1), M83–M85 (2004) 14. Kramer, S.C., Friedmann, E., Bernstein, P.L.: Comparison of the effect of human interaction, animal-assisted therapy, and AIBO assisted therapy on long-term care residents with dementia. Anthrozoös 22(1), 43–57 (2009). https://doi.org/10.2752/175303708x390464 15. Khosla, R., Chu, M.: Embodying care in Matilda: an affective communication robot for emotional wellbeing of older people Australian residential care facilities. ACM Trans. Manag. Inf. Syst. 4(4), 1–33 (2013) 16. Khosla, R., Chu, M., Kachouie, R., Yamada, K., Yoshihiro, F., Yamaguchi, T.: Interactive multimodal social robot for improving quality of care of elderly in Australian nursing homes. In: ACM International Conference on Multimedia, pp. 1173–1176. Nara, Japan (2012) 17. Mordoch, E., Osterreicher, A., Guse, L., Roger, K., Thompson, G.: Use of social commitment robots in the care of elderly people with dementia: a literature review. Maturitas 74, 14–20 (2012). https://doi.org/10.1016/j.maturitas.2012.10.015 18. Lammer, L., Huber, A., Weiss, A., Vincze, M.: Mutual care: how older adults react when they should help their care robot. In: Proceedings of the 3rd International Symposium on New Frontiers in Human-Robot Interaction, London, England (2014) 19. Louie, W.Y., McColl, D., Nejat, G.: Acceptance and attitudes toward a human-like socially assistive robot by older adults. Assist. Technol. Off. J. RESNA 26(3), 140–150 (2014). https:// doi.org/10.1080/10400435.2013.869703 20. McEvoy, P., Plant, R.: Dementia care: Using empathic curiosity to establish the common ground that is necessary for meaningful communication. J. Psychiatr. Ment. Health Nurs. 21(6), 477–482 (2014) 21. Nunez, E., Matsuda, S., Hirokawa, M., Suzuki, K.: Humanoid robot assisted training for facial expressions recognition based on affective feedback. In: Shah, J.A., Wiken, J., Williams, B.C., Breazeal, C. (eds.) International Conference on Social Robotics, Paris, France (2015) 22. Romeo, M., Hernandez Garcia, D., Jones, R., Cangelosi, A.: Deploying a deep learning agent for HRI with potential “end-users” at multiple sheltered housing sites. In: 7th International Conference on Human-Agent Interaction, Kyoto, Japan (2019)

Designing Robot Verbal and Nonverbal Interactions …

265

23. Luperto, M, Romeo, M, Lunardini, F, Basilico, N, Abbate, C, Jones, R, Cangelosi, A, Ferrante, S & Borghese, NA, Evaluating the Acceptability of Assistive Robots for Early Detection of Mild Cognitive Impairment. in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macao, China, (2019) 24. Johnson, D.O., Cuijpers, R.H., Juola, J.F., Torta, E., Simonov, M., Frisiello, A., Beck, C.: Socially assistive robots: a comprehensive approach to extending independent living. Int. J. Soc. Robot. 6, 195–211 (2014). https://doi.org/10.1007/s12369-0130217-8 25. Tanaka, M., Ishii, A., Yamano, E., Ogikubo, H., Okazaki, M., Kamimura, K., Watanabe, Y.: Effect of a human-type communication robot on cognitive function in elderly women living alone. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 18(9), CR550–CR557 (2012). https://doi. org/10.12659/msm.883350 26. SoftBank Robotics Webpage. https://www.softbankrobotics.com/emea/en/nao 27. SoftBank Robotics. https://www.softbankrobotics.com/emea/en/industries/healthcare

Real-Time Data Processing in Industrial and IoT Applications

IoT in Smart Farming Analytics, Big Data Based Architecture El Mehdi. Ouafiq, Abdessamad Elrharras, A. Mehdary, Abdellah Chehri, Rachid Saadane, and M. Wahbi

Abstract The concern over Smart Farming is growing, where Internet of Things (IoT) technologies are highlighted in the farm management cycle. Also a large amount of data is generated via different channels such as sensors, Information Systems (IS), and human experiences. A timely right decision-making by monitoring, analyzing, and creating value from these Big Data is a key element to manage and operate the farms smartly, and is also bound to technical and socio-economic constraints. Given the fact, in this research, we work on the implication of Big Data technologies, IoT, and Data Analysis in agriculture. And we propose a Smart Farming Oriented Big Data Architecture (SFOBA).

1 Introduction The strength of data no longer needs to be proven. In parallel intelligent/smart decision-making becomes crucial in every industry including agriculture, which has not been suitably favored by new advancements of Computer Science and Electronics E. Mehdi. Ouafiq (B) · A. Elrharras · A. Mehdary · R. Saadane · M. Wahbi Hassania School of Public Works, SIRC- (LaGeS), Casablanca, Morocco e-mail: [email protected] A. Elrharras e-mail: [email protected] A. Mehdary e-mail: [email protected] R. Saadane e-mail: [email protected] M. Wahbi e-mail: [email protected] A. Chehri University of Quebec at Chicoutimi, Chicoutimi, QUÉBEC G7H 2B1, Canada e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_22

269

270

Elmehdi. Ouafiq et al.

unlike the other industries. Agriculture has always been a pivotal part of human civilization and crucial for the evolution of every human society. And the only key to improve the agriculture efficiency is the new advanced technologies. Researchers proposed various techniques to handle different problems related to agriculture, also to replace traditional farming practices by smart ones, whose IoT technologies are strongly present and designed to establish a decision-making and knowledge sharing system for farmers. The idea is to allow objects to be connected all over the place, with anything, anytime, and based on any communication service [11]. The amount of agriculture data generated is increasing day by day. And the event storage becomes challenging, since the traditional storage systems proved that they are unable to handle the sudden increase in the size, variety, and complexity of data produced [1]. These requirements are close to what is called Big Data Processing. The applications of Big data will be responsible for changing the way farms are managed and operated. This can be seen from two point of views [2]: 1. Business, in which farmers are seeking to obtain good prices for their products and reduce costs; 2. Public, in which big Data applications are attempting to solve global problems such as water management, sustainability, and food security etc. For this perspective, attempts have been done to take advantage of big data in areas like geographical segmentations [3], bioinformatics [1], temperature monitoring [4], mechanical systems maintenance and sustainability, and water sustainability and management [5–8], which is topical and important. Figure 1 represents national water resource affectation in 13 billion m3 in Morocco in 2015 [9], and Fig. 2 represents the estimated water consumption and withdrawal by sector in Morocco in 2020, where agricultural withdrawal supposed to attend 80% [10]. The novelty in this research is demonstrating requirements in terms of Data Architecture and also proposing an SFOBA for IoT and big data technologies, which includes complex components that should be implemented, to handle both batch and real-time processing of data generated from the smart farm devices and systems. Fig. 1 National Water Resource Affectation

IoT in Smart Farming Analytics, Big Data Based Architecture

271

Fig. 2 Water Consumption by sector

The SFOBA take also into consideration the computation and storage performance constraints. This paper is organized as follows: Sect. 2 presents the relevant agricultural domains where data analytics can be applicable. Section 3 exploits the manner and the profit from IoT and Data Analysis in Smart Farming (SF). Section 4 demonstrates the technical constraints, the high-level architecture, and software components. Section 5 concludes the paper and gives a glimpse on the perspective.

2 IoT and Data Analysis in Smart Farming 2.1 IoT in Smart Farming IoT system considers pervading presence in the environment of a diversity of things and objects that are capable of forming an interactive ecosystem, across wired and wireless connections beside unique communication protocols. It provides better coverage, connectivity (things of different characteristics get connected and enabled appropriately), real-time, and systems monitoring over multiple industries to reduce human resources costs. It also creates an excess of new opportunities in a different domain and provides a massive opportunity for agriculture. The use of smart devices in farms needs Internet connectivity. And key elements to make it operational is [12] 1. 2. 3. 4.

Identifying smart devices in farm’s network; Collecting data and sending it to farm’s Database (DB)/data lake; Connecting variegated objects; Processing units and software applications, which is the brain of IoT, but now they will be replaced by Big data technologies and algorithms. Table 1 describes relevant IoT components and their field of use.

Data Analysis in Smart Farming, The Manner, and The Profit. Smart agriculture can be seen as a combination of smart farms, smart farmers, and intelligent consumers. The farming process is considered as smart if it includes the following intelligence levels [11]: 1. Readjusting; 2. Sensing;

272

Elmehdi. Ouafiq et al.

Table 1 Relevant IoT components (sensors) and their field of use in Smart Farming IoT component

Field of use

pH sensors

To measure the content of nutrients in the soil required for irrigation

Temperature sensors

To measure both agriculture equipment and farm temperature

Moisture sensors

To measure and make the correlation between dielectric constant, the resistance of the soil, and moisture content

Water pump

Activated based on the result of analysis on data generated from the smarts devices; as a result, water will be supplied to the field/land [11] [5] with a given quantity

3. 4. 5. 6.

Inferring; Learning; Anticipating; Self-organizing.

Big data is capable of intervening in SF and remedying the limitations of traditional decision-making and storage systems, which cannot handle the massive amounts of data. Its applications in agriculture include sensor deployment and analysis, data modeling, and predictive modeling to manage the endanger of crop failure and increase feed efficiency inbreeding [2] and maintain the efficiency of agricultural practices in different domains. Table 2 indicates some of the perspectives of data analysis.

3 Big Data Architecture for Smart Farming 3.1 Data Sources The development of IoT is providing the farms with massive data that can be accessible in real-time and batch. Robots and intelligent machines also produce nonstructured and non-traditional data such as videos and images. Social Media (SM) is still one of the important sources of human experience/made data. In the field of agriculture, generally there are three types of data generated [2]: 1. Data Mediated by Process (DMP): Most of this data is generated from IS (e.g., ERP, CRM, etc.), also primary and secondary bioinformatic DB. Example of DMP: Laboratory Information Management System (LIMS) data, Customer Complaint data from a Consumer IS; 2. Data Generated by Machine (DGM): We are in the position to use satellite imagery, and also sensors data. Most of this data is generated from Intelligent Machines, Robots, and IoT devices. Example of DGM: Pressure and flow sensors, Visible light communication VLC;

IoT in Smart Farming Analytics, Big Data Based Architecture

273

Table 2 Perspective behind data analysis in smart farming Domains

Perspective

Geographical Segmentation and Periodical segmentation.

Discovering why areas perform better than others [3]. Spatial distribution and concentration of crops and their yields and their crop periods

Temperature Monitoring and Mechanical systems maintenance.

Expecting crop prices and precipitation. Monitoring temperature [4]. For irrigation and also equipment maintenance

Bioinformatics

Data analysis and visualization to make biological data available to the specialists in electronic format in a single and specific place [1]

Water Management

- Promoting watershed sustainability based on dynamic and spatial interactions between biophysical and human processes - Maintaining water pressure within appropriate bounds to reduce the water leaks in water distribution system and suggest emergency actions - Analyzing the water consumption patterns and visualizing the result. Gathering this data that will be considered as historical consumption data and will also provide other, that can be used to predict future water consumption [5] - Smart water dripping will be advantageous for farmers to irrigate farms automatically and effectively using an automated soil temperature based irrigation system [13]

3. Data from Human Origin (DHO): Most of this data is generated from human experience. It can be considered as a challenge, as regards to collect only the relevant data from SM. Technical Constraints. Since the data size is increasing rapidly, the amount and quality of the retrieved data cannot be expected. This farming big data is way too large to be stored on a single node, but it should be distributed across multiple nodes. Also pressure and water flow can be measured in more than two dozen areas [8]. With these requirements, traditional storage systems proved incapable to capture leak location, detailed consumption, and water loss data. With this amount of data sources and collected data, we can observe why utilities hardly have time to process only 40 percent of the data generated, where valuable data might be unused [7]. Attempts have been made to make Big data technologies involve in data storage and processing in agricultural fields. In [4], the authors used Hadoop to process and store 1.44 million data records daily, for temperature monitoring. In [5], the authors proposed a Hadoop based application for watershed management, in order to handle users’ scalability and allow them to access and run the model without latency problems. In [8], Hadoop and NoSQL were used to collect data from distribution systems and they compared the performance with MySQL DB.

274

Elmehdi. Ouafiq et al.

Hadoop solutions for SF can be effective only next to a data migration strategy that intercalates data modeling/processing/storage/quality, system administration and configuration, and a Data Architecture that meets all these requirements as well as the technical constraints which are • Data files: most of SF data comes in small files which lead to a massive amount of large small files. On the other hand, Hadoop Distributed File System (HDFS) is designed to process a large volume of data, where the size of the block is 128 MB. Small files can make the Name Node running out of memory by handling a large amount of metadata. For this perspective [14] work proposed a mechanism based on Hadoop Archives named NHA to enhance the utilization of the memory for metadata and also improve accessing small files. But reading files directly from HDFS still providing better computation, which leads to the second constraint; • Computation performance for SF process: the authors in [15] proposed a solution to improve the computation performance in Hadoop while compressing large– small files to increase the storage space, by doing a comparison between different compression algorithms, where bzip2 gave the best computation performance compared to others (gzip, snappy LZ4) and saved the storage space more than 70% of raw-text file, while raw inputs still providing better computation performance. The scheduling of the jobs in Hadoop and resources management play a big part in performance [18]. Where at least resource, task, migration of data block, network constraints [18] should be considered while a task is being processed on a machine x in order to guarantee that: 1. For all tasks (number of tasks “N”) and all the machine’s slots (map/reduce slots: “m sk ”), there should be no longer memory than obtainable (RAM quantity on a machine x: “m rx ”) is used (RAM quantity needed from a task k : qkr ): m sk N qkr ≤ m rx ; s=1 k=1

2. The number of reduce slots “m rxs ” should be higher than the number of reduce tasks “N r ” that run on the machine x at that time: N r /m rxs ≤ 1; 3. For all tasks and machines (number of machines “M”) After migration of a block b on machine x from x , the capacity of hard drive “m hx ” of the machine x cannot be exceeded by disk space required/used (Hard drive quantity required by task N M k): qkh + S(b) where S is the data block size in the cluster and Bk is x=1 k=1 b∈Bk

a list of manipulated blocks by the task k. Data Architecture and Processing: The major challenge is building a Data Architecture that meets all data processing requirements for Smart Farming Analytics (SFA). This architecture must: 1. Handle the variety of sources and take into account the particularity and data type of each data source;

IoT in Smart Farming Analytics, Big Data Based Architecture

275

2. Favor real-time processing for DGM and DHO and batch processing for DMP and (sometimes) DHO; 3. Provide a single source of truth/access to all critical Key Performance Indicators (KPI) for farming analytics, where data is stored in a format that can be processed by data scientists and data visualization tools; 4. Ensure data quality. In this research, we proposed an SFOBA, which is inspired by Lambda Architecture [17], aligned for agricultural analytics where the query = function (farming-physical-model’s data), and can be applied in whole agriculture fields. This architecture can handle large workloads and is convenient with fault-tolerant systems that should be built in layers. Each one executes particular functionalities and performs read/write actions upon the previous one. It offers real-time layer and batch layer (which can be seen as a data lake system [19]) and can be implemented in On-premise and Cloud Hadoop platform to take advantage of the vast compute-power-dimensions scalability options, main memory, on-demand, and unlimited storage offered by Cloud Computing [20]. For batch processing/layer: 1. Shared area: is not a schema but is a set of landing folders that are structured in a way to meet the farming, physical data model requirements [22]. For security perspective and while processing less secured systems, it is recommended to pass by edge node before getting connected directly to Hadoop’s Data Lake. Edge nodes are now oriented to ingest data and need more storage space and drives. 2. Raw zone: is the area where raw data will be landed directly into Hadoop, or pushed from the shared area. A QC and “farming” business rules verifications will be done on the tables; 3. Trusted zone: contains structured and curated data, where tables are stored based on the SFA data model. This zone becomes the “source of truth” of the analytics system. After data is ingested in this zone, the files that are put under the raw zone schema have to be deleted so as not to impact the performance of the Name Node; 4. Access layer: the area where data will be put based on the data model and in a format that can be processed by data scientist and visualization tools. After data is ingested in this zone, and as soon as the trusted zone, tables are not used in calculation, they can be compressed. For real-time processing, the pipelines will be built on top of Hadoop to connect with the sources and collect DGM and DHO to be stored in the access layer. Figure 3 presents our proposed Smart Farming Oriented Big Data Architecture [22, 23]. Our proposed data lake is zoned correctly, to meet SFA requirements and offer access to the data gathered from different sources in various transformation states. The raw zone allows tracking farm data in their raw format. The trusted area provides structured and curated data ensuring data quality. The access layer provides customized shared data for each field of agriculture, in the form of a star schema. This form is composed of fact and dimension tables and

276

Elmehdi. Ouafiq et al.

Fig. 3 Smart Farming Oriented Big Data Architecture

joins that link them together so that we can perform Online Analytical Processing (OLAP) on top of Hadoop using Apache Kylin [16]. Kylin provides random and real-time access to the data by storing OLAP cubes in HBase that supports analytics in real-time [16]. The access layer gives a better querying performance in terms of KPIs calculation due to the analysis-across-time based design, few tables to query on, non-complex joins, and also the flexibility to meet smart farming business needs. Figure 4 shows the Apache Kylin performance with the Star Schema Benchmark dataset [24] on 10, 20, and 40 million rows of data, where the multidimensional queries on top of Hadoop have finished in less than one second [25].

Fig. 4 Apache Kylin Fast Multidimensional queries on top of Hadoop

IoT in Smart Farming Analytics, Big Data Based Architecture

277

Table 3 Software Components for Batch and “Near” Real-Time Processing Processing Tool

Usage

Batch

Python To process manual files and perform quality checks and transformations and “Farming” Business Rules (FBR) verifications on Raw Data. And compress trusted zone tables

Batch

Sqoop

To load RDBMS data in Data Lake/HUB

Batch

Hive

To store data in Hadoop under Hive warehouse (for managed tables), and gives metadata to the files pushed in HDFS (for external tables)

Batch

Shell

Can help running HDFS commands and scheduling them. Making connection between the Edge Node and the Hadoop. Can replace Python for preliminary quality checks in Edge Node

Batch

Oozie

To automate storage and process actions and synthesize them as Directed Acyclic Graph

Real-Time

HBase

Random access database; provides real-time reads and writes and can quickly run queries for Dimensional Smart Farming Data Model

Real-Time

Kylin

Gives quick responses to the FBR queries using HBase, and Hive as a source [18]

Real-Time

NiFi

Provide connection with smart devices and build real-time pipelines

Real-Time

Kafka

To build listeners on smart devices, and a scalable, durable, fast collection of data

Real-Time

Spark

Consumes Kafka’s streaming data for real-time ingestion and analysis. It can replace all Python jobs (for batch and real-time) and offers better performance

4 Software Components The components have to be chosen based on the particularity of each layer and each zone. Table 3 represents the selected software components and their usage. Thinks to Hadoop-MapReduce-Framework SF data after being modeled will be processed in parallel, distributed, and replicated through various nodes, providing scalability and high availability [21]. IoT ephemeral data can land on the shared area to be checked, either process or reject. As a centralized repository of Hadoop, the three data lake schemas are working harmoniously to prepare, clean, transform, and ingest the data, which guarantees time and resource-saving. Figure 5 presents an Activity Diagram of water management analytics based on SFOBA.

5 Conclusion IoT is now constituting a viable data source in many domains of agriculture. Researches have been made to take advantage of big data technologies to handle the massive amount of data (DMP, DGM, DHO) in many agriculture fields. Our proposed

278

Elmehdi. Ouafiq et al.

Fig. 5 Activity Diagram of water management analytics based on SFOBA

solution is introducing big data technologies to smart farming and descends into the next level of granularity, giving an architecture adapted to meet the SFA requirements. The proposed solution takes into account different data sources, data modeling, software components, and technical constraints. The research is continuing to enrich our proposal to enhance the job scheduling performance, mastering data quality of smart farms, and providing an infrastructure dedicated to the SFA favoring the facilitation of agricultural practices.

References 1. Kumar, V.: Big data analytics: bioinformatics perspective. IJIACS 5(6), 1–7 (2016) 2. Wolfert, S.: Big Data in smart farming–review. Agric. Syst. 153, 69–8 (2017) 3. Vadivu, S., Kiran, V., Devi, P.M.: Big data analysis on geographical segmentations and resource constrained scheduling of production of agricultural commodities for better yield. In: Fourth International Conference on Recent Trends in Computer Science & Engineering, vol. 87, pp. 80–85. Elsevier (2016) 4. Zhou, T.: Temperature monitoring system based on hadoop and vlc. In: 8th ICICT, Procedia Computer Science, pp. 1346–1354 (2018) 5. Koo, D., Piratla, K., Matthews, J.: Towards sustainable water supply: schematic development of big data collection using internet of things (IoT). In: International Conference on Sustainable Design, Engineering and Construction, vol. 118, pp. 489–497, Elsevier (2015) 6. Hu, Y., Cai, X., DuPon, B.: Design of a web-based application of the coupled multi-agent system model and environmental model for watershed management analysis using hadoop. Environ. Model Softw. 70, 149–162 (2015) 7. Thompsona, K., Kadiyala, R.: Leveraging big data to improve water system operations. In: 16th Conference on Water Distribution System Analysis, vol. 89, pp. 467–472, 2014 8. Jacha, T., Magieraa, E., Froelicha, W.: Application of hadoop to store and process big data gathered from an urban water distribution system. In: 13th Computer control for water industry conference, vol. 119, pp. 1375–1380. Elsevier (2015)

IoT in Smart Farming Analytics, Big Data Based Architecture

279

9. OCPPC: https://www.policycenter.ma/publications/morocco%E2%80%99s-water-securityproductivity-efficiency-integrity. Last accessed 15 Jan 2020 10. National Office of Potable: http://www.onep.ma/. Last accessed Nov 2009 11. Roy, S.: IoT, big data science & analytics, cloud computing and mobile app based hybrid system for smart agriculture. In: 8th Annual Industrial Automation and Electromechanical Engineering Conference, pp. 303–304 (2017) 12. Patgiri, R., Nayak, S.: Data of things: the best things since sliced bread. In: International Conference on Communication and Signal Processing, pp. 341–348 (2018) 13. Padalalu, P.: Smart water dripping system for agriculture/farming. In: 2nd International Conference for Convergence in Technology, pp. 659–662 (2017) 14. Vorapongkitipun, C., Nupairoj, N.: Improving performance of small-file accessing in hadoop. 11th International Joint Conference on Computer Science and Software Engineering, pp. 200–205 (2014) 15. Rattanaopas, K., Kaewkeeree, S.: Improving hadoop mapreduce performance with data compression. In: 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, pp. 564–567 (2017) 16. Ranawade, S.V.: Online analytical processing on hadoop using apache kylin. Int. J. Appl. Inf. Syst. 12, 1–5 (2017) 17. Lemberger, P., Batty, M., Moral, M., Raffaelli, J.: Big data and machine learning, 2nd edn, Dunod (2019) 18. Jlassi, A., Martineau, P.: Offline scheduling of map and reduce tasks on hadoop systems. In: 5th International Conference on Cloud Computing and Services Science (2015) 19. Persicoa, V., Pescapéa, A.: Benchmarking big data architectures for social networks data processing using public cloud platforms. Elsevier 89, 98–109 (2018) 20. Pargmann, H.: Intelligent big data processing for wind farm monitoring and analysis based on cloud-technologies and digital twins. In: 3rd International Conference on Cloud Computing and Big Data Analysis, pp. 233–237, IEEE (2018) 21. Hadoop: hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html. Last accessed 15 Jan 2020 22. Schneider, R.D.: Hadoop for dummies, special edition, chapter 16: Deploying Hadoop, part IV Administering and Configuring Hadoop, Edge Node, p. 326 (2012) 23. Dzone: dzone.com/guides/big-data-data-science-and-advanced-analytics. Last accessed Jan 2020 24. Star Schema Benchmark: github.com/Kyligence/ssb-kylin. Last accessed 15 Jan 2020 25. Kyligence: kyligence.io/blog/integrating-kyligence-analytics-platform-with-microsoft-azurehdinsight/. Last accessed 15 Jan 2020

Review of Internet of Things and Design of New UHF RFID Folded Dipole with Double U Slot Tag Ibtissame Bouhassoune, Hasna Chaibi, Abdellah Chehri, Rachid Saadane, and Khalid Menoui

Abstract The Internet of Things (IoT) is a promising technology that makes the interaction between things, such as sensors, medical objects, smartphone, food, and humans and enables the smart objects to be part of the Internet environment. The emerging IoT paradigm opens the doors to new infrastructures, networks, and smart objects to be part of IoT development and working together for enhancing the quality life of users. This paper first discusses the state-of-the art of IoT, its basic points like connectivity, their components, architecture, and applications that are necessary to know about the IoT. In addition, the paper presents an overview of RFID technology, which is considered as an essential element in IoT technologies, also an example of a novel component of RFID system is developed to make a connection between objects for tracking and processing.

1 Introduction In recent years, the Internet of Things (IoT) provides connectivity for anything at any time and place; this new technology is omnipresent in different aspects of human life, such as cities, hospitals and healthcare centers, universities, and industrial I. Bouhassoune · K. Menoui LRIT Laboratory, Mohammed V University, Rabat, Morocco e-mail: [email protected] K. Menoui e-mail: [email protected] A. Chehri University of Quebec in Chicoutimi, Chicoutimi, QUÉBEC G7H 2B1, Canada e-mail: [email protected] H. Chaibi · R. Saadane (B) SIRC/LaGeS-EHTP, EHTP Km, 7 Route El Jadida, Oasis, Casablanca, Morocco e-mail: [email protected] H. Chaibi e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_23

281

282

I. Bouhassoune et al.

environment. The IoT consists of objects, sensors, and electronic devices that exchange the data between us and make the real connectivity between the physical world and applications. The IoT has grown up in size and quantity of devices due to the important benefits that can be gained from using this technology. Therefore, it is considered as a result of the advancement in multiple services like monitoring, managing, and automating human activities, and it satisfies the Quality of Service (QoS) metrics, such as security, cost, timing, energy consumption, reliability, and availability also increase the quality of human life [1]. The IoT includes three levels that use the hardware in the first level followed by the topology in the second level and applications and services in the last level, it consists of a set of sensors, tags, and applications that are able to make a structured communication between devices from a remote database. The organization of IoT connectivity is essential to provide the flexible connection and robust communication protocols, from tiny sensors and tags capable of sensing, tracking, and transmitting the information to servers that are utilized for data analysis, also the integration of mobile devices, routers, and hubs, and humans represent a crucial component of IoT connectivity [2]. The RFID system represents one of the important technologies in the IoT system, behind wireless sensor network, ZigBee and Bluetooth. These technologies have been presented and developed previously in many works [3, 4]. Also, the technical components of IoT like RFID system, WSN, and WBAN are surveyed in several researches. But much less attention have been devoted to the Internet of things connectivity and their components. In this paper, we discuss key issues and inherent challenges facing IoT approaches, techniques, and architecture in the IoT. We study from the literature the existing IoT connectivity, which use different technologies such as RFID, WSN, and WBAN, and which are based on different components and topologies. Subsequently, we provide an example of RFID system, by designing a new RFID tag to make the connection between the objects. The remainder of this review is organized as follows. Section 2 provides a survey of the different concepts adopted in the IoT. Section 3 deals with different architectures of IoT. Section 4 tackles the communication used in the IoT. Section 5 presents the application of IoT. Finally, Sect. 6 discusses RFID technology components and our proposed RFID Tag design placed on the industrial object for identification and tracking.

2 The Internet of Things Connectivity Concept The IoT concept is characterized by various interworking, billions of devices, and several technologies for the distribution of services and applications. The main challenge of this paradigm is a large amount of information that must be analyzed in real-time. So, the smart connectivity is required for the available link between the devices and human. The worldwide research has been growing rapidly in the areas

Review of Internet of Things and Design of New UHF RFID …

283

of wireless, cellular, and satellite technologies, on the one side, computing and artificial intelligence, on the other side. This development has opened new business area, which evolved the concept of IoT connectivity. [4] In this context, the services and applications get a determining role of the IoT connectivity, which are allowing new ways of utilizing existing infrastructure, and for enhancing the connectivity of objects in the city, region, country, or worldwide, by including ZigBee, Bluetooth, RFID, and Wi-Fi in LANs, and traditional cellular networks GSM (2G), UMTS (3G), and LTE (4G), in WANs [1]. Several new technologies have appeared recently to connect. These novel systems, characterized by their low power consumption and wide coverage are generally known as Low-Power Wide-Area (LPWA) technologies [5]. There are two categories of this technology. The first one is proprietary LPWA such as Sigfox, LoRa, Ingenu, which operate on unlicensed spectrum. The second one is cellular LPWA such as NB-IoT, LTECatM1, EC-GSM, which operates on licensed spectrum, and is realized to furnish a cellular option and is listed in the latest cellular standard. [6] To select the adequate IoT connectivity technology for a specific application, when facing such a different option of technologies, the three essential requirements must be recognized such as coverage, energy efficiency, and data rate, they represent the opposed goals that every IoT technology must make tradeoffs with. Also, they are included in three main IoT concepts: technical, commercial, and ecosystem.

3 The Internet of Things Topology Techniques The IoT topology is the way in which the nodes are organized and connected to each other forming a network. The two known techniques of IoT topology are mesh and star. The star topology is a technique where all nodes are attached to a central node, which acts as a gateway show figure. A Wi-Fi network is an example of a star topology, where the center node is called an Access Point (AP) and the other nodes are called stations. In a mesh topology, all nodes are connected to many other nodes. Reference [5] ZigBee Light Link network is an example of a mesh topology, One of the ZigBee nodes is called a coordinator, and it usually serves also as an Internet gateway (Show Fig. 1a). Fig. 1 The IoT topologies

284

I. Bouhassoune et al.

The technique of mesh network is more complex and can require a longer delay routing a message from a distant node through the mesh, compared to the star networks. (Show Fig. 1b). The benefit of a mesh topology is maintaining low radio transmission power by extending the range of the network through multiple hops. They can also achieve better reliability by permitting more than one path to link a message in the network. Network size is also an important consideration in IoT connectivity design. Some technologies like Bluetooth support up to 20 connections; other technologies, like ZigBee, can support thousands of connections [6].

4 The IoT Connectivity Standards One of the most important challenges of IoT connectivity is communication standard, this challenge is considered the principal objective of many organizations that define the specifications of communication and organize the interoperability between devices, like OSI network model. So, the adequate selection of the standards affects the performances of IoT networks and applications. Multiple standards are available in different environments of IoT connectivity. We have reviewed some of these standards in the next subsections.

4.1 Wireless Fidelity (Wi-Fi) Wi-Fi technology is a wireless Fidelity development used for local area networks and the Internet, it bases on the IEEE 802.11 standard, currently, this standard has four versions IEEE 802.11 a, b, g, and n [7], with different data rates, signal ranges, and methods of operation. The tremendous success of Wi-Fi is due to the interoperability programs provided by the Wi-Fi Alliance and to its usability in most infrastructure like homes, enterprises, medical centers, also it’s integrated in all new laptops, smartphone, tablets, and Tvs; therefore, the Wi-Fi technology becomes the primary communication standard for IoT connectivity. [8] Most Wi-Fi networks operate in the ISM 2.4-GHz band. Wi-Fi can also operate in the 5-GHz band where more channels and higher data rates are available, to assure good Wi-Fi coverage [9]. Most Wi-Fi Access Points (APs) support for up to 250 devices connected simultaneously. Wi-Fi is the most ubiquitous wireless connectivity for IoT technology. Its high power and complexity have been a major obstacle for IoT engineer, but new devices decrease many of the challenges and enable the integration of Wi-Fi into several IoT applications.

Review of Internet of Things and Design of New UHF RFID …

285

4.2 Bluetooth Bluetooth [10] standard was invented by Ericsson in 1994 for wireless communication embedded in electronic devices, phones, and laptops used for Wireless Personal Area Networks (WPAN). The Bluetooth is standardized by Institute of Electrical and Electronics Engineers (IEEE) as 802.15.1 but today the Bluetooth standard is supported by the Bluetooth SIG which has developed several versions, such as 1.0, 2.0, 3.0, 4.0, and 5.0. The most recent versions are 4.0, also known as Bluetooth Low Energy (LE) [10], and 5.0 [11] that are specifically developed for the IoT network. The operating band of Bluetooth standard is 2.4-GHz ISM band, with 1 MHz channels and signal range from 10 to 100 m [12]. Bluetooth became very popular in smartphones, is used in a point-to-point or in a star network topology. Bluetooth Low Energy (also known as Bluetooth Smart) is a more recent addition to the Bluetooth specification, developed for lower data throughput [6].

4.3 ZigBee The ZigBee [13] technology is a low-energy WPAN standard communication maintained by the ZigBee Alliance that ensure the interoperability between devices, based on the IEEE 802.15.4 standard [14] and is a low-power and low-cost innovation. It operates in the 2.4-GHz ISM band and the specter that can also support the 868-MHz data rate is 20 Kbps and 915-MHz data rate is 40 kbps in ISM bands [15]. ZigBee can deliver up to 250 Kbps of data, but is a just lower data rate can be delivered. It also has the capacity to maintain very long sleep intervals and low operation duty cycles to be powered by batteries for years. ZigBee can be used in several applications, and it has gained the biggest success in smart energy, another reason standard that makes the ZigBee standards the stronger technology in the IoT connectivity is because of its mesh network topology that can include the thousands of nodes [6].

4.4 RFID RFID technology is the most standards used in the IoT, because it permits to identify and track object from different distances according to the type of RFID applications, this technology contains two devices tags and reader, the tag can be attached to the object and harvest the energy from the reader that identify and read the information stored in the tag [16]. RFID tags are divided in two types active and passive RFID tags, with battery or batteryless, and operates in different bands such as Low Frequency (LF), High Frequency (HF), Ultra High Frequency (UHF), Microwave, and UltraWide-Band (UWB), all of them have a different signal range and read distance [17].

286

I. Bouhassoune et al.

4.5 SigFox SigFox [18] standard is a technology developed by a French company in 2009 for IoT nodes with low energy consumption, low data rate, and low signal [19], it operates at 868 MHz in Europe and 902 MHz in the USA in ISM band (Industrial, Scientific, and Medical). SigFox operates in Ultra Narrow Band (UNB) for transmissions and uses the Differential Binary Phase Shift Keying (DBPSK) modulation at 100 bps [16], it uses short messages for communication with uplink of 12 and downlink of 8 bytes. The data rate of a message transfer is 100–600 bps. The transmission between devices and a network is unsynchronized because every device sends broadcast messages three times on three different frequencies [20].

5 The IoT Applications The modern IoT applications in different fields of life such as smart cities, healthcare, smart homes, smart manufacturing. The usage of IoT provides a management of energy, time, and optimal planning of devices distribution. In addition, this management can be used in the emergency cases such as healthcare centers and restoration services. The applications of IoT can be divided into three main categories: Smart city, healthcare, and industrial, Ref. [4] an example of IoT applications is illustrated in Fig. 2.

Fig. 2 The IoT applications

Review of Internet of Things and Design of New UHF RFID …

287

6 Proposed RFID Tag Used in IoT Network In this section, we present the layout of folded dipole with double U slot tag sensor for UHF RFID applications with the goal of miniaturization and the ability to host sensors and other electronic components like batteries, and to be part of IoT components. Our suggested tag conception has a small size, contains a flexible PVC plastic substrate (permittivity = 2.7, 0.007 loss tangent, thickness 2 mm) in the ground surface, and is covered by the adhesive copper. The proposed antenna is designed for connecting the tag chip (Alien Higgs 4, SOT232 package, Z chip = 34 − j142 ). We then placed the tag on the glass substrate to discuss its radiation performance and test its capability to be placed on the industrial object. The design and geometrical parameters of the proposed tag are presented, respectively, in Fig. 3 and Table 1. All the numerical simulations of the tag are performed via HFSS (high-frequency structure simulator) software [21].

Fig. 3 Geometry of the proposed RFID tag

Table 1 Dimensions of the proposed RFID tag Geometrical parameter

Dimensions (mm)

L

120

w

40

L

104

w

23

A

52

B

19

a

20

b

20

288

I. Bouhassoune et al.

6.1 The Matching Feature and Radiation Performances of the Proposed RFID Tag To study the matching and radiation performance sensor tag, Ansoft HFSS is used to simulate the proposed antenna performance. The microchip was modeled in this solver by introducing the lumped port that simulates the behavior of the IC (with its complex impedance feed) [22]. The return loss of this antenna was calculated based on the power reflection coefficient, which considers the complex impedance of the microchip. Figure 4 shows the reflection coefficient S11 plot versus frequency of the proposed RFID tag with PVC plastic substrate in free space. The maximum simulated S11 of this tag has a value of −24 dB at the resonance frequency 920 MHz, the bandwidth has a value of 60 MHz ranging from 890 to 950 MHz. Figure 5 shows the Total gain simulation versus frequency of proposed tag in free space, the maximum value of the gain is 1.67 dBi, obtained around 926 MHz.

Fig. 4 The reflection coefficient of the proposed RFID tag

Fig. 5 Total gain of the proposed RFID tag in UHF band

Review of Internet of Things and Design of New UHF RFID …

289

Fig. 6 Reflection coefficient S11

6.2 The Matching Features and Radiation Performances of the Proposed RFID Tag Placed on the Glass To test the functionality of proposed folded dipole with double U slots RFID tag, we are putted in glass material. From the simulation results, the proposed tag presents a good matching feature and well radiation pattern. Figure 6 shows the reflection coefficient S11 plot versus frequency of the proposed RFID tag with PVC plastic substrate, when it is placed on the glass ground plane. The maximum simulated S11 of proposed tag reaches −27.33 dB around resonant frequency 855 MHz. Figure 7 presents simulated Total gain versus frequency of proposed tag placed on the glass substrate in UHF band, we note that the plot of the gain decreases in the high frequency, but maintains a good gain value around 960 MHz, the maximum value of the gain is 1.63 dB, obtained around 840 MHz.

7 Conclusion This paper explores a brief review of the IoT connectivity and presents the technical, commercial, and ecosystem concept of IoT connectivity, also discusses the predominant communication standards, distributing over multiple frequency bands and using different communication standards and protocols. There are many smart connectivity parameters: Security, privacy, reliability and quality of service, and good coverage in distant areas, to reach solid smart IoT connectivity deployment. The challenges of IoT are addressed based on several connectivity systems. For low-power applications, ZigBee standards are adequate and for higher bandwidth applications

290

I. Bouhassoune et al.

Fig. 7 Total gain of the proposed RFID tag on the glass substrate

Wi-Fi is the best. In addition, we have proposed a new design of RFID system, which represents the best system of IoT connectivity and assures a connection and communications between several objects. RFID tag is an important component of connectivity system, it can be placed on the object and human for medical, commercial, and industrial application. In our study, we have designed a new RFID tag in UHF band for industrial objects, we then tested the performance of our proposed tag in different environments such as glass and plastic materials. The Proposed RFID tag presents a good radiation performance and well-matching features in free space and in glass object. Therefore, it’s considered a good candidate in the industrial RFID applications.

References 1. Ahmad, M., Ishtiaq, A., Habib, M.A., Ahmed, S.H.: A review of internet of things (IoT) connectivity techniques. EAI/Springer Innovations in Communication and Computing. Springer (2019) 2. Al-Momani, A. M., et al.: A review of factors influencing customer acceptance of internet of things services. Int. J. Inf. Syst. Serv. Sect. 11(1), 54–67 (2019) 3. Guinard, D., et al.: From the internet of things to the web of things: Resource oriented architecture and best practices. In: Architecting the Internet of things, 97–129 (2001) 4. Din, I.U., et al.: The internet of things: A review of enabled technologies and future challenges. IEEE Access 7, 7606–7640 (2019) 5. Chehri, A.: Energy-efficient modified Dcc-Mac protocol for IoT in E-health applications internet of things (2019) 6. Chehri, A., Saadane, R.: Zigbee-based remote environmental monitoring for smart industrial mining. The Fourth International Conference on Smart City Applications, Casablanca, Morocco(2019)

Review of Internet of Things and Design of New UHF RFID …

291

7. Sharma, P., Chaurasiya, R., Saxena, A.: Comparison analysis between IEEE 802.11 a/b/g/n. Int. J. Sci. Eng. Res. 988–993 (2013) 8. Samuel, S.S.I.: A review of connectivity challenges in IoT-smart home. 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (2016) 9. Pietrosemoli, E.: Setting long distance wifi records: proofing solutions for rural connectivity. J. Commun. Inf. 4(1), 1–10 (2008) 10. Sturman, B. J.: Bluetooth 1.1: Connect without cables. Pearson Education (2001) 11. Raza. S, et al.: Bluetooth smart: An enabling technology for the internet of things. IEEE WiMob, 2005 12. Collotta, M., Pau, G., Talty, T., Tonguz, O.K.: Bluetooth 5: a concrete step forward towards the IoT. arXiv: 1711.00257. (2017) 13. Madakam, S., Ramaswamy, R., Tripathi, S.: Internet of things (IoT): a literature review. J. Comput. Commun. 03(05), 164–173 (2015) 14. Vatkar, N. S., Vatkar, Y. S.: Zigbee: A wireless network (2016) 15. Chehri, A., Jeon, G., Choi, B.: Link-quality measurement and reporting in wireless sensor networks. Sensors (Basel, Switzerland) 13, 3066–3076 (2013) 16. Lee, J.S., Huang, Y.C.: ITRI ZBNode: a zigbee/IEEE 802.15.4 platform for wireless sensor networks. In: IEEE International Conference on Systems, Man and Cybernetics (2006) 17. Weis, S.A.: RFID: principles and applications. System 2(3), 1–23 (2007) 18. Dobkin, D.: The RF in RFID. Elsevier, Burlington, MA (2007) 19. Chehri, A., Mouftah, H.T.: Link adaptation-based optimization for wireless sensor networks routing protocol. In: Proceedings of IEEE 26th Biennial Symposium on Communications, pp. 142–145, (2012) 20. Sigfox, Tech. rep. https://www.sigfox.com 21. ANSYS HFSS 17.1, EM simulation software (2020) 22. Bouhassoune, I., Saadane, R., Menoui, K.: RFID double-loop tags with novel meandering lines design for health monitoring application. Int. J. Antennas Propag. (2019) 23. Bouhassoune, I., Saadane, R., Chehri, A.: Wireless body area network based on RFID system for healthcare monitoring: progress and architectures. In: 15th International Conference on Signal Image Technology & Internet Based Systems, Italy (2019)

Smart Water Distribution System Based on IoT Networks, a Critical Review Nordine Quadar, Abdellah Chehri , Gwanggil Jeon, and Awais Ahmad

Abstract The purpose of this paper is to discuss different existing technologies related to sensing in smart cities. The continuous growth of urban areas is a reality that should be faced by innovating more solutions that are efficient. Smart cities are one of the remarkable solutions, it can be seen as different intelligent systems or platforms that work together to ensure better sustainability. Sensors are at the core of smart cities. They collect data from different environments or infrastructures in order to send them to the cloud using different communications platforms. These data can be used to better manage the infrastructures or provide smarter services. However, they are various issues and challenges related to the ubiquitous sensors that should be solved. In the last section of this paper, a case study of smart water distribution system is presented with an overview of the related issues and challenges such as reliability, cost, and scalability. Also, a table is provided in this section to compare the results and challenges of the last five studies on producing smart pipes with the most common challenges.

N. Quadar University of Ottawa, Ottawa, ON K1N 6N5, Canada e-mail: [email protected] A. Chehri (B) Department of Applied Sciences, University of Québec in Chicoutimi, Chicoutimi, QC G7H 2B1, Canada e-mail: [email protected] G. Jeon Incheon National University, Incheon, South Korea e-mail: [email protected] A. Ahmad University of Milan, Milan, Italy e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_24

293

294

N. Quadar et al.

1 Introduction According to the World Health Organisation (WHO), the number of urban population is expected to reach 5 billion by 2030 and will keep increasing [1]. This fact will cause some serious urban issues related to congestion, pollution, health and safety risks, resource shortage, and others. So for these reasons, innovative solutions such as smart cities are required to solve these issues. Moreover, the climate change becomes a reality, and green solutions are necessary to better use the existing resources and look for new alternative ones such as renewable energies. A smart city can be seen as a city where all its systems (economics, social, and technical) work together with a cohesive and intelligent way that will ensure better sustainability. These systems should help to improve the infrastructures and make them intelligent, increase the efficiency of the energy use, and provide smarter services, building, healthcare, and others. To do so, certain components are involved in this process such as sensors, communication technologies, and sensing platforms. Sensors are used to monitor different infrastructures such as bridges and public buildings. They can also be used to gather data from outside or inside environment in order to have more control and adaptive solutions. These data can be used to predict future and act based on these results; an example of this application is the energy management. With the real-time monitoring of energy use, power plants can produce the exact amount of energy needed, which will save a lot of money and reduce related pollution. However, they are two many challenges behind the feasibility of these huge infrastructures such as compatible communication networks, power supply resources, and data storage. All these technologies and issues related to sensors in the context of smart cities are discussed in this paper. The next section reviews the evolution of sensors technologies. The second section discusses different existing communication networks and how can be applied to connect sensor nodes. Section 3 gives an overview of available sensing platforms. Section 5 describes an example of application of sensors in smart cities and discusses different challenges related to this application.

2 Sensing Technology Evolution Nowadays sensors represent the core part of any smart control system. In order for a process to be intelligent, its control system should be able to detect any change in its environment. To do so, different sensors are involved to gather all these changes of a physical nature and convert them to electrical signals that can be used and interpreted by the control system. These signals could be a result of a change in different parameters like temperature, humidity, light, pressure, acceleration, and others. The evolution of sensing technologies enables the feasibility of multiple applications that were impossible to implement in the past [2], as the cost was high and the limitation of the availability in the market of such specific sensors. The driven

Smart Water Distribution System Based on IoT Networks …

295

Fig. 1 Evolution of meter reading [3]

researches in this area allow not only sensors’ revolution, but also they help reduce related production cost. For the smart cities, the large availability of these technologies means a huge number of opportunities in terms of new enabling technologies that can be deployed in order to make the city smarter with low cost and more control. An example of the sensing evolution, in the context of smart cities, is meters that can be used to control the use of electricity, gas, and water. These meters have been electromechanical; however, today they became smart because of the new generation of technologies [2]. The challenge within this switch was to keep the same functionalities and, in the same time, guarantee high reliability, low cost, and easy maintenance. As can be seen from Fig. 1, the evolution of electricity meter reading has been jumped from the manual reading using the electromechanical meters to the Automatic Meter Reading (AMR) that helped to improve the reading accuracy and reduce the cost related to the meter’s manufacturing and implementation [3]. After that, a new generation called advanced metering infrastructure has shown better results as it allows twoway communication, which is a key factor in improving the energy management and optimization in a smart grid. Other example of sensing evolution is the CMOS based sensing, the main advantage of this new sensors generation, that can be used as a smart temperature sensor for instance, is their low power consumption as the CMOS circuit can be operational in the subthreshold area of MOSFETs [4]. This type of sensors can be applied in smart cities to control the quality of air and water and other parameters. Nano-technology sensing has known a very interesting development. These sensors are mostly used in application related not only to the surrounding sensing but also to the human’s health. Also a very important fact of the new sensing generation that should not be ignored is the smartphones use. These devices are fitted with different types of sensors: GPS, accelerometer, compass, and gyroscope. The main use in the smartphone scenario is the possibility to develop more crowed sourcing applications that can help to collect more data by outsourcing different tasks to a group of people having a smartphone.

296

N. Quadar et al.

These applications will be more and more in demand with the evolution of the Internet of Things and they can be categorized as follows [5]: • Personal mobile applications that can be used to monitor the person’s health such as heart rate. This will involve only the concerned person [6]. • Group mobile applications that can be used in the crowd sourcing case. Here a group of people is involved in order to share and exchange these data collected from the sensors installed in their smartphones. • Community mobile applications where the whole community is involved in order to better understand the urban dynamism will lead to solve urbanization’s issues.

3 Communication Technologies Communication between sensors is one of the big issues. In the context of smart cities, a million sensors should be connected to the network, so using a cabling method will cost a lot of money and complexity. For this reason, the communication between sensors should be wireless. Hence, new communication standards, which can be applied for a large number of sensors or devices, are mandatory [7]: • Field Area Networks: is used in the case of a smart grid to connect customers’ houses or buildings to substations. • Wide Area Networks: is used to connect smart grid house or building to utilities, this communication requires larger coverage and it can be achieved with infrastructures like 3G, LTE, and lines based on fiber optic. • Home Area Networks: Is typically used to connect all home devices to the network, it requires shorter range standards such as ZigBee, Dash7, and Wi-Fi (802.11 g/n).

3.1 Dash7 Technology Dash7 is a communication technology for the Wireless Sensor Networks (WSN) applications and it’s based on 433 MHz frequency band, it provides also a good penetration comparing to ZigBee technology [8]. For instance, in the home area networks, Dash7 offers better wall penetration than 2.4 GHz [9]. Moreover, it allows connecting long-distance sensors (up to 1 km) with low power; this can be applied for applications involved in building automation and logistics.

3.2 ZigBee Technology In smart cities context, this low power standard (IEEE 802.15.) has some limitations such as its limited range, which will require to install a huge number of repeaters

Smart Water Distribution System Based on IoT Networks …

297

in the city in order to ensure a full coverage of the millions of sensors to be used. However, it has an interesting solution to the addressing issue. As the transmission of IPv6 packets over the IEEE 802.15 will be possible using the 6LoWPAN concept, the issue will be solved [10, 11].

3.3 NFC and RFID Technology Radio Frequency Identification (RFID) and Near Field Communication (NFC) are two interesting technologies to use in smart city applications because of their shortrange communication that requires less power consumption and their low cost. A main application of the RFID is the localization system, as shown in Fig. 3, the RFID localization system consists of [12]: • RFID tags: there are two types of tags: active and passive. Active tags powered by a battery (lifetime of 7 years [12]) can communicate with the reader up to 300 m. The passive tag has a limited range as it can communicate only up to 1 m from the reader. • RFID reader: it uses a specific radio frequency and protocol to read data sent from tags. • Data processing subsystem: Execute localization algorithms using data received from the reader in order to make the localization available for different applications. Hence, each tag can be used as a sensor. Moreover, these tags do not require an active power supply as they can be powered only when needed. In the context of smart cities, various applications can use these techniques such as • Localization and tracking objects. • Healthcare applications. • Smart parking. NFC is used for a very short bi-directional communication range (centimeter scale) and it is used in mobile devices such as smartphones. The main difference between NFC and RFID is that NFC can communicate in both directions, which is not the case for RFID communication. In the context of smart cities, this technology can be used in various applications, below are some real-world examples: • Smart energy metering: In Chongqing city, China, a new generation of smart meter called NFC-enabled post-pay electricity meter has been deployed. This technology gives the opportunity to the costumers to read their smart meter using NFC phone, the payment will be automatically done by sending the encrypted data to the banking system [13]. • Data acquisition and control: NFC smartphones can be used as a gateway between a monitored device and the monitoring system in remote control applications. Figure 4 shows a block diagram of the components required for a control system

298

N. Quadar et al.

or wireless monitoring system. The NFC-enabled mobile phone can be used as the local reader device that will ensure the short-range communication to send and receive data to and from the sensors. The long-range technology that could be a Wi-Fi will help to communicate to and from the back-end [14]. • Smart car parks: NFC smartphones can be used as ticket to enter the parking lot and as a wallet to make payment when leaving the parking [15].

4 Sensing Platforms As the number of sensors required in smart cities applications increases, various platforms are available to manage the connectivity between sensors. These platforms are designed in a way that the sensed data is gathered from deferent sensors connected to these platforms. Sensing data can be humidity, temperature, and light, and then pre-processed before transmitting them to a sink node. These platforms should take into consideration the fact they will be deployed outdoors, so they should resist to any kind of environmental conditions.

4.1 Wireless Sensor Networks A wireless sensor network consists of a collection of nodes arranged into one network [16]. As shown in Fig. 2, each node is equipped with a processing device such as microcontroller, memory, RF transceiver, analog-to-digital converter, various sensors or actuator, and finally a power source. These nodes can communicate using wireless networks to transmit data to the cloud. The advantage of using Wireless Sensor Networks platform is their low power consumption (even if the power sources present once of its big limitation) and the Fig. 2 Wireless senor node architecture [17]

Smart Water Distribution System Based on IoT Networks …

299

low-cost sensor nodes that can be installed in any environment conditions. Moreover, the sensor nodes are available in the market with open source codes, the example of these nodes are Mica, Mica2, Telos, TelosB, and IMote2. Regardles of the variety of these techniques, they still have various issues regarding their implementation for such low-power devices. There are ongoing researches in this field to solve implementation issues related to this kind of application.

4.2 Internet of Things Platform As mentioned before, the sensor nodes send the gathering data to the sink node then send them to the cloud after some processing steps. Some solutions of this process are [17]: • iOBridge: this platform uses its own hardware that is connected to the cloud. The information or data sent can be accessible via web interfaces and can be used in web or mobile application in order to remotely control and monitor the sensor nodes. • Thingspeak: within this platform, users can upload data gathered from their sensors to the cloud. Users can register and have access to the customized dashboard where they can monitor and control their sensors or actuators. • HPCense’s: this platform is a sort of information ecosystems where a trillion of nanoscale sensors and actuators are connected to collect information regarding the planet such as the seismic activity [18].

5 Applications Water gas and oil have been considered as essential services in cities. Distributing, monitoring, and feeding the infrastructure of the city need to be regulated [19]. For example, Effective control procedure has to be applied in order to deliver safe water to the customers. The distribution system consists of a river or lake as a water source, storage, and a distribution network, which primarily composed of pipes that could be located above or underground and underwater. However, this method is non-intelligent. One of the particular problems is that it is difficult to diagnose the problem if the water leaks from underground pipes. Advanced sensing is used to make a more intelligent error detection system. Figure 3 shows potential locations to locate sensors and areas of interest. Examples of such applications include monitoring the water level in the storage, leak detection and observing the water quality. A wide field experiment with the Boston Water and Sewer Commission. They combined the components of the three-tier PipeNet prototype with the Boston Water and Sewer Commission as shown in Fig. 4. The general findings presented that the leak localization algorithms were fairly effective. However, local leak detection does not work effectively, a large fraction was

300

N. Quadar et al.

Fig. 3 Applying sensors in water distribution system

Fig. 4 PipeNet deployment overview [20]

detected using this algorithm. That is because the short length of the experimental pipe made the traveling waves form standing waves which affected the variations in the wave speed.

Smart Water Distribution System Based on IoT Networks …

301

Table 1 Comparison between study results and criteria Study

Reliability

Cost

Scalability

Technique

Partially

Yes

No

UWSN + FRS

No

Partially

Yes

No

Off-the-shelf MEMS

Stoianov and et al. [20]

No

Partially

No

Partially

WSN

Rashid and et al. [22]

No

Yes/on air only

No

No

WSN

Akyildiz and et al. [23]

No

Partially

No

No

WUCN

Power supply

Communication

Sadeghioon and et al. [21]

Yes

Metje and et al. [19]

As a result of reducing spatial and temporal noises, leakage detection, several solutions were proposed using Wireless Sensor Network (WSN). Challenges high data rates sampling, time of synchronization data collection, and limited budget, and method can be implemented automatically, quickly, and easily is required to combine with wireless sensor network for long-range pipeline networks. We compared the results and challenges of the last 5 studies on producing smart pipes with the most common challenges of this field as following: reliability, cost, and scalability. Although most of the experiments were pilot projects conducted with small-scale laboratories, they didn’t reach the whole goals and the required success, and they still have several challenges. The result of the comparison is illustrated in Table 1. There are certain areas that need further research if this concept is to become reality [24–27]. This include: • Powering the sensors: methods of getting energy from the environment important for expanding the lifetime of the pipe. • Communications: work on sensor-to-sensor and sensor-to-server communication is needed. • Scalability: the study should work in large-scale and real environments. • Cost: all the above requirements should be reached at low cost.

6 Conclusion In a world where resources are limited and cities use the most of these resources, it is necessary to make cities more sustainable. Improving and automating processes within a city will be the required role in smart cities. In this paper, an overview of the role of sensors in smart cities is discussed as well as communication networks and available platforms.

302

N. Quadar et al.

Then we gathered different studies about one of the most vital applications in smart cities, which are smart pipelines used to supply the cities with water or gas. The state of the art in each of the presented techniques by those studies is analyzed and related challenges are highlighted. Insights into their results and goals are provided. That is reached by comparing their achievements with each other and with the most common challenges in this area of study. The main finding was that although most of the experiments were pilot projects conducted with small-scale laboratories, still they didn’t reach the determined goals and the required success.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10.

11.

12. 13. 14. 15.

16. 17.

18. 19.

WHO.: World urbanization prospects, The 2014 Revision (2014) Intelligent Sensors for the Smart City (2020) IEEE Power and Energy Magazine.: The path of the smart grid (2020) Ueno, K., Asai, T., Amemiya, Y.: Low-power temperature-to-frequency converter consisting of sub-threshold CMOS circuits for integrated smart temperature sensors. Sens. Actuators 132–137 (2011) Choudhury, T., Campbell, A.: A survey of mobile phone sensing. Communications Magazine. IEEE 48(9):140–150 (2010) CENS/UCLA. Participatory Sensing / Urban Sensing Projects (2020) Hamaguchi, K., Ma, Y., Takada, M., Nishijima, T., Shimura, T.: Telecommunications Systems in Smart Cities. Hitachi Rev. 61, 152–158 (2012) Hwakyung, L., et al.: Performance comparison of DASH7 and ISO/IEC 18000-7 for fast tag collection with an enhanced CSMA/CA protocol (13–15):769–776, (2013) Tabakov, Y.: DASH7 alliance protocol. http://95.85.41.106/wp-content/uploads/2014/08/005Dash7-Alliance-Mode-technical-presentation.pdf Chehri, A., Mouftah, H.: A practical evaluation of ZigBee sensor networks for temperature measurement. In: Zheng, J., Simplot-Ryl, D., Leung, V.C.M. (eds.) Ad hoc networks. ADHOCNETS 2010. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 49. Springer, Berlin (2010) Chehri, A., Saadane, R.: Zigbee-based remote environmental monitoring for smart industrial mining. In: The Fourth International Conference on Smart City Applications. Casablanca, Morocco (2019) Chehri, A., Fortier, P., Tardif, P.M.: UWB-based sensor networks for localization in mining environments. Elsevier Ad Hoc Netw. 7, 987–1000 (2009) Chinese City to Get NFC Smart Meters (2020) Opperman, C., Hancke, G.P.: Using NFC-enabled phones for remote data acquisition and digital control. Proceedings of IEEE AFRICON 1–6, 2001 (2011) Benelli, G.: An automated payment system for car parks based on near field communication technology. In Proceedings of International Conference for Internet Technology and Secured Transactions (ICITST) pp. 1–6. London, UK (2010) Chehri, A., Jeon, G., Choi, B.: Link-quality measurement and reporting in wireless sensor networks. Sensors (Basel, Switzerland) 13, 3066–3076 (2013) Gerhard, P. et al.: The role of advanced sensing in smart cities. In: Advanced Sensor Networks Research Group, Department of Electrical, Electronic and Computer Engineering, University of Pretoria, South Africa (2012) Wischke, M., Masur, M., Kroner, M., Woias, P.: Vibration harvesting in traffic tunnels to power 6wireless sensor nodes. Smart Mater. Struct. 20, 1–8 (2011) Metje, N., Chapman, D.N., Cheneler, D., Ward, M., Thomas, A.M.: Smart Pipes—Instrumented Water Pipes, Can This Be Made a Reality? Sensors 11(8), 7455–7475 (2011)

Smart Water Distribution System Based on IoT Networks …

303

20. Stoianov, I., et al.: PIPENET: A wireless sensor network for pipeline monitoring. In: Proceedings of the 6th International Conference on Information Processing in Sensor Networks (IPSN ‘07), Cambridge, pp. 264–273 (2007) 21. Sadeghioon, A.M., Metje, N., Chapman, D.N., Anthony, C.J.: SmartPipes: smart wireless sensor networks for leak detection in water pipelines. J. Sens. Actuator Netw. 3, 64–78 (2014) 22. Rashid, S., Oaisar, S., Saeed, H., Felemban, E.: A method for distributed pipeline burst and leakage detection in wireless sensor networks using transform analysis. Int. J. Distrib. Sens. Netw. 1–14 (2014) 23. Akyildiz, I.F., Sun. Z., Vuran, M.C.: Signal propagation techniques for wireless underground communication networks. Phy. Commun. J. 2(3), 167–183 (2009) 24. Chehri, A., Mouftah, H.T., Fortier, P., Aniss, H.: Experimental testing of IEEE801.15.4/ZigBee sensor networks in confined area. In: Annual Communication Networks and Services Research Conference, pp. 244–247 (2010) 25. Chatzigeorgiou, D., et al.: Design and evaluation of an in-pipe leak detection sensing technique based on force transduction. Proc. ASME Int. Mech. Eng. Congr. Expo., 489–497 (2012) 26. Chehri, A., Farjow, W., Mouftah, H.T., Fernando, X.: Design of wireless sensor network for mine safety monitoring. In: 24th Canadian Conference on Electrical and Computer Engineering (CCECE) 2011 27. Chehri, A., Mouftah, H.: An efficient clusterhead placement for hybrid sensor networks. In: Nikolaidis, I., Wu, K. (eds.) Ad-hoc, mobile and wireless networks. ADHOC-NOW. Lecture notes in computer science, vol. 6288. Springer, Berlin (2010)

Performance Analysis of Mobile Network Software Testbed Ali Issa, Nadir Hakem, Nahi Kandil, and Abdellah Chehri

Abstract This paper reports a comparative study of the most significant 4G mobile network open-source platforms, namely: OAI and srsLTE. Moreover, this study includes 4G Amarisoft software, one of the promising commercial alternatives. The three alternatives were evaluated in the indoor environment of the university building in order to enhance LTE signal propagation and extend the capacity of preexisting WIFI solution. The carried out experimental measurements are compared and analyzed according to the Quality of Service (QoS) link and to the use of processing and/or computation resources. The results demonstrated that the commercial one outperforms the open-source alternatives, although the performance of the OAI solution appears to be fairly similar to the Amarisoft one.

1 Introduction With the increasing popularity of mobile wireless communication, new services and technologies are established to meet the demands of clients. Thus, improved performance, coverage, and QoS were carried out to satisfy the client’s desire to establish a robust network based on long-term evolution (LTE). This evolution of mobile broadband will continue with the new fifth-generation (5G) mobile network. Over recent years, the emergence of open-source mobile communication software has turned telecommunication industries from proprietary and legacy technologies to open source technologies. Also, open-source mobile has become an integral part of the R&D process and a member in open-source communities. Thus, the latter provides much innovation happening in the space and allows extending service coverage in order to connect underserved rural and remote areas. A. Issa (B) · N. Hakem · N. Kandil Université du Québec en Abitibi-Témiscamingue (UQAT), Val d’Or, QC, Canada e-mail: [email protected] A. Chehri Université du Québec à Chicoutimi (UQAC), Chicoutimi, QC, Canada © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 A. Zimmermann et al. (eds.), Human Centred Intelligent Systems, Smart Innovation, Systems and Technologies 189, https://doi.org/10.1007/978-981-15-5784-2_25

305

306

A. Issa et al.

This innovation has started in 1999 with an open-source telephone switch called Asterisk. Subsequently, several open-source projects have been developed by several companies and research institutes, including both the hardware and software. Software Radio Systems, Fraunhofer FOKUS, Range Networks, Sysmocom, OpenBTS, OpenLTE, and EURECOM are some of the most prominent companies and research institutes who have been developing open-source mobile communication software projects. The combination of open-source mobile communication software with the Software Defined Radio (SDR) provides the potential to achieve a better cellular system, regarding cost, time, and flexibility. The SDR technology made the pledge of a universal and fully programmable wireless communication system. Based on a universal hardware platform, SDR tries to implement various radio functions over software instead of the traditional radio design ideas that are depending on fixed hardware for a special purpose. The Universal Software Radio Peripheral (USRP) provided by Ettus research is the most famous Radio Frequency (RF) frontend [1]. In order to establish a baseband processing platform, the General Purpose Processor (GPP) is selected by most SDR designers to execute functions like a signal generation, coding/decoding, modulation/demodulation, etc. The combination of USRP RF frontend and GNU Radio [2], an open-source baseband processing Platform GPP based, creates the leading SDR platform for research communities to interact with the next generation of telecommunication architecture. Furthermore, both the research community and the commercial sector are highly interested in this platform. Thus, SDR is now recognized as the third revolution following the fixed-to-move, analog-to-digital in the communication [3]. Companies such as Software Radio Systems and EURECOM have developed open-source mobile network projects like srsLTE [4], OpenAirInterface (OAI) [5]. However, other companies are selling commercial products such as LTE 100 provided by Amarisoft [6]. In the articles [3] and [7], authors have carried out a performance analysis comparing the GPP-based SDR systems of the two open-source platforms, OAI, and srsLTE. Furthermore, in [7], a methodology was designed to characterize the performance of these alternatives and to quantify their differences in throughput and resource consumption over a range of practical settings. Indeed, a mobile relay architecture based on two nested levels of LTE networks were evaluated and adapted to support public transport system architecture with standard radio interfaces and off-the-shelf terminals. The Amarisoft software alternative was also the subject of a descriptive study in [8]. Hence, in this paper, the most popular mobile communication alternatives (namely, Amarisoft, OAI, and srsLTE) that provide a full LTE protocol stack were investigated and implemented. An experimental performance test was conducted between GPP SDR open-source platforms and Amarisoft LTE commercial platforms. Also, an evaluation of their adequacy toward 5G experimentation is shown in this work. Moreover, we studied the capability of Amarisoft PCIe SDR and USRP-b210 board to support LTE alternatives. These results provide a reference work given by our experimental comparing results.

Performance Analysis of Mobile Network Software Testbed

307

This testbed aims to study the effectiveness of using a GPP SDR cellular platform in an indoor environment, mainly to enhance LTE signal propagation and extend the quality of preexisting service provider to produce excellent services. Thus, we conduct measurements in a corridor located on the campus of the University of Quebec in Abitibi-Temiscamingue (UQAT) located in Val-d’Or in northwestern of Quebec province in Canada. The rest of this article is organized as follows. The next section presents a brief overview of different LTE studied alternatives. In Sect. 3, we describe our testbed setting up. The results and analysis are shown in Sect. 4. An indoor performance evolution scenario is presented in Sect. 5. Finally, a conclusion and future work are provided.

2 Overview 2.1 OAI and SrsLTE Platforms OAI is a powerful and flexible wireless technology platform based on the 4G ecosystem that contains the entire LTE protocol stack released under the AGPLv3 license, including standard-compliant implementations of the 3GPP LTE access stratum for both evolved node B (eNB) and UE and a subset of the 3GPP LTE Evolved Packet Core (EPC) protocols. OAI can be adopted as an emulation and performance evolution platform for LTE/LTE-A systems [9]. srsLTE is a software for SDR applications that provides a full LTE protocol stack for both srsENB (as LTE eNB) and srsUE (as LTE UE). Thus, this software has a lightweight LTE core network implementation. srsLTE was released under the AGPLv3 license and used a software from the OpenLTE project for some security functions and NAS parsing. Additionally, this software is available under both Open source and commercial licenses. Also, the SRS software suite includes some custom products such as AirScope and the over-the-air LTE analysis toolbox. The current development of the two platforms written in C/C++ language targeting both real-time and non-real-time operations and running on standard Linux-based computing equipment is ranging from a simple PC to a sophisticated cluster or even a GPU workstation. Both platforms support a distributed deployment on an IP local network. For real-world experimentation and validation, OAI has a custom SDR called the Express MiMO2 PCI Express (PCIe) board, while srsLTE includes support for Sidekiq M.2 SDR from Epiq Solutions. The two platforms support NI/Ettus Universal Software Radio Peripheral (USRP) B2x0/ N210/X3x0, LimeSDR, and BladeRF. 1. OAI operation: The implementation of the OAI-eNB application includes two main Portable Operating System Interface (POSIX) threads, eNB_thread_rx, and eNB_thread_tx that

308

A. Issa et al.

Table 1 OAI main PHY features LTE releases

8.6 Compilant and a subset of release 10

Duplexing mode

FDD and TDD

Tx modes

1, 2 (stable), 3, 4, 5, 6, 7 (experimental)

Number of antennas

2

CQI/PMI reporting

Aperiodic, feedback mode 3 − 0 and 3 − 1

Downlink(DL) channels

All

Uplink(UL) channels

All

HARQ support

UL and DL

run using the earliest deadline first scheduling supported in low latency. The main PHY features are summarized in Table 1. Concerning the Core Network (CN), the OAI includes a subset of 3GPP LTE Evolved Packet Core (EPC) components, including the Home Subscriber Server (HSS), Mobility Management Entity (MME), Serving Gateway (S-GW), Packet Data Network Gateway (P-GW), and the Non-Access Stratum (NAS) protocols. A GPRS Tunneling Protocol (GTP) for user plane is required and inserted into P-GW. 2. SrsLTE operation: srsLTE library provides the function to build LTE components, including the implementation of different processing functions and physical channels such as synchronization (Sync), resampling, etc. Moreover, the three low protocol layers (L1, L2, and L3) for eNB and UE have been implemented to the srsLTE library. Several applications examples are also integrated, including pdsch_enodeb, cell-search, and cell-measurement. The main PHY features for srsLTE open-source and commercial platforms are summarized in Table 2. Table 2 srsLTE main PHY features srsLTE open source

srsLTE commercial

LTE releases

8 Compilant, and a subset of release 9

Release 15 compliant

Duplexing mode

FDD

FDD and TDD

Tx modes

1, 2, 3, 4

Number of antennas

2

CQI/PMI reporting

Periodic and aperiodic feedback support

Downlink (DL) channels

All

Uplink (UL) channels

All

HARQ support

UL and DL

Performance Analysis of Mobile Network Software Testbed

309

2.2 Amarisoft Platform Amarisoft is software mobile communication company that developed 4G and newly 5G NR software running on a generic hardware that is consistent with 3GPP specification. Amarisoft provides a full LTE protocol stack and supports intra eNB, S1 interface, and X2 handover protocols. This technology contains all LTE components such as EPC, eNB, NB IoT, LTE-M, evolved Multimedia Broadcast Multicast Services (eMBMS), and IP Multimedia Subsystem (IMS). At the beginning of 2019, the company announced the release of its 5G software. The new 5G portfolio includes gNodeB integrated with the non-standalone (NSA) mode and the sub-6-GHz spectrum. The 5G software will be supporting the standalone (SA) mode later [14]. Amarisoft main PHY is shown in Table 3. Amarisoft LTE Network is a Full LTE 3GPP software solution that functions as integrated components in a vast network or as an SA system. This software runs on a standard Linux x86-based machine and has its PCI Express SDR board. Also, this platform supports LimeSDR, PicoSDR 2x2, USRP N2x0, B2x0, and X3X0.

3 Testbed Deployment and Configuration In this work, we studied the performance of the three SDR platforms, namely OAI, srsLTE, and Amarisoft. In terms of RF boards, OAI-eNB, and srsENB operate with USRP B210 RF board connected using USB 3.0 while Amarisoft eNB uses both Amarisoft PCIe SDR and USRPB210 RF to test performance between the two boards. Therefore, the PC is outfitted with an Intel Core i7-8700 CPU, including 6 cores clocked at 3.20 GHz and 8 GB of DDR4 memory. The PC was running using Ubuntu 16.04 kernel 4.15.0–45 low latency while all power management, CPU scaling, and firewall were disabled. The main system parameter is described in Table 4. Thus, to make sure that the EPC computing load will not affect the performance of the eNB and to host the EPC functionality, all the experiments were conducted using OAI-CN (Developed branch) integrated with an additional computer. The latter was connected to eNB using gigabit Ethernet cable. The experimental testbed included Table 3 Amarisoft main PHY features LTE releases

14 compilant

Duplexing mode

FDD and TDD

Tx modes

1 (single antenna) and 2–10 (Mimo 4×4)

Downlink (DL) channels

All

Uplink (UL) channels

All

HARQ, CSI-RS, PAPR, PRS, and carrier aggregation support Wideband CQI/PMI reporting

310

A. Issa et al.

Table 4 System configuration PCIe SDR

USRP b210

LTE

Parameter

Value

Frequency range

70–6 GHz

Antenna

2 TX * 2 RX (12.3 × 8.3 × 1.5 inch)

Standard connector

PCIe

Frequency range

70–6 GHz

Antenna

2 TX * 2 RX (12.3 × 8.3 ×1.5 inch)

Standard connector

USB 3.0

UHD version

3.12.0

Duplex mode

FDD

Transmission mode

TM1 (SISO)

Carrier frequency

Band 7

Downlink frequency

2660 MHz

Uplink frequency

2540 MHz

Bandwidth

10, 20 MHz

two Commercial UEs with an Android OS for testing. BluDrive smart card reader and GRSIM write application version 3.10 were used to configure the SIM card. In consequence, two omni-directional antennas with a maximum input power of 100 W and a gain of 9 dBi were connected using N-to-SMA cable to the TX and RX of the SDR. We conducted the testbed performance experiments of the three software platforms using versions listed below: • OAI—Develop branch tag 2018.w42. • SrsLTE version (release_18_12). • Amarisoft version: 2019-02-05.

4 Results and Analysis Figure 1 shows the experimental scenario in which the CN and eNB run in separated PCs. The USRP-b210 and Amarisoft PCIe SDR were connected to eNB through USB 3.0 and PCIe slot, respectively. Moreover, each one of the Mobile platforms was integrated into a single M.2 SSD to enable high performance. During the testbed, the two SDRs were fixed approximately 1 m from the UE with no obscured property. The transmitting power of SDRs was set at 90 dB. In this testbed, we transmitted UDP and TCP packets in both uplink (UL) and downlink (DL) direction with bandwidth (10 and 20 MHz) to test the throughput and the Delay Jitter of each mobile platform. Also, a Round Trip Time (RTT) and CPU usage test were measured for each eNB SDR process. Moreover, a performance test between the two SDR RF Amarisoft PCIe SDR and USRP-b210 was done.

Performance Analysis of Mobile Network Software Testbed

311

Fig. 1 Experimental scenario

Finally, the results were obtained after more than 30 repetitions for each experiment test in which we employed the best solutions in the figures below. Thus, before each experiment test of a mobile platform, the whole network was restarted entirely. Also, the connection between LTE equipments was verified.

4.1 RTT Test A time delay test was measured for each cellular software by sending 120 packets from the UE side to the core network using the ICMP protocol with packet size 1436 bytes. RTT is the time scale for the transmission of one packet to a specific destination plus the time scale of an acknowledgment packet to be received by the source. The time delay includes the procedure of the time scale between UE and CN. We observed that the Amarisoft and OAI platforms had similar results with a minimum RTT delay of 20 ms and a maximum RTT of 43,36 ms. The average RTT was 30 ms for OAI and 32 ms for Amarisoft. In the case of srsLTE, the latency was higher than the two other platforms with a 27 ms, 51 ms, and 36 ms as, respectively, the minimum, the maximum and the average RTT delay. Finally, the standard deviation (mdev) confirms that the OAI platform has the most constant packet transmission (mdev = 3.7), while Amarisoft mdev is 3.8 and srsLTE mdev was 5.4.

4.2 CPU Usage In this section, the CPU usage was analyzed for wireless software running at the eNB PC, for both bandwidths (10 and 20 MHz). In order to conduct this test, “gnome system monitoring” Linux software to display system information such as (CPU usage, memory usage, network rates, etc.) was used at the eNB side, while at the UE side RFBENCHMARK was utilized. The latter is an application capable of detecting existing mobile networks in a geographical area and indicates the network in which the cell phone has been

312

A. Issa et al.

connected. Moreover, this application has many options to test networks. Accordingly, we used the performance speed test and HD video streaming test to study the CPU usage. This experience aimed to evaluate the difference between cellular platforms in terms of consumed resources. The Amarisoft eNB CPU resource consumptions (20% for BW 10 MHz and 40% for BW 20 MHz) are less than the two other platforms. Furthermore, the results off the OAI-eNB platform were 35% for BW 10 MHz and 50% for BW, while for the srsLTE open-source Platform, we had 40% for BW 10 MHz and 60% for BW 20 MHz. These results confirm that there is a vast difference in terms of resource consumptions between Amarisoft, OAI-eNB, and srsLTE open source. In conclusion, Amarisoft and OAI are more capable of handling the 5G scenarios.

4.3 Delay Jitter and Throughput Measurments In order to make the measurement of jitters and useful bitrate, we used Iperf, a network test tool that can generate a UDP and TCP packet streams. Hence in UL direction, at the UE side, Iperf was used as a client and data generator to the CN server side. Also, we used the TCP protocol for throughput measuring and UDP protocol to obtain the Delay jitter. In order to test the DL performance for each mobile platform, we generated throughput traffic from the CN (client side) to the UE (server side). As shown in Fig. 2, the test was conducted during the 60 s, and the bandwidth was set at 10 MHz in UL direction for each cellular platform. Therefore, the results showed that we got the most stable delay jitter when Amarisoft eNB was used. Also, we got an unstable delay jitter by srsLTE. As for OAI, it had a more stable delay jitter

Fig. 2 Delay jitters for cellular platforms (BW 10 MHz)

Performance Analysis of Mobile Network Software Testbed

313

than the srsLTE open-source platform. Finally, delay jitters can lead to the network QoS and Voice over IP (VoIP) quality. Next, we measured the throughput for each of the cellular platforms with 10- and 20-MHz bandwidth in UL/DL directions during a time interval of 60 s. The Iperf and TCP protocol were used. In Figs. 3 and 4, for both 10 and 20 MHz bandwidths in UL, results showed clearly that Amarisoft has a higher data rate with an average of 19.5 Mb/s and 44.6 Mb/s, respectively, as compared to the other platforms. OAI-eNB has a stable throughput for 10 and 20 MHz bandwidths with an average 18 Mb/s and 28 Mb/s, respectively.

Fig. 3 Throughput for cellular platforms in UL direction (BW 10 MHz)

Fig. 4 Throughput for cellular platforms in UL direction (BW 20 MHz)

314

A. Issa et al.

Finally, srsLTE showed an unbalanced throughput in terms of bandwidth 20 MHz, while the latter showed a stable bitrate in the case of 10 MHz. As shown in Figs. 5 and 6, OAI and Amarisoft have almost the same results in terms of DL rates for both BW. We note that the latter was more stable than the OAI platform. Therefore, for BW 10 and 20 MHz, OAI had an average data rate (33.1 Mb/s, 68.6 Mb/s) and for Amarisoft (34.6 Mb/s, 70.7 Mb/s).

Fig. 5 Throughput for cellular platforms in DL direction (BW 10 MHz)

Fig. 6 Throughput for cellular platforms in DL direction (BW 20 MHz)

Performance Analysis of Mobile Network Software Testbed

315

srsLTE Platform had a stable throughput 30 Mb/s using BW 10 MHz, while for BW 20 MHz, the data rate varied between 42 and 55 Mb/s, with an average of 48 Mb/s.

4.4 SDR RF Performance Test In this section, we measured the throughput for both SDR USRP-b210 and Amarisoft PCIe SDR using Amarisoft Software. The experiment was conducted with BW set at 10 and 20 MHz during a time interval of 60 s for UL/DL directions. As shown in Fig. 7, and for both BW, Amarisoft PCIe SDR displayed a more stable and higher throughput as compared to USRP-b210. As for USRP-b210, we obtained a stable throughput in terms of BW = 10 MHz, while the latter was semistable using BW = 20 MHz. Lastly, we found that for both SDRs, the CPU usage remained the same in terms of consumed resources. Figure 8 confirms that in terms of the DL direction test, the type of SDR used had utterly no impact on the throughput performance.

5 Indoor Performance Evaluation The indoor performance evaluation is conducted in a corridor located on the first floor of the UQAT university Val-d’Or campus. The corridor length is over 70 m with a height and width of around 3 m approximately. However, the base stations around

Fig. 7 Throughput performance obtained for USRP b210 and Amarisoft SDR RF using Amarisoft platform and BW 10 and 20 MHz in UL direction

316

A. Issa et al.

Fig. 8 Throughput performance obtained for USRP b210 and Amarisoft SDR RF using Amarisoft platform for BW 10 and 20 MHz in DL direction

Fig. 9 Photography of university corridor

the campus are still working on LTE, and we expect that the signal interference of other cells would impact the results of our study. Finally, the photography of the university corridor is shown in Fig. 9.

5.1 Setup Scenario In terms of the Cellular Platform, we used the configuration and deployment, as mentioned in the Sect. 3 of this article. Thus, Fig. 10 shows the map of the indoor gallery with the LTE measurement setup. We take our measurement at different distances 15, 30, 45, and 50 m from the fixed eNB with no obscured property in a line of sight (LOS) propagation. Furthermore, the antennas have been placed at 1.5 m from the floor.

Performance Analysis of Mobile Network Software Testbed

317

Fig. 10 Map of the indoor gallery with the LTE measurement setup

5.2 Coverage and Radio Signal Analysis In this measurement campaign, coverage and signal analysis of the LTE mobile platform was presented using the Received Signal Strength Indicator (RSSI), Signal-toNoise Ratio (SNR), path-loss exponent n, and standard deviation. The LTE platform log files info and some smartphone (UE) applications [15], were used to extract the signal parameters. Also, Samsung service mode is used by typing *#0011# to check the RF status. Due to the lack of space, we will just present the coverage results and some signal parameters. Figure 11 presents the coverage results in terms of RSSI using a radio signal heat map. However, hot colors describe strong received signal strength, and cold ones indicate that the signal is weak. Table 5 describes the LTE signal quality with reference to RSSI and RSRP parameters. We can observe that in the indoor scenario, the RSSI results are between −72 and −88 dBm. Thus, the main indoor signal parameters are shown in Table 6.

Fig. 11 Coverage of LTE mobile platform for the indoor scenario in terms of RSSI

Table 5 LTE signal quality Signal quality

RSSI (dBm)

RSRP (dBm)

Excellent

>−65

>−84

Good

−65 to −75

−85 to −102

Fair

−75 to −85

−103 to −111

Poor