331 54 60MB
English Pages XXIV, 842 [854] Year 2020
Advances in Intelligent Systems and Computing 1160
Álvaro Rocha · Hojjat Adeli · Luís Paulo Reis · Sandra Costanzo · Irena Orovic · Fernando Moreira Editors
Trends and Innovations in Information Systems and Technologies Volume 2
Advances in Intelligent Systems and Computing Volume 1160
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Álvaro Rocha Hojjat Adeli Luís Paulo Reis Sandra Costanzo Irena Orovic Fernando Moreira •
•
•
•
•
Editors
Trends and Innovations in Information Systems and Technologies Volume 2
123
Editors Álvaro Rocha Departamento de Engenharia Informática Universidade de Coimbra Coimbra, Portugal
Hojjat Adeli College of Engineering The Ohio State University Columbus, OH, USA
Luís Paulo Reis FEUP Universidade do Porto Porto, Portugal
Sandra Costanzo DIMES Università della Calabria Arcavacata di Rende, Italy
Irena Orovic Faculty of Electrical Engineering University of Montenegro Podgorica, Montenegro
Fernando Moreira Universidade Portucalense Porto, Portugal
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-45690-0 ISBN 978-3-030-45691-7 (eBook) https://doi.org/10.1007/978-3-030-45691-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This book contains a selection of papers accepted for presentation and discussion at the 2020 World Conference on Information Systems and Technologies (WorldCIST’20). This conference had the support of the IEEE Systems, Man, and Cybernetics Society (IEEE SMC), Iberian Association for Information Systems and Technologies/Associação Ibérica de Sistemas e Tecnologias de Informação (AISTI), Global Institute for IT Management (GIIM), University of Montengero, Mediterranean University and Faculty for Business in Tourism of Budva. It took place at Budva, Montenegro, during 7–10 April 2020. The World Conference on Information Systems and Technologies (WorldCIST) is a global forum for researchers and practitioners to present and discuss recent results and innovations, current trends, professional experiences and challenges of modern information systems and technologies research, technological development and applications. One of its main aims is to strengthen the drive towards a holistic symbiosis between academy, society and industry. WorldCIST’20 built on the successes of WorldCIST’13 held at Olhão, Algarve, Portugal; WorldCIST’14 held at Funchal, Madeira, Portugal; WorldCIST’15 held at São Miguel, Azores, Portugal; WorldCIST’16 held at Recife, Pernambuco, Brazil; WorldCIST’17 held at Porto Santo, Madeira, Portugal; WorldCIST’18 held at Naples, Italy and WorldCIST’19 which took place at La Toja, Spain. The program committee of WorldCIST’20 was composed of a multidisciplinary group of almost 300 experts and those who are intimately concerned with information systems and technologies. They have had the responsibility for evaluating, in a ‘blind review’ process, the papers received for each of the main themes proposed for the conference: (A) Information and Knowledge Management; (B) Organizational Models and Information Systems; (C) Software and Systems Modelling; (D) Software Systems, Architectures, Applications and Tools; (E) Multimedia Systems and Applications; (F) Computer Networks, Mobility and Pervasive Systems; (G) Intelligent and Decision Support Systems; (H) Big Data Analytics and Applications; (I) Human–Computer Interaction; (J) Ethics, Computers and Security; (K) Health Informatics; (L) Information Technologies in
v
vi
Preface
Education; (M) Information Technologies in Radiocommunications; (N) Technologies for Biomedical Applications. The conference also included workshop sessions taking place in parallel with the conference ones. Workshop sessions covered themes such as (i) Innovative Technologies Applied to Rural; (ii) Network Modelling, Learning and Analysis; (iii) Intelligent Systems and Machines; (iv) Healthcare Information Systems Interoperability, Security and Efficiency; (v) Applied Statistics and Data Analysis using Computer Science; (vi) Cybersecurity for Smart Cities Development; (vii) Education through ICT; (viii) Unlocking the Artificial Intelligence Interplay with Business Innovation (ix) and Pervasive Information Systems. WorldCIST’20 received about 400 contributions from 57 countries around the world. The papers accepted for presentation and discussion at the conference are published by Springer (this book) in three volumes and will be submitted for indexing by ISI, EI-Compendex, SCOPUS, DBLP and/or Google Scholar, among others. Extended versions of selected best papers will be published in special or regular issues of relevant journals, mainly SCI/SSCI and Scopus/EI-Compendex indexed journals. We acknowledge all of those that contributed to the staging of WorldCIST’20 (authors, committees, workshop organizers and sponsors). We deeply appreciate their involvement and support that were crucial for the success of WorldCIST’20. April 2020
Álvaro Rocha Hojjat Adeli Luís Paulo Reis Sandra Costanzo Irena Orovic Fernando Moreira
Organization
Conference General Chair Álvaro Rocha
University of Coimbra, Portugal
Co-chairs Hojjat Adeli Luis Paulo Reis Sandra Costanzo
The Ohio State University, USA University of Porto, Portugal University of Calabria, Italy
Local Organizing Committee Irena Orovic (Chair) Milos Dakovic Andjela Draganic Milos Brajovic Snezana Scepanvic Rade Ratkovic
University of Montenegro, Montenegro University of Montenegro, Montenegro University of Montenegro, Montenegro University of Montenegro, Montenegro Mediterranean University, Montenegro Faculty of Business and Tourism, Montenegro
Advisory Committee Ana Maria Correia (Chair) Benjamin Lev Chatura Ranaweera Chris Kimble Erik Bohlin Eva Onaindia Gintautas Dzemyda
University of Sheffield, UK Drexel University, USA Wilfrid Laurier University, Canada KEDGE Business School and MRM, UM2, Montpellier, France Chalmers University of Technology, Sweden Polytechnical University of Valencia, Spain Vilnius University, Lithuania
vii
viii
Janusz Kacprzyk Jason Whalley João Tavares Jon Hall Justin Zhang Karl Stroetmann Kathleen Carley Keng Siau Manlio Del Giudice Michael Koenig Miguel-Angel Sicilia Reza Langari Vedat Verter Vishanth Weerakkody Wim Van Grembergen
Organization
Polish Academy of Sciences, Poland Northumbria University, UK University of Porto, Portugal The Open University, UK University of North Florida, USA Empirica Communication and Technology Research, Germany Carnegie Mellon University, USA Missouri University of Science and Technology, USA University of Rome Link Campus, Italy Long Island University, USA University of Alcalá, Spain Texas A&M University, USA McGill University, Canada Bradford University, UK University of Antwerp, Belgium
Program Committee Abdul Rauf Adnan Mahmood Adriana Peña Pérez Negrón Adriani Besimi Agostinho Sousa Pinto Ahmed El Oualkadi Ahmed Rafea Alberto Freitas Aleksandra Labus Alexandru Vulpe Ali Idri Amélia Badica Amélia Cristina Ferreira Silva Almir Souza Silva Neto Amit Shelef Ana Isabel Martins Ana Luis Anabela Tereso Anacleto Correia Anca Alexandra Purcarea Andjela Draganic Aneta Polewko-Klim Aneta Poniszewska-Maranda Angeles Quezada
RISE SICS, Sweden Waterford Institute of Technology, Ireland Universidad de Guadalajara, Mexico South East European University, Macedonia Polytechnic of Porto, Portugal Abdelmalek Essaadi University, Morocco American University in Cairo, Egypt FMUP, University of Porto, Portugal University of Belgrade, Serbia University Politehnica of Bucharest, Romania ENSIAS, University Mohammed V, Morocco Universti of Craiova, Romania Polytechnic of Porto, Portugal IFMA, Brazil Sapir Academic College, Israel University of Aveiro, Portugal University of Coimbra, Portugal University of Minho, Portugal CINAV, Portugal University Politehnica of Bucharest, Romania University of Montenegro, Montenegro University of Białystok, Institute of Informatics, Poland Lodz University of Technology, Poland Instituto Tecnologico de Tijuana, Mexico
Organization
Anis Tissaoui Ankur Singh Bist Ann Svensson Antoni Oliver Antonio Jiménez-Martín Antonio Pereira Armando Toda Arslan Enikeev Benedita Malheiro Boris Shishkov Borja Bordel Branko Perisic Bruno Veloso Carla Pinto Carla Santos Pereira Catarina Reis Cengiz Acarturk Cesar Collazos Christophe Feltus Christophe Soares Christos Bouras Christos Chrysoulas Christos Troussas Ciro Martins Claudio Sapateiro Costin Badica Cristian García Bauza Cristian Mateos Daria Bylieva Dante Carrizo Dayana Spagnuelo Dušan Barać Edita Butrime Edna Dias Canedo Eduardo Santos Egils Ginters Ekaterina Isaeva Elena Mikhailova Eliana Leite Erik Fernando Mendez Garcea Eriks Sneiders Esteban Castellanos
ix
University of Jendouba, Tunisia KIET, India University West, Sweden University of the Balearic Islands, Spain Universidad Politécnica de Madrid, Spain Polytechnic of Leiria, Portugal University of São Paulo, Brazil Kazan Federal University, Russia Polytechnic of Porto, ISEP, Portugal ULSIT/IMI-BAS/IICREST, Bulgaria Universidad Politécnica de Madrid, Spain Faculty of Technical Sciences, Serbia INESC TEC, Portugal Polytechnic of Porto, ISEP, Portugal Universidade Portucalense, Portugal Polytechnic of Leiria, Portugal Middle East Technical University, Turkey Universidad del Cauca, Colombia LIST, Luxembourg University Fernando Pessoa, Portugal University of Patras, Greece London South Bank University, UK University of Piraeus, Greece University of Aveiro, Portugal Polytechnic of Setúbal, Portugal University of Craiova, Romania PLADEMA-UNICEN-CONICET, Argentina ISISTAN-CONICET, UNICEN, Argentina Peter the Great St.Petersburg Polytechnic University, Russia Universidad de Atacama, Chile Vrije Universiteit Amsterdam, Netherlands University of Belgrade, Serbia Lithuanian University of Health Sciences, Lithuania University of Brasilia, Brazil Pontifical Catholic University of Paraná, Brazil Riga Technical University, Latvia Perm State University, Russia ITMO University, Russia University of Minho, Portugal Autonomous Regional University of the Andes, Ecuador Stockholm University, Sweden ESPE, Ecuador
x
Faisal Musa Abbas Fatima Azzahra Amazal Fernando Almeida Fernando Bobillo Fernando Molina-Granja Fernando Moreira Fernando Ribeiro Filipe Caldeira Filipe Portela Filipe Sá Filippo Neri Firat Bestepe Francesco Bianconi Francisco García-Peñalvo Francisco Valverde Galim Vakhitov Gayo Diallo George Suciu Gheorghe Sebestyen Ghani Albaali Gian Piero Zarri Giuseppe Di Massa Gonçalo Paiva Dias Goreti Marreiros Graciela Lara López Habiba Drias Hafed Zarzour Hamid Alasadi Hatem Ben Sta Hector Fernando Gomez Alvarado Hélder Gomes Helia Guerra Henrique da Mota Silveira Henrique S. Mamede Hing Kai Chan Hugo Paredes Ibtissam Abnane Igor Aguilar Alonso
Organization
Abubakar Tafawa Balewa University Bauchi, Nigeria Ibn Zohr University, Morocco INESC TEC and University of Porto, Portugal University of Zaragoza, Spain National University of Chimborazo, Ecuador Portucalense University, Portugal Polytechnic Castelo Branco, Portugal Polytechnic of Viseu, Portugal University of Minho, Portugal Polytechnic of Viseu, Portugal University of Naples, Italy Republic of Turkey Ministry of Development, Turkey Università degli Studi di Perugia, Italy University of Salamanca, Spain Universidad Central del Ecuador, Ecuador Kazan Federal University, Russia University of Bordeaux, France BEIA Consult International, Romania Technical University of Cluj-Napoca, Romania Princess Sumaya University for Technology, Jordan University Paris-Sorbonne, France University of Calabria, Italy University of Aveiro, Portugal ISEP/GECAD, Portugal University of Guadalajara, Mexico University of Science and Technology Houari Boumediene, Algeria University of Souk Ahras, Algeria Basra University, Iraq University of Tunis at El Manar, Tunisia Universidad Tecnica de Ambato, Ecuador University of Aveiro, Portugal University of the Azores, Portugal University of Campinas (UNICAMP), Brazil University Aberta, Portugal University of Nottingham Ningbo China, China INESC TEC and University of Trás-os-Montes e Alto Douro, Portugal Mohamed V University in Rabat, Morocco Universidad Nacional Tecnológica de Lima Sur, Peru
Organization
Imen Ben Said Inês Domingues Isabel Lopes Isabel Pedrosa Isaías Martins Issam Moghrabi Ivan Dunđer Ivan Lukovic Jaime Diaz Jan Kubicek Jean Robert Kala Kamdjoug Jesús Gallardo Casero Jezreel Mejia Jikai Li Jinzhi Lu Joao Carlos Silva João Manuel R. S. Tavares João Paulo Pereira João Reis João Reis João Rodrigues João Vidal Carvalho Joaquin Nicolas Ros Jorge Barbosa Jorge Buele Jorge Esparteiro Garcia Jorge Gomes Jorge Oliveira e Sá José Álvarez-García José Braga de Vasconcelos Jose Luis Herrero Agustin José Luís Reis Jose Luis Sierra Jose M. Parente de Oliveira José Machado José Paulo Lousado Jose Torres José-Luís Pereira Juan M. Santos Juan Manuel Carrillo de Gea Juan Pablo Damato Juncal Gutiérrez-Artacho Kalinka Kaloyanova
xi
Université de Sfax, Tunisia University of Coimbra, Portugal Polytechnic of Bragança, Portugal Coimbra Business School ISCAC, Portugal University of Leon, Spain Gulf University for Science and Technology, Kuwait University of Zabreb, Croatia University of Novi Sad, Serbia University of La Frontera, Chile Technical University of Ostrava, Czech Republic Catholic University of Central Africa, Cameroon University of Zaragoza, Spain CIMAT, Unidad Zacatecas, Mexico The College of New Jersey, USA KTH Royal Institute of Technology, Sweden IPCA, Portugal University of Porto, FEUP, Portugal Polytechnic of Bragança, Portugal University of Aveiro, Portugal University of Lisbon, Portugal University of the Algarve, Portugal Polytechnic of Coimbra, Portugal University of Murcia, Spain Polytechnic of Coimbra, Portugal Technical University of Ambato, Ecuador Polytechnic Institute of Viana do Castelo, Portugal University of Lisbon, Portugal University of Minho, Portugal University of Extremadura, Spain Universidade New Atlântica, Portugal University of Extremadura, Spain ISMAI, Portugal Complutense University of Madrid, Spain Aeronautics Institute of Technology, Brazil University of Minho, Portugal Polytechnic of Viseu, Portugal Universidty Fernando Pessoa, Portugal Universidade do Minho, Portugal University of Vigo, Spain University of Murcia, Spain UNCPBA-CONICET, Argentina University of Granada, Spain Sofia University, Bulgaria
xii
Kamel Rouibah Khalid Benali Korhan Gunel Krzysztof Wolk Kuan Yew Wong Laila Cheikhi Laura Varela-Candamio Laurentiu Boicescu Leonardo Botega Leonid Leonidovich Khoroshko Lia-Anca Hangan Lila Rao-Graham Łukasz Tomczyk Luis Alvarez Sabucedo Luis Cavique Luis Gouveia Luis Mendes Gomes Luis Silva Rodrigues Luiz Rafael Andrade Luz Sussy Bayona Oré Maksim Goman Manal el Bajta Manuel Antonio Fernández-Villacañas Marín Manuel Silva Manuel Tupia Manuel Au-Yong-Oliveira Marciele Bernardes Marco Bernardo Marco Ronchetti Mareca María PIlar Marek Kvet María de la Cruz del Río-Rama Maria João Ferreira Maria João Varanda Pereira Maria José Angélico Maria José Sousa María Teresa García-Álvarez Mariam Bachiri
Organization
Kuwait University, Kuwait LORIA University of Lorraine, France Adnan Menderes University, Turkey Polish-Japanese Academy of Information Technology, Poland Universiti Teknologi Malaysia (UTM), Malaysia University Mohammed V, Rabat, Morocco Universidade da Coruña, Spain E.T.T.I. U.P.B., Romania University Centre Eurípides of Marília (UNIVEM), Brazil Moscow Aviation Institute (National Research University), Russia Technical University of Cluj-Napoca, Romania University of the West Indies, Jamaica Pedagogical University of Cracow, Poland University of Vigo, Spain University Aberta, Portugal University Fernando Pessoa, Portugal University of the Azores, Portugal Polythencic of Porto, Portugal Tiradentes University, Brazil Universidad Nacional Mayor de San Marcos, Peru JKU, Austria ENSIAS, Morocco Technical University of Madrid, Spain
Polytechnic of Porto and INESC TEC, Portugal Pontifical Catholic University of Peru, Peru University of Aveiro, Portugal University of Minho, Brazil Polytechnic of Viseu, Portugal Universita’ di Trento, Italy Universidad Politécnica de Madrid, Spain Zilinska Univerzita v Ziline, Slovakia University of Vigo, Spain Universidade Portucalense, Portugal Polytechnic of Bragança, Portugal Polytechnic of Porto, Portugal University of Coimbra, Portugal University of A Coruna, Spain ENSIAS, Morocco
Organization
Marijana Despotovic-Zrakic Mário Antunes Marisa Maximiano Marisol Garcia-Valls Maristela Holanda Marius Vochin Marlene Goncalves da Silva Maroi Agrebi Martin Henkel Martín López Nores Martin Zelm Mawloud Mosbah Michal Adamczak Michal Kvet Miguel António Sovierzoski Mihai Lungu Mircea Georgescu Mirna Muñoz Mohamed Hosni Monica Leba Mu-Song Chen Natalia Grafeeva Natalia Miloslavskaya Naveed Ahmed Neeraj Gupta Nelson Rocha Nikolai Prokopyev Niranjan S. K. Noemi Emanuela Cazzaniga Noureddine Kerzazi Nuno Melão Nuno Octávio Fernandes Olimpiu Stoicuta Patricia Zachman Patrick C-H. Soh Paula Alexandra Rego Paulo Maio Paulo Novais
xiii
Faculty Organizational Science, Serbia Polytechnic of Leiria and CRACS INESC TEC, Portugal Polytechnic Institute of Leiria, Portugal Polytechnic University of Valencia, Spain University of Brasilia, Brazil E.T.T.I. U.P.B., Romania Universidad Simón Bolívar, Venezuela University of Polytechnique Hauts-de-France, France Stockholm University, Sweden University of Vigo, Spain INTEROP-VLab, Belgium University 20 Août 1955 of Skikda, Algeria Poznan School of Logistics, Poland University of Zilina, Slovakia Federal University of Technology - Paraná, Brazil University of Craiova, Romania Al. I. Cuza University of Iasi, Romania Centro de Investigación en Matemáticas A.C., Mexico ENSIAS, Morocco University of Petrosani, Romania Da-Yeh University, China Saint Petersburg University, Russia National Research Nuclear University MEPhI, Russia University of Sharjah, United Arab Emirates KIET Group of Institutions Ghaziabad, India University of Aveiro, Portugal Kazan Federal University, Russia JSS Science and Technology University, India Politecnico di Milano, Italy Polytechnique Montréal, Canada Polytechnic of Viseu, Portugal Polytechnic of Castelo Branco, Portugal University of Petrosani, Romania Universidad Nacional del Chaco Austral, Argentina Multimedia University, Malaysia Polytechnic of Viana do Castelo and LIACC, Portugal Polytechnic of Porto, ISEP, Portugal University of Minho, Portugal
xiv
Paulvanna Nayaki Marimuthu Paweł Karczmarek Pedro Rangel Henriques Pedro Sobral Pedro Sousa Philipp Brune Piotr Kulczycki Prabhat Mahanti Rabia Azzi Radu-Emil Precup Rafael Caldeirinha Rafael M. Luque Baena Rahim Rahmani Raiani Ali Ramayah T. Ramiro Gonçalves Ramon Alcarria Ramon Fabregat Gesa Renata Maria Maracho Reyes Juárez Ramírez Rui Jose Rui Pitarma Rui S. Moreira Rustam Burnashev Saeed Salah Said Achchab Sajid Anwar Sami Habib Samuel Sepulveda Sanaz Kavianpour Sandra Patricia Cano Mazuera Savo Tomovic Sassi Sassi Seppo Sirkemaa Sergio Albiol-Pérez Shahed Mohammadi Shahnawaz Talpur
Organization
Kuwait University, Kuwait The John Paul II Catholic University of Lublin, Poland University of Minho, Portugal University Fernando Pessoa, Portugal University of Minho, Portugal Neu-Ulm University of Applied Sciences, Germany Systems Research Institute, Polish Academy of Sciences, Poland University of New Brunswick, Canada Bordeaux University, France Politehnica University of Timisoara, Romania Polytechnic of Leiria, Portugal University of Malaga, Spain University Stockholm, Sweden Hamad Bin Khalifa University, Qatar Universiti Sains Malaysia, Malaysia University of Trás-os-Montes e Alto Douro & INESC TEC, Portugal Universidad Politécnica de Madrid, Spain University of Girona, Spain Federal University of Minas Gerais, Brazil Universidad Autonoma de Baja California, Mexico University of Minho, Portugal Polytechnic Institute of Guarda, Portugal UFP & INESC TEC & LIACC, Portugal Kazan Federal University, Russia Al-Quds University, Palestine Mohammed V University in Rabat, Morocco Institute of Management Sciences Peshawar, Pakistan Kuwait University, Kuwait University of La Frontera, Chile University of Technology, Malaysia University of San Buenaventura Cali, Colombia University of Montenegro, Montenegro FSJEGJ, Tunisia University of Turku, Finland University of Zaragoza, Spain Ayandegan University, Iran Mehran University of Engineering & Technology Jamshoro, Pakistan
Organization
Silviu Vert Simona Mirela Riurean Slawomir Zolkiewski Solange N. Alves-Souza Solange Rito Lima Sonia Sobral Sorin Zoican Souraya Hamida Sümeyya Ilkin Syed Nasirin Taoufik Rachad Tatiana Antipova Teresa Guarda Tero Kokkonen The Thanh Van Thomas Weber Timothy Asiedu Tom Sander Tomaž Klobučar Toshihiko Kato Tzung-Pei Hong Valentina Colla Veronica Segarra Faggioni Victor Alves Victor Georgiev Victor Kaptelinin Vincenza Carchiolo Vitalyi Igorevich Talanin Wafa Mefteh Wolf Zimmermann Yadira Quiñonez Yair Wiseman Yuhua Li Yuwei Lin Yves Rybarczyk Zorica Bogdanovic
xv
Politehnica University of Timisoara, Romania University of Petrosani, Romania Silesian University of Technology, Poland University of São Paulo, Brazil University of Minho, Portugal Portucalense University, Portugal Polytechnic University of Bucharest, Romania Batna 2 University, Algeria Kocaeli University, Turkey Universiti Malaysia Sabah, Malaysia University Mohamed V, Morocco Institute of Certified Specialists, Russia University Estatal Peninsula de Santa Elena, Ecuador JAMK University of Applied Sciences, Finland HCMC University of Food Industry, Vietnam EPFL, Switzerland TIM Technology Services Ltd., Ghana New College of Humanities, Germany Jozef Stefan Institute, Slovenia University of Electro-Communications, Japan National University of Kaohsiung, Taiwan Scuola Superiore Sant’Anna, Italy Private Technical University of Loja, Ecuador University of Minho, Portugal Kazan Federal University, Russia Umeå University, Sweden University of Catania, Italy Zaporozhye Institute of Economics and Information Technologies, Ukraine Tunisia Martin Luther University Halle-Wittenberg, Germany Autonomous University of Sinaloa, Mexico Bar-Ilan University, Israel Cardiff University, UK University of Roehampton, UK Dalarna University, Sweden University of Belgrade, Serbia
Contents
Software Systems, Architectures, Applications and Tools Single Producer – Multiple Consumers Ring Buffer Data Distribution System with Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mariusz Orlikowski Exploring the Innovative Aspects of CV Distributed Ledgers Based on Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. O. Silva, Rui Humberto Pereira, Maria José Angélico Gonçalves, Amélia Ferreira da Silva, and Manuel Silva Characterizing the Cost of Introducing Secure Programming Patterns and Practices in Ethereum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aboua Ange Kevin N’Da, Santiago Matalonga, and Keshav Dahal
3
14
25
A Survey of AI Accelerators for Edge Environment . . . . . . . . . . . . . . . Wenbin Li and Matthieu Liewig
35
Building a Virtualized Environment for Programming Courses . . . . . . . Tuisku Polvinen, Timo Ylikännö, Ari Mäkeläinen, Sampsa Rauti, Jari-Matti Mäkelä, and Jani Tammi
45
Data Extraction System for Hot Bike-Sharing Spots in an Intermediate City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Walter Remache, Andrés Heredia, and Gabriel Barros-Gavilanes Software Tools for Airport Pavement Design . . . . . . . . . . . . . . . . . . . . . Tiago Tamagusko and Adelino Ferreira Design of a Gravity Compensation for Robot Control Based on Low-Cost Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dalia Alvarez-Montenegro, Juan Escobar-Naranjo, Geovanni D. Brito, Carlos A. Garcia, and Marcelo V. Garcia
56 66
77
xvii
xviii
Contents
Electronic Payment Fraud Detection Using Supervised and Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lilian Pires Pracidelli and Fabio Silva Lopes
88
Fog Computing in Real Time Resource Limited IoT Environments . . . . 102 Pedro Costa, Bruno Gomes, Nilsa Melo, Rafael Rodrigues, Célio Carvalho, Karim Karmali, Salim Karmali, Christophe Soares, José M. Torres, Pedro Sobral, and Rui S. Moreira A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Bruno Gomes, Nilsa Melo, Rafael Rodrigues, Pedro Costa, Célio Carvalho, Karim Karmali, Salim Karmali, Christophe Soares, José M. Torres, Pedro Sobral, and Rui S. Moreira Supervising Industrial Distributed Processes Through Soft Models, Deformation Metrics and Temporal Logic Rules . . . . . . . . . . . . . . . . . . 125 Borja Bordel, Ramón Alcarria, and Tomás Robles Frameworks to Develop Secure Mobile Applications: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Jezreel Mejía, Perla Maciel, Mirna Muñoz, and Yadira Quiñonez Social Media: People’s Salvation or Their Perdition? . . . . . . . . . . . . . . 147 Yúmina Zêdo, João Costa, Viviana Andrade, and Manuel Au-Yong-Oliveira Artificial Intelligence Applied to Digital Marketing . . . . . . . . . . . . . . . . 158 Tiago Ribeiro and José Luís Reis Fact-Check Spreading Behavior in Twitter: A Qualitative Profile for False-Claim News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Francisco S. Marcondes, José João Almeida, Dalila Durães, and Paulo Novais Artefact of Augmented Reality to Support the Treatment of Specific Phobias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Raul Vilas Boas, Lázaro Lima, Greice Zanini, and Pedro Rangel Henriques Social Care Services for Older Adults: Paper Registration Versus a Web-Based Platform Registration . . . . . . . . . . . . . . . . . . . . . . 188 Ana Isabel Martins, Hilma Caravau, Ana Filipa Rosa, Ana Filipa Almeida, and Nelson Pacheco Rocha Enabling Green Building’s Comfort Using Information and Communication Technologies: A Systematic Review of the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Ana Isabel Martins, Ana Carolina Oliveira Lima, Paulo Bartolomeu, Lucilene Ferreira Mouzinho, Joaquim Ferreira, and Nelson Pacheco Rocha
Contents
xix
eFish – An Innovative Fresh Fish Evaluation System . . . . . . . . . . . . . . . 209 Renato Sousa Pinto, Manuel Au-Yong-Oliveira, Rui Ferreira, Osvaldo Rocha Pacheco, and Rui Miranda Rocha A Virtual Reality Approach to Automatic Blood Sample Generation . . . 221 Jaime Díaz, Jeferson Arango-López, Samuel Sepúlveda, Danay Ahumada, Fernando Moreira, and Joaquin Gebauer Building Information Modeling Academic Assessment . . . . . . . . . . . . . . 231 Miguel Ángel Pérez Sandoval, Isidro Navarro Delgado, and Georgina Sandoval Radar System for the Reconstruction of 3D Objects: A Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Jeneffer Barberán, Ginna Obregón, Darwin Moreta, Manuel Ayala, Javier Obregón, Rodrigo Domínguez, and Jorge Luis Buele Multimedia Systems and Applications Achieving Stronger Compaction for DCT-Based Steganography: A Region-Growing Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Mohammed Baziyad, Tamer Rabie, and Ibrahim Kamel Teaching Computer Programming as Well-Defined Domain for Beginners with Protoboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Carlos Hurtado, Guillermo Licea, Mario García-Valdez, Angeles Quezada, and Manuel Castañón-Puga Computer Networks, Mobility and Pervasive Systems Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Raúl Lozada-Yánez, Fernando Molina-Granja, Pablo Lozada-Yánez, and Jonny Guaiña-Yungan Underground Channel Model for Visible Light Wireless Communication Based on Neural Networks . . . . . . . . . . . . . . . . . . . . . . 293 Simona Riurean, Olimpiu Stoicuta, Monica Leba, Andreea Ionica, and Álvaro Rocha mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Toshihiko Kato, Shiho Haruyama, Ryo Yamamoto, and Satoshi Ohzahata Context-Aware Mobile Applications in Fog Infrastructure: A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Celestino Barros, Vítor Rocio, André Sousa, and Hugo Paredes
xx
Contents
Analyzing IoT-Based Botnet Malware Activity with Distributed Low Interaction Honeypots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Sergio Vidal-González, Isaías García-Rodríguez, Héctor Aláiz-Moretón, Carmen Benavides-Cuéllar, José Alberto Benítez-Andrades, María Teresa García-Ordás, and Paulo Novais Evolution of HTTPS Usage by Portuguese Municipalities . . . . . . . . . . . 339 Hélder Gomes, André Zúquete, Gonçalo Paiva Dias, Fábio Marques, and Catarina Silva Augmented Reality to Enhance Visitors’ Experience at Archaeological Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Samuli Laato and Antti Laato Design and Performance Analysis for Intelligent F-PMIPv6 Mobility Support for Smart Manufacturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Byung Jun Park and Jongpil Jeong Development of Trustworthy Self-adaptive Framework for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Sami J. Habib and Paulvanna N. Marimuthu Intelligent and Decision Support Systems Traffic Flow Prediction Using Public Transport and Weather Data: A Medium Sized City Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Carlos Silva and Fernando Martins Filtering Users Accounts for Enhancing the Results of Social Media Mining Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 May Shalaby and Ahmed Rafea From Reinforcement Learning Towards Artificial General Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Filipe Marinho Rocha, Vítor Santos Costa, and Luís Paulo Reis Overcoming Reinforcement Learning Limits with Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Filipe Marinho Rocha, Vítor Santos Costa, and Luís Paulo Reis A Comparison of LSTM and XGBoost for Predicting Firemen Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Selene Cerna, Christophe Guyeux, Héber H. Arcolezi, Raphaël Couturier, and Guillaume Royer Exact Algorithms for Scheduling Programs with Shared Tasks . . . . . . . 435 Imed Kacem, Giorgio Lucarelli, and Théo Nazé
Contents
xxi
Automating Complaints Processing in the Food and Economic Sector: A Classification Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Gustavo Magalhães, Brígida Mónica Faria, Luís Paulo Reis, Henrique Lopes Cardoso, Cristina Caldeira, and Ana Oliveira IoT Services Applied at the Smart Cities Level . . . . . . . . . . . . . . . . . . . 457 George Suciu, Ijaz Hussain, Andreea Badicu, Lucian Necula, and Teodora Ușurelu Statistical Evaluation of Artificial Intelligence -Based Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Samir Puuska, Tero Kokkonen, Petri Mutka, Janne Alatalo, Eppu Heilimo, and Antti Mäkelä Analyzing Peer-to-Peer Lending Secondary Market: What Determines the Successful Trade of a Loan Note? . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Ajay Byanjankar, József Mezei, and Xiaolu Wang Experience Analysis Through an Event Based Model Using Mereotopological Relations: From Video to Hypergraph . . . . . . . . . . . . 482 Giles Beaudon, Eddie Soulier, and Anne Gayet Level Identification in Coupled Tanks Using Extreme Learning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Alanio Ferreira de Lima, Gabriel F. Machado, Darielson A. Souza, Francisco H. V. da Silva, Josias G. Batista, José N. N. Júnior, and Deivid M. de Freitas Decision Intelligence in Street Lighting Management . . . . . . . . . . . . . . . 501 Diogo Nunes, Daniel Teixeira, Davide Carneiro, Cristóvão Sousa, and Paulo Novais Cardiac Arrhythmia Detection Using Computational Intelligence Techniques Based on ECG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Jean C. C. Lima, Alanio Ferreira de Lima, Darielson A. Souza, Márcia T. Tonieto, Josias G. Batista, and Manoel E. N. de Oliveira Personalising Explainable Recommendations: Literature and Conceptualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Mohammad Naiseh, Nan Jiang, Jianbing Ma, and Raian Ali Colorectal Image Classification with Transfer Learning and Auto-Adaptive Artificial Intelligence Platform . . . . . . . . . . . . . . . . . 534 Zoltan Czako, Gheorghe Sebestyen, and Anca Hangan Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening of Idiopathic Scoliosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Dejan Dimitrijević, Vladimir Todorović, Nemanja Nedić, Igor Zečević, and Sergiu Nedevschi
xxii
Contents
Big Data Analytics and Applications Detecting Public Transport Passenger Movement Patterns . . . . . . . . . . 555 Natalia Grafeeva and Elena Mikhailova A Parallel CPU/GPU Bees Swarm Optimization Algorithm for the Satisfiability Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Celia Hireche and Habiba Drias Comparison of Major LiDAR Data-Driven Feature Extraction Methods for Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 Duarte Fernandes, Rafael Névoa, António Silva, Cláudia Simões, João Monteiro, Paulo Novais, and Pedro Melo Where Is the Health Informatics Market Going? . . . . . . . . . . . . . . . . . . 584 André Caravela Machado, Márcia Martins, Bárbara Cordeiro, and Manuel Au-Yong-Oliveira Data Integration Strategy for Robust Classification of Biomedical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Aneta Polewko-Klim and Witold R. Rudnicki Identification of Clinical Variables Relevant for Survival Prediction in Patients with Metastatic Castration-Resistant Prostate Cancer . . . . . 607 Wojciech Lesiński, Aneta Polewko-Klim, and Witold R. Rudnicki Human-Computer Interaction Digital Technologies Acceptance/Adoption Modeling Respecting Age Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Egils Ginters Location-Based Games as Interfaces for Collecting User Data . . . . . . . . 631 Sampsa Rauti and Samuli Laato Application of the ISO 9241-171 Standard and Usability Inspection Methods for the Evaluation of Assistive Technologies for Individuals with Visual Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Ana Carolina Oliveira Lima, Maria de Fátima Vieira, Ana Isabel Martins, Lucilene Ferreira Mouzinho, and Nelson Pacheco Rocha Augmented Reality Technologies Selection Using the Task-Technology Fit Model – A Study with ICETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654 Nageswaran Vaidyanathan Towards Situated User-Driven Interaction Design of Ambient Smart Objects in Domestic Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Victor Kaptelinin and Mikael Hansson
Contents
xxiii
The Impact of Homophily and Herd Size on Decision Confidence in the Social Commerce Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 Mariam Munawar, Khaled Hassanein, and Milena Head Multimodal Intelligent Wheelchair Interface . . . . . . . . . . . . . . . . . . . . . 679 Filipe Coelho, Luís Paulo Reis, Brígida Mónica Faria, Alexandra Oliveira, and Victor Carvalho Innovation and Robots in Retail - How Far Away Is the Future? . . . . . 690 Manuel Au-Yong-Oliveira, Jacinta Garcia, and Cristina Correia Usability Evaluation of Personalized Digital Memory Book for Alzheimer’s Patient (my-MOBAL) . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Anis Hasliza Abu Hashim-de Vries, Marina Ismail, Azlinah Mohamed, and Ponnusamy Subramaniam Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Pedro Pinto, Romeu Lages, and Manuel Au-Yong-Oliveira On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages: A UK-China Comparison . . . . . . . . . . . . . . . . . . . . . . . . 723 John McAlaney, Manal Aldhayan, Mohamed Basel Almourad, Sainabou Cham, and Raian Ali Measurement of Drag Distance of Objects Using Mobile Devices: Case Study Children with Autism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 Angeles Quezada, Reyes Juárez-Ramírez, Margarita Ramirez, Ricardo Rosales, and Carlos Hurtado Right Arm Exoskeleton for Mobility Impaired . . . . . . . . . . . . . . . . . . . . 744 Marius-Nicolae Risteiu and Monica Leba Semi-automatic Eye Movement-Controlled Wheelchair Using Low-Cost Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Gustavo Caiza, Cristina Reinoso, Henry Vallejo, Mauro Albarracín, and Edison P. Salazar Design and Control of a Biologically Inspired Shoulder Joint . . . . . . . . 765 Marius Leonard Olar, Monica Leba, and Sebastian Rosca Modelling and Simulation of 3D Human Arm Prosthesis . . . . . . . . . . . . 775 Sebastian Daniel Rosca, Monica Leba, and Arun Fabian Panaite Ethics, Computers and Security On the Assessment of Compliance with the Requirements of Regulatory Documents to Ensure Information Security . . . . . . . . . . . 789 Natalia Miloslavskaya and Svetlana Tolstaya
xxiv
Contents
Iconified Representations of Privacy Policies: A GDPR Perspective . . . . 796 Sander de Jong and Dayana Spagnuelo Data Protection in Public Sector: Normative Analysis of Portuguese and Brazilian Legal Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Marciele Berger Bernardes, Francisco Pacheco de Andrade, and Paulo Novais Common Passwords and Common Words in Passwords . . . . . . . . . . . . 818 Jikai Li, Ethan Zeigler, Thomas Holland, Dimitris Papamichail, David Greco, Joshua Grabentein, and Daan Liang Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity for Telehealth Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 Thiago Poleto, Rodrigo Cleiton Paiva de Oliveira, Ayara Letícia Bentes da Silva, and Victor Diogho Heuer de Carvalho Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Software Systems, Architectures, Applications and Tools
Single Producer – Multiple Consumers Ring Buffer Data Distribution System with Memory Management Mariusz Orlikowski(&) Department of Microelectronics and Computer Science, Lodz University of Technology, Lodz, Poland [email protected]
Abstract. The paper presents the parallel data processing system for data acquisition systems. A solution utilizes on ring buffer data storage place and dedicated memory management class, to be able to provide efficiently a requested memory area and release it. An operation is based on independently working single producer, a number of consumers and release tasks. They are intended to allocate memory buffer and acquire the data, process them and finally release the buffer. To organize such a sequence of the operations a new synchronization object is proposed. All the components allows the system operation with zero copy and lock-free data operation except the tasks synchronization to save CPU processing power while waiting for previous operation completion. The system was tested and some performance results are presented. The solution is currently intended to work in multithreading applications, however the design on robust interprocess operation is ongoing. The presented work includes solutions suitable for the future extension. Keywords: Ring buffer Synchronization object
Memory management Lock-free Zero copy
1 Introduction In the world of data acquisition systems the designers often face the situation that they need to handle high throughput data streams. The data first need to be acquired from the source to CPU, then they need to be processed and finally the results usually are sent away. For contemporary systems it is not so hard to saturate the system with available devices (e.g. ADCs, video cameras). They can generate extensive data stream for single processing unit. The multiple data sources connected to single CPU are not so rare as contemporary computers offering increasing performance are able to process high throughput data. In such a case the data stream can be close to the processing limits of the single system. In such a case a proper data handling and distribution among processing tasks is an issue. The designed solution is intended to be used in high performance diagnostics systems where different processing algorithms need to be applied to the acquired data. Some processing parts, data archiving or pushing the data out may be enabled or © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 3–13, 2020. https://doi.org/10.1007/978-3-030-45691-7_1
4
M. Orlikowski
disabled when needed at any time. Additionally, the ring buffer approach is used to store required amount of data to work smoothly without data loss even when the tasks needs occasionally more time for the processing. The software implements single producer to many consumers data distribution system with zero copy approach implementing dedicated lock-free objects. The existing solutions, e.g. [1–6], implement ring buffer approach, but in most cases they are intended to work with fixed size data, also called messages. The solutions have also other restrictions e.g. number of ring buffer elements must be a power of 2 [5, 6]. It simplifies the data management part and makes possible to optimize the latency and number of possible operations per second. Additionally to achieve the best results they often use active monitoring sacrificing CPU processing time. The features may not fit to many data acquisition and processing application requirement where the system can work with variable size data buffers and balanced performance vs. CPU utilization. The presented implementation is intended to work as multithreading application, but great part of the code was designed to make possible extending it for multiprocess operation in the future. The implementation includes some programming techniques to optimize data management time and minimize latency. The code was prepared for ongoing large scale high energy physics projects as ITER tokamak built in France and its diagnostics systems thus the implementation is done in C++ 11 for Linux 64-bit platforms to operate within CODAC based systems [7].
2 One-to-Many Data Distribution System Proposal The proposed system [8] was designed to work with single writer (producer) feeding data into the system and many readers (consumers) which can operate onto the same data in parallel threads. The readers can attach to the system and detach at any time without interrupting the system operation. 2.1
System Operation Overview
The idea of the operation is derived from the classical ring buffer approach holding the data coming from the source. The acquisition task first reserves following regions of the buffer, which is filled with the data. When the data region is no longer needed, it is released and may be reused again. Memory management data holding information about reserved regions are placed in another structure which also can also be considered as a ring buffer (or a vector indexed by a modulo counter). The information about any specific memory region can be identified by a single number, the vector index. A separate synchronization object is used to organize the task communication. It is also designed as ring buffer where each element holds an individual synchronization object assigned to memory region when data acquisition is completed. The synchronization objects are used to control the specific task access to given memory area for write, read or release. The data management and synchronization objects with their data dependency and operation idea are presented in Fig. 1.
Single Producer – Multiple Consumers Ring Buffer Data Distribution System
5
Fig. 1. The system operation concept.
The solution has always two tasks running: the acquisition thread as a part of user application and release thread run in background. Additionally, during the runtime, reader tasks can be started or stopped at any time to access the data. The synchronization objects are responsible for organizing proper sequence of operations on the data, i.e. the new data region first must be written, then may be read if any reader exists, and finally must be released, when no reads are in progress. For the acquisition and reading task the system interface provides the calls for memory region allocation and committing written data (used in acquisition loop), and the calls for getting the following data regions for reading and committing when reading is completed (used in readers loop). The release task is an internal one and follows the readers and writer freeing memory regions when they commit. It is started on system initialization and operates till system stops. To work with the ring buffers two classes Producer and Consumer have been designed to be used in the application. The provided methods create new or attach to the existing ring buffers and performs all necessary operations related to memory reservation, committing and receiving data and graceful cleaning up on exit. Memory Management. A memory manager is responsible for providing the ring buffer memory areas for the producer and releasing them when they are no longer used. It was designed for (but not limited to) the data acquisition use case. A standard dynamic memory allocation process is not optimal in terms of its usage in high performance data acquisition applications. The issues as built-in locks for thread safety, included metadata and aligned allocation, memory fragmentation, extra lazy allocation algorithms etc. may affect the memory usage, data processing performance and smoothness of operation. The problems can be avoided by using no dynamic allocation at data acquisition and processing time. All the data structures and buffers need to be allocated before any acquisition starts, and they are used sequentially to acquire and process data. A sequential operation allows designing a specialized memory management algorithm with free and allocated regions non-fragmented. A system goes beyond the most common approach with fixed size buffers. They are easier to implement and faster to manage, but not flexible. The designed memory management allows varying size memory region reservation with byte size resolution. The presented solution implements also zero copy and lock free ideas. It allows improving overall system performance. The producer and consumers use the same
6
M. Orlikowski
memory areas for acquiring and processing the data. The locks were replaced with atomic operations without affecting the thread-safety of the operations. The memory manager operates on two shared memory regions. The first one called later memory management data keeps the information about the reserved data regions. The data are managed using an operation on a set of atomic objects identifying current memory layout. The second region is a ring buffer memory, used directly to store the and exchange data. The acquired data storage space is organized as a ring buffer structure. It means that acquired data regions are written one after another. When the writes reach the buffer tail, the writing starts from its beginning again. When a ring buffer memory is allocated in standard way (single memory area), we can face the situation that remaining tail of the memory is too small for current request. It needs to be left unused or the memory data must be provided as two separate regions. To avoid this problem a special memory allocation routine has been used. It uses a feature of MMU which allows to map given physical memory segment in virtual user space more than once. From application perspective the available memory area has doubled size, but writes beyond the first part have the corresponding updates in the first one. The described memory layout greatly simplifies memory management routines, never generates unused tail and allows lockfree memory management implementation. The memory management routines identify ring buffer memory regions using abstract memory addressing in range [0, ring buffer size – 1] and its size. To get the actual address of the specific memory area, the abstract address need to be added to a process specific address of the ring buffer. The approach makes the memory representation process-independent. The information about reserved memory areas is maintained in memory management data. They are organized as a vector of structures (called later data slots) identifying each memory region in use. Each element keeps the information about abstract address, size and a slot index also used as a status flag. A count of data slots is a parameter of the initialization call and it limits the number of concurrently allocated areas. It needs to be estimated for specific data acquisition use case using the information how many data sets need to be buffered. The vector structure also simplifies identification of the memory region by index of the data slot in the table. To manage the ring buffer, i.e. allocate and release data regions, a separate class has been designed. The allocation and release calls operate on dedicated object representing the memory region. The object provides methods to get ring buffer memory region address, the size, and provides the error in case of allocation failure. To make memory management code suitable for other applications the algorithm can also operate in non-ordered release sequence with full thread-safety for allocation and release operations. Anyway, the best performance is possible where the allocation and release sequences match. The memory management is done using atomic operations on the data representing the free data region with the first empty data slot index and the first to release data block information. Additionally, a semaphore is used to control the access to limited number of data slots with. Allocation and release operations alter the counter accordingly additionally providing synchronization for data slot objects. The alloc() call (Fig. 2) tries first to reserve single data slot trying to decrement the data slots access semaphore. Next the free memory size is decreased if sufficient. The
Single Producer – Multiple Consumers Ring Buffer Data Distribution System
7
last step is atomic increase of data slot index and first free abstract address to adjust for next allocation and returning the object representing currently allocated memory region. Managing the address together with the slot index makes sure that the data slot ring buffer holds always data regions ordered. A free() call (Fig. 3) first marks the provided data block as it is requested to release. The flag allows release completion in following calls of free() if it cannot be done in current one. Then the first to release data block information is altered, but only if they match current data to release. A mismatch indicates that the current release is performed in not-ordered sequence or other free operation is in progress so the call exits and release is deferred. Next the data slot flag is set empty, free memory size is increased and the slots access semaphore is incremented. To prevent race with other concurrent free() calls which may increase the semaphore before current data slot is marked empty, a special flag is set and cleared so only one thread can complete the operation. The procedure repeats to complete all postponed releases.
DataBlock alloc(size, timeoutns){ if (sem_wait(slotsSem, timeoutns)!=0) return status_timeout; free = freeSize; do { if (free < size){ sem_post(slotsSem); return status_outofmemory; } } while (!cmp_xchg(freeSize, free, free-size)); data = allocData; do { newData.ID = (data.ID + 1) % slotsNum; newData.addr = (data.addr + size) % rbSize) } while (!cmp_xchg(allocData, data, newData)); dataSlots[data.ID].addr = data.addr; dataSlots[data.ID].size = size; return DataBlock(dataSlots[data.ID], rbBaseAddr); }
Fig. 2. Memory allocation pseudocode (atomic operations marked bold). free(ID){ dataSlots[ID].stat = torelease; do { reqData.ID = ID; reqData.addr = dataSlots[ID].addr newData.ID = (ID + 1) % slotsNum; newData.addr = (dataSlots[ID].addr + dataSlots[ID].size) % rbSize; newData.protect = true; if (!cmp_xchg(freeData, reqData, newdata)) break; dataSlots[ID].stat = empty; freeData.protect = false; freeSize += dataSlot.size; sem_post(slotsSem); } while(dataSlots[ID = newData.ID].stat == torelease); }
Fig. 3. Memory release pseudocode (atomic operations marked bold).
8
M. Orlikowski
Synchronization. To handle thread synchronization for the system a new lock object has been designed (called wrflock). The lock operation consists of set 3 steps for each type of the operation to be performed (write, read or release/free): • wacquire(), racquire(), facquire() – the calls initiates the given type of operation and the lock will be able to transit to the specific state. The racquire() operation is blocking when the lock has acquired been for release and write to prevent starvation problem. • wwait(), rwait(), fwait() – waits when given type of operation can be done. Read and write can block waiting for its turn. A fwait() uses wait yield approach, to prevent the slowest consumer futex wake() calls, which additionally degrade its speed. • wrelease(), rrelease(), frelease() – the calls declares the end of the given operation and allows to unblock the next allowable operation in sequence. The wrflock is based internally on the a 64-bit integer value containing a set of flags and counters to trace number of writer, reader and release tasks operating on the lock. For its internal operations futex and its wait/wake operations are used. The integer bitwise flags are used to keep the information about current state of the lock and its next allowed state. Counters for write and free operations are 1-bit thus only one type of each lock can be successfully hold. For read operation the counter is 16-bit wide thus up to 65535 concurrent readers are able to lock the object. A pseudocode for a set of operations is presented in Fig. 4. Considering the above information wrflock operations can be also considered as a state machine. The possible states and transitions are shown in Fig. 5. The transitions are made in acquire and release (free) calls, where the related read/write/free counters are incremented and decremented accordingly.
facquire (lock) { data = lock.data; do { newdata = data | FR_NUM; if (data & NEXT_RDFR) newdata = newdata ^ (NEXT_RDFR | CURR_FR); } while (!cmp_xchg(lock.data, data, newdata)); } fwait (lock) { while (!lock.data & CURR_FR) yield(); } frelease (lock) { data = lock.data; do { newdata = data & ~(FR_NUM | CURR_FR); if (data & WR_NUM) newdata |= CURR_WR; else newdata |= NEXT_WR; } while (!cmp_xchg(lock.data, data, newdata)); if (newdata & CURR_WR) futex_wake(lock.data); }
Fig. 4. A pseudocode for facquire, fwait and frelease calls (atomic operations marked bold).
Fig. 5. State machine of wrflock
Single Producer – Multiple Consumers Ring Buffer Data Distribution System
9
User Interface. A user of the system needs to use 3 designed classes: • Producer – provides a set of methods to create data structures, reserve memory region and commit data when the writing operation is completed. It runs release thread dedicated to free the allocated data regions when they are no longer used. • Consumer – provides a set of methods to attach to existing producer data structures, get the memory region for reading and commit when the work is completed. • DataBlock – a class used to access specific data region. A producer and consumer operation example has been presented in Fig. 6.
Producer p("test"); DataBlock b; p.create(1000000, 100); p.init(); while (!stop) { while (!p.allocateDataBlock(b, 123)); // do data acquisition // of 123 bytes // to address *b p.commit(b); } p.done();
Consumer c("test"); DataBlock b; c.attach(); c.init(); do (!detach){ c.getDataBlock(b); if (b.isStop()) // producer break; // exited // read b.size() data at *b c.commit(b); } c.done();
Fig. 6. Example code of the producer task (left) and consumer task (right).
3 Performance Tests The system performance has been estimated on Nehalem dual Quad Core 2.5 GHz CPU under Linux RedHat 7.4 using library high resolution clock. The tests shows minimal and average time of operations and the time where 99% of operation are completed. The statistics are done from 10000000 loop iterations. 3.1
WRF Lock Performance
A similar locking object available in GNUC has been compared to WRF lock assuming that no locking takes place, i.e. the lock does not block at all. The tests show the overhead of the internal code needed to check, verify and set the lock. It gives an information how complex, thus how efficient is the locking and unlocking algorithm. The results (Table 1) shows how much time specific lock calls require to complete. The tests show that the performance of the designed locking object faster than remaining ones. In Linux implementation it takes relatively small amount of the memory, so it can be used to create large ring buffer queues. It proves its usability in the system in place of other ones. A performance measures including latency when the lock waits blocked has not been shown. As the objects use internally the same futex calls, results are similar.
10
M. Orlikowski Table 1. Performance comparison of selected locking objects (non-blocking case).
Lock type mutex_t
Size [B] 40
sem_t
32
rwlock_t
56
wrflock_t (w/r/f)
8
3.2
Operation lock unlock post wait rlock/wlock unlock acquire wait release
Lowest time [ns] 6 7 9 10 13 14 8/19/9 1/1/1 9/9/9
Average time [ns] 12 11 13 10 20 18 11/21/12 3/4/3 13/11/14
99% Less then [ns] 82 13 81 10 22 19 8/20/11 2/4/2 12/11/12
Memory Manager Performance
The memory manager performance is compared to standard malloc()/free(). A single test reserves 100 bytes of memory block including allocation success check and release it. For GNUC allocation calls additional tests were performed to test memory locking performance which may be needed to prevent lazy allocation and memory swapping. A memory fragmentation influence was not tested (Table 2).
Table 2. Performance comparison of memory allocation calls.
GNUC
Memory manager (semi MT) Memory manager (fully MT)
alloc free mlock munlock alloc free alloc free
Lowest time [ns] 34 100 e < 350 (R$) >= 350 e < 700 (R$)
>= 700 (R$) >= 700 (R$) 30 (Km)
>100 (R$)
>= 350 e < 700 (R$)
Classification Execution - SVM
The process went through the execution of four steps considering the Kernel functions: Linear, Polynomial, Gausian (RBF) and Sigmoidal. They were been tested to see which would give the best result. It can be see that SVM performed very well and the Gaussian kernel presented the best results. Table 3 has the obtained results.
Table 3. Comparative results of SVM implementations. Kernel Linear
Accuracy Class 0.993556854791 0-Legal 1-Fraudulent Average/Total Polinomial 0.995823887364 0-Legal 1-Fraudulent Average/Total Gausiano 0.998568189953 0-Legal 1-Fraudulent Average/Total Sigmoidal 0.989500059659 0-Legal 1-Fraudulent Average/Total
Precision 0.99 0.00 0.99 1.00 0.85 1.00 1.00 0.88 1.00 0.99 0.03 0.99
Recall 1.00 0.00 0.99 1.00 0.43 1.00 1.00 0.91 1.00 1.00 0.02 0.99
F1-score 1.00 0.00 0.99 1.00 0.57 1.00 1.00 0.89 1.00 0.99 0.02 0.99
Support 8327 54 8381 8327 54 8381 8327 54 8381 8327 54 8381
96
L. P. Pracidelli and F. S. Lopes
Therefore, as indicated by some studies, SVM has been showned to be an efficient algorithm for fraud detection application and the Gaussian that had the best performance in this sample will been compared with the model that also plays in Classification using Neural Networks. 3.6
Classification Execution–ANN
The data set was been submitted to the Multilayer Perceptron Model performing the training through the backpropagation algorithm. The activation functions used were Sigmoid, Tanh and ReLu. For each activation function, the three fundamental optimizers were tested Adam, Sgd and Lbfgs. The process was been performed with the execution of nine steps and the results were satisfactory. Table 4 shows the results organized by the Optimizer and Activation Function. Table 4. Comparative results of ANN implementations. Optimizer and Activation Accuracy Function Adam – Relu
Adam – Tanh
Adam – Logistic
Sgd – Relu
Sgd – Tanh
Sgd – Logistic
Lbfgs – Relu
Lbfgs – Tanh
Lbfgs - Logistic
0.99630115738
0.995465934853
0.994392077318
0.993676172294
0.994272759814
0.993318219783
0.999045459969
0.998926142465
0.999045459969
Class
Precision
recall
f1-score
Support 8327
0-Legal
1.00
1.00
1.00
1-Fraudulent
0.70
0.74
0.72
54
Average/Total
1.00
1.00
1.00
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.74
0.46
0.57
54
Average/Total
0.99
1.00
0.99
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.56
0.57
0.57
54 8381
Average/Total
0.99
0.99
0.99
0-Legal
0.99
1.00
1.00
8327
1-Fraudulent
1.00
0.02
0.04
54
Average/Total
0.99
0.99
0.99
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.57
0.44
0.50
54
Average/Total
0.99
0.99
0.99
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.88
0.91
0.89
54
Average/Total
1.00
1.00
1.00
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.93
0.93
0.93
54
Average/Total
1.00
1.00
1.00
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.89
0.94
0.92
54
Average/Total
1.00
1.00
1.00
8381
0-Legal
1.00
1.00
1.00
8327
1-Fraudulent
0.88
0.98
0.93
54
Average/Total
1.00
1.00
1.00
8381
Electronic Payment Fraud Detection Using Supervised and Unsupervised Learning
97
The results were been bought and it was possible to observe that the optimizers showed good results, however Lbfgs performed better than Adam and Sgd. In bold lines, it is possible to identify the best results of each optimizer. Comparing the activation functions of the Lbfgs optimizer, it was been found that Relu and Logistic had the best results; in terms of accuracy, both presented the same values. The difference between them is been related to the Precision and Recall metrics. Due to greater stability in the parameters of the Relu Activation Function, highlighted by the blue line, this it was been compared with the results obtained by SVM. It was observed that the optimizers showed good results, however Lbfgs performed better than Adam and Sgd. Comparing the activation functions of the Lbfgs optimizer, it was possible to verify that Relu and Logistic had the best results, in terms of accuracy both presented the same values. The difference between them was been related to the Precision and Recall metrics. Due to greater stability in the parameters of the Relu Activation Function, highlighted by the blue line, this it was been compared with the results obtained by SVM.
4 Results and Comparisons Comparing the obtained metrics, there is a slight advantage in the execution of the ANN, especially in fraudulent data, thus bringing greater confidence in the model presented (see Fig. 3).
Accuracy 0.9992 0.999 0.9988 0.9986 0.9984 0.9982
0.99904546 0.99856819 SVM - Gaussian
ANN - Lbfgs + Relu
Fig. 3. Comparative accuracy SVM and ANN.
Comparing the other metrics, there is a slight advantage in the execution of ANN, especially in fraudulent data, thus bringing greater confidence in the model presented (see Fig. 4).
98
L. P. Pracidelli and F. S. Lopes
SVM and ANN Metrics Comparison 1.2 1 0.8 0.6 0.4 0.2 0
SVM Gaussian (Legal)
SVM Gaussian (Fraudulent)
precision
1
0.88
1
0.93
recall
1
0.1
1
0.93
f1-score
1
0.89
1
0.93
ANN - Lbfgs + Relu (Legal)
ANN - Lbfgs + Relu (Fraudulent)
precision recall f1-score
Fig. 4. Comparative SVM and ANN metrics.
5 Conclusions This experiment proposed the creation of an artifact to detect fraud by assessing the racing behavior of an application operating company. For this construction, unsupervised clustering algorithms were been suggested, followed by the application of supervised classification algorithms. For the study was be based on methods that yielded satisfactory results, the definitions of machine learning and the characteristics of the methods to be applied were detailed, that is, the K-means, SVM and ANN were treated. A systematic review was be applied to validate the importance of the topic today and the numerous studies conducted on electronic payments. A review application gap of this analysis was be identified in the companies operating by application. With the grouping it was possible to detect the truth of some hypotheses raised by the company’s specialists, the first of which is that the biggest fraud occurred by payments made via credit card and then the bonus. As for service, it occurred in all categories. In the matter of values it was possible to identify that the ones that caused the greatest loss were above R$ 100.00. It was not possible, through grouping, to identify an association between drivers and passengers or relationship with the cities in which the races are held, so these items will be proposed in future work. After the groupings were be executed, the races received labels classifying them as fraudulent and legitimate, only then the execution of the classification algorithms was started. SVM processing occurred with the main Kernel’s and the one that obtained the best result was be used in comparison with the Neural Networks unemployment. For the ANN test, the Multilayer Perceptron Model was be considered, performing the training through backpropagation. The Adam, Sgd and Lbfgs Optimizers were be compared and the Relu, Tanh and Lbfgs activation functions were be performed for each of them. Among them the best results were: Lbfgs optimizer with Relu and Logistic activation functions, however the Relu activation function was chosen for
Electronic Payment Fraud Detection Using Supervised and Unsupervised Learning
99
comparison with SVM because it demonstrates greater stability in the results of the applied metrics. The accuracy of both had identical results. The result confirms that it is possible to detect electronic payment fraud with the developed artifact and that it is suitable for implementation in an application transport company. Thus, it can be concluded that the K-means and Neural Networks algorithms obtained the best performance.
References 1. Agrawal, A., Kumar, S., Mishra, A.K.: A novel approach for credit card fraud detection. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 8–11. IEEE (2015) 2. Ahmed, M., Naser, A.: A novel approach for outlier detection and clustering improvement. In: 2013 IEEE 8th Conference on Industrial Electronics and Applications (iciea), [s.l.], pp. 577–582. IEEE (2013) http://dx.doi.org/10.1109/iciea.2013.6566435 3. Amarasinghe, T., Aponso, A., Krishnarajah, N.: Critical analysis of machine learning based approaches for fraud detection in financial transactions. In: Proceedings of the 2018 International Conference on Machine Learning Technologies - Icmlt 2018, [s.l.], pp. 12–17. ACM Press (2018). http://dx.doi.org/10.1145/3231884.3231894. Disponível em: http://doi. acm.org/10.1145/3231884.3231894. Acesso em 17 Nov 2018 4. Awoyemi, J.O., Adetunmbi, A.O., Oluwadare, S.A.: Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 International Conference on Computing Networking and Informatics (iccni), [s.l.], pp. 1–9. IEEE (2017). http://dx.doi. org/10.1109/iccni.2017.8123782 5. Behera, T.K., Panigrahi, S.: Credit card fraud detection: a hybrid approach using fuzzy clustering & neural network. In: 2015 Second International Conference on Advances in Computing and Communication Engineering, [s.l.], pp. 494–499. IEEE (2015). http://dx.doi. org/10.1109/icacce.2015.33 6. Choi, D., Lee, K.: Machine learning based approach to financial fraud detection process in mobile payment system. It Conv. Pract. (inpra) 5, 12–24 (2017). Disponível em: http://isyou. info/inpra/papers/inpra-v5n4-02.pdf. Acesso em 21 out 2018 7. Eugênioa, M.: E-commerce no Brasil perfil do mercado e do e-consumidor (2016). Disponível em: https://www.e-commerce.org.br/e-commerce-no-brasil-perfil-do-mercado-edo-e-consumidor-2/. Acesso em 31 maio 2018a 8. Kho, J.R.D., Vea, L.A.: Credit card fraud detection based on transaction behavior. In: Tencon 2017 - 2017 Ieee Region 10 Conference, [s.l.], pp. 1880-1884. IEEE (2017). http:// dx.doi.org/10.1109/tencon.2017.8228165 9. Kount. Mobile Payments and Fraud. In: 6th annual – 2018 Report (2018). Disponível em: https://www.jhacanada.com/White-Paper-Mobile-Payments-and-Fraud-2018-Report. pdf. Acesso em 10 Mar 2019 10. Kroll. Global Fraud & Risk Report: Forging New Paths in Times of Uncertainty. 10th annual edition – 2017/18 (2018). Disponível em: https://www.kroll.com/en-us/global-fraud-andrisk-report-2018. Acesso em 20 Out 2018 11. Lima, F.: O Comércio Electrónico e as Plataformas B2C e C2C: contribuições para o estudo do comportamento do consumidor online. 2012. 210 f. Dissertação (Mestrado) - Curso de Mestrado em Publicidade e Marketing, Instituto Politécnico de Lisboa, Escola Superior de Comunicação Social, Lisboa (2012). Disponível em: https://repositorio.ipl.pt/bitstream/ 10400.21/1990/1/Disserta%C3%A7%C3%A3o.pdf. Acesso em 05 Jan 2019
100
L. P. Pracidelli and F. S. Lopes
12. Lima, R.A.F., Pereira, A.C.M.: Aplicação de Técnicas de Inteligência Computacional para Detecção de Fraude em Comércio Eletrônico. A Revista Eletrônica de Iniciação Científica, Belo Horizonte 12(3) (2012). Trimestral. Disponível em: https://seer.ufrgs.br/reic/article/ view/32053. Acesso em 04 ago 2018 13. Monamo, P.M., Marivate, V., Twala, B.: A multifaceted approach to bitcoin fraud detection: global and local outliers. In: 2016 15th IEEE International Conference on Machine Learning and Applications (icmla), [s.l.], pp. 188–194. IEEE (2016a). http://dx.doi.org/10.1109/icmla. 2016.0039 14. Monamo, P., Marivate, V., Twala, B.: Unsupervised learning for robust Bitcoin fraud detection. In: 2016 Information Security For South Africa (issa), [s.l.], pp. 129–134. IEEE (2016b). http://dx.doi.org/10.1109/issa.2016.7802939 15. de Moura, W.G.: Um Método de Apoio ao Aprendizado de Máquina Não Supervisionado para Interpretação de Clusters. 2015. 91 f. Dissertação (Mestrado) - Curso de Engenharia de Computação, Engenharia de Software, Instituto de Pesquisas Tecnológicas, São Paulo (2015). Disponível em: https://www.slideshare.net/slideshow/embed_code/key/ lMfGycDEkCzjjS. Acesso em 17 Nov 2018 16. Oliveira, P.H.M.A.: Detecção de Fraude em cartões: um classificador baseado em regras de associação e regressão logística. 2016. 103 f. Dissertação (Mestrado) - Curso de Ciência da Computação, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo. Cap. 3 (2016) 17. Omar, S.J., Fred, K., Swaib, K.K.: A state-of-the-art review of machine learning techniques for fraud detection research. In: Proceedings Of The 2018 International Conference on Software Engineering in Africa - Seia 2018, [s.l.], pp. 11–19. ACM Press (2018). http://dx. doi.org/10.1145/3195528.3195534 18. Sá, A.G.C., Pappa, G.L., Pereira, A.C.M.: Generating personalized algorithms to learn bayesian network classifiers for fraud detection in web transactions. In: Proceedings Of The 20th Brazilian Symposium On Multimedia And The Web - Webmedia 2014, [s.l.], pp. 179– 186. ACM Press (2014) http://dx.doi.org/10.1145/2664551.2664568. Disponível em: https:// dl.acm.org/citation.cfm?id=2664551.2664568. Acesso em 21 out 2018 19. Santiago, G.P.: Um processo para Modelagem e Aplicação de Técnicas Computacionais para Detecção de Fraudes em Transações Eletrônicas. 2014. 110 f. Dissertação (Mestrado) Curso de Ciência da Computação, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo. Cap. 3 (2014) 20. Sherly, K.K., Nedunchezhian, R.: Boat adaptive credit card fraud detection system. In: 2010 IEEE International Conference On Computational Intelligence And Computing Research, [s. l.], pp. 1–7. IEEE (2010). http://dx.doi.org/10.1109/iccic.2010.5705824 21. Souza, P.V.C.: A utilização dos métodos de data mining e machine learning no processo de prevenção à fraudes no mercado segurador. In: vii congresso brasileiro de engenharia de produção, 7., 2017, ponta grossa, pr. Anais… . Ponta grossa, pr: vii congresso brasileiro de engenharia de produção, pp. 1–12 (2017). Disponível em: https://www.researchgate.net/ publication/322862525_a_utilizacao_dos_metodos_de_data_mining_e_machine_learning_ no_processo_de_prevencao_a_fraudes_no_mercado_segurador. Acesso em 01 June 2018 22. Wang, M.J., Mao, G., Chen, H.: Mining multivariate outliers. In: Proceedings of the 2014 ACM Southeast Regional Conference on - Acm Se 2014, [s.l.], pp. 1–4. ACM Press (2014). http://dx.doi.org/10.1145/2638404.2638526. Disponível em: http://doi.acm.org/10.1145/ 2638404.2638526. Acesso em 25 Nov 2018
Electronic Payment Fraud Detection Using Supervised and Unsupervised Learning
101
23. Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. In: Procedia Computer Science, [s.l.], vol. 48, pp. 679–685. Elsevier BV (2015). http://dx.doi.org/10.1016/j.procs.2015.04.201 24. Zhou, X., et al.: A state of the art survey of data mining-based fraud detection and credit scoring. Matec Web Of Conferences, [s.l.], vol. 189, pp. 03002–3017. EDP Sciences (2018). http://dx.doi.org/10.1051/matecconf/201818903002. Disponível em: https://www.matecconferences.org/articles/matecconf/abs/2018/48/matecconf_meamt2018_03002/matecconf_ meamt2018_03002.html. Acesso em 17 Nov 2018
Fog Computing in Real Time Resource Limited IoT Environments Pedro Costa1 , Bruno Gomes1 , Nilsa Melo1 , Rafael Rodrigues1 , C´elio Carvalho1,4 , Karim Karmali4 , Salim Karmali4 , Christophe Soares1,2 , Jos´e M. Torres1,2 , Pedro Sobral1,2(B) , and Rui S. Moreira1,2,3 1
ISUS Unit, FCT - University Fernando Pessoa, Porto, Portugal {pedro.costa,bruno.gomes,nilsa.melo,rafael.rodrigues, celio.carvalho,csoares,jtorres,pmsobral,rmoreira}@ufp.edu.pt 2 LIACC, University of Porto, Porto, Portugal 3 INESC-TEC, FEUP - University of Porto, Porto, Portugal 4 Hardlevel - Energias Renov´ aveis, Porto, Portugal {karim.karmali,salim.karmali}@ufp.edu.pt http://isus.ufp.pt
Abstract. Cloud computing is omnipresent and plays an important role in today’s world of Internet of Things (IoT). Several IoT devices and their applications already run and communicate through the cloud, easing the configuration burden for their users. With the expected exponential growth on the number of connected IoT devices this centralized approach raises latency, privacy and scalability concerns. This paper proposes the use of fog computing to overcome those concerns. It presents an architecture intended to distribute the communication, computation and storage loads to small gateways, close to the edge of the network, in charge of a group of IoT devices. This approach saves battery on end devices, enables local sensor fusion and fast response to urgent situations while improving user privacy. This architecture was implemented and tested on a project to monitor the level of used cooking oil, stored in barrels, in some restaurants where low cost, battery powered end devices are periodically reporting sensor data. Results show a 93% improvement in end device battery life (by reducing their communication time) and a 75% saving on cloud storage (by processing raw data on the fog device).
Keywords: Fog computing IoT · IoT · M2M
1
· Edge computing · Fog analytics · Fog
Introduction
Nowadays, IoT integration with the physical world has provided the simplest everyday objects with monitoring capabilities for both the environment and users. All the IoT local infrastructure is, most of the times, tailored to accomplish majorly the basic functions of sensing, acting and communicate or relay raw data to the network. Hence, certain features, such as storing or processing c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 102–112, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_10
Fog Computing in Real Time Resource Limited IoT Environments
103
data, are most of the times absent locally, being associated exclusively with the cloud where data will be processed and stored for future analysis. Frequently, IoT devices communicate directly with a cloud server using technologies such as GPRS or NB-IoT or, alternatively, through a LoRa or sigfox gateway, enabling data report to the cloud via Internet. Figure 1 illustrates the mostly used architecture for IoT. Currently, cloud computing is heavily adopted for a multitude of applications and IoT is no exception. This type of architectural solution presents interesting dynamics as all the information collected by the whole ecosystem of the existing IoT devices in diverse installations is sent to a central point. This aggregation allows developers to implement algorithms to analyze data on a greater scale. Although cloud computing brings more processing power and storage, some disadvantages are obvious when is taken into consideration the increasing number of IoT devices, which have to do the heavy lifting of reporting collected data. This communicating process consumes a considerable amount of energy. If the IoT device is battery powered it will be negatively impacted. Besides energy consumption, scalability is another concern. Since IoT devices generally don’t have great processing power or storage, the cloud infrastructure must be able to serve all the devices connected to the system, and the more devices are in use the more cloud resources are required to support it. The IoT devices often are responsible for the production of streams of continuous data and this puts some pressure into the existing storage capabilities in cloud computing that have to adapt to the amount of data been generated. Fog computing [1] is the distribution of cloud computing resources to small agents closer to the IoT devices, in the edge of the network where they are deployed, taking on the role of cloud computing in a smaller scale. As such, these devices can take the responsibility of storing, processing and reporting information generated by the IoT devices. As Fig. 2 illustrates, the fog device is in between the IoT end nodes and the cloud. This creates a dynamic where the IoT device doesn’t necessarily know the existence of a cloud server and it doesn’t need to. The IoT device reports data more efficiently since it has to send data to another much closer device, resulting in battery life improvements for the IoT end node. Since the fog device stores and processes data from all IoT devices in close proximity, only processed data or a summary of it is, most of the times, sent to the cloud, saving storage space and simultaneously improving the privacy levels for IoT environments. Although Fog computing presents advantages in IoT environments the disadvantages are also important to note, like the lack of software in the field requiring experienced developers, also the allocation of resources on a greater scale can impact the architecture. One typical application scenario pointed for fog computing is in e-health, when quick decisions must be made. For instance, a monitored user can have his/her information sent to a nearer (fog) device that can store and analyze data and determine if the user is in a critical health state and trigger necessary re-actions. In opposite, for the cloud server solution, the latency involved in waiting for a response could be fatal. As the paper [2] suggests this can improve
104
P. Costa et al.
Fig. 1. Cloud computing architecture
Fig. 2. Fog computing architecture
response time and even apply deep learning algorithms to determine the cause for the emergency, to act upon it. The working scenario for this paper is to provide a low-cost and extensible IoT solution, for monitoring the level of used cooking oil on barrels distributed among country-widespread restaurants. When the oil level in the barrel exceeds a limit, it is collected by a recycling company. This paper addresses the challenges for managing efficiently the deployment of those IoT barrels. The aim is to solve the lack of information about the oil level in the existing restaurant barrels. The original process, to be improved, is the following: an operator, from the recycling company, calls the restaurant asking if the barrels are full of oil. Frequently, the restaurant personnel give inaccurate information about oil level in barrels, forcing the company to send personnel to check/collect barrels, thus wasting unnecessary resources. The installation of smart oil barrels solves a lot of problems related to monitoring, however, a pure cloud computing solution raises some issues regarding battery life expectancy, privacy of collected sensor data, cost and data traffic. Therefore, this paper presents a smart oil barrels solution, based on a fog computing architecture, that increases battery life, reduces latency in the communication process and also the privacy and amount of data send to the cloud server. This paper is organized as follows: Sect. 2 references the state of art on fog computing and areas where it can be used. Section 3 presents the fog computing solution, its advantages and disadvantages and also the Smart Barrels Case of implementation in the fog architecture. The last section analyzes the performance of the equipment, efficiency and stability of the network and the scalability of the architecture.
Fog Computing in Real Time Resource Limited IoT Environments
2
105
Related Work on Fog Computing
Fog computing is still on early stages of adoption but it’s already proving to be a viable solution to solve problems related to latency and load balancing of the cloud. The idea for using fog computing in this project was first introduced in [2], where edge computing could minimize latency and provide a quick response in health cases that needed immediate attention, for example, in case of a fall or a heart attack. The authors also point out that the fog layer, since it has capabilities for storing and processing, sensor fusion and machine learning could be used to improve results and keep the personal data private. The following [3] was also crucial in defining what fog computing could achieve since cloud computing can be affected by latency problems. The same authors address the idea of fog-to-fog collaboration, where fog devices can communicate with each other to minimize latency processes that require immediate action. As such, this paper presents scenarios where fog devices are supervising IoT devices in remote location and have to off-load that process to near by fog devices so it can be process and act upon. A real world implementation of a fog computing architecture is presented in [4], where fog computing is used to analyze video footage from street cameras, determine the situation being recorded and acted upon the footage using a deep residual network model. After that, the system sends an alert to a server via MQTT protocol, as such the system doesn’t save irrelevant footage and just has alerts to acted upon it saving storage space in the cloud server where otherwise would have uneventful footage. In [5] the authors present the idea of using classifiers as K-NN and ID3 on fog devices for classifying sensor data and determine: what data needs to be sent to the cloud directly; not sent to the cloud; or processed and compressed before it is sent to the cloud. The authors claim that this system could improve latency concerns and maximize bandwidth. Table 1. “+” improvement; “−” not applicable Solutions comparison Requirements/papers [2] Latency
[3]
[4]
+++ +++ +++
Privacy
+++ −
Scalability
+++ +++ +++
Resources
++
+
− ++
Although the papers presented discuss real uses cases for fog computing some of the papers have drawbacks in the utilization of fog computing. In [3], the author presents a algorithm to load balance all the fog devices to minimize latency, but it’s never mention how the network is setup and protocols used, security is also a concern since data it’s transmitted between fog devices. Paper [4] discusses the implementation of fog device closer to the hardware, the video
106
P. Costa et al.
camera, but never mention how much space it’s saved with this method or if the Raspberry Pi keeps some footage and what is done to mitigate the space occupied in the SD card. Table 1 presents a comparison between the papers, according to four dimensions. These dimensions: latency, privacy, scalability, and features, are aspects where fog computing can take advantage over cloud computing.
3
Fog Computing and the Smart Barrels Case
Traditionally, in cloud based IoT systems, the application logic is implemented between end devices and the cloud server, using a router to provide Internet connectivity (Fig. 3). The router is only responsible for packet forwarding at the network layer without any intervention in the application protocol. On the other hand, in fog-based IoT systems, the application protocol operates between the end device, the fog device and the cloud server (Fig. 4). From the end device perspective, the existence of the fog device enables a significant lower communication latency. As such, the end device can save battery by turning off the radio interface and going into deep sleep much faster and more often [7]. From the cloud server perspective the raw data processing performed on the fog device can save storage space and improve scalability. This is of paramount importance to cope with the expected exponential growth on the number of connected IoT devices in the near future, each one endlessly generating vast amounts of sensor data. Fog computing handles this problem by saving most of the generated data locally and only sending partial or summarized processed data to the cloud. However, the original raw data can still be kept in the network accessible fog device. Data integrity and fault tolerance are important requirements, so the data replication between the fog and cloud layers is important [6]. For the system as a whole, the existence of the fog device, enables use case scenarios where local sensor fusion is desired and fast feedback to urgent events is of major importance. The fog device is usually not restricted in processing power and energy spending like the end devices. So, it is a suitable place to perform computational intensive tasks on sensor data like pattern recognition, deep learning, sensor fusion etc. Since it is located close to the end devices it can also react in near real time to emergency situations, reducing the system response time. Cloud based systems struggle to meet the real time requirements of several application scenarios due to unpredictable communication latency and server load. 3.1
Smart Barrels Project
The described fog computing architecture was applied in the Smart Barrels Project. The main goal of the project was the real time monitoring of the cooking oil level in the barrels delivered to partner restaurants in order to improve the efficiency of the oil collection process.
Fog Computing in Real Time Resource Limited IoT Environments
107
Fig. 3. Communication in Cloud based IoT systems
Fig. 4. Communication in Fog based IoT systems
The collection process consisted of scheduled or regular visits to the restaurant facilities, to manually check the oil level or to call the restaurant and ask the employees, if the level was high. As such the process was inefficient and with high margin for error. So the solution was implemented with the use of sets of smart oil barrels controlled by a fog device, and deployed together on restaurants premises. A similar fog device was placed also in the oil recycling warehouse, to detect the return of the smart oil barrels when they are collected and sent back for oil storing and barrel cleaning.
Fig. 5. Prototype box.
The oil barrels were equipped with an expressif SOC ESP32 (Fig. 5 A), a VL53l0X lidar sensor used to measure the oil level (Fig. 5 B) and a 750 mAh at 4.2 v battery. The fog device prototype is presented in Fig. 6. It was developed
108
P. Costa et al.
using a Raspberry Pi Zero W (Fig. 7 A), that hosted a hidden WiFi network for connecting with the smart oil barrels and a SIM800l v2.1l modem (Fig. 7 B) that provided a GPRS connection to the Internet to communicate with the cloud server.
Fig. 6. Fog device case
Fig. 7. Fog device internals
The fog device provides a service to the smart oil barrel, using the COAP protocol library [9,10]. This protocol requires minimal device processing and network resources. Other protocols, like MQTT or pure UDP, were also tested for use in this scenario. As presented in [1] we concluded that COAP was the best option for this case. The network application is based on a REST model where the end nodes act as clients and the fog device and the cloud server implement a REST API. The smart oil barrel sends hourly data reports. The data includes: barrel identification, oil level, software version, battery level, time required to connect to the WiFi network and the correspondent RSSI value. After the fog device received at least one data entry from every registered smart oil barrel, it sends a notification to the cloud server with a summary of the most recent data received every 4 h. This summary reports barrels low battery condition, high oil level or communications problems as well as fog device telemetry like used memory and processor load histories. The overall smart barrels project architecture is presented in Fig. 8. Communication between the fog devices and the cloud server is always available through a VPN tunnel. The connection is slow due to the limitations of the GPRS connection, however, it was stable enough to allow SSH access. With this setting it is possible to access data in real time and even automatically perform over the air firmware updates on the smart oil barrels in every restaurant.
4
Evaluation
The smart barrel project was a significant breakthrough for the recycling company. Before its implementation all the process was dependent on humans with a
Fog Computing in Real Time Resource Limited IoT Environments
109
Fig. 8. Smart barrels project architecture
considerable waste of time and resources. Implementing a fog computing architecture was a fundamental decision to make this project feasible in the real world. The first improvement was the significant decrease in the communication time between the smart barrel and the fog device. As presented in Fig. 9, this allowed the smart barrel to be more efficient and save battery life. If it was done directly with the cloud server, the smart barrel needed a local Internet connection through WiFi, which is not always available, or a dedicated GPRS connection. Using the restaurant WiFi network, the data reporting process directly to the cloud requires around 4–5 s [8]. With a dedicated GPRS connection, according to our tests, the process took more than 6 s and required a minimum electric current of 1 Ampere, making this option unacceptable since the working time for the smart barrel battery would be less than 10 days. Using the fog device as a middle-man through WiFi we were able to cut down the communication time to an average of 450 ms. This approach represents a reduction of around 93%. As a result, the expected battery life of the smart barrel improved dramatically to around 1,5 years (with a 750 mAh battery). Telemetry on the smart barrels received so far from the ground is consistent with our predictions. Once the WiFi connection is established, the fog device prioritizes the smart barrel response time. When a smart barrel sends a data report, it takes around 10 ms for an acknowledgment to arrive back at the barrel. Occasionally the barrel was unable to connect to the WiFi as fast as expected (taking between 1–3 s). In order to preserve battery life in the barrel we check if the connection is taking more than 800 ms. If that’s the case, the device aborts the connection, goes to sleep and tries again later. Another improvement was on the amount of data being sent to the cloud. Since every barrel in service send 4 messages with 54 bytes to the fog device before the synchronization with the cloud takes place, only 1/4 of the raw data is sent, saving storage space on the cloud server. Considering, for example, 5 smart oil barrels, if all the data is sent to the cloud, there will be 43800 messages
110
P. Costa et al.
Fig. 9. Network connection setup time
(2.36 Mbytes) each year. This optimization saves 75% of cloud storage (to 0.59 Mbytes) as presented in Fig. 10. The ability to perform over the air firmware updates on all the smart barrels through the GPRS connection was also very important for the system management. Although the process is quite slow, it was tested and it worked well even in harsh environments where walls and network interference were present. Mobility support was also an important requirement. From time to time some barrels were reported stolen and their whereabouts unknown. Since every oil barrel is provisionally associated with a specific fog device, it’s possible to track its presence in the restaurant where it was deployed.
Fig. 10. Cloud storage
Fog Computing in Real Time Resource Limited IoT Environments
5
111
Conclusion and Future Work
The main goal of this project was the development of a fog computing architecture for IoT applications. The major requirements were low implementation, operational and maintenance costs, fast response time, scalability and extensibility. In order to fulfill those requirements, several challenges such as battery and storage efficiency, communication latency and data privacy needed to be tackled. The Smart Barrels project was successfully deployed and it truly helped optimizing the oil collection process and smart barrel management for the recycling company. A field test in several restaurants is already in the ground and the results are quite promising. This project underscores the potential of the fog computing architecture in stringent IoT scenarios where critical information requires immediate attention from an increasing number of IoT devices. The fog computing architecture can be extended in the future to include more sensors and services, providing value added information from sensor fusion scenarios. The use of machine learning can improve results and predict events, for example, the need to collect an oil barrel in advance. In this enterprise context of potential services expansion, the inclusion of image sensors for example, which are more demanding, from the point of view of the volume of data produced and processed, will enable the implementation of more advanced services. Acknowledgements. This work was funded by Project P-FECFP-HARDLEVELISUS0002-2018 supported under the scope of a protocol established between Hardlevel Energias Renov´ aveis Lda and Funda¸ca ˜o Ensino e Cultura Fernando Pessoa, represented here by its R&D group Intelligent Sensing and Ubiquitous Systems (ISUS).
References 1. Bellavista, P., Berrocal, J., Corradi, A., Das, S.K., Foschini, L., Zanni, A.: A survey on fog computing for the Internet of Things (2019). https://doi.org/10.1016/j. pmcj.2018.12.007 2. Monteiro, K., Rocha, E., Silva, E., Santos, G.L., Santos, W., Endo, P.T.: Developing an e-Health system based on IoT, fog and cloud computing (2018). https:// doi.org/10.1109/UCC-Companion.2018.00024 3. Masri, W., Al Ridhawi, I., Mostafa, N., Pourghomi, P.: Minimizing delay in IoT systems through collaborative fog-to-fog (F2F) communication (2017). https://doi. org/10.1109/ICUFN.2017.7993950 4. Mendki, P.: Docker container based analytics at IoT fog (2018). https://doi.org/ 10.1109/IoT-SIU.2018.8519852 5. Sangulagi, P., Sutagundar, A.V.: Context aware information classification in fog computing (2018). https://doi.org/10.1109/ICAECC.2018.8479464 6. Grover, J., Garimella, R.M.: Reliable and fault-tolerant IoT-edge architecture. International Institute of Information Technology, Hyderabad (2018). https://doi. org/10.1109/ICSENS.2018.8589624 7. Mebrek, A., Merghem-Boulahia, L., Esseghir, M.: Efficient green solution for a balanced energy consumption and delay in the IoT-fog-cloud computing (2017). https://doi.org/10.1109/NCA.2017.8171359
112
P. Costa et al.
8. Tran, M.A.T., Le, T.N., Vo, T.P.: Smart-config Wifi technology using ESP8266 for low-cost wireless sensor networks (2018). https://doi.org/10.1109/ACOMP.2018. 00012 9. https://coap.technology/ 10. https://github.com/go-ocf/go-coap
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling Bruno Gomes1 , Nilsa Melo1 , Rafael Rodrigues1 , Pedro Costa1 , C´elio Carvalho1,4 , Karim Karmali4 , Salim Karmali4 , Christophe Soares1,2 , Jos´e M. Torres1,2 , Pedro Sobral1,2 , and Rui S. Moreira1,2,3(B) 1 ISUS Unit, FCT, University Fernando Pessoa, Porto, Portugal {bruno.gomes,nilsa.melo,pedro.costa,rafael.rodrigues,csoares,jtorres, pmsobral,rmoreira}@ufp.edu.pt 2 LIACC, University of Porto, Porto, Portugal 3 INESC-TEC, FEUP, University of Porto, Porto, Portugal 4 Hardlevel - Energias Renov´ aveis, V.N. Gaia, Portugal {celio.carvalho,karim.karmali,salim.karmali}@hardlevel.pt http://isus.ufp.pt
Abstract. This paper presents an efficient, battery-powered, low-cost, and context-aware IoT edge computing solution tailored for monitoring a real enterprise cooking oil collecting infrastructure. The presented IoT solution allows the collecting enterprise to monitor the amount of oil deposited in specific barrels, deployed country-wide around several partner restaurants. The paper focuses on the specification, implementation, deployment and testing of ESP32 /ESP8266-based end-node components deployed as an edge computing monitoring infrastructure. The achieved low-cost solution guarantees more than a year of battery life, reliable data communication, and enables automatic over-the-air end-node updates. The open-source software libraries developed for this project are shared with the community and may be applied in scenarios with similar requirements. Keywords: Power efficiency management · Context-aware motes Edge computing · Internet of Things (IoT)
1
·
Introduction
There is a growing demand for embedded processing and communication facilities in everyday objects, both in home and enterprise scenarios. These IoT trending efforts are compelling and leading researchers into multiple and diverse edge computing architectures. More specifically, in smart cities, there is an enormous effort and many potential scenarios for applying edge computing monitoring infrastructures. This paper focuses on providing a low-cost and extensible IoT edge computing solution, for monitoring cooking oil level on distributed restaurant barrels. This IoT edge computing infrastructure is currently being applied c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 113–124, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_11
114
B. Gomes et al.
on a real enterprise for managing its used cooking oil collection network. In the remainder of the paper we discuss the architecture, software and hardware implementation aspects and details. Nowadays, many countries have to cope with compelling laws to ambient protection. For example in Portugal, restaurants are required to adopt strategies for conveying used cooking oil into distribution centers, where it can be recycled and reused in bio-fuels. We work together with a recycling company that is responsible for managing a country-level network of cooking oil containers (aka. barrels), deployed in client restaurants. The recycling company is responsible for managing the entire network of barrels, i.e., for deploying empty barrels and collecting full ones, on-site. Moreover, is responsible for storing and analyzing the collected cooking oil before forwarding it from the warehouse to bio-fuel recycling partners. The cooking oil barrels network is currently being monitored by the IoT infrastructure that is proposed in this paper. This solution allows our partner enterprise to monitor real-time cooking oil levels inside all deployed barrels. Moreover, it provides valuable information for planning the collection of barrels from the entire distributed network of client restaurants throughout Portugal. This project was technically challenging because it allowed working on a fullstack IoT technical solution, applied to a real enterprise application scenario. The possibility to work on an IoT solution for supporting the environment, made it even more motivating, since the proposed solution will contribute to increase the efficiency of enterprise cooking oil collection, thus diminishing the amount of harmful oil waste left untreated.
Fig. 1. Lid sensor mounting
Usually, each restaurant has between two and five allocated barrels. The proposed solution allows to identify each barrel and remotely monitoring its oil level. Thus, the technological solution is partly deployed between the facilities of the restaurants and the managing enterprise. As shown in Fig. 1, inside the lid of the barrels, there are ESP32/ESP8266-based motes equipped with sensors for estimating the oil level deposited by restaurants. These sensor motes were planned and prepared to survive in adverse dirty environments. The set of barrels in each restaurant connect to a local ESP32-based WiFi Access Point (AP) that bridges to the enterprise cloud service. The proposed edge infrastructure focused
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling
115
on reducing the deployment costs per barrel, while allowing to determine their position and cooking oil level, for planning collection paths. This paper is organized as follows: Sect. 2 presents related projects addressing power efficient and low cost in sensor monitoring. Section 3 describes the rational of the architecture and hardware/software setup to comply with the pre-established business requirements. In Sect. 4 the solution is evaluated, in a real world scenario, with a set of motes implementing the optimization proposed in Sect. 3. Final remarks and future work are presented in Sect. 5.
2
Related Work on IoT Monitoring Infrastructures
End node sensor monitorization has been applied in several projects like [4] or [10]. These works discuss the idea of applying a nearby gateway offering several wireless technologies. For example, in [10] some power optimization techniques were explored, however, the high sampling rate of sensor data, together with the lack of embedded processing on the gateway, makes the average battery life last for days instead of years. Also in [4] an ESP32 based air quality monitorization was conducted but no power optimizations were analysed. In [1] an ultrasonic distance sensor mote is presented to measure oil/water solutions based on vertical distance, from the roof top, inside a tank. This mote solution could be reused to estimate the oil level or even the amount of water, based on the differences in wave propagation, in the air-oil-water interface. Although this could be a replicable solution, its integration in our application scenario would be highly subject to damages, mainly due to barrel transportation and handling hazards. Hence, such integration would require complex physical/structural adaptations, thus increasing substantially the unitary cost of motes. The main focus presented on [11] lies in the analysis of the latency regarding an end-to-end communication, and a significant cost reduction by removing the gateway from the network topology. Although it proposes low end node costs, the discussed solution does not achieve lower enough latency values, and does not comply with our strict battery autonomy requirements. Table 1 depicts a qualitative comparison between the above-mentioned projects. In this table, the Poor rating describes a feature that is not accessible or compatible with our requirements. The Medium rating considers a solution that could be integrated into our scenario with some adaptations, and Good accounts for a solution that could be considered without changes. For our real world enterprise scenario, the strict business requirements of low costs and long battery life, led us to test and further develop the principles presented in [11] and [6]. Our efforts improved their outcomes (relative to our goals) with the integration of the distributed computational paradigm described in [2]. An in-depth power profile analysis was carried out in [6], but all the presented tests were conducted with “off-the-shelf ” development boards leading to a sub-optimal energy management at the hardware level.
116
B. Gomes et al. Table 1. Comparison of related IoT monitoring solutions P = Poor, M = Medium, G = Good Project Unitary cost Battery management OTA updates Sensor solution [4] [10] [1] [11]
M G M G
M P P M
P P P P
P P M P
From a network point of view, the technical analysis presented in [8], comparing different kinds of LAN and PAN wireless technologies, and also the application and transport layers described in [5], provided the basis for the choices made in our project as described in the following sections.
3
Agile Edge Computing System Solution
This Section discusses the edge computing infrastructure developed with general business restrictions in mind, namely the viable operation and autonomy under battery-power functioning, large scale low-cost deployment and contextawareness regarding barrels operation depending on location. 3.1
Requirements of Sensor Motes
The proposed approach must comply with the following predefined requirements. – Long battery life: restaurants may have a yearly recall cycle. The proposed solution should guarantee one year of battery life. – Context-awareness: barrels should adapt their behavior according to their current location. For instance, the sleeping time between each measurement will vary considering if it is in a restaurant (filling state) or a warehouse (emptying state). – Reliable data communication: messages sent by the barrel end nodes should be acknowledged by the AP, to confirm reception. Retransmission mechanisms are deployed when network misbehavior is detected. – Delay tolerant automatic over-the-air updates: after the deployment stage, the barrels will be installed into multiple restaurants, or stored in warehouses. Once a new firmware is available, it should be automatically propagated thorough all existing end nodes (i.e., barrels motes) to guarantee they remain updated and properly functioning. – Low unitary cost: the proposed solution must scale to more than fifty thousand barrels, so the unitary cost is a major concern to turn this project economically viable.
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling
117
– Agile support: the hardware should be agile and open enough to cope with future requirements and upgrades. Namely, several aspects should be considered, such as support for additional sensors use, processing power capabilities, battery management efficiency, and support for diverse communication technologies. 3.2
Edge Computing System Architecture
The system should provide a generic low-cost solution based on end node batterypowered equipments. Therefore, the option was biased to the use of an agile IoT edge computing topology supporting different communication technologies. Thus, instead of establishing a direct connection between each barrel end node and the remote server, an intermediate device (cf. local network Access Point) was added, next to the end nodes. The local WiFi Access Point (AP) acts as a local server and gateway for the edge devices, i.e., functioning as a local branch and forwarding collected data to the remote server.
Fig. 2. IoT edge computing system architecture
From the end node point of view, this architectural approach, when compared with a standard sensor ⇔ server communication, brought some advantages to the proposed solution: – Less energy consumption (due to fast, short-range communication with local AP). Real time responsiveness described in [2] applied to minimize the time interval spent in rx/tx modes and therefore reduce power consumption. – Reduce end node cost (since there is no need for any kind of WAN hardware, typically more expensive). – OTA updates become easy implemented (each AP device caches latest firmware updates to deliver to end nodes). – No need for local pre-installed third-party network solutions between barrels and server. – Agile monitoring solution (easy and configurable reporting of sensor data, either by push or pull strategies).
118
B. Gomes et al.
– Context-awareness [2] (inherited by the barrels since the location of each intermediate local AP is known). In this architecture, the barrel end nodes are directly connected to the local WiFi AP. This device stores locally all collected data, from all barrels, and reports periodic reports to the server. Moreover, the AP enables also requests from the server for directly collecting info from end node barrels. 3.3
Barrel Mote Hardware Implementation
Microcontroller. There are several hardware solutions on the market that may be considered for the implementation of the end nodes (cf. barrel motes). The project option fell on the use of the Espressif modules ESP8266 and ESP32. The choice was mostly based on the versatility and scalablility of the modules, the availability of a vast support community, and also on the low price of the hardware when compared to similar wireless protocols like Bluetooth or ZigBee. The project considered also cheaper RF modules but the existing solutions lack several interesting capabilities, such as, over the air updates, “out of the box” connectivity along with off-the-shelf within the edge device. The Espressif modules enable also to programmatically implement Rest APIs for standard remote procedure calls (RPC), with an acceptable data range (between 10 and 100 m should be expected) and an interesting bandwidth (Wi-Fi b/g/n standard). In addition, the project considered important the possible future use of other wireless protocols (e.g. Bluetooth, LoRa). For these reasons, other boards were disregarded and the ESP8266 and ESP32 modules were selected for tests and possible use. One of the major problems of the barrel motes operation was related with their energy consumption and autonomy. WiFi communication does not focus on low power consumption, hence any modules using WiFi communication would suffer from autonomy problems. For instance, the ESP8266 and ESP32 often reach peak currents of more than 200 mA during data exchange [6]. A solution to tackle this issue is discussed later in Sects. 3.3 and 3.4. Sensor Mote Readings. The barrel motes are fixed on the lid of the barrels and measure the distance from the top to the surface of the cooking oil contained in each moment inside the barrel. Several sensors were tested and evaluated for this purpose (cf. IR sensors, ultrasound sensors, lidar, etc.). This parallel work is an ongoing subject which keeps its main focus on sensor readings in dirty environments. Therefore, next sections assume that sensor motes obtain the readings from distance sensors and transmit them to the AP. This paper will not address the test and choice of the measuring sensors. Barrel Mote Hardware Optimization. The use of development boards usually eases the task of quickly implementing a solution. However, when working in scenarios coping with a limited amount of energy, many of the existing general purpose hardware included in those boards becomes a problem.
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling
119
For instance, an onboard USB to Serial converter or a standard voltage regulator might help the developing process, but can easily bring the current consumption to the mA range, even when the chip stays in a sleep state (e.g. NodeMcu ESP-12E, Barebones ESP8266 ). 1. NodeMcu ESP-12E – Chip - ESP8266-12E – Board - NodeMcu ESP8266-12E – External sensors - none – Current draw during deep-sleep - 1 mA 2. Barebones ESP8266 – Chip - ESP8266-12E – Board - minimal configuration breakout board – External sensors - none – Current draw during deep-sleep - 20 µA For example, when we take into consideration two boards with the same chip (1 and 2) in the unlikely scenario of a forever sleeping node and given: C = Xt where C represents the battery capacity (µAh), X the average current draw (µA) and t the expected battery lifetime, in hours, we would achieve the following results (both considering a generic 750 mAh LiPo battery): 1. 750000 = 1000t ⇔ t = 750 h 2. 750000 = 20t ⇔ t = 37500 h Just by choosing the right board peripherals shows an increase of around 50 times the original expected battery life. In a real-world scenario, where the battery suffers from degradation over time and the node wakes periodically between pre-defined time intervals, the difference should be slightly lower. 3.4
Barrel Mote Software Implementation
As described in Sect. 3.3 both ESP8266 and ESP32, communicating over WiFi, are not as efficient as required in our scenario. Therefore, this led us to explore some software modifications to optimize the power consumption between duty cycles. There are two main working states that can be distinguished during the device operation: – Sleep State: besides deactivation of unnecessary components (peripherals, coprocessors), not many software optimizations could be performed. – Awake state: here is where software optimization could impact battery lifetime.
120
B. Gomes et al.
For example, the ESP8266 has an average current consumption around 80 mAh during awake state, whereas in sleep mode the barebone chip should not need more than 20 µAh, under normal operation [3]. Therefore, the answer to an improved battery lifetime should focus on reducing the time spent on high energy demand modes. Increased sleep time between duty cycles helps, but the 32-bit real-time clock does not allow the ESP8266 to sleep for more than 4294967295 micro-seconds, roughly 71 min (the 64-bit Real Time Clock (RTC) present on the ESP32 bypasses this limitation). Thus, increasing the time interval between operation modes may impact positively on battery lifetime, when fine-tuning the awake state. ESP Operation Mode Management Library. With the above in mind we created an Arduino IDE compatible library, called ESP Operation Mode Management library (EOMMlib) [7], that simplifies the task of implementing an energy efficient functioning pattern: sleep → wakeup → boot → connect → read → communicate → sleep. This library is available under open source license and works with both ESP8266 and ESP32. Some of the implemented optimizations are listed below: – Pre-defined static network configuration: avoid DHCP, which is especially relevant on the ESP8266 connection time. – Keep connection details in cache: keeping BSSID, SSID and channel cached, allows connection establishment without any kind of network scan, thus releasing almost 2 s of connection time. – Reduce connection checking interval: allows checking if the module is connected in 10 ms intervals instead of 500 ms. – Avoid busy waiting: the Wi-Fi connection task does not lock the CPU, hence, instead of spending time asking if the connection is established the library runs other useful, time-dependent work, like sensor reading. All of the above optimizations were encapsulated in a single class library, which is agnostic with respect to sensor readings and data transmission protocols.
Fig. 3. Comparison between duty cycles duration: standard WiFi client (blue) versus EOMMlib-based client (red)
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling
121
The current draw over time, during barrel mote operation, could be calculated by discretizing the continuous operation of the module in duty cycles of duration TA (time awake) and TS (time sleeping) and current drawn from CA (current awake) and CS (current sleeping). ∞ CAn ∗ T An + CSn ∗ T Sn ) ( T An + T Sn n=1
Based on the values presented in Fig. 3, it is possible to estimate an average time to send of 2377.86 µs in comparison with 222.17 µs (achieved with the above-mentioned operations). Considering an equal sleep time of one hour, the average current required by the module over its duty cycle would decrease from around 73 µAh to 25 µAh. This translates into an almost three times lower power consumption. In a real scenario, using a client implemented with the EOMMlib, the average time to send could stay below 200 ms, when using some techniques to pre-detect the sporadic 3+ seconds time to connect to the base station, observed in Fig. 3. – Standard WiFi connection: 80000 ∗ 6.61e−4 + 20 ∗ 1 = 72.88 µAh – Proposed approach: 80000 ∗ 6.19e−5 + 20 ∗ 1 = 24, 95 µAh Module Updates and Context Awareness. The capability to support Over The Air (OTA) updates, on thousands of working devices, in an automatic process, must be considered in this project. The chosen hardware for this project provides some ways of dealing with this feature, but only on a manual basis, where one module is updated at a time. Since more than 50.000 running modules may be spread around different restaurants, it was necessary to adapt the default OTA feature to comply with these requirements. The end node modules have a web form that can be used to update for the last firmware version. This feature made it possible to update the modules automatically by a local server-side running script. Such script is capable to fill in the form provided by the end node and feed the binary file for the update. This means that the module needs to work either as a client or as a web server, every time a new update is ready. Since the binary file is sent over a generic web form the module does not lose its capability of manual updates too. The context awareness is implemented based on same principle of the OTA updates. After the module send the last reading to the local server it waits for a response. This response brings new information to the mote regarding its current location or a special action to be performed, like the above-mentioned OTA service. 3.5
Communication Protocol Between Motes and AP
The data exchanging between the Mote end nodes and WiFi AP, requires a fullduplex protocol with message acknowledgement and re-transmission capabilities.
122
B. Gomes et al.
Moreover, the time spent to send a message and wait for AP response ack is critical because it directly impacts on battery consumption. Based on the in-depth analysis presented in [5], and the complete specification provided in [9] directed us to work with COAP. Therefore, the communication between the end node and the AP device uses a UDP-based COAP implementation, which not only allows sending data with low latency, but also, permits to directly call a local API served by the AP device. A publish subscriber approach using MQTT was also considered, but, although MQTT supports fullduplex communication, the need to work on QoS level 1 (cf. at least once) would increase the time for data exchanging [5] and therefore the power consumption of the mote devices which are battery-power dependent.
4
Tests and Validation
This section presents a concrete implementation of every aspect and optimization described in Sect. 3. The end nodes were assembled and integrated into the barrel lids. These motes for cooking oil level monitoring used the following components: – – – – – – –
ESP32 - Adafruit Huzzah ESP32 breakout. 3.7 V 1000 mAh battery. VL53L0X distance sensor. 2 MΩ voltage divider. 470 µF capacitor. 0.1 µF decoupling capacitor. Existing WiFi AP and cloud server used in the edge computing architecture.
The tests involved the assembly of 10 end node prototypes with the abovementioned components. These motes were deployed together with the remaining edge computing components of the architecture, for one whole month in two different restaurants. The time period allowed to fulfill two complete usage cycles (cf. warehouse → restaurant → warehouse).
Fig. 4. Battery voltage level over time (for all 10 tested motes)
A Power Efficient IoT Edge Computing Solution for Cooking Oil Recycling
123
The experiment focused mainly on checking the functional requirements and also analyzing the network usage and energy-consumption related data. Figure 4 shows an approximation to the battery level of the 10 end node modules, used along the test operation period. The proposed energy management solution, allowed a direct impact on the autonomy of the end node modules, which was one major requirement. From Fig. 3 it is perceptible that end node 4 had unexpected excessive voltage drops, caused by an hardware malfunction. This problem lead to a complete battery drain, but was not related with the programmed modules operation mode. Therefore, the experimental results corroborated the theoretical results presented in Sects. 3.4 and 3.3. Modules 7 and 10 experienced also some electrical problems due to defective wire connections between the voltage divider and the micro-controller. These wiring problems interfered with some of the battery voltage readings, but did not cause any observed battery drain. The connection times (counting from the end node boot to the moment right before sending the COAP packets) were also aligned with the software optimization introduced by use of the EOMMlib Sect. 3.4. Only a few sporadic excessive delays were identified, mainly caused by cache misses (occurring on the first connection to a specific AP device). The context awareness part of the problem also achieved positives results with the motes detecting the movement between the restaurants and the warehouse and adopting different sleep patterns.
5
Conclusion
This paper described the implementation of an IoT edge computing solution for a national wide cooking oil recycling enterprise. Currently, this enterprise uses a totally manual and human-operated solution, which has several resource management problems and costs. The use of an automatic low-cost remote monitoring infrastructure will allow the enterprise to improve the operation management of cooking oil barrels which are spread by hundreds of restaurants. The paper focused on presenting the specification, implementation and tests to ensure the requirements related with modules operation and energy efficiency for supporting longer lifetimes. Such solution provides a win-win scenario between the enterprise and the partner restaurants. These later stakeholders do not need to manually check the oil level, neither deal with delayed collections. The former enterprise may use the IoT infrastructure to monitor the levels of cooking oil, on the entire population of deployed barrels, plan ahead collecting paths and avoid unnecessary visits to restaurants. The efficiency of the end node modules, achieved through the integrated operation with the edge architecture, will enable the recycling enterprise to explore the low-cost remote monitoring infrastructure capabilities to estimate cooking oil barrel collection periods. This joint project foresees, in the future, to use this and more valuable monitoring information to determine best route collecting paths. Besides oil monitoring and barrel collection, the presented edge computing infrastructure can also be applied to other waste management networks used in the same company. For example, the company plans to use the same edge
124
B. Gomes et al.
computing solution for monitoring grease traps, installed in waste-water disposal systems of partner restaurant sinks. These grease traps, are also part of the business model of the company and must be washed periodically. Hence new sensor solutions must be tested and prototyped in future developments. Acknowledgements. This work was funded by Project P-FECFP-HARDLEVELISUS0001-2018 supported under the scope of a protocol established between Hardlevel Energias Renov´ aveis Lda and Funda¸ca ˜o Ensino e Cultura Fernando Pessoa, represented here by its R&D group Intelligent Sensing and Ubiquitous Systems (ISUS).
References 1. Al-Naamany, A.M., Meribout, M., Al Busaidi, K.: Design and implementation of a new nonradioactive-based machine for detecting oil–water interfaces in oil tanks. IEEE Trans. Instrum. Meas. 56(5), 1532–1536 (2007) 2. Bellavista, P., Berrocal, J., Corradi, A., Das, S.K., Foschini, L., Zanni, A.: A survey on fog computing for the Internet of Things. Pervasive Mob. Comput. 52, 71–99 (2019) 3. Espressif Systems. ESP8266EX Datasheet, Version. 6.2, August 2019 4. Flores-Cortez, O.O., Adalberto Cortez, R., Rosa, V.I.: A low-cost IoT system for environmental pollution monitoring in developing countries. In: 2019 MIXDES 26th International Conference “Mixed Design of Integrated Circuits and Systems”, pp. 386–389, June 2019 5. Lagerqvist, A., Lakshminarayana, T.: IoT latency and power consumption: measuring the performance impact of MQTT and CoAP. Master’s thesis, J¨ onk¨ oping University, JTH, Computer Science and Informatics (2018) 6. Mesquita, J., Guimar˜ aes, D., Pereira, C., Santos, F., Almeida, L.: Assessing the ESP8266 WiFi module for the Internet of Things. In: 2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA), vol. 1, pp. 784–791, September 2018 7. ISUS: Intelligent Sensing and Ubiquitous Systems. ESP operation mode management library (2019). https://code.ufp.pt/bruno.gomes/esp fast connect 8. Shahzad, K., Oelmann, B.: A comparative study of in-sensor processing vs. raw data transmission using ZigBee, BLE and Wi-Fi for data intensive monitoring applications. In: 2014 11th International Symposium on Wireless Communications Systems (ISWCS), pp. 519–524, August 2014 9. Shelby, Z., Frank, B., Sturek, D.: Constrained application protocol (CoAP). In: IETF Internet-Draft draft-ietf-core-coap-08, Work in Progress (2011). http://tools. ietf.org/html/draft-ietf-core-coap-08 10. Span´ o, E., Di Pascoli, S., Iannaccone, G.: Low-power wearable ECG monitoring system for multiple-patient remote monitoring. IEEE Sens. J. 16(13), 5452–5462 (2016) 11. Tuan Tran, M.A., Le, T.N., Vo, T.P.: Smart-config WiFi technology using ESP8266 for low-cost wireless sensor networks. In: 2018 International Conference on Advanced Computing and Applications (ACOMP), pp. 22–28, November 2018
Supervising Industrial Distributed Processes Through Soft Models, Deformation Metrics and Temporal Logic Rules Borja Bordel(&), Ramón Alcarria, and Tomás Robles Universidad Politécnica de Madrid, Madrid, Spain [email protected], {ramon.alcarria,tomas.robles}@upm.es
Abstract. Typical control solutions for future industrial systems follows a topdown approach, where processes are completely defined at high-level by prosumers (managers) and, later, the control infrastructure decomposes, transforms and delegates the execution of the different parts and activities making up the process into the existing industrial physical components. This view, although may be adequate for certain scenarios; presents several problems when processes are executed by people (workers) or autonomous devices whose programming already describes and controls the activities they perform. On the one hand, people execute processes in a very variable manner. All these execution ways are valid although they can be very different from the original process definition. On the other hand, industrial autonomous devices cannot be requested to execute activities as desired, and their operations can only be supervised. In this context, new control solutions are needed. Therefore, in this paper it is proposed a new process supervision and control system, focused on industrial processes executed in a distributed manner by people and autonomous devices. The proposed solution includes a soft model for industrial processes, which are latter validated through deformation metrics (instead of traditional rigid indicators). Besides, in order to guarantee the coherence of all executions, temporal logic rules are also integrated to evaluate the development of the different activities. Finally, an experimental validation is also provided to analyze the performance of the proposed solution. Keywords: Intelligent control Real-time control Supervised processes Process models Industrial process Data acquisition
1 Introduction Future industrial systems are envisioned to implement very effective control solutions, so they can increase the efficiency of their operations in a manner never seen before. New concepts and technologies such as Industry 4.0 [1], circular economy [2] and Cyber-Physical Systems [3] are supporting this promising future. However, control technologies are still the more important and essential component in all these new industrial solutions [4], as they are in charge of guaranteeing that low-level executions and operations (performed by hardware devices) are organized and coordinated to meet © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 125–136, 2020. https://doi.org/10.1007/978-3-030-45691-7_12
126
B. Bordel et al.
the high-level objectives fixed by managers and prosumers [5] (users that behaves at the same time as service producers and consumers). In this context, typical works imagine industrial solutions as top-down mechanisms where processes are defined in a hard way at high-level [6]. At this level, managers and prosumers create models where final objectives and business restrictions are stablished. The process control system, later, analyzes these process models, translates them into executable code, decomposes them and delegates the different parts and activities to hardware devices through an specific interface exiting in every physical element. This approach guarantees the high-level objectives are easily met, as orchestration engines and similar software elements can manage how processes are executed, as well as the entire infrastructure behavior, in order to carry the business operation to the expected point. However, this scheme (although adequate for certain scenarios) is extremely difficult to implement in applications where people (workers) and/or autonomous devices are in charge of process execution [7]. On the one hand, these elements have not an interface to command them to act as desired. Besides, people tend to execute activities as they consider it is the best way, although that manner does not fit the proposed model by managers [8]. In general, furthermore, these executions are valid, although they are different from expected. On the other hand, autonomous devices have a programming which totally describes and controls the activities and processes they execute, so (actually) managers can only activate or stop the operation of these devices. No more control operations are available. In this scenario, processes are executed in a distributed and choreographed manner, and they can only be slightly manipulated and controlled by managers. Then, control solutions with a bottom-up view are needed and required. In that way, distributed processes executed at low-level should be supervised to ensure they meet the minimum requirements to be acceptable and harmless for other agents (people or devices) in the environment. Although the evolution of processes being executed could not be modified, if necessary, supervisory modules could trigger an alarm, initiate corrective action to mitigate as much as possible the caused inefficiencies, etc. To do that, however, supervisory control modules must implement technologies adapted to the special characteristics of the heterogeneous agents being considered (people and autonomous devices) [9]. The objective of this paper is to fulfill this gap. Therefore, in this paper it is proposed a new industrial solution to control processes through supervisory mechanisms. First, in order to adapt the new solution to people’s behavior, a new soft model for processes is proposed and employed. Besides, this new process model allows us to evaluate the validity of process executions by deformation metrics, instead of rigid indicators (such as the maximum allowed error). Second, in order to guarantee the coherence of the entire execution and integrate autonomous devices in the proposed solution, temporal logic rules will be employed. Prosumers could define the high-level minimum structure every process execution should follow, and a specific engine will be in charge of evaluating and information about the state of these rules (or requirements).
Supervising Industrial Distributed Processes
127
The rest of the paper is organized as follows: Sect. 2 describes the state of the art on industrial control solutions; Sect. 3 describes the proposed solution, including the new process model and architecture; Sect. 4 presents an experimental validation; and Sect. 5 concludes the paper.
2 State of the Art on Industrial Control Mechanisms Almost every existing type of control system has been applied to industrial scenarios. From monolithic solutions to networked control systems and fuzzy mechanisms. However, not all solutions have shown the same performance. Then, below we are reviewing the main reported solutions. The most classic control solution for industrial scenarios is SCADA (Supervisory Control And Data Acquisition) systems [10]. SCADA, in fact, implements supervisory control policies, in order to manage heterogeneous infrastructures (a similar scenario to which we are addressing in this paper). Nevertheless, SCADA is a very complex and heavy technology which rarely can be combined with resource constrained devices [12]. For example, the most common communication protocol in these systems (OPC, OLE for Process Control [11]) is much less efficient than any other existing communication protocol. On the other hand, many security risks and attacks have been detected and reported in relation to SCADA systems [14, 15], so (although it was a good solution in the past), it is not the most adequate technology for future scenarios. SCADA is a monolithic system, although it defines some independent modules [13] and in the last years it has been transformed into a cloud-based solution [16] which even may be combined to Internet-of-Things deployments. However, most modern industrial systems are distributed scenarios including different geographical sites and independent production systems. Thus, networked (or distributed) control solutions have been intensively reported in the last ten years. Contrary to SCADA proposals (most of them focused on practical issues), networked control systems are typically studied from a mathematical point of view [17]. The communication delays, inevitable in any networked solution, have forced the researchers to analyze the stability [18], evolution [19], mathematical models [20], and other similar elements, related to the most classic control theory (based on dynamic systems); but in the context of industrial scenarios. This approach, nevertheless, is based on a small collection of devices, not including people and not including pervasive sensing platforms and other ubiquitous computing schemes. Then, other new and improved control mechanisms are required. In this context fuzzy control solutions were defined. Although fuzzy control and logic is a pretty old technology [21], in the last ten years many researchers have created new solutions based on this paradigm for future industrial scenarios [22]. In this case, as in networked control systems, most works are focused on mathematical analyses [23]. Although some sparse implementations have been reported, especially in the context of nonlinear control and event-driven processes (usually managed using logic rules) [24]. This last mechanism is very interesting for distributed processes, as sometimes they are managed by events. Finally, in the last five years, control solutions for industrial scenarios have been based on modern technologies such as Cyber-Physical Systems, Industry 4.0 or cloud
128
B. Bordel et al.
services [6]. In this case, control is distributed and supported by pervasive physical infrastructures, following the principles of networked and fuzzy control. In this case, moreover, different real implementations have been reported [25, 26]. As said in the Introduction, these new solutions tend to present a top-down approach which is not adequate for distributed processes [4], and people is not usually considered in these schemes. In this paper we are combining the last approaches for control systems based on new paradigms such as Industry 4.0, with other previous technologies with a good performance in distributed process control as logic rules and event-driven processes. As a complete novelty, in this work we are also including a new process model to integrate people (workers) in industrial control systems.
3 New Supervisory Control Solution for Distributed Industrial Processes In industrial scenarios, very different processes may be executed. However, in this paper, we are focusing on those application where distributed processes are employed, i.e. applications where processes are managed and choreographed by several independent agents (e.g. people, autonomous devices). If all these agents are correctly working, at the end, a coherent and valid process is executed. However, sometimes, problems occur, and a global control entity covering and supervising all operations is needed. With this view, we are describing in the next subsection a new architecture to implement such a supervisory control entity. First, we are analyzing the different processes in an industrial scenario may appear, later we are presenting the proposed architecture. Finally, in the last two subsections, we are describing how atomic activities are validated through soft process models and deformation metrics; and how global processes are evaluated to be coherent through temporal logic rules. 3.1
Bottom-up Architecture for Distributed Process Control
Very different processes may be executed in industrial scenarios. Figure 1 shows a taxonomy. As can be seen, two basic types of processes are identified: physical processes and computational processes (hybrid models may also be considered). Physical processes are evolutions of physical variables following a specific path (servo problems) or reaching certain final value or point (regulator problems). Any case, physical processes are not composed of tasks but natural flows. On the other hand, computational processes are sequences of activities or tasks which must be executed. Computational processes may be defined at high-level and, then, delegated into the underlying hardware platform (top-down processes); or they may be defined at low-level, in a distributed or choreographed manned, among all the existing devices and agents in the infrastructure (bottom-up processes). Finally, bottom-up computational processes may be eventdriven (using, for example, models proposed by standard such as IEC 61499), if agents in the infrastructure generate events to inform high-level components about what is happening in the infrastructure. If low-level agents (humans of devices) do not notify the
Supervising Industrial Distributed Processes
129
Fig. 1. Processes in industrial scenarios: classification
main event taken place, then processes are supervised and some activity recognition mechanism, process model and other similar technologies are required. In this paper we are focusing on distributed processes, in general. Thus, the proposed architecture must be adapted to all possible agents: devices generating explicit events about their operation, autonomous devices which are not generating those events, and people (workers). Figure 2 shows the proposed architecture. As can be seen, three different technological domains are considered: people (workers), autonomous devices, and devices generating explicit notifications about their operations (in this case, events). Agents which cannot notify their operations to the high-level control modules, are recovered by an activity recognition module; so, information about the activities taken place at low-level can be collected. These activity recognition technologies are usually based on Artificial Intelligence mechanisms [8], but we are not addressing those solutions in this paper. Any existing solution such as Hidden Markov Models [27] or neural networks [28] could be employed. Recognized activities are injected in the “atomic action control layer”. In this layer, data (sequences of variables describing the operations in the physical infrastructure) are collected. Those data are employed to estimate the deformation of the activities being performed, compared to soft models stored in an activity repository. A decision-making module performs that comparison after a deformation analysis phase. Deformation metrics are, then, employed to generate events about the low-level operations in the “global process control layer”. These events, together with those which come from event-driven devices, are injected into a rule verification engine. In this engine, besides, rules created by managers (or prosumers) in the corresponding creation environment, are verified and evaluated considering the received events. Then, a decision about processes being executed is done. The following subsections are focused on describing both, the atomic action control layer and the global process control layer.
130
B. Bordel et al.
Fig. 2. Proposed architecture for controlling distributed processes
3.2
Atomic Actions Control: Soft Processes and Deformation Metrics
As explained before, the first module in the atomic action control layer is a data collection module. In this module, for each activity A being executed, a set SA of N relevant variables (and their values) are collected (1). SA ¼ SiA i ¼ 1; . . .; N
ð1Þ
These variables are employed to represent each recognized activity. Now, in general, any task, activity or process is associated with a set of objectives which must be met in order to consider valid the execution. If these objectives must be met in a
Supervising Industrial Distributed Processes
131
hard manner, the process or activity is called “hard process”. Sometimes, however, some small tolerance margins or errors (2) are considered and allowed. In this case, it is said the process is “rigid”. Typically, these error values e can be expressed as relative amounts (2). These approaches, nevertheless, are only valid if admitted errors are small ðjei j 1Þ. If, as in our case, processes may change in much more complex manner, but they are still valid, then, a new process definition is required: soft or deformable processes. ei ¼
si;max siA DS A ¼ i sA siA
ð2Þ
In this new process model, an activity is represented as a solid body in a generalized N-dimensional space. To do that, it is enough to consider a N-dimensional phase W RN space where each variable siA represents a dimension. Variables evolve with time (3), so after executing a certain task or activity (with a duration of T seconds), a Ndimensional trajectory is defined (4). SA ðtÞ ¼ SiA ðtÞ i ¼ 1; . . .; N; t ¼ t0 ; . . .; t0 þ T ! nA ð t Þ ¼
! SiA OðtÞ i ¼ 1; . . .; N
¼ S1A ; S2A ; . . .; SNA
ð3Þ ð4Þ
Then, to create such a rigid solid, we should consider the volume created by that trajectory (5). Then, the execution of a deformable process may be understood as a deformation function D (6). VA ¼
n ! o nA ðtÞ; t ¼ t0 ; . . .; t0 þ T D : W !W D ðVA Þ ¼ VA
ð5Þ ð6Þ
In order to estimate the deformation suffered by this solid body (activity), it is defined the deformation tensor F as the Jacobian matrix of the application D (7). Then, and finally, from this Jacobian matrix, different metrics mi may be obtained to evaluate the deformation of the activity (and if it is admissible). One of this metrics, for example, may be the module (8). 0 B F ¼ rD ¼ @
@S1; A @S1A @SN; A @S1A
... ... ...
m i ¼ kF k
@S1; A @SNA @SN; A @SNA
1 C A
ð7Þ
ð8Þ
132
B. Bordel et al.
For each activity, then, the expected metrics are stored in a repository (the soft process repository) and compared, at this point, to the obtained metrics mi in a decision-making module which decides whether an execution is valid (or not) or any other relevant decision considered in the analyzed scenario. 3.3
Evaluating Global Processes: Prosumer Environment and Temporal Logic Rules
At this point we have obtained a set of M relevant pieces of information pi about the activities being executed at low level. Then, from event-driven devices a flow of events eva is received. Besides, using the referred pieces of information, an event generation module creates a second flow of events evb . All these events are received in a new engine where processes are supervised. At this level, processes are controlled through a set of high-level rules which are meant to not be violated any case. If that happens, the executions must be cancelled, and alarm must be triggered, etc. These rules, in our case, are created by managers in a prosumer environment, using temporal logic rules. These rules may be created graphically in a very easy manner and can be verified in a very simple manner as well. Temporal logic rules are composed of a finite set of atomic propositions, Boolean connectives (9) and the temporal connectives (10). : ðnotÞ;
^ ðand Þ;
_ ðor Þ and ! ðimpliesÞ
U ðuntilÞ; R ðreleaseÞ; X ðnexttimeÞ; G ðgloballyÞ and F ðin the futureÞ
ð9Þ ð10Þ
Intuitively, / U w states that / remains true until such a time when w holds. Dually, / R w, or / releases w, means that w must be true now and remain true until such a time when / is true, thus releasing w. / means that / is true in the next time step after the current one. Finally, / commits / to being true in every time step while / designates that / must either be true now or at some future time step. We define a computation p as a set of executed activities that generated the corresponding events eva and evb .p satisfies (⊨) a temporal logic rule / if the set of actions from p makes / evaluate to true (11). We say that p /; otherwise we say that p 2 /
ð11Þ
As an example, if / : b ! ð: b U aÞ, defining that “a” must occur before “b” (precedence property), p1 ¼ fb; b; cg 2 / but p2 ¼ fa; c; bg /. The evaluation engine analyses the created and loaded temporal logic rules in realtime. Having a computation p and a set of temporal logic rules (12), we define the following states for the duple (p, C): C ¼ f/1; /2; . . .; /ng
• Satisfied: if 8 /i 2 C : p /i
ð12Þ
Supervising Industrial Distributed Processes
133
• Temporarily violated: if there is a conflict but it can exist a longer trace p that has p as a prefix and for which ðp ; C Þ is satisfied. • Permanently violated: if there is a conflict and there is no longer trace p that has p as a prefix and for which ðp ; C Þ is satisfied. The engine evaluates the logic rules and determines if the global process being executed takes the state satisfied, temporarily violated or permanently violated. In that way, the global coherence of the process is supervised, controlled and evaluated.
4 Experimental Validation: Implementation and Results In order to evaluate the performance of the proposed solution, an experimental validation was designed and carried out. The validation was based on a simulation scenario, deployed using the NS3 simulator. The scenario represented a set of one hundred and twenty (120) agents executing action according a programming that cannot be modified using any technique. Besides, one hundred (100) users (people) and one hundred (100) event-driven devices were also considered. In order to control the behavior of all these agents in the proposed simulation, behavior models [9] were injected in the simulator using C++ technologies. The proposed architecture was deployed using virtual instances and Linux containers, deployed using LXC technologies and the libvirt interface and API. Using TAP bridges and ghost nodes the infrastructure based on virtual machines and the NS3 simulation were interconnected. For this initial evaluation only one experiment was carried out. In this experiment, the percentage of non-valid activities and processes that are correctly detected and stopped is analyzed and evaluated. The obtained results are compared to existing traditional solutions for supervisory control, in particular, a reduced open source SCADA system. Figure 3 shows the obtained results. As can be seen, more than 90% of cases are correctly evaluated, contrary to traditional supervisory control solutions which barely reaches 75%. This implies, for this first evaluation, an improvement around 20%. Most of this improvement is caused by the new soft model for processes which allows a much more flexible understanding of executions. Despite these good initial results, the proposed technology and validation scenario present some limitations and threats. First, human behavior is, typically, much more variable and unpredictable than common models, so the performance of the proposed mechanism may get worse in real deployments. Besides, prosumers may create confusing or unclear rules, what severely affects the final performance of the proposed system. Finally, the workers’ profile and the devices’ firmware may also affect the final performance: devices with a reduced set of operations and workers with a technical profile tend to execute processes in a more stable and predictable way, so process execution systems present a higher performance.
134
B. Bordel et al.
Percentage (%)
ExecuƟons correctly considered
Proposed soluƟon
100 90 80 70 60 50 40 30 20 10 0
TradiƟonal soluƟon
Correct detecƟon
Wrong detecƟon
Fig. 3. Simulation results
5 Conclusions and Future Works In this paper we present a new process supervision and control system, focused on industrial processes executed in a distributed manner by people and autonomous devices. The proposed solution includes a soft model for industrial processes, which are latter validated through deformation metrics (instead of traditional rigid indicators). Besides, in order to guarantee the coherence of all executions, temporal logic rules are also integrated to evaluate the development of the different activities. An experimental validation is also provided to analyze the performance of the proposed solution, obtaining an improvement around 20% over existing supervisory control mechanisms. Future works will evaluate the proposed solution in real environments, and real industrial processes will be also considered. Acknowledgments. The research leading to these results has received funding from the Ministry of Economy and Competitiveness through SEMOLA (TEC2015-68284-R) project and the European Commission through DEMETER project (DT-ICT-08-2019, project ID: 857202).
References 1. Bordel, B., de Rivera, D.S., Sánchez-Picot, Á., Robles, T.: Physical processes control in Industry 4.0-based systems: a focus on cyber-physical systems. In: Ubiquitous Computing and Ambient Intelligence, pp. 257–262. Springer, Cham (2016) 2. Geissdoerfer, M., Savaget, P., Bocken, N.M., Hultink, E.J.: The circular economy–a new sustainability paradigm? J. Clean. Prod. 143, 757–768 (2017) 3. Bordel, B., Alcarria, R., Robles, T., Martín, D.: Cyber–physical systems: extending pervasive sensing from control theory to the Internet of Things. Pervasive Mobile Comput. 40, 156–184 (2017)
Supervising Industrial Distributed Processes
135
4. Sánchez, B.B., Alcarria, R., de Rivera, D.S., Sánchez-Picot, A.: Enhancing process control in Industry 4.0 scenarios using cyber-physical systems. JoWUA 7(4), 41–64 (2016) 5. Alcarria, R., Robles, T., Morales, A., López-de-Ipiña, D., Aguilera, U.: Enabling flexible and continuous capability invocation in mobile prosumer environments. Sensors 12(7), 8930– 8954 (2012) 6. Bordel, B., Alcarria, R., de Rivera, D.S., Robles, T.: Process execution in cyber-physical systems using cloud and cyber-physical internet services. J. Supercomput. 74(8), 4127–4169 (2018) 7. de Rivera, D.S., Alcarria, R., de Andres, D.M., Bordel, B., Robles, T.: An autonomous information device with e-paper display for personal environments. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 139–140. IEEE (2016) 8. Bordel, B., Alcarria, R., Sánchez-de-Rivera, D.: A two-phase algorithm for recognizing human activities in the context of Industry 4.0 and human-driven processes. In: World Conference on Information Systems and Technologies, pp. 175–185. Springer, Cham (2019) 9. Bordel, B., Alcarria, R., Hernández, M., Robles, T.: People-as-a-Service dilemma: humanizing computing solutions in high-efficiency applications. In: Multidisciplinary Digital Publishing Institute Proceedings, vol. 31, no. 1, p. 39 (2019) 10. Boyer, S.A.: SCADA: supervisory control and data acquisition. International Society of Automation (2009) 11. Zheng, L., Nakagawa, H.: OPC (OLE for process control) specification and its developments. In Proceedings of the 41st SICE Annual Conference, SICE 2002, vol. 2, pp. 917–920. IEEE (2002) 12. Ahmed, I., Obermeier, S., Naedele, M., Richard III, G.G.: Scada systems: challenges for forensic investigators. Computer 45(12), 44–51 (2012) 13. Li, D., Serizawa, Y., Kiuchi, M.: Concept design for a web-based supervisory control and data-acquisition (SCADA) system. In: IEEE/PES Transmission and Distribution Conference and Exhibition, vol. 1, pp. 32–36. IEEE (2002) 14. Nazir, S., Patel, S., Patel, D.: Assessing and augmenting SCADA cyber security: a survey of techniques. Comput. Secur. 70, 436–454 (2017) 15. Cherdantseva, Y., Burnap, P., Blyth, A., Eden, P., Jones, K., Soulsby, H., Stoddart, K.: A review of cyber security risk assessment methods for SCADA systems. Comput. Secur. 56, 1–27 (2016) 16. Sajid, A., Abbas, H., Saleem, K.: Cloud-assisted IoT-based SCADA systems security: a review of the state of the art and future challenges. IEEE Access 4, 1375–1384 (2016) 17. Yang, T.C.: Networked control system: a brief survey. IEE Proc.-Control Theory Appl. 153 (4), 403–412 (2006) 18. Walsh, G.C., Ye, H.: Scheduling of networked control systems. IEEE Control Syst. Mag. 21 (1), 57–65 (2001) 19. Chen, T.H., Yeh, M.F.: State feedback control based networked control system design with differential evolution algorithm. Univ. J. Control Autom. 5(1), 12–17 (2017) 20. Goodwin, G.C., Haimovich, H., Quevedo, D.E., Welsh, J.S.: A moving horizon approach to networked control system design. IEEE Trans. Autom. Control 49(9), 1427–1445 (2004) 21. Guerra, T.M., Sala, A., Tanaka, K.: Fuzzy control turns 50: 10 years later. Fuzzy Sets Syst. 281, 168–182 (2015) 22. Aslam, M., Khan, N.: A new variable control chart using neutrosophic interval method-an application to automobile industry. J. Intell. Fuzzy Syst. 36(3), 2615–2623 (2019) 23. Kovacic, Z., Bogdan, S.: Fuzzy Controller Design: Theory and Applications. CRC Press, Boco Raton (2018) 24. Pan, Y., Yang, G.H.: Event-triggered fuzzy control for nonlinear networked control systems. Fuzzy Sets Syst. 329, 91–107 (2017)
136
B. Bordel et al.
25. Theorin, A., Bengtsson, K., Provost, J., Lieder, M., Johnsson, C., Lundholm, T., Lennartson, B.: An event-driven manufacturing information system architecture for Industry 4.0. Int. J. Prod. Res. 55(5), 1297–1311 (2017) 26. Golob, M., Bratina, B.: Web-based control and process automation education and Industry 4.0. Int. J. Eng. Educ. 34(4), 1199–1211 (2018) 27. MacDonald, I.L., Zucchini, W.: Hidden Markov models for discrete-valued time series. In: Handbook of Discrete-Valued Time Series, pp. 267–286 (2016) 28. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., Chen, T.: Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018)
Frameworks to Develop Secure Mobile Applications: A Systematic Literature Review Jezreel Mej´ıa1(B) , Perla Maciel1(B) , Mirna Mu˜ noz1(B) , 2(B) and Yadira Qui˜ nonez 1
Centro de Investigaci´ on en Matem´ aticas A.C., Parque Quantum, Ciudad del conocimiento Avenida Lasec andador Galileo Galilei manzana 3 lote 7, C.P. 98160, Zacatecas, Mexico {jmejia,perla.maciel,mirna.munoz}@cimat.com.mx 2 Universidad Aut´ onoma de Sinaloa, Mazatl´ an, Mexico [email protected] Abstract. Nowadays, Smartphones have become the most widely used communication technology because of their portability and relatively low cost. Its use has been increased and at least seven billion people (93% of the global population) have access to a mobile-cellular network, according to the ICT Facts and Figures report 2019. Therefore, mobile security plays an important role in protecting information for users. In this context is important to propose frameworks to develop secure mobile applications. To achieve this, a Systematic Literature Review was performed. The result of this protocol allowed to establish the state-of-the-art of secure mobile software development. The findings also encourage to the creation of a framework proposal in a future work for this research. Keywords: Mobile · Environment · Issues development · Standard · Methodology
1
· Model · Software
Introduction
In the last few years, the mobile networks have been on the rising, as well as the consumers of smartphones. According to the ICT Facts and Figures report 2019 [1], these networks are within reach for at least seven billion people (93% of the world’s population), where approximately 3.2 billion of them are smartphone users [2]. Moreover, these devices have become an essential part of a person everyday life, reshaping its interactions, needs, and way of living, which is reflected in the 194 billion of applications downloaded in the mobile platform stores in the year of 2018 worldwide [3]. Being an important part of a person life, these smartphones have access to personal information, managing sensitive data, defined as a wide quantity of information which embrace the personal life of a person [14]. Thus, the neccessity of securing this data has become a great area of improvement for everyone related c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 137–146, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_13
138
J. Mej´ıa et al.
to the development of mobile applications. In addition, in organizations, the use of apps has taken a great role, due to be a part of the business process. According to Symantec in 2018, one in 36 smartphones used in organizations had high-risk applications installed, so the need to ensure the security of the sensitive data it’s a priority [4]. To resolve the security issues is crucial to identify the vulnerabilities of each platform and use security standards to resolve them. As the semi-annual balance of mobile security 2019 by Giusto for welivesecurity by eset published [5], the Android Operating System (OS) got for a decrease in the number of vulnerabilities. Howerver, most of it are classified as critical and the malware variants are less that previous years. Nonetheless, this OS is still considered more appealing to attacks. On the other hand, for the iOS the quantity of vulnerabilities found in 2019 until September were 155, being an increase compared to last year. The good news are that most of it are not critical, and the malware variants remains lesser than on the Android OS. In a general context, the most common vulnerabilities that can be found on mobile applications are: back-doors configured, session variable capture, weak encryption, data transference without encryption (man in the middle), malicious APIs, permissions, access to application database, misconfigured settings and application component without restrictions [9,13,14]. Besides taking on consideration these vulnerabilities on each mobile platform, most of the applications security problems are introduced in the development process [11]. To resolve this, there have been various approaches by distinct authors on different areas, but for mobile Software development is not as worked as other areas. So, the purpose of this article is to presents the results of a systematic literature review to establish the state-of-the-art of secure mobile software development. To achieved this, a Systematic Literature review protocol was performed. The purpose of this Systematic Literature Review (SLR) is to obtain the most significant amount of recent studies related to the Mobiles Software development frameworks to avoid the more critical security issues. This article is organized as follows: Sect. 2 details the phases and activities indicated by the SLR. Section 3 presents the results obtained by the SLR after the analysis of the primary studies. Section 4 discuss the findings, while Sect. 5 covers conclusions and future work.
2
Systematic Literature Review
A Systematic Literature Review (SLR) is an investigation protocol proposed by B. Kitchenham that helps to identify and interpret all the information relevant available for a specific research of interest (being a question, a topic or an area). The SLR is based on three main phases: planning the review, conducting the review, and reporting the results [6]. To carry through the review two articles were taken as a baseline to do a SLR [7,8].
State of the Art of Secure Mobile Apps: A Systematic Literature Review
2.1
139
Review Planning
This is the first phase of SLR. In which the research objectives must be defined, as well as how the review will be completed. The activities included to fulfill this phase are: Identify the need to perform the SLR, define the research questions, create the search string and select the data sources. 2.1.1 Identify the Need to Perform the SLR The use of mobile devices has been on an exponential growth on the latest years, and with it the development of applications that fulfill a need for the users. These apps have access to sensitive information of the people and the mobile platforms have vulnerabilities that can be exploited by a malicious entity. So it is necessary to secure the personal information of the user in the development stage of the application that use this information. This SLR aim to find the methodologies, models, standards, security strategies and evaluation methods in which the mobile Software development frameworks are being worked on to avoid the more critical security issues. 2.1.2 Define the Research Questions The research questions defined are: (RQ1) Which methodologies or framework are being implemented in development of secure mobile applications?; (RQ2) Which models or standards are being implemented in development of secure mobile applications?; (RQ3) Which security strategies are being used in development of secure mobile applications?; and (RQ4) Which evaluation methods are being used in development of secure mobile applications? 2.1.3 Create the Search String After the research questions were established, a set of word was selected as keywords and where accommodated in Table 1, in which synonyms and terms associated to the keywords were considered. After the selection of the keywords, logical connectors (AND and OR) were used to create the search string. Table 1. Keywords and search string. Question Keywords
Synonyms or related terms
Search string
1
Agile, Framework
(Methodology OR Agile OR Framework) AND (Model OR Standard OR Process) AND (mobile applications development OR SDL) AND (Security Strategies OR OWASP OR SDS OR Common Criteria) AND (Evaluation Method)
1,2,3,4 2 3 4 a
Methodology Mobile applications development Standard Security Strategies
a
SDL
Model, Process OWASPb , Common Criteria
Evaluation Method Secure Development Lifecycle b Open Web Application Security Project.
140
J. Mej´ıa et al.
2.1.4 Select the Data Sources The list below contains the digital libraries which were chosen as sources to apply the defined string. All of them are in the software engineering area: ACM Digital Library. SpringerLink. IEEE Xplorer. 2.2
Conducting the Review
The next part of the SLR is conducting the review, which focuses on collecting a set of studies and select the primary studies. It includes the following activities: Establish the inclusion and exclusion criteria; Primary studies selection; Primary studies quality assurance and Data extraction. 2.2.1 Establish the Inclusion and Exclusion Criteria The first part of conducting the review consists of setting the inclusion and exclusion criteria, as can be seen on Table 2. Table 2. Inclusion and Exclusion Criteria Inclusion criteria
Exclusion criteria
Studies in English language Studies published between 2015 and 2019 Studies containing at least 3 keywords on title, abstract and keywords Studies that were published as an Article or Book Chapter
Studies repeated in more than one digital library Studies that doesn’t include any keyword
2.2.2 Primary Studies Selection The selection process was implemented following six steps: 1) Apply the search string, adapting it for each digital library. As a result of a second iteration the string underwent a change, the term Evaluation Method was excluded because of the lack of relevant result in regard of what was being searched. Moreover, in one of the digital libraries (SpringerLink) the same string was searched in two disciplines, Computer Science and Engineering; 2) Filter studies considering the first 2 inclusion criteria (language and year); 3) Apply the remaining criteria (keywords, type of publication and repeated studies); 4) Reading titles and abstracts to identify which studies are of relevance, 5) Select the primary studies; 6) Apply the quality criteria to ensure the quality of studies. The SLR results are seen on the Fig. 1, the implementation is shown with the result of studies obtained from the search string, which found 886 studies but only 7 satisfied all inclusion and exclusion criteria.
State of the Art of Secure Mobile Apps: A Systematic Literature Review
141
Fig. 1. Primary studies collected by selection process.
2.2.3 Primary Studies Quality Assurance To assure the quality of the studies a quality criteria was stated and applied in every primary study, as: 1) Is the study mainly focused on the Development of a secure mobile application with model or standard or methodology or framework or security strategy?; 2) The study had tools associated to the proposal? 2.2.4 Data Extraction The information obtained from the primary studies selected was registered in a spreadsheet editor (Excel), which contains the following data: id (personal control), author, title, year, country, keywords, data source, contributions, type of contribution and type of team if mentioned.
3
Results
This section contains the results of the SLR that were 7 primary studies (Table 3), with these we got the answers for the research questions. 3.1
Data Obtained
According to the 4 research questions, the next section describe how each question is answered by the primary studies indicated in the Table 3. RQ1: its goal is searched studies that implemented a methodology or framework to help in the development of secure mobile applications. The primary studies that propose a framework were the PS2, PS6 and PS7. The first have as a product a framework that is made with PIM(Platform Independent Model) and PDM(Platform Definition Model). The latest being the one were the security take place in the form of a Security Metamodel. The PS6
142
J. Mej´ıa et al. Table 3. Primary studies of SLR.
ID
Title of article
PS1 Privacy Vulnerability Analysis for Android Applications: A practical approach PS2 A Framework for supporting the context-aware mobile application development process PS3 Management System for Secure Mobile Application Development PS4 Model Driven Security in a Mobile Banking Application Context PS5 Process of Mobile Application Development from the Security Perspective PS6 Access Control Approach in Development of Mobile Applications PS7 The Study of Improvement and Risk Evaluation for Mobile Application Security Testing
Reference [9] [10] [11] [12] [13] [14] [15]
propose a framework named iSec to help in the implementation of the Secure Development Strategy(SDS) proposed by Poniszewska-Maranda et al. [14] also mentioned on PS5, and last the PS7 proposes a framework for risk assessment to test the Security on a Mobile Application after the development of it. Additionally, regarding to methodology the only article that qualify with a methodology was PS1 by Argudo et al. They propose a methodology to analyze the Security of an Application for the Android system before to implement improvements or launch the development process. In this article they follow four stages: know the device vulnerability in a system level, check the login methods and encryption used, look over the data generated in the transmission between devices and the analysis of the code of the mobile application. All this to find the vulnerabilities of an already working application, and then they assess the accuracy of it with a Case Study of a governmental app, in which the results were not favorable to this app. RQ2: It relates to find which standards or models are being used in the same field. The result gave that in the part of models there were a mention of the Model Drive Engineering on two articles, the PS2 and PS4. In the first one it is in a context-aware type of mobile application an use PDM (which is a Model) with a Security Metamodel, and the other use a topic of the model, Model Driven security. In the last, the design and development of a secure system is the main concern, so the authors take a security by design approach and use UMLSec and the Graphwalker tool to do so. In the PS5 and PS6 the proposed SDS consist of three models to approach the security of sensitive data being these Storage, Access and Transfer, and in the PS6 Poniszewska-Maranda et al., in the access approach they propose an Application-based Access Control model to cover the necessities of security in that aspect. In the part of standards the serie of ISO/IEC 27000 and 31000 were used for risk assessment in PS7, using it just to define the three main elements of the information security: Confidentiality, Integrity and Availability, also in the same article it use NIST SP800-163.164
State of the Art of Secure Mobile Apps: A Systematic Literature Review
143
and ITU-T YD/T 2407 to define a secure data access control and secure transmission protocol. All this was defined to be applied after the development of the application. RQ3: Its goal is to find strategies used in the development of secure mobile applications. In PS3 proposes a Security Baseline as a security strategy for the Mobile Platforms Android and iOS. The author constructed a Platform to help in the definition of the security and design requirements, and integrates a Security Management Module. The PS7 answers the RQ3 as well, because it use OWASP Top 10 Vulnerabilities, OWASP Mobile Security Guide and the CSA Mobile Application Security Testing that can be considered as Security Strategies and are used to define the application and system execution. In the last paragraph we mentioned SDS, the three pillars that integrates it, the Access Model and the framework to help implement it. Finally, in the RQ4 the Evaluation methods used to ensure the security of a mobile application were searched, and as a result two articles (PS1 and PS7) were identified. The PS7 talks about an evaluation that was oriented to risk assessment with the use of 46 test items that satisfied the specification priorities of mobile devices. On the other hand, the PS1 answer as well the question, in which the authors came up with four stages to carry out the analysis of a mobile application, already mentioned in the answer of the first question. After answering the questions, diverse aspects in common were identified in every primary study and are discussed in the next section.
4
Discussion
On each article, the authors focus on several issues and challenges to make every proposal (see Table 4 below). Therefore, five concerns were identified in common from the studies, and the solution proposed for them, it they had it, is described below. Authentication: this concern appears in more than a half of the primary studies, as in PS1 their proposal makes an analysis of the security in a mobile application and one of the test wants to find a way to capture authentication data, which is a dangerous vulnerability if a malicious entity get a hold of this information. So this proposal just check if its secure. In other articles, the Table 4. Common concerns in every article. Concern
PS1 PS2 PS3 PS4 PS5 PS6 PS7
Authentication • Authorization Data Storage Data Access Data Transfer •
• •
• •
• • •
• • • •
• • • •
• • • • •
144
J. Mej´ıa et al.
authors make a suggestion in what can be done to avoid this kind of security holes. In PS2, proposed by Steffanelo et al., with a Platform Definition Model (PDM). They insert a Security Metamodel that assure the authentication and the authorization using various elements to fulfill this, as in using Role of the user and Permissions to restrict the actions of the type of user. A similar approach was seen in the PS3, were Wei Guo designed a platform with a Module called Security Management Module with a Identity authentication component. In the PS4 a tool named Graphwalker helped with the design authentication work flows in the Model Driven Security. For the PS5, the proposal uses Digital Signatures for the authentication in the Data Transfer Security Model. In PS7 the Mobile Top10-2016 is used to analyze risk assessments regarding an insecure authentication to help to determine the identity authorization for sensitive data access. Authorization: In this concern in the PS2, they use the same Security Metamodel as mentioned above. The same happens in the PS3 the same Security Management Module resolve this concern with an Authority Management component; in the PS4 they used UMLSec as the modelling technique because this model can handle various issues than just focusing on the authorization, so it also resolve the encryption. Moreover, in the PS6, the authors proposed an Application-Based Access Control Model (mABAC), in which they take on consideration the permissions of an entity, and verifies their identity. Finally, the PS7 uses the Mobile Top 10-2016 to make analysis of risk assessments regarding an insecure authorization. Data Storage: To secure the sensitive data it is really important to take care of it since the storage, so some of the primary studies have a proposal to help with it. In the PS5 and PS6 gives a recurring advice of use of encryption for everything, and in this case to storage the data, but in the PS5 proposes the use of key-stores and File System, both encrypted. In the PS7, one of the dimensions considered was the data storage in which four items were specified, in these the encryption is involved. Data Access: For this concern, the PS4 uses specifically the methods calls to resolve it, because it adds a mechanism for controlling the access to the data. On PS7 they used the NIST SP800-163.164 and ITU-T YD/T 2407 standards for the data access security specification. The PS5 and PS6 proposes in the Data Access Security Model a Geolocation and Device Unique Identifier solution, in which the device would send it current location every time it requires access to sensitive data, and for the identifier it should always be paired with a Digital signature to assure the integrity of the data. In addition, in the article by Poniszewska-Maranda et al., an Application-based Access Control Model tackle specifically this concern and it has the basic aspects of access control: subjects, objects and methods. Data Transfer : This concern was handled in different way by every article, but there was a thing in common the use of encryption to ensure the confidentiality and integrity of the data. In the PS1, where an analysis of risk assessments is proposed, and does an analysis of the data generated in the transmission stage checking if the encryption methods are secure. The
State of the Art of Secure Mobile Apps: A Systematic Literature Review
145
PS7, has 14 test items for the transmission protocol and encryption strength dimension and use the NIST SP800-163.164 and ITU-T YD/T 2407 for the transmission protocol specifications. In the PS5 and PS6, the authors, proposes in the second one the encryption of data, check request integrity and sign request with encrypted signatures. On the other hand, the PS5 has a more specific solution, which is the use of Digital Signatures to ensure that the message received is as it was sent whit no modifications, and to secure these signatures there is a mechanism of Security Keys which should be sent to the server before the request of access to sensitive data.
5
Conclusions and Future Work
The main objective of this article was to establish the state-of-the-art of how the Mobile Software development frameworks are being worked on to avoid the more critical security issues. To obtain this, a Systematic Literature Review (SLR) was done. The SLR allowed to get more in detail of what are the contributions on the field and the search string was constructed to know specifically about the methodologies, models, standards, security strategies and evaluation methods. As a result, 5 main concerns were identified, from which two of them were taken in consideration in most of the studies, being these authentication and authorization. The most of them solved it implementing a tactic, as encryption and role permissions. The other concerns focus in the integrity and confidentiality of the sensitive data provided by the user, from there three types of data management(Storage, Access and Transfer) were taken in consideration and found in half of the articles, either in a direct or indirect way. Most of the proposals made use of models, like Model Driven approach, a model for each data management and an ApplicationBased Application Approach. Finally as future work, as findings after conducting the SLR, it is clear that the few articles that were found as primary studies allow to establish an area of opportunity to approach and work on a proposal for a framework, which would address the main aspects in the areas proposed in this article. Moreover, this framework must consider aspects of the Standard ISO/IEC 27001 that indicates the article by Huey-Yeh Lin et al., with strategies and frameworks identified in the primary studies.
References 1. ITU: Measuring digital development. https://www.itu.int/en/mediacentre/Docu ments/MediaRelations/ITUFactsandFigures2019-Embargoed5November1200CET. pdf. Accessed 22 Nov 2019 2. Newzoo: Number of smartphone users worldwide 2016-2021—Statista. https:// www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/. Accessed 22 Nov 2019 3. Annie, App, Forbes, Annual, number of mobile app downloads worldwide—Statista (2018). Accessed 22 Nov 2019
146
J. Mej´ıa et al.
4. Symantec: ISTR Internet Security Threat Report, vol. 24, no. February (2019). https://www.symantec.com/content/dam/symantec/docs/reports/ istr-24-2019-en.pdf. Accessed 22 Nov 2019 5. Giusto Bilic, D.: Semi-annual balance of mobile security—WeLiveSecurity (2019). https://www.welivesecurity.com/2019/09/05/balance-mobile-security-2019/. Accessed 22 Nov 2019 6. Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering - a systematic literature review. Inf. Softw. Technol. 51, 7–15 (2009) 7. Mej´ıa-Miranda, J., Melchor-Velasquez, R.E., Munoz-Mata, M.A.: Detecci´ on de Vulnerabilidades en Smartphones: Una Revisi´ on Sistem´ atica de la Literatura. In: Iberian Conference on Information Systems and Technologies, CISTI (2017) 8. Mej´ıa, J., I˜ niguez, F., Mu˜ noz, M.: Data analysis for software process improvement: ´ Correia, A., Adeli, H., Reis, L., a systematic literature review. In: Rocha, A., Costanzo, S. (eds.) Recent Advances in Information Systems and Technologies, WorldCIST 2017. Advances in Intelligent Systems and Computing, vol. 569, pp. 48–59. Springer, Cham (2017) 9. Argudo, A., L´ opez, G., S´ anchez, F.: Privacy vulnerability analysis for android applications: a practical approach. In: 4th International Conference on eDemocracy and eGovernment 2017, ICEDEG, pp. 256–260 (2017) 10. Stefanello, D., Lopes, D.: A framework for supporting the context-aware mobile application development process. In: Second International Conference on Internet of things, Data and Cloud Computing, ICC 2017, pp. 1–8. Association for Computing Machinery, New York (2017) 11. Guo, W.: Management system for secure mobile application development. In: ACM Turing Celebration Conference - China, ACM TURC 2019, pp. 1–4. Association for Computing Machinery, New York (2019) 12. S ¸ erafettin, S., Ya¸sark, H., So˘ gukpınar, I.: Model driven security in a mobile banking application context. In: 14th International Conference on Availability. Reliability and Security, ARES 2019, pp. 1–7. Association for Computing Machinery, New York (2019) 13. Majchrzycka, A., Poniszewska-Maranda, A.: Process of mobile application development from the security perspective. In: Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds.) Advances in Dependability Engineering of Complex Systems. DepCoS-RELCOMEX 2017. Advances in Intelligent Systems and Computing, vol. 582, pp. 277–286. Springer, Cham (2018) 14. Poniszewska-Maranda, A., Majchrzycka, A.: Access control approach in development of mobile applications. In: Younas, M., Awan, I., Kryvinska, N., Strauss, C., Thanh, D. (eds.) Mobile Web and Intelligent Information Systems, MobiWIS 2016. Lecture Notes in Computer Science, vol. 9847, pp. 149–162. Springer, Cham (2016) 15. Huey-Yeh, L., Hung-Chang, C., Yung-Chuan, S.: The study of improvement and risk evaluation for mobile application security testing. In: Peng, S.L., Wang, S.J., Balas, V., Zhao, M. (eds.) Security with Intelligent Computing and Big-data Services. SICBS 2017. Advances in Intelligent Systems and Computing, vol. 733, pp. 248–256. Springer, Cham (2018)
Social Media: People’s Salvation or Their Perdition? Yúmina Zêdo1, João Costa1, Viviana Andrade1, and Manuel Au-Yong-Oliveira1,2(&) 1
Department of Economics, Management, Industrial Engineering and Tourism, University of Aveiro, 3810-193 Aveiro, Portugal {yumina,joaomcosta,viviana.andrade,mao}@ua.pt 2 GOVCOPP, Aveiro, Portugal
Abstract. This article was written to study and understand how social networks have changed the way we perceive and create relationships. The exponential increase of social media (SM) has become not only convenient, but also a common habit. It has changed people’s habits and brought a new way of being since the technology era arrived. Although there are several advantages in using these social networks, the uncontrolled use and abuse can be alarming, thus the importance of this research. There are plenty of papers related to the impact of social media on adolescent behaviour and on young people’s academic performance. However, this paper aims to examine the facts and provides a comparison of the effect of social media use between different age sectors: adolescents (between 10 and 17 years old), young adults (18–30), adults (31– 45), older adults (46–65) and seniors (+66). In order to explore this issue, an online survey was developed. Using a sample of 358 participants, social media use was analysed by age sectors, focusing on the dependence, habits, the reasons/motivations for people to have these accounts, and how different generations perceive the future of social media. Keywords: Social Media (SM) Facebook
Adolescents Millennials Adults
1 Introduction Information technology (IT) has transformed society, affecting populations of all ages [1]. Mobile communications technology enabled users to form social networks that overcome barriers of time and geographic distance, and provide sociability, sense of belonging, and social identity [2]. Social media networks and their usages are increasing day by day. They have become the best media for interaction, discussing issues, sharing information and opinions in heterogeneous groups [3]. As cited in [4], according to the Cambridge English Dictionary, Social Media (SM) translates in “websites and computer programs that allow people to communicate and share information on the Internet.” However, the author describes SM as “involving social interaction among individuals online, including websites that support more interpersonal relationships (e.g. Facebook, Twitter), as well as those that supply more © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 147–157, 2020. https://doi.org/10.1007/978-3-030-45691-7_14
148
Y. Zêdo et al.
commercial content (online reviews)” [4]. Social networking sites (SNSs) have always become channels for people to express their identities [5]; besides, the SNSs have increased in popularity worldwide, since they help develop multi-functions such as communications, relationship construction, daily and professional activities, selfexpression, relaxation and information seeking [6]. “By far, the most popular SNS is Facebook, with more than 2.2 billion registered users (Facebook, 2018)” [6]. Nowadays, we live in the age of “now”, rather than “later” [7]. It seems like we can be connected to the world instantly, sharing things and meeting new people with just a click. As expected, the goal in life is to improve and keep on track, therefore, based on that, other online networks also appeared rather quickly, such as Instagram and WhatsApp. According to a survey conducted by the Pew Research Centre, 71% of young adults aged 18–24 years old in the USA use Instagram, and the mobile media platform was revealed to be the fastest-growing SNS around the world [5]. Another significant development in SM has been the use of mobile messaging applications, such as the market leader WhatsApp, used by more than one billion people worldwide [2]. The leader of SNS bought, as stated in [7], Instagram (acquired in 2012, for 1 billion USD) and WhatsApp (acquired in 2014, for approximately 19 billion USD). This may be a strategy that Facebook adopted to keep their leading position and take control of the most successful SM platforms that showed an overall growth tendency, in order to ensure that no other SM platform could compete with them. The human being is identified by the need for social interaction and self-disclosure (the exposing of personal information). As stated in [8], “Nowadays, social relationships are increasingly maintained via social media”, which means it is easier for people to find support on SM, keep friendships alive and gain prestige. The positive outcomes that SM provides are clear: social capital, psychological wellbeing, and employee engagement. Furthermore, some authors found that using Facebook can have a positive effect in maintaining and establishing social capital, but also civic participation, life satisfaction and social trust can be stimulated [9]. According to [8], studies show that the more people interacted on SM the more powerful the sensation of closeness. SM undoubtedly provides many advantages to users, but it also holds the other side of the coin. The exaggerated use of social networks can develop mental health problems, stress, and a decrease in performance [9]. However, there are some age sectors which are more prone to develop some of those issues. For instance, University Students due to their intense usage of the Internet, extensive free time and flexible schedules, are sustainable to develop problematic SM use [9]. As Voltaire, a French writer, historian, and philosopher once said, dosage is what determines the difference between a poison and an antidote. Related to this issue, social media can be people’s salvation or their perdition, depending on the amount of time they spend connected. The success of a healthy relationship with our online world depends on our time management skills and the deep desire to control it. The context of our study is to understand the general use of SM by different age segments. Although there are several research studies about the impact of SM use among University students or even among high school students, this paper focuses on the difference of behaviour among people of different ages.
Social Media: People’s Salvation or Their Perdition?
149
This paper also aims to respond to certain questions: Is the impact of SM use on college students equal as on older adults? What are the main differences between them? How does a teenager or an adult use SM, how important is it in their lives? What is the role of SM and its purpose?
2 Literature Review Social media represents the most popular platforms [3] that have built the bridge from a passive consumption of the internet to an active one. In this context, SM, as defined in [10], is a “group of internet-based applications that build on the ideological and technological foundations of Web 2.0 and allow the creation and exchange of usergenerated content”. Furthermore, SM is defined as a group of websites used primarily for social interaction- SNSs such as Facebook, instant messaging: e.g. WhatsApp, image sharing applications: e.g. Instagram [11], blogs, content-sharing sites: e.g. YouTube [3], video messaging apps: e.g. Snapchat, work-related channels: e.g. LinkedIn, dating apps: e.g. Tinder [9], and others. Social media is now present in our lives and routines, becoming a leader of communication, interacting with our way of life and even establishing relationships. In SM everything is out in the open; the leaders, Facebook, Instagram and WhatsApp allow people to practice self-disclosure, see how others live, post important moments, share ideas and gather information. At first, as cited in [1], modern adolescents between 12 and 18 years old are called “digital natives” and they have grown up in a digital era. For instance, young people give and receive social support in SM platforms, they built and trust relationships created in this forum. They also invest in their own autonomy, mental health stigma, and services which are inappropriate or inaccessible for them [12]. Social support refers to the social benefits (e.g. emotional, informational, and instrumental help) that people perceive, express, and gain from human interactions [1]. Therefore, it is crucial to understand the importance of SM in order to develop strategies that reinforce the connection between SM and the younger generation. In fact, the young ones have also been found to more easily discuss sensitive issues online, rather than face-to-face. Somehow, cyberspace may become the real world for some users, to a point that the most trustful and closest friends are the virtual ones, instead of the physical ones [12]. Adolescence is a complicated phase, where individuals start building their social identities and feel the necessity of belonging to a group. Studies agree that digital natives may feel lonely if they can no longer rely on the group they used to be a part of, while they are also struggling with social changes [1]; besides, online communication is an attractive environment for shy, anxious, or depressive young people [11]. Unfortunately, this fraction has the highest rates of mental health problems and distress compared to any other age group, and a fairly small percentage look for professional help; this scenario only paved the way for the digital natives to get online mental health support [12]. Secondly, we have the millennials (people born in or after 1982, and to the turn of the century), which also are part of the digital revolution, highly connected and technologically advanced, totally open to embrace new concepts and habits [7]. As
150
Y. Zêdo et al.
expected, in this segment, a considerable percentage (approximately 88–90%) of millennials use SM and a high percentage (between 85 and 99%) of students use Facebook, some spending approximately 9 h to 12 h each day online, instead of doing other things [13, 14]. Furthermore, [13] suggests that difficulties with emotion regulation predict media use, while scholars have found a relation between emotion regulation problems and the problematic use of SNSs, affirming that the use of Facebook could be negatively reinforced. Millennials have a special way of communicating, expressing their feelings and emotions through technology. Research suggests university students are more likely to have SM problems, compared with other users [9]; in fact, the time spent by students on Facebook and Instagram is rather high [3], and drawing students attention is a challenge, since they direct all their energy and focus to online platforms; therefore, to attract their attention in class using the same digital based tools could be the appropriate solution [7]. The challenge is to understand how SM’s practices affect the millennials, and how this generation can be taught and be prepared for the future. Some scholars support that SM can have a positive influence on their academic performance, as [14] stated the use of Facebook for education is fascinating for students, since SNSs enable self-disclosure and contribute to higher levels of learning and motivation; besides, it has a positive effect on their health and confidence, increasing happiness and probably reducing student depression. However, some authors defend that social network use can depress people [14]; for example, when practicing self-disclosure, they show a happy character, while inside they are experiencing bad emotions such as being devastated or angry [13]. Furthermore, “Facebook users dedicate more time to the platform and less to their studies, resulting in lower grades” [9]. Regarding this issue, it is mentioned that students with a high Facebook engagement are more active in university activities, but the use of this media can decrease student concentration in class [14]. The point is SM can be a powerful tool to disseminate important resources, connect teachers and students, and to straighten real-world relationships, while supporting the learning environment or harming it, depending on the student’s goals. “Students highlighted that besides social media use, time management is a factor which affects students negatively”; SM distracts students, directing their concentration to activities that do not add value to their academic life, such as chatting or watching videos that are not educational, so it is all about making choices [15]. Lastly, for the older adults, which are part of an era of low technology use, SM have also conquered a rather large space. As age-related physical and cognitive limitations create barriers in maintaining social activities with friends, family, and their community, SM can provide a means to overcome such obstacles for older adults [16]. In general, adults use less technology than teenagers or younger adults. According to [16], even though this generation has been the most likely to use SNSs, in other age groups there have been shown higher rates of growth in recent years. The research suggests that older adults who felt comfortable using SM were inclined to use Facebook; according to [16], they felt more connected to family than to colleagues – as adults tend to value relationships, this could be a perfect way to bond with friends and family, while the users tend to perceive less stress in real life regarding social interactions. The motivation for using SM differs, depending on the age sector; for instance,
Social Media: People’s Salvation or Their Perdition?
151
as stated in [17], social isolation and loneliness tend to grow with time, and to counter that, old people bet on an escape through SM. Remarkably enough, the literature suggests strong social networks help manage stress, reduce depression and improve health. Social networks can be empowering, since people get old and their network tends to decrease, and they allow older adults to feel less isolated, better informed and more socially connected [18]. As mentioned above, Facebook is a powerful tool to reinforce social networking, and it is famous in the adult layer. A survey made in [17] with a sample of people aged between 35 to 54 showed that there are three main reasons why adults love Facebook: interpersonal habitual entertainment, passing the time, and self-expression. The adults under the age of 50 were mainly online to stay in touch with friends, while adults over 50 valued being connected to family; although “older adults were less likely to use technology in general than younger adults, individuals who felt more confident in their ability to use computers were more likely to use technology than those who felt less confident” [16].
3 Methodology This paper resulted from a desire to understand how SM affect and influence people’s lives, and to a certain point to understand if they are advantageous or not. Are social media people’s salvation or their perdition? To achieve this aim, we adopted two methodological approaches: a quantitative and a qualitative one [11]. In the beginning, we had some informal conversations with our friends, colleagues, and teacher, in order to organize and filter our ideas and know exactly what we wanted to conclude about SM; then, a literature review was performed in order to provide a theoretical approach to the topic [19]. These topics were searched in scientific databases such as Scopus, Springer Link and Google Scholar using keywords/sentences related to the subject, such as “Social Media”, “Facebook”, “Millennials in SM”; afterwards, the abstracts that seemed to be connected to the paper’s topic were read, some of them were selected and downloaded, and we started to create an article sheet. A survey questionnaire based on the information gathered in the literature review was developed [19]. The group unanimously decided to collect primary data, in order to have real-life elements about how people perceive their use of SM; furthermore, we could underline some conclusions about the study. The questionnaire was performed, and we had 358 responses. The structure shows that from a total of 30 questions, the first 6 addressed demographic and personal information, such as age segment, gender, literacy qualification, nationality and country they currently have lived in for more than a year. The rest of the questions were based around their habits, opinions and practices regarding SM. The questions were organized following a multichoice structure, using rating scales, and short and “yes or no” answers. The questions were pre-tested and validated several times by our mentor and the workgroup before the final questionnaire was developed. It is important to assert that since the questionnaire was performed online, we could cover people from different parts of the world, and naturally, it was shared on SM channels, where, gratefully, it was well-received by the respondents. In fact, one of the authors of this paper is from
152
Y. Zêdo et al.
Mozambique, and, as we completely agree with [3], SM is expected to serve as a tool for overcoming existing geographic and social barriers. The questionnaire reached the target. We started to send it to our friends and family using online platforms (WhatsApp, Facebook and e-mail), who then started sharing it with other friends, and so the bubble came up and turned viral. It is important to affirm that this research could not have happened without SM versatility and functions which support that; even though the study was performed in Portugal, we still could have realistic data from Mozambique, Brazil, and other countries. Thanks to this digital era, countries, continents, and people are more connected. Therefore, what truly motivated us was the fact that we have found a lot of studies in this area, with different and important points of view, and we would like to contribute by providing an overview about SM presence in our lives from the perspective of people’s beliefs and habits. To achieve this goal, the results of the questionnaire are presented, starting with the respondents’ identity, such as age, occupation and gender. Then, specific points are shown in three main parts, the first related to the respondents’ behaviour in SM use, the second gives an insight into the opinions and practices by each age group, and the last is related to the free space for comments and personal opinions of the respondents.
4 Results and Discussion This study is based on a sample of 358 people, bundled in five main age sectors: 10–17, 18–30, 31–45, 46–65 and +66 years old (the last segment was ignored, bearing in mind that it only held one person), thus we could trace a pattern for the other generations. Unsurprisingly, more than 50% of the responses came from women, and we got more answers from students (51,1%), followed by 22.3% coming from employed people, 15.1% from student workers, 7.8% from independent workers, and the rest from retired and unemployed people. Most of the answers came from Mozambicans (234 out of 358) and the Portuguese (87 out of 358); additionally, 176 of the respondents live in Mozambique and 135 in Portugal, which are the nationalities of the authors of this article. The survey circled around people’s SM, and we also had data from Angola, Brazil, and other countries that were not considered, as the percentages did not show to be significant. The 30 questions were designed to answer and to draw a pattern for people’s SM uses, which are shown in the following topics. 4.1
Human Behaviour
Dependence: 54.7% of the respondents consider themselves as moderated users, and 33% said they were always online. Furthermore, 74,9% stated they stay offline only because they have tasks to do, 15.1% are mostly offline, and 10.1% are always online and cannot miss anything. In addition, 49.7% of the respondents disagreed when they were asked to answer if SM brings benefits to their lives, 21,5% did not agree or disagree and 21,5% agreed. A closer look: 198 people have SM because it is pleasant to see what other people are doing and what is happening around them; 131 people only use it for academic and
Social Media: People’s Salvation or Their Perdition?
153
professional needs; 17 do not identify themselves with SM; and 12 said they are totally dependent on it. About 72,3% of the respondents have their phones on silence mode in classes or at work, 21.8% have the ringtone on, and 66,5% of them do not interrupt their tasks when the phone rings, while 33,5% stop doing whatever they are doing to check the notifications. Motivation: 281 people use SM to keep updated about what is happening in the world, 54 to not feel lonely, 54 to share happy moments in their lives, 45 just because everyone uses it, and 31 because of their addiction to it. The future: when the respondents were invited to imagine themselves in the next 5 years, 85.5% of them said they would be moderated SM users, 9.5% independent, and 5% completely dependent. Additionally, following the hypothesis: “How do you see the future of SM? For instance, Facebook was the most used SM platform; today, it is less popular with the younger generation. Do you believe this could be a tendency and we could go back to a traditional approach, since SM use could decrease to a stage where people could be fed up and abandon it?”, 108 people completely disagreed with this possibility, 99 did not agree or disagree and 74 disagreed. 4.2
Age Sector Opinions
Table 1 shows the percentages of the respondents’ opinions divided by the four age groups analysed, followed by conclusions. From Table 1 we can see that, for the four age sectors, we have different points of view about SM; below, we would like to remark on some important points. For instance, adolescents use more than four SM channels (33%), followed by the young adults (20%), while adults and older adults commonly use between two and four SM platforms. Question number 2 confirms what the literature shows; even though Facebook is the market leader, only 1.1% of adolescents and 14,7% of young adults use it, it being the favourite for the oldest generations (20.4% of adults, 26.7% of older adults). Unsurprisingly, when we asked the participants if they were afraid to feel excluded if they did not use any SM, 58% of the respondents between 46–65 years old completely disagreed – for them, the fear of missing out is not that strong. The opposite is for the adolescents, who completely agree that they are afraid to be set aside, as they feel a deep need to belong and strive to be accepted by the world. Additionally, question 4 shows that 45% of adolescents believe it is easier to make new friends online, instead of face-to-face. According to question 5, 73% of young adults do not believe in a world without SM, and, surprisingly, neither do 62% of older adults. Despite adolescents being part of the digital era, 79% of them believe in a world without SM, which is contradictory, since they are the largest SM user group. In the last question, 58% of the people (46–65 years old) responded they do not post daily because they prefer to keep their lives private, while the majority percentage of the others are more open-minded and share what they find interesting (55% - adolescents, 66% - young adults, 50% - adults); all the respondents stated that they do not like to share their routines or feelings, which may be because they are worried about their privacy and security.
154
Y. Zêdo et al. Table 1. Percentages of answers in different age sectors
1 - How many social media platforms do you use? 2 - On which social media platform do you spend more time? 3 - Are you afraid of feeling excluded if you don’t use any social media?
4 - When you need to make new friends, is it easier for you to do it in real life, than online? 5 - Do you believe it’s possible to live in a world without social media? 6 – Do you post daily or share your day with your friends?
Responses
10–17 (Adolescents) 57% 33% 10%
18–30 (Young Adults) 74% 20% 6%
2 to 4 More than 4 One
73% 9% 17%
46–65 (Older Adults) 78% – 22%
Facebook Instagram WhatsApp
1% 30% 35%
15% 27% 32%
20% 6% 52%
27% 2% 49%
Completely disagree Disagree Neither agree or disagree (NAD) Agree Completely agree Completely disagree Disagree NAD Agree Completely agree Yes No
43%
45%
45%
58%
19% 24%
20% 23%
27% 20%
19% 19%
7% 7% 45%
7% 5% 54%
5% 3% 69%
4% – 77%
29% 12% 5% 10% 79% 21%
20% 13% 5% 8% 27% 73%
14% 13% 5% –% 61% 39%
12% 12% –% –% 39% 62%
45%
28%
47%
58%
55%
66%
50%
38%
–
5%
3%
4%
No, I prefer to keep my life private More or less; I share what I find interesting Yes, I like to share with my followers what’s happening; it is a way to feel closer to them
31–45 (Adults)
Social Media: People’s Salvation or Their Perdition?
4.3
155
Comments About Social Media in the Four Age Sectors
Finally, in this section we offered the possibility to the respondents to express their thoughts about SM and to think about it, thus obtaining different and interesting perspectives on the issue, from different respondent age sectors (please see Table 2). SM is a tool that has a reach across all age groups in our sample. Table 2. Comments about social media in the four age sectors Age 10– 17 18– 30
31– 45
46– 65
Opinions from the participants “For me, social networks have a big positive as well as negative impact, but nevertheless, I use them to be able to better inform myself and have fun when I have no other occupation” “Although I am constantly online (contactable), I don’t consider myself addicted to social media. I use them mostly to catch up with the world (as I watch little television and don’t read newspapers and feel the need to stay informed) and to exchange messages with colleagues and friends, or for professional or personal reasons” “I use it to keep up with news from around the world, to do research, talk to some family members and friends I don’t have the opportunity to see regularly, and to consult with parents and guardians of my children in a special group. But I do not post my daily moments constantly for fear of my privacy and security being at risk” “Social media is an important tool to establish contact with my family and friends that live far away. It allows us to belong to a work or a family group”
5 Conclusion and Suggestions for Future Research As defined in [7], times have changed, in large part due to new technologies which have appeared, altering how we communicate and relate to each other. As cited in [20], people previously used to spend their time on work, study and other obligatory activities, but today they are taken by storm by the temptation of SM. With this study, the authors tried to analyse the SM usage by age sectors, focusing on the dependence, the reasons/motivations for people to have social media accounts, and how different generations perceive the future of social media. In terms of media choice patterns, the authors of this article can conclude from the survey carried out, for all the age sectors, that there was a transversal predominance of people (260) admitting they use between 2 and 4 SM platforms, 65 people who use more than 4, and 33 who use only one. Currently, most of the respondents consider themselves moderate users (54,7%), followed by the ones who recognize they are online all the time (33%). A clear generation gap was also noticed between the ones under 30 and the ones over 30 years old. Even though all the respondents say they spend most of their time on WhatsApp, for the age sector 10–17 time on WhatsApp is very close to the time spent on Instagram, which also happens with the age sector 18-30. However, for those between 31–45 and 46–65, the time spent on WhatsApp is followed by the time spent on Facebook.
156
Y. Zêdo et al.
Furthermore, the data shows that 49% of the respondents neither agree or disagree that they are addicted to SM, which allows us to conclude that people are not aware about their use of SM, they have not noticed how much time they spend online, and 74.9% said that they are offline only because they have tasks to do – if it were not for that, they would probably be scrolling their touch screens. Undoubtedly, SM has both positive and negative outcomes, and to answer the question “Social Media: People’s salvation or their perdition?” is quite hard; the questionnaire and literature shows SM as a reality now, and it is up to the user to make it either their salvation or their perdition. Social media is important; it came with the aim of making our lives easier, more pleasant and much more well-informed, but it also holds the ugly side of the coin, their weaknesses being addiction, issues regarding health problems, security and wellbeing. The authors of this study strongly believe that with SM having gained so much importance lately, future research should be conducted in order to better understand how advantageous or disadvantageous it can be for each and every one of us. Research in this area has focused mostly on young adults, adults and adolescents, since they are the real digital natives. However, there is a lack of research that investigates the use of Facebook and social networking by older audiences [19], probably because fewer older adults use the Internet than their younger counterparts; therefore, studies concerning the influence of SM for adults over 50 are limited. Furthermore, how SM influences the new generation, how they should take its good parts and be aware about the risks, that is also a field that needs our attention. For the older generations such as adults, older adults and seniors, there should also be studies on how SM can contribute to their professional, social and personal lives, as well as providing solutions to how they can ensure their privacy and security while being online. Finally, we deeply hope this paper can be useful for future work in this field. Acknowledgements. We could not have conducted this research without our survey respondents, to whom we would like to address a special thank-you for their time, suggestions and collaboration, and everything these 358 people have done to contribute to this paper.
References 1. Gentina, E., Chen, R.: Digital natives coping with loneliness: facebook or face-to-face? Inf. Manag. 56, 103–138 (2019) 2. Tyrer, C.: Beyond social chit chat? Analysing the social practice of a mobile messaging service on a higher education teacher development course. Tyer Int. J. Educ. (2019) 3. Singh, A.: Mining of Social Media data of University students. Educ. Inf. Technol. 22, 1515–1526 (2017) 4. Schlosser, A.E.: Self-disclosure versus self-presentation on social media. Curr. Opin. Psychol. 31, 1–6 (2020). John, L., Tamir, D., Slepian, M. (eds.) 5. Wartberg, L., Kriston, L., Thomasius, R.: Internet gaming disorder and problematic social media use in a representative sample of German adolescents: prevalence estimates, comorbid depressive symptoms and related psychosocial aspects. Comput. Hum. Behav. 103, 31–36 (2020)
Social Media: People’s Salvation or Their Perdition?
157
6. Rosenberg, D., Mano, R., Mesch, G.S.: Absolute monopoly, areas of control or democracy? Examining gender differences in health participation on social media. Comput. Hum. Behav. 102, 166–171 (2020) 7. Au-Yong-Oliveira, M., Gonçalves, R., Martins, J., Branco, F.: The social impact of technology on millennials and consequences for higher education and leadership. Telematics Inform. 35, 954–963 (2018) 8. Scha, J., Kra, N.C.: Mastering the challenge of balancing self-disclosure and privacy in social media. Curr. Opin. Pyschol. 31, 67–71 (2020) 9. Whelan, E., Islam, A.K.M.N., Brooks, S.: Applying the SOBC paradigm to explain how social media overload affects academic performance. Comput. Educ. 143, 103692 (2020) 10. Chen, Z., Yuan, M.: Psychology of word of mouth marketing. Curr. Opin. Psychol. 31, 7–10 (2020) 11. Mccrae, N., Gettings, S., Purssell, E.: Social media and depressive symptoms in childhood and adolescence: a systematic review. Adolesc. Res. Rev. 2, 315–330 (2017) 12. Gibson, K., Trnka, S.: Young people’s priorities for support on social media: “It takes trust to talk about these issues”. Comput. Hum. Behav. 102, 238–247 (2020) 13. Rasmussen, E.E., Punyanunt-carter, N., Lafreniere, J.R., Norman, M.S., Kimball, T.G.: The serially mediated relationship between emerging adults social media use and mental wellbeing. Comput. Hum. Behav. 102, 206–213 (2020) 14. Kaya, T., Bicen, H.: The effects of social media on students’ behaviors; facebook as a case study. Comput. Hum. Behav. 59, 374–379 (2016) 15. Alwagait, E., Shahzad, B., Alim, S.: Impact of social media usage on students academic performance in Saudi Arabia. Comput. Hum. Behav. 51, 1092–1097 (2015) 16. Bell, C., Fausset, C., Farmer, S., Nyugen, J., Harley, L., Fain, W.B.: Examining social media use among older adults, pp. 158–163 (2013) 17. Hämmig, O.: Health risks associated with social isolation in general and in young, middle and old age. PLoS One 14, 0219663 (2019) 18. Hogeboom, D.L., McDermott, R.J, Perrin, K.M., Osman, H., Bell-Ellison, B.A.: Internet use and social networking among middle aged and older (2010) 19. Ribeiro, B., Gonçalves, C., Pereira F., Pereira, G., Santos, J., Gonçalves, R., Au-YongOliveira, M.: Digital bubbles: living in accordance with personalized seclusions and their effect on critical thinking. In: WorldCIST’. AISC, vol. 932, pp. 463–471 (2019) 20. Du, J., Koningsbruggen, G., Kerkhof, P.: Spontaneous approach reaction towards social media cues. Comput. Hum. Behav. 103, 101–108 (2020)
Artificial Intelligence Applied to Digital Marketing Tiago Ribeiro1 and José Luís Reis1,2(&) 1
IPAM, Portuguese Institute of Marketing, Porto, Portugal [email protected] 2 ISMAI, Maia University Institute, Research Units UNICES/CEDTUR/CETRAD, Maia, Portugal [email protected]
Abstract. Based on the theory that both manual and cognitive tasks can be replaced by Artificial Intelligence, this study explores, using a qualitative research method, the impact of Artificial Intelligence (AI) in Digital Marketing. An analysis of interviews with 15 experts from different industries related to Marketing and AI shows that AI have impact in Marketing processes and the impact will be bigger in the future. The study reinforces that many of the manual and repetitive tasks of a marketer’s life can already be replaced by AI, and the use of machines working together with humans are the key to better marketing results. The challenges and ethical aspects that lead to a slow or non-adoption of AI have been addressed, and one of the major obstacles is that humans aren’t yet confident in technology and, they are not yet ready for this cultural change. Based on these findings, business decision-makers and managers need to prepare their companies and employees for the implementation of AI in Marketing. Keywords: Artificial Intelligence learning Integration
Marketing Digital Marketing Machine
1 Introduction Artificial Intelligence is integrated into our lives, although many people are unaware of its presence. This misconception is evident from the fact that only 50% of responses from the PRNewswire (2018) consumer awareness study state that they have never interacted with AI technologies and 23% are unsure whether they have ever interacted with AI. technology. There are many examples of AI that operate in the background of most modern technologies (smartphones, computers, smart TV’s, etc.) revealing an apparent lack of knowledge about what consumers think AI is and how AI is applied daily [1]. This paper presents the results of an exploratory study with a quantitative methodology, based on 15 interviews with specialists, which provided a better understanding of the impact of AI on digital marketing. The article presents the main aspects related with Artificial Intelligence and Digital Marketing, the used methodology, the analysis and discussion of results of the research and finally the study conclusions.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 158–169, 2020. https://doi.org/10.1007/978-3-030-45691-7_15
Artificial Intelligence Applied to Digital Marketing
159
2 Artificial Intelligence and the Digital Marketing AI is present in the daily lives of people and businesses, an example of which are voice recognition, image recognition and handwriting suggestions available on today’s smartphones [2]. Kietzmann, Paschen and Treen (2018) report that in order to deepen understanding of consumer decision-making, there are very useful AI systems for marketers [3], of which the following points should be highlighted. 2.1
Artificial Intelligence
According to Russell and Norvig (2016), Artificial Intelligence is computerized systems that capture data to perform tasks of intelligent beings in order to maximize their chances of success [4]. Strong AI (Artificial General Intelligence) is a machine with consciousness and mind, and this machine has intelligence in more than one specific area. Weak AI (Narrow AI) focuses on specific tasks (autonomous cars derive from Narrow AI) [5]. In addition, there are authors who hypothesize that computers may be better or smarter than humans, so there would be a new AI term, called Artificial Super Intelligence, but right now it’s hypothetical [6]. According to Rosenberg (2018), based on the Constellation study, looking at investment in all sectors of the market, there will be an investment of over 100 billion euros per year in Artificial Intelligence in 2025, while in 2015 only 2 billion was spent. The Marketing industry will be no exception and there will be increasing investment in AI [7]. From McKinsey & Company’s analysis of more than 400 AI use cases in 19 industries and 9 business functions, the authors Chui, et al. (2018) found that the greatest impact on the potential value of AI use is in marketing and sales, supply chain management and production. Consumer industries, such as retail and high tech, tend to see more potential in AI applications in marketing and sales because frequent, digital interactions between companies and customers generate larger datasets for AI techniques. E-commerce platforms can benefit from AI because of the ease with which these platforms collect customer information, such as click data or time spent on a website page, and can customize promotions, pricing, and products for each customer. dynamically and in real time. The study uses cases that using customer data to customize promotions, for example, using individual offer personalization every day, can lead to a substantial increase in sales [8]. 2.2
Natural Language Processing – NLP
Natural Language Processing (NLP) enables AI systems to analyze the nuances of human language to gain meaning, among others, from blog entries, product reviews, billions of daily tweets, Facebook posts, etc. Swedbank, a Swedish bank, uses a virtual assistant with NLP to answer customer queries on its website’s home page, allowing customer service employees to focus more on sales without sacrificing service [3].
160
2.3
T. Ribeiro and J. L. Reis
Image and Voice Recognition
Image recognition helps marketers understand images and videos that people share on social networks and “show” consumer behavior. Consumers identify details about the offerings pictured in the image, and marketers benefit from the details of contextual consumption. Selfies reveal the marks used, even when not explicitly mentioned in the publication, and the personal details of users. When a celebrity shares a photo about an unidentified product, image recognition recognizes both the product and a potential social media influencer [9]. San Diego-based Cloverleaf uses image recognition on its smart shelf display platform. Equipped with optical sensors, the display collects customer demographics such as age and gender and analyzes shoppers’ faces to gauge their emotional reaction to the product. The closer consumers are, the more personalized the content will be [3, 9]. Speech recognition allows AI to analyze the meaning of the words reproduced. Sayint, a call center service provider, uses voice recognition to monitor and analyze customer calls. Technology helps Sayint understand customer needs, improve caller performance, and increase customer satisfaction and Artificial Intelligence in Business gets real [10]. 2.4
Problem Solving and Reasoning
Marketers implement AI to understand hidden insights into consumer-generated content, narrowly defining the problem they want to solve and how they will approach data analysis. These core processes generate pattern detection in the data, improving the ability to predict future behavior. Marketers may want to segment their market based on the varying psychography of their customer base, possibly to determine who their “best” customers are and why those customers would buy their offers against competitors. The personality traits that are important in people’s lives eventually become part of their language [10]. AI can “reason” with comments and posts on people’s social networks, and can reveal personality trends, values and needs. AI-based profiles derived from consumer analysis may be relevant to future marketing decisions. North Face, using IBM Watson, uses AI to determine which jackets consumers may be interested in, based on available data. The system begins by asking where, when, and what activities the consumer will be wearing the jacket and based on the weather forecast for that location and the wearer’s gender, narrows the search to six options. Based on activity, rearranges alternatives from “high match” to “low match”. This will save the wearer time by avoiding hundreds of jacket options, many of which would not even meet your functional needs. This is a way to increase the quality of the customer experience throughout its decision-making journey [3, 10]. 2.5
Machine Learning
Machine learning is a subcategory of AI that uses computer programs to learn and improve throughout experiments, processing huge amounts of data. It is the fastest form of AI and is the primary source in the AI industry for marketers. By detecting patterns in data, machine learning systems can “reason” and propose the best options
Artificial Intelligence Applied to Digital Marketing
161
for the stated consumer needs, more efficiently than humans. In addition, the system remembers everything that was previously calculated, storing all memories in a knowledge base and uses machine learning to learn from your previous experiences and problem solving (Big Data). The more unstructured data a machine learning system processes, the smarter and more insightful the subsequent positive results for marketers. Just as a bank without a database cannot compete with one in which they are present, a company without a machine learning (AI subcategory) cannot keep up with another that makes use of it. While experts in the former write thousands of rules to predict what customers want, second algorithms learn billions of rules, an entire set of them for each customer. Machine learning is a new and bold technology, but this is not why companies adopt it, but because they have no choice in relation to the benefits that technology offers [11]. Marketers use machine learning to monitor consumer behavior. Develop algorithms for discovering websites visited, open emails, downloads, clicks, etc. They can also analyze how the user behaves across channels, which accounts they follow, posts they like, ads they interact with, etc. [12]. Depending on studies and industry, acquiring a new consumer is between 5 to 25 times more expensive than maintaining an existing one. Because you don’t waste time and resources looking for a new customer, the focus is simply on keeping the existing customer satisfied. Machine learning through predictive models can help predict Customer Lifetime Value (CLV) and through clustering models make targeting more accurate, fast and effective. CLV is the value of all a customer’s interactions with the company over time. By focusing on CLV, brands attract more important customers, encourage continued engagement, and increase audience retention. By analyzing patterns and learning from data about past consumer behavior, machine learning can predict the future value of a customer. It is a system that can make predictions, for example predicting consumer retention rates. Consumer retention rate is the metric that measures the percentage of consumers who break up with a business over a certain period, or how long a user spends on a landing page. Machine learning gives the marketer the information foreseeing possible customer abandonment, so marketers can use strategies to keep them interested in the brand [13].
3 Methodology This work is an exploratory and descriptive study on a specific theme. The methodology that supports the research is qualitative and, above all, descriptive. Based on the context of the AI tools applied in marketing, presented in the previous points, this study made an analysis focused on the perspective of the people who work with AI, although consumers always assume themselves as central and structuring figures in research, due to their constant relationship with them [14]. As this is an exploratory and descriptive study, intend to understand the strategies of companies that use AI, the benefits, the challenges presented, the ethical issues and to understand the impact that these practices are having on companies’ income. It is considered relevant to understand which elements are considered essential for the successful implementation of an AI strategy in Marketing, as this research aims to be a
162
T. Ribeiro and J. L. Reis
contribution to companies and a supporting document in the implementation of a successful AI strategy in marketing. The first part of the study provided the theoretical underpinnings based on secondary information from scholarly articles, journals, reports and books. In the second part, the primary data collection was performed to be analyzed together with the theoretical bases. 3.1
Research Objectives
The research developed allowed to understand the strategies of companies that use AI, the benefits, the challenges presented, the ethical issues and to understand the impact that these practices are having on companies’ income. It was considered relevant to understand which elements are considered essential for the successful implementation of an AI strategy in Marketing. This research work is a contribution for companies to support the implementation of a successful AI strategy in Marketing. 3.2
General and Specific Objectives
The purpose of this study is to understand the current situation of Artificial Intelligence in Marketing, analyzing how AI currently impacts Marketing and the impact it will have in the future. The specific objectives of this work are as follows: – Identify the key benefits of implementing AI in Marketing. – Understand the key challenges and ethical aspects of integrating AI in Marketing. – Assess how companies are using AI in Marketing and what are the uses of AI applications and what problems they solve. – Check if Small and Medium Enterprises (SMEs) are able to integrate AI into Marketing. – Understand the impact AI has on marketing today, and what it will have in the future. 3.3
Interview Data Collection
To collect primary data to meet the objectives, interviews were conducted as a qualitative study method. For semi-structured interviews, the interview script was not rigid, and the answers were open. The questions asked were based on the knowledge obtained during the literature review. The choice of specialists was made through contacts via LinkedIn or by contacting companies directly. In the profiles of respondents there are computer science professionals, data scientists, consultants and marketers. Notes were taken during the conversations with the experts and were extracted and summarized the essential content, and then analyzed according to the research objectives. The evaluation and discussion of the results was guided by the research questions defined and the literature review. Table 1 show the specialists profile, with information about their country of origin, their professional area, organization and their acronym.
Artificial Intelligence Applied to Digital Marketing
163
Table 1. Experts interviewed Name Mark Floisand Stephanie Ogando Peter Mahoney Paul Rotzer
Country UK
Prof. area and organization Chief Marketing Officer at Coveo
Acronym Exp 1
Brazil
Marketing Analyst at Alfonsin
Exp 2
USA
Marketing Intelligence Consultant at Plannuh
Exp 3
USA
Exp 4
Bernardo Nunes Katie King Christopher Penn Jim Sterne
Brazil
AI Marketing Author/Consultant at Marketing Artificial Intelligence Institute Data Scientist/AI Marketing Consultant at Growth Tribe AI Marketing Author/Consultant Data Scientist/Digital Marketer at Trust Insights
Exp 8
Patricia Lorenzino Nuno Teixeira Alex Mari Sergio Lopez Kevin Kuhn Alexander Avanth Tilak Shrivastava
Brazil
AI Marketing Author/Researcher at Digital Analytics Association Head of Strategic Alliances - IA at IBM
UK USA USA
Portugal
Exp 5 Exp 6 Exp 7
Exp 9
University Professor/AI & BI Consultant at ISCTE-IUL Switzerland Researcher/Consultant at University of Zurich Bolivia AI Marketing/Marketer Consultant at AIMA
Exp 10
Switzerland AI Marketing Consultant at Jaywalker Digital Philippines Director of Innovation at PTC Holdings
Exp 13 Exp 14
India
Exp 15
Senior Marketing Manager at Ityx Solutions and ThinkOwl
Exp 11 Exp 12
4 Analysis and Discussion of Results After collecting data obtained from the interviews with the specialists, data were described and analyzed. The analysis was structured according to the research objectives. First, the benefits of integrating AI into Marketing will be cited by respondents and compared with data gathered from the literature review. Next, all factors that influence the slow integration or non-integration of AI in Marketing will be collected and described. It will then show how companies are using AI in their marketing strategies, and whether SMEs are able to integrate AI into their Marketing processes and finally will be made an analyze of the impact that AI has on marketing costs and revenues.
164
4.1
T. Ribeiro and J. L. Reis
Benefits of AI Integration in Marketing
The main expected benefits will be lower costs and higher revenues. AI delivers benefits on acceleration, faster results, accuracy, better results and relief, reducing tasks that it is not essential for people to do more because it is not a good use of their time (Exp 2, Exp 4, Exp 7). Machines can identify and solve certain problems faster than humans. Machines can do better and on a much larger scale. A human can try to read 10 000 social networking posts in five minutes, but certainly won’t do it. The machine can reduce and remove repetitive or unimportant tasks from marketers’ lives, for example, a report by a marketer that would last about eight hours can be done by a machine in eight minutes. This way you can reduce repetitive task costs and direct marketers to tasks that are more about creativity, strategy, and decision making (Exp 4, Exp 7, Exp 8). AI’s main advantages in Marketing are: sales development through customization, greater process effectiveness and greater efficiency in marketing investment allocation. Marketers do not need to focus on segmentation, behavioral analysis, consumer journeys. AI will “filter out” huge volumes of data and feed insights that can effectively make a difference to the business (Exp 5, Exp 9, Exp 10, Exp 13). AI’s integration into marketing produces benefits for consumers (relevance, convenience, consumer experience) and enterprise/marketers (predicting consumer behavior, anticipating consumer trends, hyper-personalizing content). At the operational level, AI offers the opportunity, through process automation and optimization, to increase the efficiency and effectiveness of company strategy and the quality of work of people (Exp11). AI enables the marketing team to deliver a personalized user experience without being overly intrusive. Artificial Intelligence already enables marketers to optimize websites by customizing them for different users, for example by offering them personalized messages and distinctive designs based on their profile and needs. AI will enable organizations across all industries the ability to rebuild personal relationships with their customers. Data provides powerful insight into customers’ current needs as well as valuable information about their future needs (Exp 6, Exp 14). 4.2
Challenges and Ethical Aspects of Integrating AI in Marketing
With all the benefits that come through AI, questions and problems also arise. In recent years, according to respondents, marketers have wondered how marketing can deliver value without being too intrusive (externally) and how marketing can reshape and empower people within companies (internally) to work in this logic. A successful AI strategy can only be effective when there is strong technical (technology, data, processes) and organizational (people, capacity, culture) technical capability. Failure to do so may result in poor performance, even if the company is working with partner companies in some of its AI activities (Expert 10, Expert 11, Expert 12). The first aspect mentioned in the interviews, and one mentioned by virtually all respondents, is trust. Citizens must understand the value of the data they generate (digital footprints) and understand what brands can do with these digital footprints. AI is a relatively new technology and is complex, meaning that the general public (and even technical employees who are unaware of AI) may suspect it exits. Consumers
Artificial Intelligence Applied to Digital Marketing
165
need to be aware of how companies and governments acquire and use data to determine user behavior, such as purchases, recommendations, and voting decisions. Ethics and digital privacy (General Data Protection Regulations - GDPR) are a concern of individuals, organizations and governments. People will be increasingly concerned about how their personal information will be used by organizations in the public and private sectors. For there to be confidence in technology, companies will need to proactively address these issues. Transparency can do much to increase consumer confidence in AI. By explaining how Artificial Intelligence algorithms use customer data to make their decisions (when, how, and where the customer provided that data), it helps to build confidence (Exp 1, Exp 3, Exp 4, Exp 5, Exp 7, Exp 9, Exp 10, Exp 11, Exp 12, Exp 13, Exp 14, Exp 15). Another of the most mentioned aspects is data quality and what companies do with data. Many companies have no idea where data is generated and what they can do with data, they do not have data that has a unique view of customers and is properly validated and sanctioned by the company. For AI to be successful, it requires large data sets. However, most large companies have a lot of data locked in various marketing systems they already use. The key is to be able to connect to systems, use this data and unify it - since data is unified around individual customer profiles, AI can tailor campaigns and marketing experiences specifically for everyone (Expert). 3, Expert 7, Expert 10, Expert 12, Expert 13, Expert 14). 4.3
AI Applications in Marketing
The largest use of AI in Marketing is through machine learning. In the old days the brute force of computational power was used, all movements had to be defined, but with the use of machine learning the algorithms learn for them. Machine learning is an important underlying AI technology that is used to create models that can identify patterns in complex data sets. Marketing is more about personalizing content, the best techniques are based on it (Exp 3, Exp 4, Exp 5, Exp 7, Exp 11). Analyzing the respondents’ answers, the main uses of AI are the predictive models, clustering and recommendation systems. Predictive models are used to predict and anticipate consumer movements and behaviors along the stages of the customer journey, to lower dropout rates, identify factors of customer dissatisfaction, manage best customers, and prioritize business. Clustering models use unsupervised algorithms to do segmentation, that is, they calculate how much one client looks like another and put them in the same cluster if there are similarities. These models improve the customer attraction process - they automate the process, identify audiences and similar targets, and enable marketing to spend to be optimized by segmenting, predicting, and identifying segments more efficiently. They are used to perform more accurate, fast and effective segmentation and targeting. These are the must-have models, meaning that virtually every company should have today (Exp 5, Exp 7, Exp 10, Exp 15).
166
4.4
T. Ribeiro and J. L. Reis
Capacity of SMEs to Integrate AI in Marketing
There are two possibilities, companies can choose to develop and run their own AI marketing solutions or use AI-based marketing tools developed by other companies. In the past, personalization was very expensive, but in recent years personalization has become cheaper, due to the existence of machine learning algorithms. Building a model is cheaper, as universities and programmers make these algorithms available in open source, and computing power is more affordable. Formerly you had to use university servers to train algorithm models, now with Google, Amazon.com, and Microsoft data clouds accessible to any company, you can use servers to train models without spending a lot of money. The most advanced model of computer vision is inexpensive because major AI-based companies (Google, IBM, Facebook, Amazon.com, and Microsoft) have turned these models into cognitive services. These companies provide AI tools, some even automatic, the user sets the objective variable, the data that he has available and wants to relate to the point, and the process is done automatically and made available by the cloud. As it is getting cheaper, companies are expected to use more and more. By having the time and knowledge of business human resources, you can make your own AI solutions without much technology spending. The greatest difficulty will always be the time spent and the ability to have qualified human resources. Companies must create their own solutions if time and qualified human resources are available. If they want faster results and have the money to invest, they should choose to use tools from other companies (Exp 5, Exp 7, Exp 10). In Experts 4, 8, 11 and 12 opinion, SMEs should always buy rather than build. They should not hire a team of scientists and data engineers. The costs will be very high, and hard to find. Instead, it is preferable to use machine learning tools being incorporated into systems such as Adobe Sensei, Salesforce Einstein, or Shopify, and be mindful of the tools being created by startups. There are many tools that solve certain problems. In addition to the number of solutions, marketing technology is also growing at its level of sophistication as smart algorithms are becoming essential for these services (Exp 4, Exp 8, Exp11, Exp 12). SMEs should rethink their marketing strategies and adopt marketing technologies that integrate AI solutions that can deliver high value without significant upfront investment and, most importantly, without having a huge amount of individual-level data (Exp 3, Exp 8, Exp 11, Exp 15). Small and midsize businesses will mostly use marketing software that fills a business need such as lead generation, email marketing, search engine optimization or online chat. With AI, SMEs can find smarter tools that use Artificial Intelligence to create their solutions. So most SMEs have to look at the technologies they use today and see if there are smarter ways to do each of these things, ensuring that they are using the smartest tools available to reduce their business costs and increase revenue (Exp 4, Exp 12). 4.5
Impact on Marketing Costs and Revenues After AI Integration
Initially, implementing AI in marketing will have a big impact on the business until they can figure out what works best and what is the best solution for solving the problems they have defined. But once that is done, the other steps will be easier and
Artificial Intelligence Applied to Digital Marketing
167
less expensive. This is because they will have their quality data and can easily develop new solutions (Exp 12). For most marketers, AI does not change the level of marketing spend. It simply improves the performance of marketing efforts. It enables marketers to be more efficient, it also allows brands to be more selective about the content they reproduce, helping them prioritize content that is most valuable to their visitors. Most companies maintain the same volume and marketing expenses but increase the accuracy of their marketing efforts by being more targeted, faster and more effective, thereby delivering better results (Exp 1, Exp 4, Exp 5, Exp 9, Exp 10, Exp 15). With a well-implemented AI-based approach there will be cost savings, optimization and increased ROI. As the Boston Consulting Group and MIT Sloan Management Review report found, companies that customize their communications can increase their revenues by up to 20% and reduce costs by up to 30%. One of the main technologies being used in this process is Artificial Intelligence [13]. Rumelt (2011) defines that there are three fundamental steps to a good strategy. The diagnosis - where the business strategy is evaluated. Political orientation - where the challenges related to governance, culture and ethics are perceived. Coherent action plan - definition of aspects such as: resource allocation, implementation, purchase/build decisions, processes, talent development/hiring/retention, change management within the company related to people’s culture [15].
5 Conclusions From the data obtained from both the consulted studies and the interviews carried out within the context of this work, it is concluded that AI will have more impact on the future of marketing and that even SMEs can implement AI. Companies that are currently conducting marketing activities without AI-based solutions must be prepared for change. Developing training for a successful AI strategy in Marketing can only be effective when there is strong technical (technology, data and processes) and organizational (people, skill and culture). The first step in any AI Marketing strategy is to review the company’s business and communication strategy. Once the company’s business and communication strategy are clear, the best use cases should be identified to help the company achieve its objectives. That is, what are the problems the company wants to solve that with the help of AI can help achieve the company’s strategic goals. In the implementation phase, the company needs to think about how to turn its artificial intelligence strategy into reality. Companies need to understand how AI projects will be delivered; those responsible for each action; actions/projects that will need external support. Companies must consider what technology is required to achieve their AI priorities. Companies must understand and define whether it is best for their business objectives, to have an AI team within their own company, or whether will use solutions designed by other companies.
168
T. Ribeiro and J. L. Reis
5.1
Limitations of the Study
Since this is an exploratory study, there is a certain degree of description in the analysis of the results. Although the qualitative methodology is not the best in the generalization of the results, the analysis intended to be done in this study was more indicated using a qualitative approach. This study could have been carried out with a different methodological approach, but it would not have been possible to understand so well the reasons behind these results. It is important to note that, in this work, when it comes to convenience sampling in the interview process, it influences the reliability of the results, because if other interviewees were chosen, the answers could be different. In addition, it should be considered that this paper consists of statements from only 15 respondents, which makes it difficult to say with certainty that the results of this research are comprehensive and complete. Also, because of the small sample size, it is not possible to project a perspective that accurately reflects. However, as it was diversified among various professional areas, it is believed that data quality was assured. 5.2
Future Work
To have a better understanding the impact of AI on business on the marketing, this study must be completed through with testimonies from business managers more conclusive about the picture of AI’s impact. On the other hand, as demonstrated throughout this paper, the target client is always present, and in future work it is important to understand the impact on their lives.
References 1. PRNewswire, Despite the Buzz, Consumers Lack Awareness of the Broad Capabilities of AI (2018). https://www.prnewswire.com/news-releases/despite-the-buzz-consumers-lackaware ness-of-the-broad-capabilities-of-ai-300458237.html. Accessed 12 Apr 2019 2. Makridakis, S.: The forthcoming Artificial Intelligence (AI) revolution: its impact on society and firms. Futures 90, 46–60 (2017). https://doi.org/10.1016/j.futures.2017.03.006 3. Kietzmann, J., Paschen, J., Treen, E.: Artificial intelligence in advertising: how marketers can leverage artificial intelligence along the consumer journey. J. Advertising Res. 58(3), 263–267 (2018). https://doi.org/10.2501/JAR-2018-035 4. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, London (2016). https://doi.org/10.1016/j.artint.2011.01.005 5. Siau, K.L., Yang, Y.: Impact of artificial intelligence, robotics, and machine learning on sales and marketing. In: Twelve Annual Midwest Association for Information Systems Conference, pp. 18–19 (2017) 6. Eden, A., Steinhart, E., Pearce, D., Moor, J.: Singularity Hypotheses: An Overview. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-32560-1_1 7. Rosenberg, D.: How marketers can start integrating AI in their work. Harvard Bus. Rev. (2018) 8. Chui, M., Manyika, J., Miremadi, M., Henke, N., Chung, R., Nel, P., Malhotra, S.: Notes from the AI Frontier: Insights from Hundred Uses of Cases. McKinsey & Company (2018)
Artificial Intelligence Applied to Digital Marketing
169
9. Ramaswamy, S.: How companies are already using AI. Harvard Bus. Rev. 14, 2017 (2017) 10. Ransbotham, S., Gerbert, P., Reeves, M., Kiron, D., Spira, M.: Artificial intelligence in business gets real. MIT Sloan Manag. Rev. 60280 (2018) 11. Domingos, P.: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Penguin Books LDA, London (2015) 12. Beaudin, L., Downey, S., Hartsoe, A., Renaud, C., Voorhees, J.: Breaking the marketing mold with machine learning. MIT Technol. Rev. Insights (2018) 13. Gallo, A.: The value of keeping the right customers. Harvard Bus. Rev. 29 (2014) 14. Severino, A.J.: Metodologia do trabalho científico. Cortez Editora (2007) 15. Rumelt, R.: Good Strategy, Bad Strategy. Profile Books, London (2011)
Fact-Check Spreading Behavior in Twitter: A Qualitative Profile for False-Claim News Francisco S. Marcondes1(B) , Jos´e Jo˜ao Almeida1 , Dalila Dur˜ aes1,2 , 1 and Paulo Novais 1
ALGORITMI Centre—Department of Informatics, University of Minho, Braga, Portugal [email protected], {jj,pjon}@di.uminho.pt 2 CIICESI, ESTG, Polytechnic Institute of Porto, Felgueiras, Portugal [email protected]
Abstract. Fact-check spread is usually performed by a plain tweet with just the link. Since it is not proper human behavior, it may cause uncanny, hinder the reader’s attention and harm the counter-propaganda influence. This paper presents a profile of fact-check link spread in Twitter (suiting for TRL-1) and, as an additional outcome, proposes a preliminary behavior design based on it (suiting for TRL-2). The underlying hypothesis is by simulating human-like behavior, a bot gets more attention and exerts more influence on its followers. Keywords: Chatbot media
1
· Social agent · Fake news · Fact check · Social
Introduction
The spread of fake news on Twitter is commonplace [4]. Despite the existing efforts for identifying and debunking fake-news, initiatives for enlightening people about facts are still scarce in social-media [8]. This, however, is cornerstone since this is probably the best-suited counter-propaganda strategy [15]. In other words, spam-fighting strategies such as detecting and stopping bots suits for automatic spreading but does not refrain organic spreading. Therefore, efforts such as those of “guardians”, people who fight against misinformation in socialmedia [13], are extremely valuable. However, good-will based guardians cannot face professional and high-technological propaganda structures [15]; for illustration, an estimate proportion for true-news and fake-news is of 1:17 [12]. This paper objective is to profile the fact-check spreading behavior in Twitter for the purpose of proposing a viable design hypothesis for an automatic factcheck spreading device (called automatic guardian or a-aguardian) to be tested in near future. The key contribution is the raised profile as it clarifies this type of behavior and provides some background and insights for automation. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 170–180, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_16
Fact-Check Spreading Behavior in Twitter
171
State of the Art. Fake-news spreading is based on computational propaganda. Propaganda is a fake-news with a political or financial end; computational propaganda is the propaganda carried out by bots and bot-nets [17]. The measures adopted for hinder propaganda is called counter-propaganda; carrying it out using computer support is called computational counter-propaganda [15]. The most effective counter-propaganda strategy is enlightening people with the truth. There are two dimensions to be considered, the first is related to the content and the second to the spreading strategy. Content-related counter-propaganda is performed through debunking falseclaims with correct information (this is being held by fact-check agencies). These debunking information must reach people, then a spreading strategy must be devised. Nevertheless, there are constraints to be honored. Counter-propaganda cannot use deceptive strategies such as bot-nets with several fake-accounts, i.e. it must be ethical. This second dimension, however, is not being tackled [8]. An alternative to bot-nets is to consider “digital influencers” for reference. Digital influencers are people who succeed in influencing people through the content published in their communication channels. Anyway, such behavior is analogous to the “super-spreaders” find in the core of bot-nets [11]. Actually, know a profile to be a bot does not change its influence potential [6], unconsciously yet consistently humans interact with computers as social actors [10]. Yet this requires even simple relationship building features [2].
Methodology. This paper follows the Design Science approach adapted for researching programming. In short, the Ω-knowledge is developed following the TRL steps and the Λ-knowledge is driven by intuition1 . This paper, as a profile, is aiming TRL-1 (basic principles or the Ω-knowledge). Different methods were applied according to the research needs. Section 2 uses systematic review, Sects. 2.2, 2.3 and 2.4 are based on case study strategy and Sect. 2.5 is the result of a bibliography survey. Finally, Sect. 3 presents preliminary yet coherent model and design hypothesis for TRL-2 (basic concept for the Λ-knowledge) for the gathered information.
2
Profiling the Fight Against Fake-News
2.1
Virtuous-Bots Landscape
A survey evaluating the use of bots on fighting fake-news was presented in [8] within the time-frame between 2015–2017. For including the meantime until July 2019 a quick “re-survey” was carried on by repeating that paper parameters (the query submitted to Scopus was TITLE-ABS-KEY((chat* OR social OR 1
Intuition has been a controversial due to some authors opposed it to reason, however intuition is the highest skill level attained when the skill is internalized not requiring extensive councious reasoning. Applied to problem-solving, it the ability to see a solution beforehand.
172
F. S. Marcondes et al.
conversation* OR dialogue) W/0 (system OR agent OR bot) AND (fake-news OR fake news OR misinformation))). The same results hold, there are few researches
concerned with fact-checks spreading [8,13]. Another survey, this time directed towards Twitter in August 2019, was for identifying profiles that identify themselves as bots and aimed to fight misinformation. The query search?q=bot fact fake &src=typed query &f=user retrieved 3 results of which 1 is currently active. The @unfakingnews is a cyborg working mostly by re-tweeting information from credited sources. In turn, the query search?q=bot fact &src=typed query &f=user retrieved 49 results; 19 were classified as “not applicable” due to not being a bot or active. The remaining were classified as NLP-based bots (4); non-sense bots (9); link-spread or retweeting bots (5); and fun-facts bots (11). Therefore, 5 out of 49 accounts match the intended behavior of fighting misinformation by spreading factual information yet only two are active @skeptics bot and @TXLegeFactBot; both for link spreading. It was not possible to find virtuous-bots replying to other users’ tweets for warning about a fake-news. This may be due the @DroptheIBot effect. Roughly, it was a bot when someone said something like “illegal immigrant” it replied with a politically correct suggestion for an alternative. This profile was reported many people, and in the end, suspended by Twitter [3]. Somehow it degenerated into a subverted yet virtuous spam engine. 2.2
A Fact-Check Report Profile
For this paper, a sample of 50 fact-check reports and 7360 related tweets within a time-frame ranging from September 2008 to July 2019 (excluded repetitions), was fetched on July 20, 2019. It was selected “trending” fact-check reports collected from the Hot 50 page of Snopes’ website (www.snopes.com) whose link was queried in Twitter for collecting fact-check link spread tweets cf. [13]. The Snopes uses labels for classifying its fact-check reports according to its subject and ratings for classifying the truth-level of an analyzed claim. This helps in understanding how tweets linked to fact-check reports concentrate. The reports classified as NONE, even being significant, were rejected for this paper. Figure 1a depicts the most viewed reports are those for debunking false claims. It is depicted in Fig. 1c that tweets linked with false-claims are the most numerous yet well distributed between the sampled reports. In another hand, Fig. 1b reveals the most popular subjects being junk-news, politics and photos, however, as depicted in Fig. 1d the highest tweet concentration was about politics. These same data were also collected one month earlier and four months after the presented sample suggesting similar results. Therefore, it is safe to suggest that false-claims debunking with political subjects is a popular topic for Snopes’ readers both on the website and Twitter.
Fact-Check Spreading Behavior in Twitter
173
Fig. 1. a) Fact-check reports classified by the claim rating; b) Fact-check reports classified by the its subject labels; and c) Tweets spreading fact-check reports links classified by the analyzed claim rating; and d) Tweets spreading fact-check reports links classified by the report’s subject labels. For a remark, a fact-check report may receive several subject labels, therefore the total size presented in figure (b) is greater than the size of the sample. The none rating and label are used for uncategorized reports from Snopes.
2.3
A Fact-Check Spreading Profile
False-claim debunking is the rating with better widespread interest distribution. Other ratings common behavior is one or two reports receiving massive attention (measured in the number of tweets) whereas the further being neglected. In this sense, false-claim reports are well-suited for relationship building and working on the counter-propaganda effort at once. Therefore, in order to increase effectiveness, this paper focuses on fact-check spread for false-claims. The sample for false-claims is composed of 1987 tweets, almost one-third of total sample, scattered through 1483 profiles, a proportion of 1:1.4, suggesting an absence of bots. These tweets were split into two groups, one for tweets sent in a profile’s time-line (872, 44%) another for tweets sent in replying for another profile tweet (1115, 56%). Each group was once again splat (Jaccard > 0.5) in tweets composed almost by the fact-check link (respectively 420 and 544, 48% and 49%) and other in tweets using the fact-check for supporting an argumentation (respectively 452 and 571, 52% and 51%). Highlight that few fake-news links are tweeted for replying [11]. Refer to Table 1 for a summary. It was identified two main behaviors for direct link spreading, one is to insert a link within a tweet and another is by sharing a fact-check from an external
174
F. S. Marcondes et al.
source (such as the agency website). External source sharing is usually followed by common words such “fact-check” and “via ”, etc. whereas directly inserting a link is actually just the link supported by the Twitter’s thumbnail feature. For this sample, prevailed external source sharing for direct tweets (respectively 119 and 301, 14% and 36%) and link insertion for replying (539 and 5, 48% and 1%). Opinion tweets, in turn, can be divided into four major groups warning, informing, explaining, commenting and judging. The warning includes tweets like This is false, debunked a long while ago. Please see and judging includes moral opinion such you people are absolute garbage . Table 1. Tweets spreading fact-checks for false-claims tweets. Replied
Not replied
Direct 872 (44%) 119 (13.6%) 301 (34.5%) 452 (51.8%)
Total
–
–
Reply 1115 (56%) 539 (48.3%)
–
–
Total
1987
Only Link Almost title 5 (0.4%)
Opinion 571 (51.2%)
658 (33.1%) 306 (15.4%) 1023 (51.5%) 602 (30.3%) 1385 (69.7%)
There is however a quality difference between direct and reply tweets. Usually, direct tweets are sent directed towards an “abstract entity”, then the tone sounds conciliatory. Replying tweets, on another hand, may sound quite aggressive as critics are usually directed to actual people such as who authored or shared the false-claim. For instance, an impersonal stop lying is not shocking but by mentioning or replying to someone it may be considered hostile. Therefore, even possessing similar structures, text generation for opinion tweets is not straightforward. A caution is not to reproduce TAY’s misbehavior (become an offender shortly after start learning from social media [16]). Fake-news is quite a sensitive matter, even a deterministic ALICE-bot for commenting fake-news, and eventually mentioning or replying people, may easily become troublesome. For this sample 602 (30%) tweets received a reply and 1385 (70%) did not; most of the replies did not receive any further reply neither from the original author nor from a third-person. In addition, non-opinion tweets are responsible for about 50% of link spreading activity. Therefore, considering the risks shown by TAY and @DroptheIBot, further research on commenting and replying is required and therefore these subjects were postponed. 2.4
A Profile for the Human Behavior
Observing the sampled bots and some fact-check agencies it can be realized them tweet fact-checks as plain tweets whereas human usually includes around four words for expressing an emotion. Within Internet text messages, emoji, letter capitalization and the number of dots are usually related on expressing emotions. By simulating human behavior, it is more likely for a bot to exert influence within its followers [6], therefore an “asceptical” false-claim fact-check spread may sound artificial and falls into the uncanny valley [9]. In this situation it is more likely for a tweet create rejection than calling attention and invite for a
Fact-Check Spreading Behavior in Twitter
175
careful reading. Therefore, properly simulating human behavior may be a strategic cornerstone on fighting fake-news. For profiling and enabling proper humanlike text generation, the sampled phrases of direct only-link tweets, excluded repetitions, presented in Fig. 2. 2.5
The Enemy Behavior
Fake-news spreading bots are designed with two major concerns. One is to efficiently spread fake-news and the other is to avoid social-media surveillance. Therefore, the presented behavior for these bots overlaps those concerns into a coherent behavior. For instance, following circadian rhythms helps in avoiding surveillance [4] whereas choosing a centered or inflammatory attitude is related to a bot’s effectiveness [7]. ‘#FakeNews’, ‘#repeal2A’, ‘#stopsharingfakenews’, ‘6-29-19’, ‘Actually no’, ‘Another hoax...’, ‘COPY AND PASTE!’, ‘DEBUNKED!!!!!’, ‘FACT CHECK . . . ’, ‘FAKE QUOTE’, ‘FALSE’, ‘FALSE!!!!!’, ‘False’, ‘False quote’, ‘False...’, ‘False/hoax’, ‘Gone viral yet again:’, ‘HOAX!!!’, ‘Hoax!’, ‘Hooks! Sorry.’, ‘Its Fake . . . ’, ‘Just an fyi’, ‘Just. Stop.’, ‘Kayleena: not true’, ‘Listen Up!’, ‘NOT DEAD!’, ‘NOT TRUE . . . ’, ‘Nah . . . ’, ‘No one got herpes’, ‘No.’, ‘PSA’, ‘y’all:’, ‘Please read...’, ‘Please... Just stop...‘, ‘Saw that coming...’, ‘Stop falling for hoaxes’, ‘Stop lying.’, ‘Terry Crowl’, ‘The Fonz is ok.’, ‘The answer is no’, ‘This is 2019 af’, ‘Viga Hall Please note’, ‘Worth reading and watching:’, ‘. . . ’, ‘× False.’, ‘ ’.
Fig. 2. Text snippets used within the tweet data-set.
The strategy is often based on “super-spreaders” within the core of a bot-net who posts the same fake-news up to thousands of times by targeting, through replies and mentions, users with many followers (this is called amplification) [11]. This is especially common for the early spread (before organic sharing becomes prevalent). The expectation is to trigger a rumor cascade (people retweeting or re-claiming a fake-news) [14] by exploring some cognitive biases such as information overload, confirmation bias, motivated reasoning, social trust, etc. Highlight, it is more likely to a human spread a fake-news than a true-news [14]. Since social media profiles become more influencing after a popularity threshold [1], another common strategy is to start deception only after a threshold. For this strategy proper cultural and psycho-biological behavior plays a major [6]. Bots are tailored for targeting these triggers and exert influence [11].
3
A Feature Design for the
A-GUARDIAN
The basic principles considered for this paper were discussed in Sect. 2. For a summary, Sect. 2.1 found that by one side there are few computational counterpropaganda research and applications for social media. Also, the few instances found are only plain link spreading bots probably due to the @DroptheIBot effect). The result for Sect. 2.2 is that false-claims debunking with a political subject is a popular topic for the Snopes’ readers. For Sect. 2.3, since reports about
176
F. S. Marcondes et al.
false-claims have a better interest distribution than other ratings, this paper focused on this rating. Due to difficulties in text generation and replying to sensitive matters such as fake-news as those presented by @DroptheIBot and TAY, opinion and reply text generation require further research being postponed for another paper. For direct tweet spread, it prevailed sharing from external sources behavior. Section 2.4 discusses the need for a bot to earn trust and popularity to exert influence, which requires relationship building and to avoid the uncanny valley. It also presented a set of human tweet samples for discussion. The two major concerns for fake-news bots discussed in Sect. 2.5, is to avoid social media surveillance and effectively spread a fake-news. This is achieved by simulating human behavior, setting text generation towards a target audience, sending the same fake-news several times for amplification aiming to trigger rumor cascades (the @DroptheIBot effect is not a concern for fake-news bots). 3.1
Basic Concepts
The surveyed and sampled data presented in Sect. 2 was summarized as the requirement list presented in Table 2. Requirements are the way that programming science distinguishes a research design from a regular design. A regular design requirement is the result of a negotiation whereas the research design requirement is not. The result then is evaluated according to how far the design is from an ideal requirement (or design description). Therefore, this proposal is not for a full social-bot but just to one feature: direct almost-link fact-check sharing for debunking reports about false-claims. Although it requires a logged-in Twitter account for running, this feature is not for a Twitter-bot. This is due to data shown most of the human direct link spread is performed through external sources; therefore, like, reply and retweet behaviors are out of scope for this feature. Table 2. Feature and associated requirement list. Feature: Direct almost-link false-claims debunking fact-check sharing # Requirement
Line
1
1.4
2 3 4 5 6
Given a list of fact-check reports to share (feed-list), the a-guardian shares each report through the agency’s sharing feature Given a data-set with a desired tone, the a-guardian generates a text snippet for to be inserted with the link in the tweet Given a list of target profiles and a feed-list, the a-guardian selects who to mention within a tweet (it could be none) Given a fact-check reports, the a-guardian select those to compose the feed-list for the next iteration After a share-event, the a-guardian sleeps for a certain time After a sharing iteration, the a-guardian sleeps for a certain time
1.5 1.6 1.2 1.7 1.9
Fact-Check Spreading Behavior in Twitter
177
Algorithm 1. The a-guardian Prototype Algorithm Description. Require: assertTrue[(social media, logged in), (factCheckList, updated)]; Require: [factCheckList ∧ snippetDataset ∧ targetProfiles] = ∅; 1: loop 2: feed ← handler.chooseFeed(factCheckList) 3: for all report in feed do 4: media.share(report, / 5: handler.buildSnippet(snippetDataset), / 6: handler.sortMentions(targetProfiles)) 7: thread.sleep(handler.sleepingTimeFor(‘share’) 8: end for 9: thread.sleep(handler.sleepingTimeFor(‘iteration’)) 10: end loop Ensure: ∀ feed ∈ feedList ⊂ bot’s timeline Ensure: assertTrue(social media, logged out)
The sharing feature for the requirements in Table 2 is presented in the Algorithm 1. This, as any bot, is a thread within a continuous loop that eventually is put to sleep. The proposed algorithm, given the provided profile, is quite straightforward however, these simple measures draw the bot nearer to human behavior, compared to average bots as shown [4]. The preconditions that the snippetDataset and targetProfiles cannot be empty means that it is not possible to this bot for spread information without generating snippets nor mentioning people. In short, this strategy is embedded in this bot’s essence which in turn is based on the raised profile. The proposed feature is composed of four operations to fulfill its responsibility placed into a generic Handler class: – chooseFeed(factCheckList) – This function aims to accomplish requirement #4 however, for understanding about repetition and spacing data must be collected. For a proportion, politics topic composes at least 25% of a feed-list. – buildSnippet(snippetDataset) – see Snippet Generation. – sortMentions(targetProfiles) – This function aims to accomplish requirement #3 however, there are still data to be collected for understanding the mentioning patterns. The target profiles are those with a larger number of followers. – sleepingTimeFor(evt=[‘share’, ‘iteration’]) – This function aims to accomplish requirements #5 and #6 however, more data should be gathered for understand the mean time between shares and the sharing rhythm in a circadian sense.
Snippet Generation. This discusses the buildSnippet(snippetDataset) function for accomplish the requirement #2. The aim is to exert influence, which requires relationship building as a target. Therefore, a proper human-like snippet generation strategy is a cornerstone. As presented in the Fig. 2 the sampled tweets are loose phrases yet loaded with sentiments. Handling short texts is a difficult task due to the short amount of information for working with [5], this is even worst for snippets since they are even more reduced texts that may or not be
178
F. S. Marcondes et al.
within proper lexical-syntactic structures and often filled with onomatopoeia. Given these constraints, a caution is to avoid overtechnology 2 . For a design approach, a text generator may follow either a deterministic or a statistical approach. Based on the Fig. 2 the statistical approach would favor textual patterns, requiring a supervised learning procedure to keep making sense (and avoiding a TAY-like misbehavior); the deterministic approach would favor diversity by randomizing template structures yet requiring a CBR procedure for evolving. Also, the snippet generator can either simulate an individual profile or a general profile; the statistical approach would suit for the first and the deterministic for the second. This paper aims the second since the sampled snippets were not linked with specific profiles. The grammar presented in Fig. 3 was generated based on patterns extracted from Fig. 2. A suitable grammar-based mechanism for generating these snippets is to raffle the number of words or punctuation marks to be included before and after the link (up to four), then, for each word, it raffles the word’s case and for each punctuation mark the amount to be used. Some tweet instances: NAH afff , Hoax #StopSharingFakenews, .... !! , Please please !!!! and DEBUNKED . The sentiment interpretation for a snippet is left for the tweet reader. 1. [{ } | it’s | the answer is | again | actually | another] ({false | hoax | not true | no | fake | debunked | lie | fact-check} | nah); 2. [in | this is] ; 3. [{please} | just | y’all | worth|{.}|{!}] ({read | stop | listen up} | copy and paste); 4. and [{.}|{!}|{
}| a{f} | #FakeNews | #StopSharingFakenews] for tweet ending.
Fig. 3. Text snippet grammar derived from the snippet data present in Fig. 2.
4
Conclusion and Research Hypothesis
Fake-news spreading bots have been using several approaches for deception in social-media for propaganda effort. The counter-propaganda cannot use most of these tactics in order to remain ethical. Despite the existence of some shared approaches, due to their diverse aim, computational propaganda and counterpropaganda must have distinctive strategies for achieving their goal. The design proposal presented in Algorithm 1 is based on the profile presented in Sect. 2 and, as a result, it draws nearer to human behavior considering the properties presented in [4]. In short, the expectation is that Algorithm 1 by 2
Overtechnology is an anti-pattern similar to overengineering or overuseofpatterns and related to bleedingedge and goldplating. It is the act of designing an artifact to be more “technological” than the necessary for its intended use, often due to marketing purpose or technological obfuscation.
Fact-Check Spreading Behavior in Twitter
179
simulating human behavior allows people to “recognizes” it as “peer” creating a social connection enhancing the counter-propaganda effort. The hypothesis is that a fact-check spread bot gets more attention and exerts more influence on its followers by simulating human-like behavior compared to plain “only link” tweeting currently widespread in fact-check agencies and truenews spreading bots. The Algorithm 1 summarizes these paper findings. For future works, it is expected to gather the missing information in order to design the lacking operation for providing a complete description of such behavior. In addition, it is expected to implement the proposed design in order to collect data to better understand the dynamics of fact-check spreading in Twitter and eventually improve the bot effectiveness. Acknowledgments. This work has been supported by national funds through FCT - Funda¸ca ˜o para a Ciˆencia e Tecnologia within the Project Scope: UID/CEC/00319/2019.
References 1. Aiello, L.M., Deplano, M., Schifanella, R., Ruffo, G.: People are strange when you’re a stranger: impact and influence of bots on social networks. In: Sixth International AAAI Conference on Weblogs and Social Media (2012) 2. Bickmore, T.W., Picard, R.W.: Establishing and maintaining long-term humancomputer relationships. ACM Trans. Comput.-Hum. Interact. 12, 293–327 (2005) 3. Brooker, P.: My unexpectedly militant bots: a case for programming-as-socialscience. Sociol. Rev. 67(6), 1228–1248 (2019) 4. Ferrara, E., Varol, O., Davis, C., et al.: The rise of social bots. CACM 59, 7 (2016) 5. Jingling, Z., Huiyun, Z., Baojiang, C.: Sentence similarity based on semantic vector model. In: Proceedings of the 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (Washington, DC, USA), 3PGCIC 2014, pp. 499–503. IEEE Computer Society (2014) 6. Lucas, G.M., Boberg, J., Traum, D., Artstein, R., Gratch, J., Gainer, A., Johnson, E., Leuski, A., Nakano, M.: Culture, errors, and rapport-building dialogue in social agents. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents (New York, NY, USA), IVA 2018, pp. 51–58. Association for Computing Machinery (2018) 7. Luceri, L., Deb, A., Badawy, A., Ferrara, E.: Red bots do it better: comparative analysis of social bot partisan behavior. In: Companion Proceedings of the 2019 World Wide Web Conference, pp. 1007–1012. ACM (2019) 8. Marcondes, F.S., Almeida, J.J., Novais, P.: A short survey on chatbot technology: failure in raising the state of the art. In: International Symposium on Distributed Computing and Artificial Intelligence, pp. 28–36. Springer, Heidelberg (2019) 9. Mori, M., MacDorman, K.F., Kageki, N.: The uncanny valley [from the field]. IEEE Robot. Autom. Mag. 19(2), 98–100 (2012) 10. Nass, C., Steuer, J., Tauber, E.R.: Computers are social actors. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 72–78. ACM (1994) 11. Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.-C., Flammini, A., Menczer, F.: The spread of low-credibility content by social bots. Nat. Commun. 9(1), 4787 (2018)
180
F. S. Marcondes et al.
12. Shao, C., Hui, P., Wang, L., Jiang, X., Flammini, A., Menczer, F., Ciampaglia, G.: Anatomy of an online misinformation network. PLoS One 13, e0196087 (2018) 13. Vo, N., Lee, K.: The rise of guardians: fact-checking URL recommendation to combat fake news. In: The 41st International ACM SIGIR Conference on Research (New York, NY, USA), SIGIR 2018, vol. 38, pp. 275–284. ACM (2018) 14. Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018) 15. Waller, J.: Strategic Influence: Public Diplomacy, Counterpropaganda, and Political Warfare. Institute of World Politics Press (2009) 16. Wolf, M.J., Miller, K., Grodzinsky, F.S.: Why we should have seen that coming: comments on microsoft’s tay “experiment,” and wider implications. SIGCAS Comput. Soc. 47(3), 54–64 (2017) 17. Woolley, S., Howard, P.: Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media. Oxford Studies in Digital Politics. Oxford University Press (2018)
Artefact of Augmented Reality to Support the Treatment of Specific Phobias Raul Vilas Boas1(&) , Lázaro Lima1,2 , Greice Zanini3 and Pedro Rangel Henriques1
,
1 Centro ALGORITMI, Universidade do Minho, Braga, Portugal [email protected],[email protected], [email protected] 2 Instituto Federal de Brasília, Brasília, DF, Brazil 3 Escola de Psicologia, Universidade do Minho, Braga, Portugal [email protected]
Abstract. Phobia is a type of anxiety disorder defined by a persistent and excessive fear of an object or situation. Currently, exposition treatment is the most practiced method to treat phobias, although it comes with limitations. We can reduce these limitations by combining Augmented Reality techniques with the exposition treatment. Its benefits are a decrease in costs, versatility of the process and full control of the procedure by the therapist. As shown in multiple research, Augmented Reality has obtained interesting results in the treatment of psychological disorders serving as a foundation for the development of this project. The recent technological advances in the field also allowed for an easier access to Augmented Reality which is accessible to use even in old smartphones. Our goal is to develop an artefact using WebAR in conjunction with psychologists who treat phobic patients, to create a program to support the treatment of phobias with a gradual exposition system of the phobia. Their help will be essential to understand the most important requirements needed for the program. We believe this system has the potential to improve the treatment of phobias serving as an extension to the current methods by providing comfort and efficiency. Keywords: Augmented Reality exposition
Specific phobia WebAR Gradual
1 Introduction Phobia is a type of anxiety disorder that causes an individual to experience an irrational fear about a situation, living creature, place or object [9]. The most prevalent mental disorders in the overall population belong to the Specific Phobia category. As stated by the American Psychiatric Association [1], the 12-month community prevalence is estimated to be approximately 7%–9% in the United States and 6% in European countries for specific phobias. This value is lower in Asian, African and Latin American countries with 2% to 4% rates. Frequently, females are twice more affected with specific phobias than males, although it may change depending on its type. For © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 181–187, 2020. https://doi.org/10.1007/978-3-030-45691-7_17
182
R. V. Boas et al.
example, females experience more animal, natural environment and situational phobias than males, although blood-injection-injury affects both genders nearly equally [1]. The most common treatment for specific phobias is In Vivo Exposure, which is considered the standard method since it has proven to be efficient in many treatments [4]. This method consists of directly facing the cause of the phobia in a safe environment to overcome the fear associated with it. This approach comes with several limitations since it is not always possible to control, prolong or repeat the procedure. Depending on the type of phobia, this could also cause moving costs, the discomfort of the patient and loss of time on the process [12]. We can decrease these limitations if we use virtual tools, such as Augmented Reality. There would not be a need for transportation of the patient and the therapist since the therapist has full control when the virtual objects appear. It would even allow for the process to be simulated multiple times without any increase in the cost [9]. Augmented Reality (AR) merges the real world with virtual information, such as virtual objects, texts, images and sounds, in real-time. This technology enriches the surrounding environment without replacing it, as opposed to Virtual Reality (VR) which completely replaces the real world with a synthetic environment [2]. In recent years, AR has started to be used in web browsers. This innovation allows to use AR in any device connected to the internet. Therefore, with the increase in the quality and quantity of cell phones, everyone is capable of experience AR as easy as picking their phone and opening a browser on the internet. In this context, with the technological advancements on AR, it emerged the motivation to create a system capable of helping with the treatment of phobias. The major goal of this project is to enhance the techniques that already exist by reducing the cost, making it more accessible or improving the quality. We will take advantage of new possible scenarios that AR presents through the incremented information in the real world as virtual elements. The virtual elements, in this case, would be the phobia of the patient who seeks treatment. We believe AR serves as a meeting ground between the phobic not having any contact with the phobia to him having a real interaction with it. There is the possibility to divide the procedure into small steps for the phobic to overcome, providing a more reliable growth between different levels of phobias. The structure of this paper is as follows: a definition of specific phobias in Sect. 2, followed by the concept of AR in Sect. 3. In Sect. 4, we describe the artefact and finally in Sect. 5 we conclude the project.
2 Specific Phobia Fear is inherent and common in humans. He serves as an adaptative behaviour to ensure our survival towards a dangerous situation. Although, when fear appears in harmless situations, it is no longer adaptable and restricts the life of the person [3]. Marks [8] defined phobia as a special kind of fear that is out of proportion based on the situation, cannot be explained or reasoned away, is not voluntarily controlled and leads to avoidance of the feared situation. Although they usually recognise that their fear may be exaggerated and unrealistic, since most people do not have that problem,
Artefact of Augmented Reality to Support the Treatment of Specific Phobias
183
they are not capable of suppressing the fear. Beck et al. [5] adds that what makes a fear into a phobia “is the magnification of the amount of risk in a feared situation and the degree of harm that will come from being in that situation”. The American Psychiatric Association [1] identifies three different categories of phobias: Social Phobia, Agoraphobia, and Specific Phobia. However, some of these categories are not suitable for the use of AR as a source of treatment, therefore we will focus on specific phobias. They also divided the specific phobia category into different types, which are: • • • • •
Animal (Fear of spiders, insects, dogs); Natural Environment (fear of heights, storms, water); Blood-injection-injury (fear of needles, invasive medical procedures); Situational (aeroplane, elevators, enclosed places); Other (circumstances that lead to choking or vomiting, loud sounds or costumed characters).
Even so, some of these types will be easier to recreate using AR than others. The development of a phobia may occur after a traumatic event that resulted in an unpleasant or harmful experience. For example, if a person is involved in a car accident, he may develop vehophobia, which is the fear of driving. These phobias are usually easier to date since it’s from a specific traumatic event of the person’s life. Other phobias may come from childhood fears the person didn’t outgrow, typical examples are the fear of being alone in the dark or being lost. One possible reason that leads to the person not being able to outgrow the fear, unlike most children, is that the child was able to avoid the feared situation [5]. Beck et al. [5] defined three characteristics that occur when a person faces the cause of his fear. The first one is suffering from an unpleasant level of anxiety that leads to symptoms such as pounding heart, racing pulse, nausea, dizziness and faintness. The second one is escaping or avoiding the source of the phobia. If he is unable to avoid it, he may overcome the situation or develop chronic anxiety. Finally, the third characteristic is the ability to understand that the fear is excessive but despite this, he is still incapable to overcome the fear. They also add that people affected by phobias seek treatment because they either realise they suffer in a situation that doesn’t affect others or because they can no longer endure the effects and the restrictions inflicted by the phobia in their life. Although, they are phobic about certain situations. they are completely relaxed in other circumstances that provide distress in others. To an observer, the situation may seem harmless, although for the phobic it may be lifethreatening. Cognitive Behavioural Therapy is an efficient method to treat specific phobias. In this approach, the psychologist uses exposition therapy which consists of exposing the patient to the situations that increase his anxiety levels. This procedure should be performed multiple times and gradually in order to reduce the phobic responses, through learning processes about what is feared. Among the various ways to put exposure therapy into practice, technology can be used to assist in the treatment by improving quality and safety. Therefore, we can enhance the therapeutic process by simulating this phobic object in a controlled and safe environment [10].
184
R. V. Boas et al.
3 Augmented Reality Azuma [2] defined AR as an interactive experience combining the real world with virtual elements which “supplements reality, rather than completely replacing it”. Using AR gives more a feel of presence since the user is not immersed in a virtual environment. He is also capable of seeing the real world and using his own hands to interact with the virtual objects. All these factors are responsible for a more real experience. The ideal scenario for AR would be if it looked like the virtual elements and the real object coexisted in the environment without the user knowing which one is real. Azuma [2] adds that to be AR it needs to combine the real and virtual, it needs to interact in real-time and be registered in 3-D. As shown in Table 1, adapted from Juan [6], AR has benefits over the traditional methods that can be explored. They contribute to a safer environment for the patient allowing him to feel less anxious and decreasing the possibilities of giving up on the treatment. Table 1. Comparison between the traditional methods and AR for the treatment of phobias [6] Traditional methods The elements the patient fear are real and cannot be controlled by the therapist Depending on the type of phobia it may require moving costs and longer session The stimuli produced in are not controlled by the therapist There is no assurance of the safety of the patient during the treatment If the treatment requires a public place the patient may suffer discomfort during the session
AR treatment The elements feared are virtual therefore they cannot hurt the patient The therapist has full control when the virtual objects appear. Also, he can start/stop the program at any time The therapist can simulate the process multiple times without increasing the cost of the procedure There is no real threat to the patient since the elements are not real The therapist can choose where the sessions take place
There are different tracking methods in which we can use AR, such as Markerbased, Markless and Location-based. Marker-based and Markless both use image recognition in which the system uses the camera to detect patterns. In the Marker-based we can use a marker, for example, a QR code, to recognize the pattern and overlay the virtual elements on the marker. In the Markless the system recognizes irregular objects, such as human faces or human arms and places the object on that position. The location-based uses GPS and other position sensors to establish the location and create the virtual object.
Artefact of Augmented Reality to Support the Treatment of Specific Phobias
185
4 Artefact Description In this project, we are following the Design Science Research methodology, which consists of creating an artefact to solve a problem [11]. In this context, the problem is to improve the treatment of specific phobias by using AR. Therefore, our goal is to develop an artefact capable of providing support for the treatment of phobias using this technology. To accomplish this, we will work in conjunction with multiple psychologists, who treat phobic patients, to define which requirements are needed for the artefact to be viable and intuitive. Considering the final product is aimed for them to use during the treatment, their approval is essential for the success of this artefact. Lima et al. [7] in their work applied a gradual exposition method for the treatment of arachnophobia. Their system had eight different levels of spiders. In each level progressed, the level of realism of the spider would also increase. They start with a model that doesn’t resemble a spider and then add features such as textures and animations. This approach encompasses different states in which patients experience different levels of fear. Moreover, it allows for a gradual exposition which adds small steps to overcome during the treatment, decreasing the possibility of the patient to give up on the treatment. Therefore, we will implement a similar approach on our system. The artefact will use libraries of WebAR, which consist of using AR in the browser without any need of an extra application. This option allows for greater accessibility and an intuitive approach. Furthermore, there is also the option to use Marker-based and Markless options. The schema in Fig. 1 was developed to describe the intended result with this project. The schema is divided into two phases: the development and the operation. The first phase is dedicated to the preparation of the session. The psychologist will have access to the information of the patient in previous sessions. He will also be able to choose which 3D model he wants to load. Additionally, there will be the possibility to import other models if necessary. Other options that he can choose will be the level of realism of the 3D model and the tracking method. After this, the AR artefact is created and is ready to be used in the operation phase.
Fig. 1. Schema proposal
186
R. V. Boas et al.
In this phase, the psychologist chooses the equipment that he wants to use. The only requirements needed for the device are a camera and internet connection. The camera is used to recognize the marker and to provide video. The internet enables access to the artefact. Additionally, the psychologist can add a simple physiological sensor to measure the heart rate of the patient to obtain more information. In Fig. 2, we illustrate a simple prototype of the artefact. In this scenario, the camera recognizes the HIRO marker and overlays the 3D object of the spider or the cockroach on it. This example was performed with a laptop in a Web browser.
Fig. 2. Examples of Augmented Reality using Marker-based tracking
5 Conclusion The recent technological advances on WebAR provided a simpler way to use AR and, as multiple studies demonstrated, its application can prove beneficial in the treatment of phobias. Therefore, this motivated us to build an artefact capable of providing support for the treatment of specific phobias. The advantages this project brings in relation to others are the accessibility and the efficacy of using web-based AR, considering it is an intuitive approach and is available to every device with access to the internet. Besides, this method provides a versatile way for psychologists to be creative in their sessions to treat phobias using the gradual exposition system. In the first stage of the project, we will meet with psychologists to better understand the essential requirements needed for the program. Besides, the system needs to be intuitive, so it is easy for them to use. In the next phase, the project will be evaluated by the same psychologists to determine its viability. Depending on the results, they may use it on their procedure.
References 1. American Psychiatric Association, et al.: Diagnostic and statistical manual of mental disorders (DSM-5R©). American Psychiatric Publishing, Arlington (2013) 2. Azuma, R.T.: A survey of augmented reality. Presence: Teleoper. Virtual Environ. 6(4), 355–385 (1997)
Artefact of Augmented Reality to Support the Treatment of Specific Phobias
187
3. Barlow, D.H., Farchione, T.J., Sauer-Zavala, S., Latin, H.M., Ellard, K.K., Bullis, J.R., Bentley, K.H., Boettcher, H.T., Cassiello-Robbins, C.: Unified Protocol for Transdiagnostic Treatment of Emotional Disorders: Therapist Guide. Oxford University Press, Oxford (2017) 4. Barlow, D.H., Raffa, S.D., Cohen, E.M.: Psychosocial treatments for panic disorders, phobias, and generalized anxiety disorder. In: A Guide to Treatments that Work, vol. 2, pp. 301–336 (2002) 5. Beck, A.T., Emery, G., Greenberg, R.L.: Anxiety Disorders and Phobias: A Cognitive Perspective. Basic Books, New York (2005) 6. Juan, M.C., Alcaniz, M., Monserrat, C., Botella, C., Banos, R.M., Guerrero, B.: Using augmented reality to treat phobias. IEEE Comput. Graph. Appl. 25(6), 31–37 (2005). https:// doi.org/10.1109/MCG.2005.143 7. Lázaro, V.D.O., Cardoso, A., Nakamoto, P.T., Lamounier Jr., E.A., Lopes, E.J.: Sistema para auxiliar o tratamento de Aracnofobia usando Realidade Aumentada- usabilidade centrada no terapeuta. Anais do Comput. Beach, 268–277 (2013) 8. Marks, I.M.: Fears and Phobias. Academic Press, New York (1969) 9. Medeiros, D., Silva, W., Lamounier Jr., E., Souza Ribeiro, M.W.: Realidade virtual nãoimersiva como tecnologia de apoio no desenvolvimento de protótipos para reconstituição de ambientes históricos para auxílio ao ensino. In: V Workshop de Realidade Virtual e Aumentada–WRVA, vol. 21 (2008) 10. Sánchez, L.B., Africano, N.D., García, D.R., Gualdrón, A.S., Gantiva, C.: Realidad virtual como tratamiento para la fobia específica a las arãnas: una revisíon sistemática. Psychología: Avances de la Disciplina 13(1), 101–109 (2019) 11. Von Alan, R.H., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28(1), 75–105 (2004) 12. Wauke, A.P.T., Costa, R., de Carvalho, L.A.V.: Vesup: O uso de ambientes virtuais no tratamento de fobias urbanas. In: IX Congresso Brasileiro de Informática em Saúde, Ribeirao Preto, SP, Brasil (2004)
Social Care Services for Older Adults: Paper Registration Versus a Web-Based Platform Registration Ana Isabel Martins1,2 , Hilma Caravau1,3 , Ana Filipa Rosa1,3 Ana Filipa Almeida1,3, and Nelson Pacheco Rocha1,3(&)
,
1
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal {anaisabelmartins,npr}@ua.pt, [email protected], [email protected], [email protected] 2 Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal 3 Medical Sciences Department, University of Aveiro, Aveiro, Portugal
Abstract. Technological solutions have been playing an important role in health care provision and their implementation in social care area, though a bit slower, promises good outcomes in terms of the quality of care delivery. The study reported in this paper analysed the shifting process from a typical paper registration to a registration performed by a web-based platform (i.e., the Ankira® platform) in a Portuguese social care institution. In order to gather evidence and best practices, 13630 registries were analysed, 6390 in paper format and 7240 from the web-based platform. The results show that the shifting from paper to the web-based platform lead to a significant error rate decline in all type of registers, which demonstrates the significant impact of the introduction of information technologies to support social care provision. Keywords: Older adults care services Social care institutions social care record Management systems
Electronic
1 Introduction The implementation of online platforms is increasingly present in public services, private organizations, enterprises or banks, not only as a result of promoting policies encouraged by internet trends, but also by the recognized benefits associated to the use of these platforms [1]. In the health care sector, the technological developments represented a major improvement in the quality of the care provision, namely through the widespread of ehealth applications [2]. In the social care sector, the dissemination of technological solutions to support social care services is still very low and its introduction is difficult as institutions continue with traditional, deeply established, practices, when compared to the health care sector [3]. Furthermore, whenever technological solutions are developed to © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 188–196, 2020. https://doi.org/10.1007/978-3-030-45691-7_18
Social Care Services for Older Adults
189
support social care institutions, these tend to cover issues mainly related to bureaucracies rather than supporting professionals to enhance their practice [4]. Although technological development teams are increasingly aware of the target audience needs and despite the widespread guidelines (e.g., user centered design [5]), it is important to study how technological solutions affect the individuals’ lives and work routines, the difficulties faced in the changing processes and good practices that can be promoted and replicated to facilitate the integration of new services to support individuals and organizations. In line with this, the aim of the study reported by this paper was to gather evidence and good practices by analysing the introduction of a web-based platform [6] (i.e., the Ankira® platform) in a Portuguese social care institution. Specifically, the authors analysed the shifting process from a typical paper registration to a web-based registration. In addition to this introduction, in the following sections it will be presented the related work, the details of the web-based platform, the study design, and the achieved results. Moreover, a discussion of the results will be included and a conclusion will be drawn.
2 Related Work The technological evolution and its positive outcomes in health care provision led several authors to propose the use of technological solutions as a mediator between different actors of the social care provision [7]. Information technologies play an important role articulating services and caregiver that need to work together to guarantee individuals’ quality of life. Focusing on the older adults, it is known that most of them desire to stay at home as long as possible, with independence and autonomy, and integrated in the community rather than become institutionalized [8, 9]. However, and despite the efforts to promote active aging and aging in place [10, 11] sometimes institutionalization is an unavoidable solution. Therefore, gerontological services are required to guarantee the quality of life of institutionalized older adults [12]. Older adults’ quality of life is perceived as a multidimensional and complex concept, and thus is highly impacted by several aspects. In social care institutional environment new and different determinants can arise, and there is a well-known relation between the quality of the services and their patients’ quality of life [13]. Information systems (e.g., electronic health records for health care provision) are already a reality, with a number of advantages when compared to paper registers [14]. Electronic health records support information exchange and access across settings, enabling caregiver to better address patients’ needs [15]. Error reduction, clinical efficiencies, cost savings, and improved patient outcomes are some of the recognized advantages of electronic health records [14]. In the social area, these developments are not having such a fast evolution. In fact, mechanisms to register, access and share the required information are non-existent, or inferior to those available for the health care provision [7]. The health and social care divide is very evident in technological terms, because despite the technological
190
A. I. Martins et al.
advances and the increasing use of technology for health care delivering, the provision of social care does not follow this growth [3]. In 2003, the concept of electronic social care record was introduced in the United Kingdom [16]. However, this is not yet a reality in Portuguese social care institutions. The amount of registries and paperwork related to care provision by Portuguese social caregivers demands the introduction of information systems to promote person-centred care and to support quality management systems. In recent years, there has been a restructuring in working methods in social organizations, as care services for older adults, mainly improved after the implementation of policies, legislation, procedures and recommendations that focus on quality management systems. The enforcement of policies that establish the need for monitoring and measuring quality, which improves patients’ satisfaction, requires the registration of the services provided and the fulfilment of numerous documents with patients’ personal, medical and social information [17]. Typical registries include clinical information, the identification of care provided or administrative information [15]. For each patient it is required to have a multidisciplinary care plan which is set according to his/her needs and reviewed whenever necessary [18–20]. To guarantee compliance with the individual plan and to ensure quality of service, caregivers have to register a large range of information (e.g., bath registries or the registration of participation in activities).
3 Web-Based Platform The selected web-based platform was the Ankira® platform (http://ankira.pt/en), which was developed by Metatheke Software in 2013 [6]. It aims to support the social care services for older adults, such as home care services, geriatric centres, or nursing homes, so that caregivers might plan and register daily activities, which includes administrative, social, and clinical information of the patients. The platform is organized in four main areas: ‘Candidates’, ‘Clients’, ‘Care’ and ‘Management’. Potential clients are managed in the ‘Candidates’ area, which automatically sorts them according to each institutions’ criteria. When candidates are admitted, personal data is transferred into the ‘Clients’ area that comprises three components: ‘Daily life activities’, ‘Health care’ and ‘Psycho-social care’. Therefore, individual plans are stablished, based on the evaluation made, and available services are programmed. In turn, ‘Activities’ area allows the planning of activities, the registration of the attendees and their feedback. Moreover, the ‘Management’ area provides tools for controlling and monitoring services, which includes, supervise expenses and care provided to the older adults and analyse deviations between planned and effective activities, planning service orders, visualise and extract statistics and indicators by area that can be filtered using the listing tools. Focusing on the ‘Clients’ area, the platform allows the caregivers to register the care provided (e.g., bathing, clothing or drug administration) and visualize the available information. To assure the confidentiality of older adults’ data, the web-based platform allows managing accessibility permissions, according the caregiver profile.
Social Care Services for Older Adults
191
4 Methods The introduction of technological solutions in the social care area may represent an opportunity to improve the quality of care and services. Thus, the experimental study reported in this paper aimed to analyse the introduction the of the web-based platform in a Portuguese social care institution, which promoted a shifting from the typical paper registers to electronic registers. To achieve the defined goal, a comparative study between registers in paper format before the introduction of web-based platform and the electronic records of this platform was carried out in a social care institution, in order to assess and compare the registration quality between both approaches. This study was performed between September and October 2019. The research team randomly selected and collected paper and electronic registers related to five distinct areas: hygiene, elimination, positioning, alimentation, and drug administration. Then the quality of the information registered was analysed as well as filling errors were collected (e.g., lack of information, information recorded in wrong fields or duplicate information). The hygiene register includes information about patients’ hygiene plan. Thus, the caregiver must register each procedure related to the hygiene, including, hair wash, nail care, skin hydration, intimate hygiene, among others. In paper format, this includes checking the boxes of accomplished procedures, registering the execution time and signing the name of the person responsible for the care execution. In the web-based platform, the hygiene register includes two separate registries: bath register and hygiene care register. The bath register, defined according to the patients’ established plan, includes the type of bath and, when necessary, additional information in an observations field. In turn, the hygiene care register refers to hygiene care such as oral or partial hygiene. Also, in an observations field is possible to include additional information. The elimination register compiles the information about the type and frequency of patient elimination as well as the material used (e.g., adult diapers). It can be seen both as a health care register, allowing to understand if the patient has any abnormal functioning by analysing defecation and urination frequency, and also as a stock control of elimination items, like diapers. In paper format, the caregiver registries the change of diapers as well as the size of the diapers being used, and checks the box with information about the type of elimination (e.g., faeces or urine). The register also requires the signature of the responsible for the procedure execution. In the web-based platform, to perform an elimination register it is necessary to select the period of the day (e.g., morning, afternoon or evening). By default, elimination items planned for the selected time/period are marked. The caregiver must indicate the type of elimination (i.e., faeces and/or urine) and any changes in faeces and/or urine. If the patient goes to the toilet, it is also possible to do this register, indicating date and time in the ‘Go to toilet’ option. Concerning the positioning register, the intent is to register the number of times the caregiver positions patients who are unable to do it themselves. In paper format, the caregiver must write the abbreviation of the positioning (e.g., Right Lateral Recumbent is register as RLR). The caregiver responsible for the procedure execution must also
192
A. I. Martins et al.
sign his name on the register. In the web-based platform, to register positioning the caregiver selects the type of positioning or selects the position in which the patient was placed. It is also possible to register additional information in an observations field. Considering the alimentation register in paper format, there is a monthly sheet per patient in which the caregiver responsible for the feeding should sign the corresponding box (e.g., breakfast, morning supplement, lunch, snack, dinner or supper). There is also a field to register if dietary supplements were consumed, as well as observations. In the web-based platform the caregiver must access the patient card that contains the daily meals that should be delivered according to the defined plan (usual meal and/or supplement) and check the corresponding one. Additional information can also be registered in an observations field. Finally, the drug administration register is based on the medication plan defined by the nursing team. Care registries may be used by caregivers to monitor and evaluate the support that is provided and to follow the evolution of their patients. In paper format, the registration sheets include checking boxes for the caregiver to sign. There is one box per meal (e.g., fast, breakfast, lunch, snack, dinner or supper). There is also a daily observation field. In the web-based platform each patient has a medication plan card, with the number of doses planned for the day and the respective medication. After the login and the selection of the respective menu, the caregiver must choose the period of taking the medication (e.g., lunch) and save the data. There is also a field for observations that should be completed whenever necessary. For the results of the comparative analysis presented in the following section, it should be considered that the number of fields in each one of the five registers type can be different, in paper and in the electronic format. For example, in paper it is always necessary to have a field destined to the caregiver signature and in some cases to indicate the time of the procedure execution. The web-based platform overcomes this situation since it automatically records the identity and time of entry with the login action, and thus the registries are associated with this information.
5 Results A total of 13630 registries were analysed, 6390 in paper format and 7240 from the web-based platform. From those registers, 3180 were related to hygiene, 2493 were related to elimination, 2734 were related to positioning, 3472 were related to alimentation and 1751 were related to drug administration. In terms of presentation of the results, we considered percentage calculations of the registers effectively made and all the possible cases (number of possible fields for each type of register). In the hygiene paper registries, a percentage of 23.41% of errors was verified. In the web-based platform, the error percentage decreased to 1.64%. The error difference between paper registration and electronic registration was 21.77%. Since it was not possible to know what the planned actions were, the errors that were possible to verify for paper registries were the absence of signature of the caregivers. In the web-based platform errors are related to missing information about the procedure execution time, or planned actions incorrectly registered.
Social Care Services for Older Adults
193
The elimination register in both formats includes the type of elimination and the diaper size used. In paper format, 31.87% of the registers did not have the elimination type and 33.72% did not have the diaper size. In terms of the elimination registers of the web-based platform, in 1.01% of the cases, the caregivers did not register the elimination type and in 12.43% of the cases did not filled the diaper size. Regarding the caregivers signature, in paper format, 11.85% were missing, while in the web-based platform there were no missing data, since, as referred previously, the caregiver signature was unnecessary in the web-based platform due to the login feature. In the paper format, positioning register has 38.73% of errors while there were no errors observed when analysing the registries of the web-based platform. Moreover, the alimentation registers were clearly improved with the introduction of the computerized system. There was a decline of 32.55% in missing data (i.e., planned procedures that were unregistered), passing from 48.68% to 16.13% with the webbased platform. Finally, the error rate of the medication registration dropped about 26%, passing from 33.67% on paper to 7.58% when using the web-based platform. Figure 1 presents the errors and missing rates by type of register resulting from comparing the paper registries to electronic registries.
Fig. 1. Errors and missing rates by type of register.
194
A. I. Martins et al.
6 Discussion The experimental study reported by this paper explored the evolution from paper-based registries to electronic registries performed on a web-based platform. As a major conclusion, the comparison of the two formats indicates that there was a reduction of the error rate after the introduction of the web-based platform. In some registers, errors were completely suppressed, namely in the elimination and in the alimentation registers. Moreover, the positioning registers had the largest error rate reduction due to use of the web-based platform. The login feature of the web-based platform is a significant and evident improvement from paper-based registers, eliminating the existence of registers without a caregiver associated. This is also valid for the time of the procedure execution, since the system always records the exact time of the register. In the web-based platform, the existence of the observation field for all registers allows the critical incidents registration, which may include occurrences outside of the normal routine that can be registered and consulted at any time. Using electronic registers also eliminate problems related to confusing handwriting. Also, electronic registers do not have the deterioration that was possible to observe when using paper support. Another improvement related to the introduction of the web-based platform, is the large amount of paper registries. Considering the number of registries associated to an institutionalized patient in a daily basis, multiplied by the number of patients in the institution and the institutionalization time, which can reach decades, it is possible to imagine the volume of data and the physical space required to store them, apart from the difficulty of retrieving a past data that has already been archived. Despite the evidence of the web-based platform benefits, some difficulties were identified. For example, the fact that there is only one computer available for the staff, may represent a problem in the daily routine. The caregivers usually wait for a calmer period at the end of the shift, to register all procedures that they performed. This practice may result in loss of information, filling errors and in unreliable registers. Moreover, it is also clear that improvements could be considered in some components. For example, some registers would benefit from a pop-up message to avoid missing information, or to remind the caregiver to link specific registers. For instance, when the drug administration is performed during a meal, it should be linked to the alimentation registries (e.g., aspirin administrated at lunch).
7 Conclusion This study analysed the transition between traditional approaches with paper registration to electronic registries supported by a web-based platform. The results show that the error rate in filling the registries declined significantly in all type of registers analysed: hygiene, elimination, positioning, alimentation and drug administration. The introduction of information systems to support social care services is a difficult process as paper registers are rooted in social care organizations. The social sector still presents deficiencies in terms of the introduction of information technologies to support
Social Care Services for Older Adults
195
daily work routines. However, it is clear that the benefits had a great value and impact in the quality of the services being provided. For instance, having the registries in electronic format enables the implementation of analytics procedures, which may help high-level managers to structure and organize the services provided to older adults. Managing care institutions, as social care institutions, is a big challenge concerning the high regulatory bureaucratic degree that they have to meet. The activities developed should therefore be subject to registration, monitoring and evaluation, as well as to identify the person responsible for the accomplishment. The promises for improved care quality, safety and value for patients should be primary drivers of future investment in information technologies by social care institutions [21]. Acknowledgements. This work was financially supported by National Funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., under the project UI IEETA: UID/CEC/00127/2019.
References 1. European Commission: Online platforms and the digital single market opportunities and challenges for Europe, pp. 1–15 (2016) 2. Srivastava, S., Pant, M., Abraham, A., Agrawal, N.: The technological growth in eHealth services. Comput. Math. Methods Med. 2015, 1–18 (2015) 3. Rigby, M.: Integrating health and social care informatics to enable holistic health care. Stud. Health Technol. Inform. 177, 41–51 (2012) 4. Wastell, D., White, S.: Beyond bureaucracy: emerging trends in social care informatics. Health Inform. J. 20(3), 213–219 (2014) 5. Earthy, J., Jones, B.S., Bevan, N.: ISO standards for user-centered design and the specification of usability. In: Usability in Government Systems, pp. 267–283. Elsevier (2012) 6. Loureiro, N., Fernandes, M., Alvarelhão, J., Ferreira, A., Caravau, H., Martins, A.I., Cerqueira, M., Queirós, A.: A web-based platform for quality management of elderly care: usability evaluation of Ankira®. Proc. Comput. Sci. 64, 666–673 (2015) 7. Maguire, D., Evans, H., Honeyman, M., Omojomolo, D.: Digital Change in Health and Social Care. King’s Fund, London (2018) 8. Bedaf, S., Gelderblom, G.J., Syrdal, D.S., Lehmann, H., Michel, H., Hewson, D., Dautenhahn, K., Witte, L.: Which activities threaten independent living of elderly when becoming problematic: inspiration for meaningful service robot functionality. Disabil. Rehabil.: Assistive Technol. 9(6), 445–452 (2014) 9. Wiles, J.L., Leibing, A., Guberman, N., Reeve, J., Allen, R.E.S.: The meaning of ‘Aging in Place’ to older people. Gerontologist 52(3), 357–366 (2012) 10. World Health Organization: Active Ageing: A Policy Framework. World Health Organization, Geneva (2002) 11. World Health Organization: World Report on Ageing and Health. World Health Organization, Geneva (2015) 12. Vaz, E.: Mais idade e menos cidadania. Análise Psicológica 16(4), 621–633 (1998) 13. Schenk, L., Meyer, R., Behr, A., Kuhlmey, A., Holzhausen, M.: Quality of life in nursing homes: results of a qualitative resident survey. Qual. Life Res. 22(10), 2929–2938 (2013)
196
A. I. Martins et al.
14. Kruse, C. S., Mileski, M., Alaytsev, V., Carol, E., Williams, A.: Adoption factors associated with electronic health record among long-term care facilities: a systematic review. BMJ Open 5(1), e006615 (2015) 15. Alexander, G.L., Madsen, R.: A national report of nursing home quality and information technology. J. Nurs. Care Qual. 33(3), 200–207 (2018) 16. SCHD: Defining the Electronic Social Care Record. Information Policy Unit - Social Care Department of Health, London (2004) 17. Segurança Social: Modelo de Avaliação da Qualidade – estrutura residencial para idosos (2007) 18. Degenholtz, H.B., Resnick, A.L., Bulger, N., Chia, L.: Improving quality of life in nursing homes: the structured resident interview approach. J. Aging Res. 2014, 1–8 (2014) 19. Sousa, M., Arieira, L., Queirós, A., Martins, A.I., Rocha, N.P., Augusto, F., Duarte, F., Neves, T., Damasceno, A: Social platform. In: Advances in Intelligent Systems and Computing, vol. 746, pp. 1162–1168 (2018) 20. Martins, A.I., Caravau, H., Rosa, A.F., Queirós, A., Rocha, N.P.: Applications to help local authorities to support community-dwelling older adults. In: International Conference on Information Technology & Systems, pp. 720–729. Springer, Cham (2019) 21. Alexander, G.L., Wakefield, D.S.: IT sophistication in nursing homes. J. Am. Med. Dir. Assoc. 10(6), 398–407 (2009)
Enabling Green Building’s Comfort Using Information and Communication Technologies: A Systematic Review of the Literature Ana Isabel Martins1 , Ana Carolina Oliveira Lima1, Paulo Bartolomeu1 , Lucilene Ferreira Mouzinho1, Joaquim Ferreira1 , and Nelson Pacheco Rocha2(&) 1
Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal {anaisabelmartins,ana.carolina.lima,bartolomeu, lucileneferreira,jjcf}@ua.pt 2 Medical Sciences Department, University of Aveiro, 3810-193 Aveiro, Portugal [email protected]
Abstract. The current environmental concerns increase the importance of sustainability in the construction industry. In this respect, information and communication technologies were identified to support automation and control systems promoting occupant’s comfort while safeguarding energy efficiency. In this study, a systematic review was performed to identify relevant applications supported by information and communications technologies with impact in the comfort of the occupants of green buildings. Results indicate that the scientific literature presents valuable arguments regarding the importance of information technologies to increase the occupant’s comfort and to minimize energy consumption of the modern green buildings. Keywords: Systematic literature review Green buildings comfort Information and communication technologies
Occupant’s
1 Introduction Buildings have a profound impact on quality of life, public health, or productivity, while consuming a significant amount of natural resources [1, 2]. To meet their vital needs, both residential and commercial buildings are connected to conventional electricity, gas, and water grids, and are responsible for a significant percentage of the world energy consumption: in 2015, residential and commercial buildings represented nearly 50% of the world energy consumption [3]. Therefore, they are important contributors for the unfolding environmental crisis [4]. The current environmental concerns resulted in a rising global awareness of the importance of sustainability in the construction industry [1, 4–6]. This has stimulated significant alterations for the built environment, with reductions in the levels of energy © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 197–208, 2020. https://doi.org/10.1007/978-3-030-45691-7_19
198
A. I. Martins et al.
consumption and natural resource depletion that have been required in traditional building life cycles [7, 8]. These changes offer several environmental, economic, and social benefits, including occupant’s satisfaction, energy costs, protection of natural resources and prevention of climate change by minimizing the emissions of carbon dioxide (CO2) and other pollutants [1, 9]. Considering the research related to green buildings, one of the current trends is the focus on the occupant’s comfort [10]. Particularly, Information and Communication Technologies (ICTs) in green buildings were identified to support automation and control systems promoting occupant’s comfort and, simultaneously, safeguarding energy efficiency [10–14]. Although there are articles systematizing evidence related to green buildings technologies (e.g., [1, 10, 15]), to the best of the author’s knowledge, there are no systematic reviews of the literature related to the use of ICTs to increase the occupants’ comfort, while minimizing energy consumption. Since systematic evidence is required to inform stakeholders and researchers about state-of-the-art solutions, the systematic review reported by the present article aimed at identifying the most relevant applications supported by ICTs with impact in the comfort of green building’s occupants.
2 Methods Considering the aforementioned research objective, the following research questions were considered: • • • •
RQ1: What are the most relevant application types? RQ2: How is it measured the impact in the comfort of the occupants? RQ3: What is the maturity level of the applications being reported? RQ4: What are the major barriers for the dissemination of the applications being reported?
Boolean queries were prepared to include all the articles published until 2018 that have their titles, abstract or keywords conform with the following expression: (intelligent OR smart OR wise OR cognitive) AND (building OR domestic OR home OR dwelling OR house) AND (efficient OR green OR eco OR ecological OR environmentfriendly OR environmental) AND (comfort OR well-being). The searched resources included general databases, Web of Science and Scopus, and one specific technological database, IEEE Xplore. As inclusion criteria, the authors aimed to include all the articles published in scientific journals or in conference proceedings that have as main purpose the explicit use of ICTs in modern green buildings to promote the occupants’ comfort while safeguarding energy efficiency. Considering the exclusion criteria, the authors aimed to exclude all the articles not published in English, without abstracts or without access to the full text. Furthermore, the authors also aimed to exclude all articles that: do not report primary research results, such as reviews, surveys, or editorials; report solutions that are not intended to be used inside buildings (e.g., streets lighting); do not report results of the application of ICTs, but other technologies (e.g., innovative construction materials); report studies
Enabling Green Building’s Comfort Using ICTs
199
where the main purpose of the application of ICTs was energy efficiency, although impacts on the comfort of the occupants could be envisaged; and do not report implementations of systems aiming the occupant’s comfort, but rather partial solutions such as algorithms or support studies and tools. After the removal of duplicates and articles without abstracts, the analyses of the remaining articles were performed according to the following two steps: first, the authors assessed all titles and abstracts according to the outlined inclusion and exclusion criteria and those clearly not meeting the inclusion criteria were removed; afterwards, the authors assessed the full text of the included articles according to the outlined inclusion and exclusion criteria and classified the retrieved articles. In these two steps, all the articles were analyzed by at least two authors. Any disagreement was discussed and resolved by consensus.
3 Results A total of 999 articles were retrieved from the initial search on the Web of Science, Scopus and IEEE Xplorer. The initial screen yielded 929 articles, by removing duplicates (69 articles) and articles without abstract (one article). Based on the title and abstracts screening 642 articles were removed due to the following reasons: articles reporting overview or reviews; editorials, prefaces, and announcements of special issues, workshops, or books; articles reporting solutions intended for outdoor spaces; articles out of context for the present study since they do not report the application of ICTs (e.g., architecture, education, certification, building techniques or new materials); or articles that, although reporting systems based on ICTs, their focus was not the comfort of the occupants. Afterward, the full texts of the remaining 214 articles were analyzed and 175 articles were excluded due to the following reasons: the full texts were not in English (five articles); did not report the implementation of systems, but rather partial solutions such as algorithms (94 articles) or support studies and tools (66 articles); or reported studies where the main purpose of the application of ICTs was energy efficiency (ten articles). After the respective analyses, the 39 retrieved articles were divided into the following application domains: air quality (four articles) [16–19]; visual comfort (nine articles) [14, 20–27]; thermal comfort (17 articles) [12, 13, 28–42]; and multiple perspective comfort (nine articles) [11, 43–50]. A significant percentage of these articles were published in conference proceedings and only 12 articles were published in scientific journals [12, 17, 20, 30, 31, 35, 36, 38, 39, 45, 49, 50]. 3.1
Air Quality
Three articles [16, 18, 19] propose the implementation of sensing networks composed by different wireless sensors (e.g., sensors to determine temperature, humidity or gaseous pollutants) to guarantee the indoor air quality. Article [16] presents a sensor data acquisition module and a toolkit to collect the raw sensor data, extract relevant
200
A. I. Martins et al.
parameters and save them in a specific database to feed applications controlling heating, ventilation, and air conditioning (HVAC) systems. In turn, [18] propose a framework, the Intelligent, Interactive, and Immersive Buildings framework, to take advantage from the interaction of the building’s occupants to improve their satisfaction and energy efficiency. Finally, [19] proposes the use of unmanned aerial vehicles (UAV) to enable the mobility of the sensors. All the articles refer the development of prototypes for validation. In particular, article [18] reports the results of a preliminary validation in three public buildings, being one of them a hospital. The results are aligned with the initial expectations. Furthermore, one article, [17], proposes an intelligent control strategy for ventilation systems. Based on the determination of the occupation of a building a demandcontrolled ventilation is utilized to prevent the energy waste from ventilating an empty building. Moreover, the predictive model was optimized to reduce its dependency on physical sensors for avoiding the inaccurate measurements. The performance of the proposed system was assessed using simulation techniques. The results indicated that the system has the potential of reducing energy consumption while maintaining the occupant’s comfort level [17]. 3.2
Visual Comfort
In terms of visual comfort, two articles [14, 22] report the use of home automation technology for lighting control. In both studies, centralized systems use light sensors and corrective feedback to ensure adequate lighting for living spaces (i.e., article [14]) or specific workspaces such a desk, which is the purpose of the article [22]. Four articles [20, 23, 25, 27] report different mechanisms to estimate occupant’s location and activity in a room to feed light control algorithms in order to optimize user comfort and energy cost of the lights: in [25, 27] presence sensors (e.g., infrared temperature sensor) are used to detect occupancy inside a room; [20] reports on the use of cameras and a distributed processing method to retrieve the required information for the lighting control system; and [23] proposes a smart luminance scenario based on participatory sensing. Finally, three articles [21, 24, 26] present automated lighting control frameworks that dynamically learn occupant’s lighting preferences. Article [21] propose a method to learn the subject’s visual comfort preferences by analyzing their behaviors. In turn, article [24] presents a non-intrusive, multimodal application to continuously analyze ambient information and derive dynamic models of the occupant’s visual comfort, safeguarding their preferences under different control scenarios. Additionally, in [26] is presented a smart system that analyses the occupant’s lighting habits, considering different environmental context variables and occupant’s needs in order to automatically learn about their preferences and automate the ambience dynamically. From the validation point of view, the studies related to visual comfort developed prototypes for different purposes. Some articles [20, 22, 25–27] report studies that developed prototypes to validate proof of concepts (e.g., [25]) or to evaluate the performance of the technologies being used (e.g., [20, 22–27]), including the coverage of the communication network (e.g., [22]) or the efficiency of the implemented
Enabling Green Building’s Comfort Using ICTs
201
algorithms (e.g., [27]). However, the remaining articles [14, 21, 23, 24] report the development of prototypes that were evaluated by real users. The prototype assessment reported in [14] involved 11 subjects and was focused on lighting comfort preferences. The experimental results indicated that target brightness distribution within a worker’s field of vision could be realized. Moreover, in [21] two end-users were involved in a laboratory experiment. The authors conclude that the system can capture the individual comfort preferences and behaviors of the occupants as well as has the ability of self-learning and adaptation to new situations when the conditions change. However, the current system focuses more on single user, so its application to a group of occupants requires further research. Also, two subjects were involved in the experience reported in [24]. The assessment indicated that it is possible more than 10% energy savings retaining comfort levels above 90% or more than 35% savings retaining comfort levels above 75%. Finally, according to [23] a testbed facility was deployed in a room of a university premises and the authors concluded that participatory sensing and a behavioral aware incentive policy might promote energy efficiency without impact in terms of visual comfort [23]. 3.3
Thermal Comfort
Thermal comfort is addressed by several of the included articles, although different strategies were considered namely: temperature control based on explicit feedback of the occupants [36, 42]; optimization of HVAC systems using environmental information [33, 34, 38–41]; or optimization of thermal comfort based on knowledge of the occupants [12, 13, 28–32, 35, 37]. The end-to-end framework presented in [36] was designed to enable the gathering of the occupant’s feedback and to incorporate this feedback in a control algorithm, which ensures that the zonal temperatures are coordinated to minimize occupant’s discomfort and total energy cost. To gather real-time data, a mobile smartphone application and a distributed sensor network were developed, namely, to collect the thermal preferences of the occupants and to measure zonal temperatures. This allowed tying individual occupant’s feedback to their respective zone of occupancy. Also, considering the explicit input of the occupants, [42] describes a system aiming to ensure a pleasant sleep during the warm nights by controlling the airconditioner, according to the occupant’s preferences stated on a web page. In terms of experimental results, while [42] reports an experience during 20 days indicating that the system increased the satisfaction of occupants, [36] reports two experimental studies in two different settings (i.e., a single-family residential home setting and in a university-based laboratory space setting) and concludes that the variation in thermal preferences among occupants and even for the same occupant over time emphasizes the usefulness of a thermal comfort system based on occupant’s behaviors. Seven articles [33, 34, 38–41] report studies aiming to optimize HVAC systems using environmental information: [33] describes a network with a large number of sensors that was designed and deployed to monitor the environmental conditions within a large multi-use building; [34] uses building information modelling and the allocation of people’s places; [38] describes a data access component capable of collecting and
202
A. I. Martins et al.
aggregating information from a number of heterogeneous sources (e.g., sensors, weather stations or weather forecasts) and a model‐based optimization methodology to generate intelligent operational decisions; [39] uses a two-way communication system, an enhanced database management system and a set of machine learning algorithms based on random forest regression techniques to provide an optimal energy-efficient predictive control of HVAC systems; [40] describes a network of sensors and actuators to support scheduling mechanisms according to which HVAC working times can be selected aiming to achieve a trade-off between energy costs and thermal comfort of occupants; and [41] presents a multi-agent power consumption and comfort management systems to optimize the comfort of multi-zone buildings. Considering the assessment results, initial analysis of the data collected in [33] suggests that it is possible to classify occupant comfort using inexpensive sensor nodes. Moreover, the model described by [34] was developed for an office in a university campus that was selected to analyze the performance of the air conditioning system and the level of thermal comfort. Two experiments were conducted with four and nine subjects, and the measurement and simulation results showed that, in extreme scenarios, energy-savings could reach 50% with thermal comfort parameters between acceptable ranges [34]. The methodology proposed by [38] has been evaluated on a heating experiment in a university office building. User acceptance of the system has been good, with the occupants stating that indoor comfort in their offices has been better compared to the manual operation of the system in days with similar conditions, mostly due to the preheating in the morning [38]. In turn, the platform presented in [39], the Next 24 h-Energy platform, was tested in a real office building and the results showed significant energy reduction both in heating (48%) and cooling (39%) consumption [39]. Also, the test of the network of sensors and actuators presented in [40] was implemented in a real scenario and the experimental results indicate that the system ensures the occupant’s thermal comfort with traditional approaches. Finally, the experimental results reported by [41] showed that when using the multi-agent energy management system, temperature varies between 21 °C to 26 °C, which coincided with the comfort level in terms of temperature, and the relative humidity rate also coincided with the comfort level standard because it changed between 30% and 40%. Nine articles [12, 13, 28–32, 35, 37] propose different solutions to optimize thermal comfort by using knowledge of the occupants. Different techniques are applied to gather this occupant’s knowledge: [12] describes a control system based on the mobility prediction of the occupants, using contextual information obtained by mobile phones; [13] reports the use of a set of sensors, including a Microsoft Kinect, to gather information to control heating and cooling elements in order to dynamically adjust indoor temperature; [28, 29, 37] describe a framework of information processing and knowledge discovery for analyzing data collected from wireless sensor networks; the study reported in [30] aimed to estimate the number of occupants and duration of occupant activities through deploying a large-scale sensor network in an open-plan office testbed environment; [31] uses cyber-physical systems for gathering and acting on relevant information in
Enabling Green Building’s Comfort Using ICTs
203
laboratories; and [32, 35] present frameworks to account for human factors that affect thermal comfort in buildings such as activity levels and clothing values. In terms of evaluation, four articles (i.e., [28–31]) do not report assessments involving real users. However, [31] presents a performance evaluation of the deployed infrastructure showing that cyber-physical systems of a smart laboratory can be used to regulate thermal comfort, and [28, 30] present simulation results. The results of [30] showed that there could be 18.5% in energy saving with 87.5% thermal comfort satisfaction using an occupant behavior-based control approach, while [28] indicates a promising capability in determining thermal comfort even in locations where sensor nodes are not available. Furthermore, the use of sensing networks to acquire knowledge of the occupants was tested with success in different studies: a case study in [13, 32], two case studies in [37] and three cases studies in [35]. Finally, the experiment reported by [12] involved 21 participants and allowed the researchers to conclude that their approach can decrease energy consumption by more than 25%, and predict at least 70% of the transit cases. 3.4
Multiple Perspective Comfort
One article [46] is focused on thermal and visual comfort. The article presents a context manager for public buildings that automatically controls lighting and temperature and displays power consumption. In terms of evaluation, a model was developed from a scenario of a university classroom and the results indicated that the context manager enabled rapid and automated changes [46]. Four articles [45, 47, 49, 50] report research studies aiming to improve thermal and environmental comfort (i.e., temperature, humidity, and air quality). Article [45] proposes the establishment of an environmental health information management platform to improve the indoor environment. Article [47] presents a system that uses an affordable and easy to install consumer weather station to measure the temperature, humidity, and CO2 concentration. Based on these measures, the system estimates the number of occupants in a room and whether the windows are opened or closed. It uses this information together with knowledge stored in an ontology to recommend actions that improve the environment quality. In turn, the article [49] describes an indoor environment control based on human thermal adaptation. The data on indoor air temperature, relative humidity, CO2 concentration, and outdoor air temperature are continuously collected and recorded at 1-s intervals. Moreover, in the article [50] the authors propose a solution to augment HVAC systems with the ability of smart monitoring and control of the thermal and environmental comfort of occupants by exploiting temperature, humidity, air quality, and occupancy measurements through wireless sensor networks. It consists of a set of sensors (e.g., temperature, humidity, air quality, or occupancy sensors) and actuators that work together to control HVAC systems. The performance of the solution proposed by [50] was evaluated using extensive simulation studies. These studies indicated the ability of the proposed solution to maintain the thermal comfort of occupants in various zones while implementing
204
A. I. Martins et al.
different energy saving profiles. The performance evaluation reported by [47] showed that the system might improve the temperature, humidity, and CO2 parameters. The results reported by [49] were obtained with ten occupants on the hottest month of summer and indicated that the implemented adaptation-based control shows impact on human comfort and energy saving. Finally, based on data of a real building [45] concluded that a system providing unified means of environmental sensing and control can offer a safe, convenient, healthy, comfortable, environmentally friendly, and energy-conserving environment to its occupants. Three articles [43, 44, 48] also considered illumination in addition to thermal and environmental comfort: [43] presents a multi-agent based intelligent control system for achieving effective energy and comfort management in a building environment; [44] describes an intelligent system architecture that is based on neural networks, expert systems and negotiating agent technologies that was designed to optimize the intelligent building’s performance considering information of preferences provided by occupants; and [48] describes a system to control temperature, humidity, air pressure and illumination using an Internet of Things (IoT) wireless sensor network and an expert system. In terms of assessment, [43] presents simulation results and [48] presents a performance evaluation. Moreover, [44] reports a field trial. In this field trial, people who worked in a public building was strongly encouraged to fill out surveys about the building’s performance. The comments of users who admitted positive changes in comfort conditions were confirmed by a survey. Finally, the article [11] presents a system, the Human Comfort Ambient Intelligence system, aiming to control different aspects of the human comfort: temperature, humidity, air quality, illumination and acoustic. Also, the design of a fuzzy rule-based system for the measurement of human comfort in a living space is presented. This subsystem gives the indicator of a living space comfort index based upon thermal comfort, visual comfort, indoor air comfort and acoustical comfort values. For that different parameters are collected, such as air temperature, mean radiant temperature, relative humidity, air velocity, clothing, metabolic rate, luminance level, shading level, CO2 concentration and sound level. Due to the modular comfort model, each comfort factor can be calculated within the group itself even though the sense value might come from different node [11]. In terms of assessment, the article [11] presents a simulation based on real data. However, real users were not involved in the experiment.
4 Discussion and Conclusion Concerning the first research question (i.e. what are the most relevant application types?), we might conclude that the retrieved articles use different devices to sense environmental and occupant’s parameters with diverse aims: air quality (four articles); visual comfort (nine articles); thermal comfort (17 articles); and multiple perspective comfort (nine articles).
Enabling Green Building’s Comfort Using ICTs
205
Looking for how to measure the applications’ impact on occupant comfort (i.e. the second research question), different methods were used to assess the proposed solutions: simulations were performed [11, 28, 30, 45, 48], some of them using real data (e.g., [11, 45]); prototypes were developed to demonstrate the feasibility of the concepts [16, 18, 19, 25, 29]; prototypes were implemented and used to evaluate the performance of the technologies being developed [17, 20, 22, 26, 27, 31, 41, 43, 47, 50]; and prototypes were developed to be evaluated by real users [12, 14, 21, 23, 24, 32, 35, 36, 42, 44, 46, 49]. Therefore, this systematic review of the literature shows that the retrieved articles report solutions for green buildings still far from being consolidated (i.e., the third research question). Moreover, most of them do not report evidence about the evaluation of the applications in real environments with real users. It is evident the lack of robust evaluation trials, which is one of the major barriers to the dissemination of the solutions being proposed (i.e., the fourth research question). The results of this systematic review indicate that the 39 retrieved articles report and discuss important topics related to the use of ICTs to support human comfort in modern green buildings. After this revision, it is possible to conclude that the scientific literature presents valuable arguments regarding the importance of ICts to increase the occupant’s comfort while minimizing energy consumption. Nevertheless, additional research is required, including trials for the application’s assessment involving a significant number of potential building occupants, to facilitate the dissemination of innovative solutions. It is always possible to point out limitations about both the chosen keywords and the databases that were used in the research. Moreover, since most articles were published in conference proceedings, it should be borne in mind that, since there are many non-indexed conferences, there will certainly be similar articles that have not been included. Finally, it should also be noted that the grey literature was not considered in this review and that this is a gap of some significance. Acknowledgments. The present study was developed in the scope of the Smart Green Homes Project [POCI-01-0247 FEDER-007678], a co-promotion between Bosch Termotecnologia S.A. and the University of Aveiro. It is financed by Portugal 2020, under the Competitiveness and Internationalization Operational Program, and by the European Regional Development Fund.
References 1. Darko, A., Zhang, C., Chan, A.P.: Drivers for green building: a review of empirical studies. Habitat Int. 60, 34–49 (2017) 2. US Green Building Council: Building Momentum: National Trends and Prospects for HighPerformance Green Buildings. US Green Building Council, Washington, DC (2003) 3. Hossain, M.F.: Solar energy integration into advanced building design for meeting energy demand and environment problem. Int. J. Energy Res. 40(9), 1293–1300 (2016) 4. Hossain, M.F.: Green science: independent building technology to mitigate energy, environment, and climate change. Renew. Sustain. Energy Rev. 73, 695–705 (2017) 5. Wang, L., Toppinen, A., Juslin, H.: Use of wood in green building: a study of expert perspectives from the UK. J. Clean. Prod. 65, 350e361 (2014)
206
A. I. Martins et al.
6. Gou, Z., Xie, X.: Evolving green building: triple bottom line or regenerative design? J. Clean. Prod. 153, 600–607 (2017) 7. Wong, J.K.W., Zhou, J.: Enhancing environmental sustainability over building life cycles through green BIM: a review. Autom. Constr. 57, 156–165 (2015) 8. Berardi, U.: Sustainable construction: green building design and delivery. Intell. Build. Int. 5 (1), 65–66 (2013) 9. Ahn, Y.H., Pearce, A.R., Wang, Y., Wang, G.: Drivers and barriers of sustainable design and construction: the perception of green building experience. Int. J. Sustain. Build. Technol. Urban Dev. 4(1), 35–45 (2013) 10. Zhao, X., Zuo, J., Wu, G., Huang, C.: A bibliometric review of green building research 2000–2016. Architect. Sci. Rev. 62(1), 74–88 (2019) 11. Rawi, M.I.M., Al-Anbuky, A.: Development of intelligent wireless sensor networks for human comfort index measurement. Proc. Comput. Sci. 5, 232–239 (2011) 12. Lee, S., Chon, Y., Kim, Y., Ha, R., Cha, H.: Occupancy prediction algorithms for thermostat control systems using mobile devices. IEEE Trans. Smart Grid 4(3), 1332–1340 (2013) 13. Gao, P.X., Keshav, S.: SPOT: a smart personalized office thermal control system. In: Proceedings of the Fourth International Conference on Future Energy Systems, pp. 237–246. ACM (2013) 14. Li, X., Chen, G., Zhao, B., Liang, X.: A kind of intelligent lighting control system using the EnOcean network. In: 2014 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 1–5. IEEE (2014) 15. Soares, N., Bastos, J., Pereira, L.D., Soares, A., Amaral, A.R., Asadi, E., Rodrigues, E., Lamas, F.B., Monteiro, H., Lopes, M.A.R., Gaspar, A.R.: A review on current advances in the energy and environmental performance of buildings towards a more sustainable built environment. Renew. Sustain. Energy Rev. 77, 845–860 (2017) 16. Bhattacharya, S., Sridevi, S., Pitchiah, R.: Indoor air quality monitoring using wireless sensor network. In: 2012 Sixth International Conference on Sensing Technology (ICST), pp. 422–427. IEEE (2012) 17. Wang, Z., Wang, L.: Intelligent control of ventilation system for energy-efficient buildings with CO2 predictive model. IEEE Trans. Smart Grid 4(2), 686–693 (2013) 18. Costa, A.A., Lopes, P.M., Antunes, A., Cabral, I., Grilo, A., Rodrigues, F.M.: 3I buildings: intelligent, interactive and immersive buildings. Proc. Eng. 123, 7–14 (2015) 19. Zhi, S., Wei, Y., Cao, Z., Hou, C.: Intelligent controlling of indoor air quality based on remote monitoring platform by considering building environment. In: 2017 4th International Conference on Systems and Informatics (ICSAI), pp. 627–631. IEEE (2017) 20. Lee, H., Wu, C., Aghajan, H.: Vision-based user-centric light control for smart environments. Pervasive Mob. Comput. 7(2), 223–240 (2011) 21. Cheng, Z., Xia, L., Zhao, Q., Zhao, Y., Wang, F., Song, F.: Integrated control of blind and lights in daily office environment. In: 2013 IEEE International Conference on Automation Science and Engineering (CASE), pp. 587–592. IEEE (2013) 22. Miki, M., Yoshii, T., Ikegami, H., Azuma, Y., Emi, A., Yoshida, K.: A lighting control system to optimize brightness distribution on working places. In: 2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI), pp. 23–28. IEEE (2013) 23. Angelopoulos, C.M., Evangelatos, O., Nikoletseas, S., Raptis, T.P., Rolim, J.D., Veroutis, K.: A user-enabled testbed architecture with mobile crowdsensing support for smart, green buildings. In: 2015 IEEE International Conference on Communications (ICC), pp. 573–578. IEEE (2015) 24. Malavazos, C., Papanikolaou, A., Tsatsakis, K., Hatzoplaki, E.: Combined visual comfort and energy efficiency through true personalization of automated lighting control. In: 2015
Enabling Green Building’s Comfort Using ICTs
25.
26.
27.
28.
29. 30.
31. 32.
33.
34.
35.
36.
37.
38.
39.
207
International Conference on Smart Cities and Green ICT Systems (SMARTGREENS), pp. 1–7. IEEE (2015) Behera, A.R., Devi, J., Mishra, D.S.: A comparative study and implementation of real time home automation system. In: 2015 International Conference on Energy Systems and Applications, pp. 28–33. IEEE (2015) Yin, X., Keoh, S.L.: Personalized ambience: an integration of learning model and intelligent lighting control. In: 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), pp. 666– 671. IEEE (2016) Basnayake, B.A.D.J.C.K., Amarasinghe, Y.W.R., Attalage, R.A., Jayasekara, A.G.B.P.: Occupancy identification based energy efficient Illuminance controller with improved visual comfort in buildings. In: 2017 Moratuwa Engineering Research Conference (MERCon), pp. 304–309. IEEE (2017) Rawi, M.I.M., Al-Anbuky, A.: Passive house sensor networks: human centric thermal comfort concept. In: 2009 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 255–260. IEEE (2009) Chang, S., Kim, C., Suh, D.: PIMA: RFID and USN based personalized indoor microclimate adjuster. In: ICCAS 2010, pp. 859–862. IEEE (2010) Dong, B., Lam, K.P.: Building energy and comfort management through occupant behaviour pattern detection based on a large-scale environmental sensor network. J. Build. Perform. Simul. 4(4), 359–369 (2011) Lei, C.U., Man, K.L., Liang, H.N., Lim, E.G., Wan, K.: Building an intelligent laboratory environment via a cyber-physical system. Int. J. Distrib. Sens. Netw. 9(12), 109014 (2013) Jazizadeh, F., Ghahramani, A., Becerik-Gerber, B., Kichkaylo, T., Orosz, M.: Personalized thermal comfort-driven control in HVAC-operated office buildings. Comput. Civ. Eng. 2013, 218–225 (2013) Pitt, L., Green, P.R., Lennox, B.: A sensor network for predicting and maintaining occupant comfort. In: 2013 IEEE Workshop on Environmental Energy and Structural Monitoring Systems, pp. 1–6. IEEE (2013) Pazhoohesh, M., Shahmir, R., Zhang, C.: Investigating thermal comfort and occupants position impacts on building sustainability using CFD and BIM. In: 49th International Conference of the Architectural Science Association, pp. 257–266. The Architectural Science Association and The University of Melbourne (2015) Rana, R., Kusy, B., Wall, J., Hu, W.: Novel activity classification and occupancy estimation methods for intelligent HVAC (heating, ventilation and air conditioning) systems. Energy 93, 245–255 (2015) Gupta, S.K., Atkinson, S., O’Boyle, I., Drogo, J., Kar, K., Mishra, S., Wen, J.T.: BEES: Real-time occupant feedback and environmental learning framework for collaborative thermal management in multi-zone, multi-occupant buildings. Energy Build. 125, 142–152 (2016) Lim, S.J., Puteh, S., Goh, K.C.: Information processing and knowledge discovery framework for sustainable building environment using multiple sensor network. In: 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp. 1448–1452. IEEE (2016) Katsigarakis, K.I., Kontes, G.D., Giannakis, G.I., Rovas, D.V.: Sense-think-act framework for intelligent building energy management. Comput.-Aided Civ. Infrastruct. Eng. 31(1), 50– 64 (2016) Manjarres, D., Mera, A., Perea, E., Lejarazu, A., Gil-Lopez, S.: An energy-efficient predictive control for HVAC systems applied to tertiary buildings based on regression techniques. Energy Build. 152, 409–417 (2017)
208
A. I. Martins et al.
40. Marche, C., Nitti, M., Pilloni, V.: Energy efficiency in smart building: a comfort aware approach based on Social Internet of Things. In: 2017 Global Internet of Things Summit (GIoTS), pp. 1–6. IEEE (2017) 41. Altayeva, A., Omarov, B., Giyenko, A., Im Cho, Y.: Multi-agent based smart grid system development for building energy and comfort management. In: 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 58–63. IEEE (2017) 42. Matsui, K.: Pleasant sleep provision system during warm nights as a function of smart home. In: 2017 International Symposium on Networks, Computers and Communications (ISNCC), pp. 1–6. IEEE (2017) 43. Yang, R., Wang, L.: Multi-agent based energy and comfort management in a building environment considering behaviors of occupants. In: 2012 IEEE Power and Energy Society General Meeting, pp. 1–7. IEEE (2012) 44. Anastasi, G., Corucci, F., Marcelloni, F.: An intelligent system for electrical energy management in buildings. In: 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 702–707. IEEE (2011) 45. Chen, S.Y., Huang, J.T., Wen, S.L., Feng, M.W.: Optimized control of indoor environmental health-example of the fu-an memorial building. Comput.-Aided Des. Appl. 9(5), 733–745 (2012) 46. Kamienski, C., Borelli, F., Biondi, G., Rosa, W., Pinheiro, I., Zyrianoff, I., Sadok, D., Pramudianto, F.: Context-aware energy efficiency management for smart buildings. In: 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pp. 699–704. IEEE (2015) 47. Frešer, M., Gradišek, A., Cvetković, B., Luštrek, M.: An intelligent system to improve THC parameters at the workplace. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 61–64. ACM (2016) 48. Lan, L., Tan, Y.K.: Advanced building energy monitoring using wireless sensor integrated EnergyPlus platform for personal climate control. In: 2015 IEEE 11th International Conference on Power Electronics and Drive Systems, pp. 567–574. IEEE (2015) 49. Zhang, Y., Mai, J., Zhang, M., Wang, F., Zhai, Y.: Adaptation-based indoor environment control in a hot-humid area. Build. Environ. 117, 238–247 (2017) 50. Farag, W.A.: ClimaCon: an autonomous energy efficient climate control solution for smart buildings. Asian J. Control 19(4), 1375–1391 (2017)
eFish – An Innovative Fresh Fish Evaluation System Renato Sousa Pinto1, Manuel Au-Yong-Oliveira2(&), Rui Ferreira1, Osvaldo Rocha Pacheco1, and Rui Miranda Rocha3 1
2
Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, Aveiro, Portugal {renatopinto,ruiantonio.ferreira,orp}@ua.pt GOVCOPP, Departamento de Economia, Gestão, Engenharia Industrial e Turismo, Universidade de Aveiro, Aveiro, Portugal [email protected] 3 Departamento de Biologia, Universidade de Aveiro, Aveiro, Portugal [email protected]
Abstract. Portugal is a country with a large coastal extension and a fishing and sea tradition which place it on the top of fresh, salty and deep-frozen fish consumption in the European Union. The incapacity by consumers to evaluate the degree of freshness and quality of fish for consumption may cause health problems, such as food poisoning, hence the preference of consumers for deepfrozen products instead of healthier fish that is sold fresh. In this project we intend to contribute to an improvement in the evaluation by the final consumer of the quality of fresh fish sold to them through creating a reference table of fish degradation (at this experimental stage for mackerel and sardine) and the development of a mobile application allowing, in a near future, for users to determine the rate of freshness of fish they wish to acquire. The reference table creation is based on the degradation/dehydration of the fish’s eye as well as a respective coloration (opaqueness), which changes according to the time since the fish was caught. Keywords: Fresh fish Mobile application Entrepreneurship Innovation
Food quality Healthiness
1 First Section 1.1
Introduction
Portugal has a large coastal extension with around 2.830 km (including the islands of Madeira, Porto Santo and the Azores). This coast is abundant in fish, being this the main dish in the Portuguese diet. That means that fishing and fresh fish are important elements not only for the Portuguese food and economy but also very import in environmental, social and cultural aspects. In December of 2017 there were 16.164 commercial (professional) fishers registered [1, 2]. Portugal is the highest fish consumer of the European Union, with approximately 61.5 kg/person/year; this value is clearly above the world’s average, which is around © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 209–220, 2020. https://doi.org/10.1007/978-3-030-45691-7_20
210
R. S. Pinto et al.
22.3 kg/person/year only being surpassed by Norway, with 66.6 kg/person/year and by South Korea, with around 78.5 kg/person/year [3]. However, despite the world tendency for consuming healthy foods, namely fish, consumption recording is still tending to be of processed fish (e.g. fish fingers) and deep-frozen fish instead of fresh fish. This tendency may be related to the lack of consumer knowledge, namely referring to the state and quality of the fish, as this way it feels safer to the consumer to buy preprocessed elements with a defined expiration date (present on the packaging). Generally, with perishable products, it is possible to know if the products are within the expiration date through the inclusion of the date in the package of the food. Albeit, not all products are tagged. Examples of this are fresh products, such as meat, fish, seafood, among others, whose deterioration is subject to other factors, not only time but also, for instance, the maneuvering of the fish or the exposure to the environment (e.g. excessive heat). The incapacity of the consumers to evaluate the degree of freshness and the quality of a fish might bring health issues such as food poisoning, where there is the possibility of ingesting food which may seem healthy but is however not in the best conditions. 1.2
Motivation
The possibility of evaluation, concerning the quality and durability of most foods has been standardized many years ago. This information is available, mandatorily, in the European Union, to consumers, including, for example, the tagging of products with a packaging date and an expiration date. A product beyond the expiration date does not mean that it is necessarily tainted; it does however mean that it is not in the peak state of its quality and freshness. However, there are a special group of products, such as fresh fish and meat and seafood, that do not have a possible way to have such an expiration date tag defined and there is not an easy way to evaluate the state of freshness. The evaluation is not always easy, since there is no obvious solution and normally it is necessary to have specific knowledge. The lack of this specialized knowledge can lead consumers to buy decayed products or even spoiled products, in the case of the consumer not being able by their own means, to identify or acknowledge the freshness of the product in hand. In the case of fish, it is known that the freshness depends not only on the date of when it was fished, but also in the way it was maneuvered and even in the conditions where it was kept since it was fished. The parameters that were catalogued (such as time, maneuvering and preservation state) affect the state of the gills and the state of the eyes of the fish, and of the overall fish. If we bear in mind the case where, for example, the acquisition of a fish in a major commercial environment, many times the gills are already taken off, removing one way of measuring freshness available to the consumer. The eyes of the fish are normally not removed, leaving here an opportunity for the consumers to evaluate a product before purchase. It is known by experts that a fresh fish has transparent eyes and with a perfect spherical shape; also known, over time and with insufficient preservation conditions, this transparency diminishes, becoming opaque. Another factor that can evidence the freshness of the fish is the spherical shape of the eye; as time passes and through
eFish – An Innovative Fresh Fish Evaluation System
211
dehydration, the shape of the eye will change. Another fact known by experts is that the rate of degradation and attenuation of such characteristics varies from species to species. In July of 2018, 6.9 million Portuguese people owned smartphones, representing a penetration rate of 75.1% of this kind of product, according to [4]. The challenge of this project is, through a smartphone application, to access experts’ knowledge enabling the users/consumers of fresh fish, to identify the state of freshness of several types of fish before its acquisition. 1.3
The Problem to Be Solved with This Project
The big challenge to solve in this project is to create a mobile application that, through a photograph taken of a fish’s eye and using a mobile phone connected to the internet, tells the user whether the fish they have photographed is fresh. We also know that eye size varies from species to species. Thus, as a complementary challenge, in order to answer users’ questions, it will be necessary to maintain a database of the various fish species and to maintain a table (with fish evaluations) of their degradation over time. 1.4
On Innovation and Entrepreneurship in Society
Innovation is as old as the human species, though it was Schumpeter [10] who glorified the role of the entrepreneur as a catalyst of change. We have always strived to become better and to evolve into something different, through our inventions, leading to improved quality of life. According also to [11, p. vii] we have “a particular psychological factor – the need for Achievement – responsible for economic growth and decline”. This study is thus the result of a desire to create a novel product to serve a need in society, in this case related to health and to healthy food. In many respects, the study is the consequence of an entrepreneurial desire to capitalize on a business opportunity [12], though, admittedly, at this stage, not really involving a business plan but rather the desire to contribute to society with an intellectual creation. There is a desire to prove that one is able and that one is right. As [10] affirmed, the motivation is not merely linked to profit, but rather linked to the joy that creating something new gives rise to. The entrepreneurial path is long and narrow, and we expect many difficulties to arise. Innovation may be related to marketing (e.g. involving new pricing or promotion methods), it may be organizational (involving, for example, new organizational methods, such as external partnerships), it may involve a new process (a different way to perform tasks), and it may lead to novel products and services (involving a new product design or function not yet available in the marketplace) [13]. The main focus herein is with the latter. “Innovation involves the utilization of new knowledge or a new use or combination of existing knowledge” [13, p. 35]. We recognize that there exists the risk whereby “the
212
R. S. Pinto et al.
benefits of creative invention are rarely fully appropriated by the inventing firm” [13, p. 35]. At this stage, this is not a major preoccupation. The novelty level of the new product described in this study is incremental rather than radical, not being entirely “new to the world” [14, p. 12]. The product is thus an improvement to existing products and technologies, albeit with the ability to change how people purchase their food (in this case fish) if it becomes mainstream. This paper is organized as follows: the next section is a study of similar technologies; after this section, the methodology of the paper is discussed; the eFish system is then described; Sect. 5 is a description of the system architecture; finally, we have a conclusion section and suggestions for future work.
2 A Study of Similar Technologies Having defined the target of the project, the development of a mobile application capable of determining the freshness of a fish, it is necessary to study if there are other applications that already perform this task. Since there is no application for the same purpose, as the proponents of this paper believe, it is necessary to investigate similar applications and how they work. Thus, it will be possible to evaluate which innovation is associated with this project. Research has been carried out on similar projects and prototypes in order to understand what already exists and what this project can bring back to this area. 2.1
PERES
PERES [5] is an application that evaluates the freshness of a food. To accomplish this task, it uses a complementary device that has four sensors that allow it to evaluate the following food parameters: temperature, humidity, ammonia and volatile organic compounds. In addition, there is an application for android and iOS that allows the user to view the results obtained by the device and relevant statistics to check the food conditions. This application + device set is still in the prototype phase and is seeking funding for its development. 2.2
FOODsniffer
FOODsniffer [6] is an application targeted at raw pork, beef and poultry, as well as fresh fish. It uses a complementary device similar to the PERES system that detects the freshness of compatible foods and sends the collected information to an application that allows the user to check the freshness of the food. It is available for Android and iOS operating systems.
eFish – An Innovative Fresh Fish Evaluation System
2.3
213
Vivino
Vivino [7] is an application for identifying wines through a photograph of their labels. Using a smartphone camera, the user takes a picture of the wine label, which is sent to a central server that identifies the wine. In addition to identifying the wine, it also allows users to rate the wines, write comments and enter the purchase price of the wine. Thus, when users submit a photo of a label that already exists in the central database, they will have access to all the wine information submitted by other users of the app. In addition to identifying wine labels, Vivino also allows its users to keep track of their searches and a “virtual wine cellar”. 2.4
ClariFruit
ClariFruit [8] monitors and analyzes fruit quality and ripeness. It offers a scientific method for classifying agricultural products and allows data collection to determine the quality and nutritional value of the fruit. It promises to change the way fruit growers make decisions about the conservation status of their fruit. This application uses a complementary sensor to a mobile phone that evaluates the ripeness of the analyzed fruit. 2.5
Harvest
Harvest is an application developed for the iOS operating system. It functions as an encyclopedia with information on when fruits should be harvested and consumed. It is an application that only has descriptive text, and there is no interaction with food, either through the camera of the mobile phone, or through any other auxiliary device. 2.6
FoodKeeper
FoodKeeper [9] works very similarly to Harvest. The main difference is that the application is not limited only to fruits, but also to other fresh foods such as meat, fish, dairy, etc. It is designed for the Android operating system. 2.7
Comparative Table Between Evaluated Softwares
Table 1 gives an overview of the main characteristics of each existing and competing product in the market. Several of the brands studied have a complementary device to evaluate the quality of the products. In the case of professional use, the use of such a device may make sense, but for an ordinary user this will certainly be a constraint on massification. Our solution aims to be based solely on a smartphone. Nobody wants to carry an additional device in his/her pocket when he/she goes shopping for a fish - either because it is impractical or certainly resistant to the art of the seller. There are also problems associated with the purchase of the equipment as well as its calibration.
214
R. S. Pinto et al. Table 1. Comparative table – six alternatives.
Auxiliary device Interaction with products Product analysis Central server PERES X X X X FOODSniffer X X X X Vivino X X Clarifruit X X X Harvest X FoodKeeper X
As the existence of a complementary device that identifies the quality of a product is not an objective of this project, we can see that the Vivino application is the one that most closely resembles the proposed objectives. It uses the camera to take a photograph of the object being evaluated and sends it over the internet for analysis, making the analysis result independent of the user’s device intelligence/processing capability. It also has the associated social part, which updates you at each moment as to what is the selling point of this article for a better price. In the case of fish, it will be possible to know, for example, who sells the fish in the best conditions. However, Vivino is an application that only works with wine bottle labels - static objects that do not degrade rapidly over time (hours) - we can consider that none of the tested and studied applications fulfill all of the objectives proposed for this study.
3 Methodology The app described herein is an incremental innovation which is being created as the alternatives in the market are not satisfactory. The app is an Android application created using Phonegap (which passes a website to an Android platform) and based on an algorithm to manipulate images. The app was developed in the scope of a Master’s dissertation. The app is still being developed and has been in the development phase for four months (another six months will be necessary, before the next phase of product and algorithm refinement). The app, to be properly tested, in the first phase, involved over 600 samples (25 fish, of each of the two species – total of 50 fresh fish fished at a maximum of twelve hours beforehand); two photos of each fish taken twice a day, during six days – which will give a total of 1.200 photos, at the end of the experiment. Fish after six days is not seen to be fit for consumption, even in good conditions of preservation (even when kept on ice). The platform for the photos was made by one of the authors, out of wood.
eFish – An Innovative Fresh Fish Evaluation System
215
4 eFish System eFish is a prototype of a mobile app that, through a photo of a fisheye and a mobile phone connected to the Internet, tells the user whether or not the fish they have photographed is fresh. Centrally, the system has a database of the various fish species where a table of its degradation over time is kept. The operating principle of eFish is the analysis of the image sent by the user to determine the degradation/deformation of the fisheye and to place it between two values of the reference table. At this early stage, the system will only have a mackerel and sardine analysis capability, however the system is open and prepared for other types of fish as well as possible improvements in the algorithms. 4.1
Reference Tables Creation
To create the reference tables, twenty-five specimens of each type of fish were acquired: 25 mackerels and 25 sardines, with approximate sizes. No distinction was made between male and female concerning these fish samples. Each fish was placed in a box labeled with a number referring to the date and time at which the specimen was fished. For six days, every twelve hours, two pictures were taken of each fish, one full-length and one just of the head, as shown in Fig. 1.
Fig. 1. A set of photos of one sardine.
At each photograph collection process, and for each fish, a few samples were taken of the fish’s belly, weighed and measured (for example, a larger liver means that it may have an altered degradation rate). To thus know the state of the fish and how fresh they were. Six hundred samples resulted from this analysis. The large number of samples ensures that the results obtained were supported by science. All of the values collected, as well as the photographs taken, were stored in a database system. In order to collect the photographs, the system shown in Fig. 2 was developed. The system had two possible heights (ground level and mid-way up) for the fish to be in. With the tray in the position shown (ground level) the full body photographs were
216
R. S. Pinto et al.
Fig. 2. Apparatus to capture the photos.
taken; so as to make it possible to take pictures of the head of the fish it was possible to position the tray in an intermedium (mid-way) position. 4.2
Fish Eye Degradation/Deformation Analysis System
After gathering the images, it was necessary to analyze the degradation and the dehydration of the fish’s eye. In the initial images the eye has its normal, and healthy, shape and it is not possible to detect any anomaly (from the point of view of form). As days pass by, the image begins to show deformations. At this point, an analysis of the eye area and deformation area caused by dehydration was performed. The strain value was stored (in the data base), which is calculated as the percentage of the strain area relative to the total eye area. This is done through a contour analysis computer program. The values vary among the various specimens of each of the fish categories analyzed. In order to obtain the reference table values of each of the analyzed species (sardines and mackerel), mean values of the samples were obtained in each hourly period. 4.3
Reviewing a User-Submitted Photo
The fish rating system is currently being tested and developed. The mobile application allows the user to choose which type of fish to analyze (sardines or mackerel at the moment) and to collect a photograph of the fish’s head. The data is sent to a centralized server which, at this stage, and in order to perform more thorough tests, saves the uploaded image. Then, the same calculation and analysis algorithm used in the construction of the reference tables on the image is applied and it is determined between which intervals the result is placed. The data is sent to the user giving, at this stage, values regarding the rate of eye deformation - and the consequent state of degradation of the fish - as well as the minimum and maximum interval from the date the fish was fished.
eFish – An Innovative Fresh Fish Evaluation System
217
For statistical analysis the GPS coordinates of where the photo was taken are also automatically collected. This information may in the future contribute to future developments.
5 System Architecture The eFISH system has five core areas: • Mobile Application; • A WebApi that communicates with the mobile application, the photography analysis system and with the database; • A database that has all the references tables for the several fish species; • A backoffice that communicates with the database; • A photography analysis system, that communicates with the database and webApi. The relation of the several modules is displayed in Fig. 3.
Fig. 3. System architecture
5.1
Mobile Application
The mobile application at this is stage developed only for devices created on the Android operating system. Development will be done in Phonegap. PhoneGap will allow easier maintenance as well as simplified access to export the application to other platforms (iOS, Windows phone, etc.) as it allows HTML + JS + CSS code to become a native mobile application. 5.2
webApi
The webApi module’s main function is to make the connection between the mobile application and the “heart” of e-fish. From an external point of view, it is concerned with the validation of users in order to keep communications secure, either by using the https protocol or by using tokens for user validation. From an internal point of view, it
218
R. S. Pinto et al.
is responsible for sending the photograph to the photographic analysis module and storing information related to the photograph sent by the user in the database. 5.3
Database
The database structure to support the reference tables and user-submitted photographs is shown in Fig. 4. It is a dynamic structure that allows each fish specimen to have different parameters. Thus, there is a generic set of fields that can be associated with each specimen (campo, especimen_campo and especimen tables). This allows for a generic sample data collection table for each specimen, fish, sample and triplet tables. For the collection of the information sent by the users, the Crowd table is defined, which allows the statistics to be done by specimen and by analysis place.
Fig. 4. Database structure
5.4
The Backoffice
The backoffice allows laboratory technicians to record analyses of the various fish of each specimen over the analysis period. It also allows the statistical analysis of either the reference interval typing for fishing time/iris degradation or the data collected from application users (crowd table).
eFish – An Innovative Fresh Fish Evaluation System
5.5
219
The Photographic Analysis Module
The photographic analysis module is a service that analyses an image and checks the state of iris degradation of the fish eye. Depending on the result obtained, it compares the values with the reference tables and returns the results. This task is performed based on the OpenCV library.
6 Conclusion and Future Work The ability to assess the quality and overall quality of food has been standardized for many years. In most perishable products it is possible to know if the products are within the shelf-life of consumption by making sure that the same shelf life data is included in the package. However, not all products are labeled, such as fresh meat, fish and shellfish, among others, whose deterioration is subject to factors other than time, such as handling or exposure to the environment. In this project, a fish degradation reference table (for mackerel and sardines) has been created and a mobile application, that informs users of the degree of freshness of the food they intend ingest, is being tested. The images gathered have been satisfactory and we conclude that isolating the eye (the key element in these species – mackerel and sardine) by the software (OpenCV) has worked so far. Future research could include other species of fish (species available in Portugal, such as salmon and hake), as well as shellfish (including octopus). With these other species different parts of the fish will have to be analyzed, besides the eye – such as the gills and the scales.
References 1. Portugal fisherman statistics, Doca Pesca. https://sites.google.com/site/docapescacreative/ sector-da-pesca-em-portugal. Accessed 06 Nov 2019 2. Portugal Fishing Statistics, Pordata. https://www.pordata.pt/en/Search/Portugal%20Fishing %20Statistics. Accessed 06 Nov 2019 3. UE fish consumption, ACOPE. https://acope.pt/noticias/1115-portugal-e-o-pais-da-uniaoeuropeia-com-maior-consumo-de-peixe.html. Accessed 06 Nov 2019 4. Portuguese smartphone users, Marketeer. https://marketeer.sapo.pt/7-milhoes-deportugueses-tem-smartphone/. Accessed 06 Nov 2019 5. PERES Funding Homepage. https://www.indiegogo.com/projects/peres-smart-way-toprotect-you-from-food-poisoning. Accessed 06 Nov 2019 6. FOODSniffer Homepage. http://www.myfoodsniffer.com/. Accessed 06 Nov 2019 7. Vivino Homepage. https://www.vivino.com/. Accessed 06 Nov 2019 8. ClariFruit Homepage. https://www.clarifruit.com/. Accessed 06 Nov 2019 9. FoodKeeper Homepage. https://www.foodsafety.gov/keep-food-safe/foodkeeper-app. Accessed 06 Nov 2019 10. Schumpeter, J.A.: The Theory of Economic Development. Harvard University Press, Cambridge (1934) 11. McClelland, D.C.: The Achieving Society. The Free Press, New York (1961) 12. Duarte, C., Esperança, J.P.: Empreendedorismo e planeamento financeiro. Edições Sílabo, Lisbon (2012)
220
R. S. Pinto et al.
13. OECD and Eurostat: Oslo Manual – Guidelines for Collecting and Interpreting Innovation Data, 3rd edn. OECD Publishing, Paris (2005) 14. Tidd, J., Bessant, J., Pavitt, K.: Managing Innovation – Integrating Technological, Market and Organizational Change, 3rd edn. Wiley, Chichester, England (2005)
A Virtual Reality Approach to Automatic Blood Sample Generation Jaime Díaz1, Jeferson Arango-López2, Samuel Sepúlveda1, Danay Ahumada3, Fernando Moreira4(&), and Joaquin Gebauer1 1
3
Depto. Cs. de la Computación e Informática, Universidad de La Frontera, Temuco, Chile 2 Depto. de Sistemas e Informática, Universidad de Caldas, Manizales, Colombia Depto. Procesos Diagnósticos y Evaluación, Universidad Católica de Temuco, Temuco, Chile 4 REMIT, IJP, Porto and IEETA, Universidade Portucalense, Universidade de Aveiro, Aveiro, Portugal [email protected]
Abstract. Given the nature of the elements that make up blood, microscopes are fundamental to identifying blood cell morphology. These instruments are expensive and limited, and it is difficult to keep blood samples and their diversity in educational environments. We propose a solution based on virtual reality, that generates reliable blood samples using the automatic creation of images that simulate specific pathologies. We validated the proposal in three aspects: A method of adoption model, the validation of replicated samples and a performance test. The study and subsequent development of the mobile application were able to generate simulations of samples corresponding to a healthy adult and an adult with acute myeloid leukemia, where users can visualize, explore and obtain data on each of the elements that appear in the sample. This iteration not only verifies its technical feasibility but opens the way for future research on improving education or training processes. Keywords: Virtual microscope Blood samples Information technology in education Virtual reality Human-computer interaction
1 Introduction There are many areas in modern medicine where the use of microscopes is indispensable for the detection of anomalies or pathologies that affect the human body. Hematology is the field that studies, investigates, diagnoses and deals with everything related to blood [1]. Given the nature of the elements that make up blood, the use of microscopes is fundamental to identifying and interpreting the morphology of the cells present in the circulatory system. This is one of the main competencies that university students in health programs like Biomedicine must acquire [2].
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 221–230, 2020. https://doi.org/10.1007/978-3-030-45691-7_21
222
J. Díaz et al.
At the Universidad de La Frontera, access to microscopes for the study of blood samples is limited. Supply is restricted and generally they are in high demand by academics and students. It is worth noting that the cost of these instruments, size and maintenance processes makes them particularly difficult to acquire. Moreover, blood samples, critical to the analysis process, must be preserved in environmental conditions and specific facilities. A blood sample at room temperature only lasts between 2 and 4 h. If the goal is to preserve it for more than 24 h, it must be maintained at constant temperature between 4 and 8 °C. For 4 weeks, it must be kept at −20 °C and for long-term preservation, the samples must be between −80 and −190 °C [3]. The handling of blood samples also carries risks, as they have microorganisms that can affect students’ health if not managed with the corresponding safety measures. One example would be blood samples with Hepatitis B, which only can be handled by people vaccinated against this virus [4]. In addition, there are many diseases that cannot be put under a microscope used by students in laboratories due to the danger their handling poses, and in other cases it is quite complicated to find patients with a certain disease. In this light, this study was intended to design a virtual reality-based initiative to generate reliable blood samples using the automatic creation of images that simulate specific pathologies. The proposal seeks to support the teaching-learning process for students in areas of medicine such as hematology or biomedicine. The document is structured as follows: The second section presents related works and similar solutions. The third describes the software solution proposal and its characteristics. In the fourth section, the software proposed is validated using different perspectives, reporting its results. Finally, the fifth section discusses and concludes the proposal.
2 State of the Art The search for studies similar to the present proposal has been separated into two parts: scientific literature and software applications. Initially, a research question (RQ1) was posed for the implementation of a systematic search in scientific search engines: ScienceDirect, Scopus and Pubmed. The first two respond to engineering and sciences criteria, while the last one includes a medical perspective. 2.1
Scientific Literature Search
The question (RQ1), What virtual reality-based tools are used for education on topics like hematology? seeks to identify the tools, applications or other types of software that have been developed as academic initiatives and used to teach undergraduate students. The following were considered inclusion criteria: (i) Population: Professionals in the field and students in associated programs. Intervention: Study with methods, techniques, models or tools that contain proposals on the use of virtual reality technologies in education. Result: Technologies implemented or functional software. Years: 2009–2020; “Conference Paper” or “Journal Article” studies.
A Virtual Reality Approach to Automatic Blood Sample Generation
223
In the final iteration of the search string structure, the keyword “education” was eliminated as it greatly limited the results obtained. Finally, the search strings were composed of keywords: virtual, microscope, hematology, blood, and their respective synonyms. The initial result of the searches yielded a total of 81 documents, which after applying the criteria, ended in 3 being selected (See Table 1). Table 1. Summary of DDBB search Heading level Initial results I/E criteria Abstract/Full text Selected ScienceDirect 9 4 4 1 Scopus 46 23 6 3 PubMed 26 17 6 3
Table 2 provides information on the technologies and topics addressed in the papers selected. Table 2. Selected papers ID
Title
[5]
Do We Know Why We Make Errors in Morphological Diagnosis? An Analysis of Approach and Decision-Making in Hematological Morphology Virtual microscopy in pathology education Web-Based Virtual Microscopy of Digitized Blood Slides for Malaria Diagnosis: An Effective Tool for Skills Assessment in Different Countries and Environments
[6] [7]
Main technologies Desktop; Virtual Microscope
Subject
N/A
A literature review for virtual microscopes in education This paper reports a scheme supplying digital slides of malaria-infected blood within an Internet-based virtual microscope environment to users with different access to training and computing facilities
Desktop; Virtual Microscope
High-resolution images in virtual microscopes to assess the performance of medical professionals
Although the use of technology in biomedical education provides improvements in students’ performance in the classroom [8], none of the proposals addresses the topic of the generation of random blood samples, and all of them use virtual microscopes that employ a real image taken from a photographic sample [6]. Some results - which were not included because they were beyond the scope of this proposal - have exclusively educational topics without the use of technology (comparison of virtual and traditional microscopes) [9, 10], others used virtual reality or augmented reality approaches (virtual libraries as a complement to medical treatments
224
J. Díaz et al.
not related to blood) [11, 12]. The interesting thing is the identification of the trend of using advanced technologies in fields like medicine in education and in professional fields [13, 14]. 2.2
Software Applications with Similar Initiatives
The second part of the study expands to digital distribution media (mobile application platforms: Google Play and App Store). The search was done only on these platforms in order to use similar technology and performances. The search was done with simplified search parameters with the terms virtual microscope, hematology and blood samples. Most of the results correspond to digital libraries (in total, over 30 results with similar features). Only the most representative applications on 3 different topics were taken into account: (i) digital encyclopedia, (ii) tests and evaluations and (iii) virtual microscope. The first, (i), contains images and information on hematology. This is the case of Lichtman’s Atlas of Hematology [15], which is an adaptation of a book with the same name that offers access to a large number of images of different hematological disorders as well as information on the different cells present in the blood. With respect to evaluations (ii), the Hematology Quiz app is worth noting: it has one of the highest number of downloads and best ratings, and consists of a series of questions that ask the user to recognize and write the name of certain cell indicated on an image [16]. On the topic of the virtual microscope (iii), the AnatLab app stands out [17]; it broadly represents the proposal of this study, the compilation of different pathologies and their corresponding analysis. However, it only presents high-resolution images of samples captured from microscopes, with no further interaction. As a final conclusion of the evaluation of the state of the art, and responding to RQ1: What virtual reality-based tools are used for education on topics like hematology? Strictly speaking, none of them uses “virtual reality” techniques, all the relevant results selected and discussed use what they call a “virtual microscope”, which is a (real) digital image with a desktop or online viewfinder. From that perspective, our initiative is pioneering in the generation of blood samples under specific conditions.
3 Software Proposal An application was developed for mobile devices (Android initially) that meets the need to produce random images of pathologies in blood samples. The application must work on mobile devices, mainly due to their current mass use and relatively low cost. The non-functional software requirements in this iteration take up the topics of usability, performance and portability. At design level, a Model-View-Controller (MVC) was created to structure the development of the technological solution. The C# programming language was used, and Unity served as the development engine for the video game and development platform. The application employs random blood samples, parameterized and generated automatically based on a selection of previous elements. In order to generate a
A Virtual Reality Approach to Automatic Blood Sample Generation
225
replicated sample, a method of analysis called the “blood differential” is used. This measures the number of different white blood cell types to enable the identification of diseases. It is worth noting that the relative values of various cell types can be altered indirectly because another of the blood cell populations has been altered. Changes to the count of some of the various populations in particular or of more than one of them is usually linked to various pathologies. The parameters within which the software functions are: neutrophils, bacilliforms, lymphocytes, monocytes, eosinophils and basophils. These parameters were added to the software to be able to specify the types of leukocytes that will appear in the sample to be generated and their percentage in relation to the total amount, a value that must also be specified. The functional characteristics requested by biomedical professionals and technology academics include: (i) Report the blood differential to the user (proportion of each white blood cell type in relation to its total) in the generated sample; (ii) Proportion of details per visible cell (name and size); (iii) Use of the generated blood sample, and (iv) Camera shift and zoom lens. The aim is for users to have the necessary information at hand to identify elements individually as well as to detect abnormal proportions. 3.1
Validation
Three validation initiatives were taken. First, a survey was applied under the method adoption model [18, 19], the point of which is to identify the relevance of the constructs proposed based on observations made by health professionals. Then, an experiment was conducted to validate the samples generated automatically, and finally, a performance test of the hardware was done on two devices with different features. Participants. The method adoption model was validated by 8 health professionals specializing in hematology, 7 men (87.5%) and 1 woman (12.5%) aged between 35 and 55 years, who used the software at least once. The replicated samples were validated by medical technologists and students. Five medical technologists participated, all men aged between 35 to 55 years, with 10 years of experience in the area. Five students participated, 3 men (60%) and 2 women (40%) aged between 22 and 25 years, in their 4th or 5th year of studies in medical technology. The performance of the application was validated by a computer engineer with 10 years of experience in software development and a student in the final year of computer engineering. Method Adoption Model. Method acceptance models (MAM) arise from the theory of reasoned action (TRA) [18], which explains an individual’s behavior based on factors like beliefs, norms and intentions. This proposal [19] was used initially to assess users’ perceptions when using the application. This model helps explain and predict the user’s acceptance of technological solutions based on a group of constructs adapted to explain the adoption of methods. These constructs are: Perceived Ease of Use (PEU), Perceived Utility (PU) and Use Intention (UI).
226
J. Díaz et al.
Perceptions and intentions are captured through a questionnaire (adapted from [20]). Each of the responses is evaluated on a 5-point Likert scale. The score obtained for each of the constructs is the equivalent to the average of each question. Observations on the application were made by requesting data through online questionnaires (Google Forms). Table 3 shows the results of the experiment. Validation of the samples. The test seeks to determine that the samples generated by the application are correct replicas according to the definition. There were two interventions on two different days for intra-person validation, i.e., the same user responds to the same sample on two different occasions. The first day 10 people participated: 5 medical technologists working in Chile, and 5 students in their last year of Medical Technology at the Universidad de La Frontera. The test consisted of showing 3 samples generated by the application, indicating that this could correspond to the sample of a “normal” person or to one with “acute myeloid leukemia”. The samples could be explored at the user’s leisure, who then gave their answer about the type. Performance of the Application. The application was developed for mobile devices. The performance tests were done on a first-generation Motorola Moto G (made in 2012, Android 7 Nougat) and a Xiaomi Mi A1 (made in 2017, Android 9 Pie). These devices were selected for their limited hardware compared to current standards, so a good performance of the application ensures a good performance on devices with better hardware features. In order to be able to measure the performance of the application, the “Profiler” function in Unity was used, which shows the use of the CPU, GPU and RAM memory throughout the execution of the application. It was defined beforehand that the acceptable minimum is 25 frames per second (FPS) so as to not affect the overall experience of the application. In order to test the performance of the prototype, the application was executed 10 times per mobile device, where the upload time to generate the blood sample and FPS were taken to then average the values and have an approximate reference of the operation of the application in devices in the low and middle range. 3.2
Results
Method Adoption Model. The highest averages were for Perceived Utility (average: 4.5; standard deviation: 0.53) and Use Intention (average: 4.13; standard deviation: 0.64). All the averages go beyond the median score of 3 points. Perceived Ease of Use had the lowest score (average: 3.63; standard deviation: 0.52). We can explain that the PEU was the construct with the lowest value (3.63) due to the usability issues detected with the User Interface. One of the greatest observations was that “the white blood cells had a greater contrast in relation to how they normally look under a microscope” and “although it does not make identification of cells impossible, it could make action difficult”.
A Virtual Reality Approach to Automatic Blood Sample Generation
227
Table 3. MAM experiment results e1 UP 4 UI 4 EUP 3
e2 5 4 4
e3 4 3 3
e4 5 4 4
e5 4 4 4
e6 5 4 3
e7 4 5 4
e8 5 5 4
Average 4.50 4.13 3.63
Standard deviation 0.53 0.64 0.52
Validation of the sample. 8 respondents (80%) identified the samples generated in all cases, only 2 (20%) (students) answered some of the test incorrectly (see Table 4). Users 1 to 5 are professionals, Users 6 to 10 are undergraduate students. Table 4. Sample validation results User U1 U2 U3 U4 U5
Day 1 Day 2 Success rate User 3/3 3/3 100% U6 3/3 3/3 100% U7 3/3 3/3 100% U8 3/3 3/3 100% U9 3/3 3/3 100% U10
Day 1 Day 2 Success rate 3/3 3/3 100% 2/3 3/3 83% 3/3 3/3 100% 3/3 2/3 83% 3/3 3/3 100%
Performance of the application. In terms of the results, the most recently manufactured device has a superior performance (4 vs. 6-s upload time). Nevertheless, the low-range device can execute the application without any noticeable delays (average FPS 60 vs. 28).
Fig. 1. Main screen of the software – healthy adult blood sample
Finally, the three assessments yielded positive results, the constructs of the method adoption model, the validation of independent samples, and the performance of the application. Figure 1 presents one of the main screens of the software that presents the generated sample.
228
J. Díaz et al.
4 Discussion and Future Work The study and subsequent development of the mobile application was able to generate simulations of blood samples corresponding to a healthy adult and to an adult with acute myeloid leukemia, where users can visualize, explore and obtain data on each of the elements that appear in the sample. One of the main ideas of this initiative is to achieve greater similarity to blood samples obtained using traditional microscopes. One example of this is what the website Histology Guide offers [21], but with the major difference being that the sample is random, parameterized and with informative options for each of the elements on the screen. In software validation processes and interviews, the results and comments were positive and considered the images created by the software, which correctly simulated specific pathologies, to be useful and reliable. This software supports the learning process, mainly because the application can navigate in a large sample, and with actions similar to the operation of traditional microscopes. 4.1
Future Work
The present proposal not only seeks to be an educational alternative, but is also presented as a tangible solution to the topic of digital transformation. This deals with the process of using digital technologies to create or modify processes of businesses, culture and the user’s experience to satisfy business or market requirements [22]. Although the proposal is not market-ready, it is possible to work and validate toward this goal. There are three complementary lines of enquiry: (i) Improve the user interface and experience when using the application. There are some mechanisms that are not ready, such as light and shadow control. (ii) The second iteration tries to incorporate artificial intelligence initiatives: Here, the idea is to isolate, cut and classify the cells obtained from a sample to clean up the database and then verify the similarity of a sample with a real one. Finally, (iii) Incorporate new diseases into the catalog. Acknowledgements. This project was initiated thanks to the “Multidisciplinary Research Contest for students - Experimentando 2.0 - organized by the Macrofacultad de Ingeniería, Universidad de La Frontera, Universidad del Bio-Bio and Universidad de Talca, 2018. The authors would like to thank all the participants of the “vNanoscope” initiative: Pablo Acuña, Matias Hernandez, Joaquin Gebauer, Diego Acuña y Jaime Díaz, all from the Universidad de La Frontera. This work was (partially) financed by the Dirección de Investigación, Universidad de La Frontera.
References 1. Hematología Adulto. http://www.clinicavespucio.cl/especialidad/hematologia-oncologia/. Accessed 15 Oct 2019 2. Peña Amaro, J.: Competencias y habilidades en histología médica: el potencial formativo de la observación microscópica (2007)
A Virtual Reality Approach to Automatic Blood Sample Generation
229
3. Wu, D.-W., Li, Y.-M., Wang, F.: How long can we store blood samples: a systematic review and meta-analysis. EBioMedicine 24, 277–285 (2017). https://doi.org/10.1016/j.ebiom. 2017.09.024 4. Handsfield, H.H., Cummings, M.J., Swenson, P.D.: Prevalence of antibody to human immunodeficiency virus and hepatitis B surface antigen in blood samples submitted to a hospital laboratory: implications for handling specimens. JAMA 258, 3395–3397 (1987) 5. Brereton, M., De La Salle, B., Ardern, J., Hyde, K., Burthem, J.: Do we know why we make errors in morphological diagnosis? an analysis of approach and decision-making in haematological morphology. EBioMedicine 2, 1224–1234 (2015). https://doi.org/10.1016/j. ebiom.2015.07.020 6. Dee, F.R.: Virtual microscopy in pathology education. Hum. Pathol. 40, 1112–1121 (2009). https://doi.org/10.1016/j.humpath.2009.04.010 7. Ahmed, L., Seal, L.H., Ainley, C., De la Salle, B., Brereton, M., Hyde, K., Burthem, J., Gilmore, W.S.: Web-based virtual microscopy of digitized blood slides for malaria diagnosis: an effective tool for skills assessment in different countries and environments. J. Med. Internet Res. 18, e213 (2016). https://doi.org/10.2196/jmir.6027 8. Hande, A.H., Lohe, V.K., Chaudhary, M.S., Gawande, M.N., Patil, S.K., Zade, P.R.: Impact of virtual microscopy with conventional microscopy on student learning in dental histology. Dent. Res. J. 14, 111–116 (2017) 9. Carlson, A.M., McPhail, E.D., Rodriguez, V., Schroeder, G., Wolanskyj, A.P.: A prospective, randomized crossover study comparing direct inspection by light microscopy versus projected images for teaching of hematopathology to medical students (2014). http:// dx.doi.org/10.1002/ase.1374 10. Zumberg, M.S., Broudy, V.C., Bengtson, E.M., Gitlin, S.D.: Preclinical medical student hematology/oncology education environment. J. Cancer Educ. 30, 711–718 (2015). https:// doi.org/10.1007/s13187-014-0778-8 11. Sun, G.C., Wang, F., Chen, X.L., Yu, X.G., Ma, X.D., Zhou, D.B., Zhu, R.Y., Xu, B.N.: Impact of virtual and augmented reality based on intraoperative magnetic resonance imaging and functional neuronavigation in glioma surgery involving eloquent areas (2016). http://dx. doi.org/10.1016/j.wneu.2016.07.107 12. Ha, W., Yang, D., Gu, S., Xu, Q.-W., Che, X., Wu, J.S., Li, W.: Anatomical study of suboccipital vertebral arteries and surrounding bony structures using virtual reality technology. Med. Sci. Monit. 20, 802–806 (2014). https://doi.org/10.12659/MSM.890840 13. Russel, A.B.M., Abramson, D., Bethwaite, B., Dinh, M.N., Enticott, C., Firth, S., Garic, S., Harper, I., Lackmann, M., Schek, S., Vail, M.: An abstract virtual instrument system for high throughput automatic microscopy. Procedia Comput. Sci. 1, 545–554 (2010). https://doi.org/ 10.1016/j.procs.2010.04.058 14. Kent, M.N., Olsen, T.G., Feeser, T.A., Tesno, K.C., Moad, J.C., Conroy, M.P., Kendrick, M. J., Stephenson, S.R., Murchland, M.R., Khan, A.U., Peacock, E.A., Brumfiel, A., Bottomley, M.A.: Diagnostic accuracy of virtual pathology vs traditional microscopy in a large dermatopathology study. JAMA Dermatol. 153, 1285–1291 (2017). https://doi.org/10.1001/ jamadermatol.2017.3284 15. Media, U.: Lichtman’s Atlas of Hematology - Apps en Google Play. https://play.google. com/store/apps/details?id=com.usatineMediaLLC.atlasHematology&hl=es_CL. Accessed 15 Oct 2019 16. Puzzles, K.: Hematology quiz App - Apps en Google Play. https://play.google.com/store/ apps/details?id=com.keywordpuzzles.microbiologyapp&hl=es_CL. Accessed 15 Oct 2019 17. Eolas Technologies Inc: AnatLab Histology - Aplicaciones en Google Play. https://play. google.com/store?hl=es. Accessed 23 Oct 2019
230
J. Díaz et al.
18. Fishbein, M.A., Ajzen, I.: Belief, Attitude, Intention and Behaviour: An Introduction to Theory and Research. Addison-Wesley, Boston (1975). Reading 19. Moody, D.: Dealing with complexity: a practical method for representing large entity relationship models (2001) 20. Fernandez, O.N.C.: Un procedimiento de medición de tamaño funcional para especificaciones de requisitos (2007). https://dialnet.unirioja.es/servlet/tesis?codigo=18086&orden= 0&info=link&info=link 21. Clark Brelje, T., Sorenson, R.L.: MH 033hr Blood Smear. http://www.histologyguide.com/ slideview/MH-033hr-blood-smear/07-slide-2.html?x=1704&y=6268&z=44.0&page=1. Accessed 15 Oct 2019 22. Bhavnani, S.P., Parakh, K., Atreja, A., Druz, R.: Roadmap for innovation—ACC health policy statement on healthcare transformation in the era of digital health, big data, and precision health: a report of of the american college of cardiology task force on health policy statements and systems of care. J. Am. Coll. Cardiol. 70, 2696–2718 (2017)
Building Information Modeling Academic Assessment International Architecture Workshop UPC-UAM Miguel Ángel Pérez Sandoval1(&) , Isidro Navarro Delgado2 and Georgina Sandoval1 1
,
Universidad Autónoma Metropolitana, Ciudad de México, Mexico {maps,rsg}@azc.uam.mx 2 Universidad Politécnica de Catuluña, Barcelona, Spain [email protected]
Abstract. The future construction professionals must have great communication and teamwork skills. In order to create new collaborative academic environments, we intend to carry out the “1st International BIM Workshop for the resolution of Social Projects” during the 2019/2020 period. The students of the Metropolitan Autonomous University/Universidad Autónoma Metropolitana, Azcapotzalco (UAM Azcapotzalco) and the Barcelona Higher Technical School of Architecture/Escuela Técnica Superior de Arquitectura de Barcelona (ETSAB) UPC, will have the task to perform an urban equipment project. We would like to explore different methods to improve collaborative education therefore this task will be done from a distance with the help of the Building Information Modeling (BIM) methodology. This project will be developed on an online collaborative workshop that will take the students through different professional scenarios where they can better simulate the challenges of the real world. In order to have proper evaluations, we will conduct various qualitative and quantitative surveys. This task was already performed by 27 students during the third year of their Architecture Bachelor’s degree at the UAM Azcapotzalco during the 2019/2020 academic term. Keywords: BIM learning Architecture workshop
Diagnostic survey Virtual collaboration
1 Introduction The construction sector in the world is using BIM – Building Information Modeling – for the development of construction projects. Such sector is looking for trained professionals who know and work in an effective way with that methodology. This growing demand has pushed universities into rapidly adopting this knowledge and implement a widely variety of isolated and little structured training courses (Barison and santos 2010). For instance, some institutions base their practice on a pedagogical system not stablished and lacking educational standards (Pillay et al. 2018).
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 231–237, 2020. https://doi.org/10.1007/978-3-030-45691-7_22
232
M. Á. P. Sandoval et al.
It is a fact that the BIM methodology offers benefits for the current needs of the architecture, engineering and construction sectors (AEC), however, it is far from wide approval on the professional field (Becerik-Gerber et al. 2011; Succar and Kassem 2015). There is a lot of resistance when it comes to adopting new methodologies. In many cases such innovations represent great effort invested with uncertain benefits (Mehran 2016). For education, integrating BIM on the studying programs should be a reality where universities can advance at the same pace as innovations. The limited training of professionals in advanced technology is one of the main handicaps of the construction sector (Ding et al. 2015; Panuwatwanich et al. 2013). On this subject the university has a fundamental role when preparing the architects of the future. According to previous studies (Ruschel et al. 2013; Abbas et al. 2016; Herrera et al. 2018) many institutions are still educating their students with traditional and outdated methodologies, in comparison to today´s market and society requirements. BIM should be understood as a new collaborative work methodology and not just as a tool (McDonald 2012). This requires an adjustment of knowledge from teachers and professionals. The inclusion of BIM is growing in the university’s curriculum, nevertheless, it is being implemented at a very slow pace (Sabongi and Arch 2009). Without a doubt, this methodology represents an educational challenge for the AEC oriented Bachelor education. Even though the literature has registered different experiences with BIM education, it is not yet clear how should it be introduced on the educational curriculum (Woo 2006; Adamu and Thorpe 2015; Barrison and Santos 2010; Becerik-Gercer et al. 2011).
2 Methodology The present article: BIM education assessment, is one of the various initiatives that comprises the doctoral study BIM implementation at the UAM Azcapotzalco, Learning Environments Design. Such initiatives pursue the finding of conclusions about the students needs, required for their professional education, as well as challenges and university requirements to incorporate technologies related to architecture. The purpose is to find mechanisms that increase the integral education of the students and prepare them for a suitable incorporation to the labor market. Based on these details we will hold the 1st International BIM Workshop for the resolution of Social Projects. The plan is to connect students, teachers and architecture experts of the ETSAB and UAM Azcapotzalco to articulate urban equipment projects with a scarcity and inequality context that exist in Latin-American cities. In an on-line working and learning space, we look forward to drive new ways of interaction trough the use of the BIM methodology. Therefore, facilitate the creation and distribution of such knowledge despite the physical separation or distance, professional, social and cultural differences of the participants. It is expected that the results of this research will promote the reflection about collaboration between academic communities in promoting BIM education.
Building Information Modeling Academic Assessment
233
In order to carry out the workshop the students need to have previous knowledge of at least one BIM software. This is the reason why in October 2019 we conducted a web assessment survey, designed along ETSAB and UAM Azcapotzalco professors. When the survey was launched, we asked two groups of 15 students from 7th and 9th semester (3rd year of their bachelor´s degree) from the Architectonic design Workshop IA and IIA to collaborate with us. Out of the 30 students invited, 27 took the questionnaire. The survey is structured in four sections. The first one is, “User Profile”, included general information of the participants. The second category “Skills and Tools”, gathered information about the main software used by the students. The third category focused on BIM. This section included an open question with the purpose of knowing the student´s BIM definitions. The fourth section “Opinion” looked to capture the ideas of the participants about the role of this methodology in education and construction. Every survey respondent was asked to answer a series of questions of quantitative style, as in previous experiences conducted by the authors (Navarro et al. 2012). The objective was to obtain a diagnosis about the UAM Azcapotzalco Architecture Bachelor´s students’ technological aptitudes. With the assessment we wanted to recognize the architecture software more used by students. Identify which students are more trained and informed about these tools’ usage. As well as distinguish the more common ways of training used by the students. The results of this survey will allow to make the first evaluation of the current situation of the students. Elaborate a work plan with strategies and future actions. It is worth mentioning that this survey was set on a dynamic and updatable format, in that way it could be distributed online and lead to new consultation every year.
3 Results The results expect to identify the main educational gaps of the students in order to guide them for future training. These were the results: • With a population of 27 Architecture students from the UAM Azcapotzalco, 56% of the participants were men and 44% women. Their ages were between 21 and 30 years old. • The survey showed that the most used computer programs among students were AutoCAD (100%), followed by Sketchup (63%) and Revit (52%). Some software like 3ds Max or ArchiCAD are also widely used. (see Fig. 1) • Out of the 27 students, 22 mentioned they learned to use the software in college and 12 were self-taught. It is worth noting that some participants took additional on-line curses or in person and others learn at their job. On a rate scale where 5 is high and 1 is low (ISO9241-11), 44% of the students consider that their Revit knowledge is low (1st degree), while 33% stated to have a medium level (3rd degree). None of the participants declare to have a high level (5th degree) (see Fig. 2).
M. Á. P. Sandoval et al.
234
Self-learning
12
On line class
4
Face-to-face class
5
Professional work
2
University
22 0
5
10
15
20
25
Fig. 1. The graphic shows the response of the students to the question: Where did you learn to use that software?
14 12
12
10 9
8 6 4 3
2
3 0
0 1
2
3
4
5
Fig. 2. The graphic shows the students response to the question: What level of Revit (BIM software) general knowledge to you have? On a rate scale where 5 is high and 1 is low.
• For Revit Frequency use, almost half of the participants (48%) mentioned the little use of the software. Only one person declared to use the program on a regular basis. • 74% of participants confirmed they have some knowledge about BIM, most of them (55.5%) only have a general idea. The rest 26% declared that they never heard of BIM and a barely 14.8% mentioned they use BIM in a moderate way.
Building Information Modeling Academic Assessment
235
According to this survey, most of the students already know the BIM methodology or have heard from it. Even some of them have been able to define in a very assertive way that: “It is a working methodology in which professionals from different construction sectors can work simultaneously in creating a digital simulation project” and “Methodology to create digital models and at the same time manage all the information that contains an architectonic project”. BIM doesn´t have a standard definition for teaching; while some recognize it as a tool for 3D design, others see it as a Management methodology for construction projects (Briones and Ogueta 2017). With a concept so diverse and extensive, we noticed that half of the participants understand it as a group of software or platforms for the architecture and construction industry, just as the following student states: “It is a platform that integrates different software that provide help in creating architectonic projects”. One of the main interesting points of this survey is the students´ perception about the implementation of the BIM methodology to their curriculum. Is this regard 81% of the students consider that BIM could be a useful tool for their learning and 85% would like it to be included in their university curriculum. Even though 60% considered that learning BIM could be somehow complex, 74% assumes it will be an improvement on the development of architectonic projects.
4 Discussion According to the data found on the BIM Survey taken by the UAM Azcapotzalco students, it was determined that the BIM implementation is still very low, considering that only 14% of the participants use BIM in a moderate way. The number of students with complete master or high level of BIM knowledge is practically non-existent. Even though most of them have heard about the methodology or even know the concept, it is evident that very few know it deeply. BIM and Revit have always been related, nonetheless the difference between both concepts is simple. BIM is a methodology while Revit is an instrument or tool. In spite that half of the participants already use certain BIM tools and 74% declare to have some knowledge about the methodology, the results seem to describe the usage of BIM focused on basic functionalities, simply based on the knowledge they already have about Revit. Moreover, the results show that the students still strongly rely on AutoCAD. Even though many of them have been self-trained in other software, over half of them affirmed they have learned Revit in the University. Even so, they expressed through this survey their interest for including BIM in the University curriculum. Their opinions show that they find this methodology useful and think of it as an improvement for the development of architectonic projects.
236
M. Á. P. Sandoval et al.
5 Conclusions In a conservative way, educational institutions started to include some BIM tools in order to satisfy social, technological and emerging productive requests. The need for implementation represents a great educational challenge in the university instruction. This has caused a wide gap between industry’s expectations and the students’ proficiency (Wu and Issa 2013). The main objective of this survey was to evaluate previous BIM knowledge from the UAM Azcapotzalco Architecture students. With the survey it was proved that without evident institutional education, over half of the participants already use at least one BIM software. Even thought most of them have never used this methodology, the promotion in the last years is undeniable. It has contributed to the interest of many on the subject and therefore on their training. This survey focused in a group of students that will attend the International BIM Workshop for the resolution of Social Projects. Nevertheless, it is important to mention the current state of BIM education in all the architecture registration and obtain a wider diagnosis. Furthermore, the nature of BIM methodology requires a collaborative practice that is urged to implement in the professors’ practice as well as the scholars’ attitude. Acknowledgments. This work is supported by the Research Group N-012, called “Learning in Community Habitat”, which belongs to the Department of Research and Design Knowledge of the Division of Science and Arts for Design at UAM Azc. In particular, it belongs to the research project under registration entitled “The BIM and its benefits for teaching and construction”.
References Abbas, A., Din, Z.U., Farooqui, R.: Integration of BIM in construction management education: an overview of Pakistani engineering universities. Procedia Eng. 145, 151–157 (2016) Adamu, Z.A., Thorpe, T.: How should we teach BIM? a case study from the UK (2015) Barison, M.B., Santos, E.T.: BIM teaching strategies: an overview of the current approaches. In: Proceedings of the ICCCBE 2010 International Conference on Computing in Civil and Building Engineering, June 2010 Becerik-Gerber, B., Gerber, D.J., Ku, K.: The pace of technological innovation in architecture, engineering, and construction education: integrating recent trends into the curricula. J. Inf. Technol. Constr. (ITcon) 16(24), 411–432 (2011) Briones Lazo, C., Ogueta, C.S.: La enseñanza de BIM en Chile, el desafío de un cambio de enfoque centrado en la metodología por sobre la tecnología. [BIM education in Chile, the challenge of a shift of focus centered on methodology over technology.] (2017) Ding, Z., Zuo, J., Wu, J., Wang, J.Y.: Key factors for the BIM adoption by architects: a China study. Eng. Constr. Architectural Manag. 22(6), 732–748 (2015) Herrera, R.F., Vielma, J.C., Muñoz, F.C.: BIM and teamwork of civil engineering students: a case study. Glob. J. Eng. Educ. 20(3), 230–235 (2018) Macdonald, J.A.: A framework for collaborative BIM education across the AEC disciplines. In: 37th Annual Conference of Australasian University Building Educators Association (AUBEA), vol. 4 (6), July 2012
Building Information Modeling Academic Assessment
237
Mehran, D.: Exploring the adoption of BIM in the UAE construction industry for AEC firms. Procedia Eng. 145, 1110–1118 (2016) Navarro, I., Redondo, E., Sánchez, A., Fonseca, D., Martí, N., Simón, D.: Teaching evaluation using augmented reality in architecture: methodological proposal. In: 7th Iberian Conference on Information Systems and Technologies (CISTI 2012), pp. 1–6. IEEE, June, 2012 Panuwatwanich, K., Wong, M.L., Doh, J.H., Stewart, R.A., McCarthy, T.J.: Integrating building information modelling (BIM) into engineering education: an exploratory study of industry perceptions using social network data (2013) Pillay, N., Musonda, I., Makabate, C.: Use of BIM at higher learning institutions: evaluating the level of implementation and development of BIM at built environment schools in South Africa. In: AUBEA, Singapore. pp. 227–240 (2018) Ruschel, R.C., Andrade, M.L.V.X.D., Morais, M.D.: O ensino de BIM no Brasil: onde estamos? Ambiente Construído 13(2), 151–165 (2013) Sabongi, F.J., Arch, M.: The Integration of BIM in the undergraduate curriculum: an analysis of undergraduate courses. In: Proceedings of the 45th ASC Annual Conference, pp. 1–6. The Associated Schools of Construction (2009) Succar, B., Kassem, M.: Macro-BIM adoption: conceptual structures. Autom. Constr. 57, 64–79 (2015) Woo, J.H.: BIM (building information modeling) and pedagogical challenges. In: Proceedings of the 43rd ASC National Annual Conference, pp. 12–14 (2006) Wu, W., Issa, R.R.: BIM education for new career options: an initial investigation. In: BIM Academic Workshop, Washington, DC. US, 11th January 2013
Radar System for the Reconstruction of 3D Objects: A Preliminary Study Jeneffer Barberán1 , Ginna Obregón1 , Darwin Moreta1 , Manuel Ayala2 , Javier Obregón1 , Rodrigo Domínguez3 , and Jorge Luis Buele2(&) 1
Instituto Superior Tecnológico Tsa’chila, Santo Domingo 230109, Ecuador {jbarberan,gaobregon,dmoreta}@institutos.gob.ec, [email protected] 2 SISAu Research Group, Universidad Tecnológica Indoamérica, Ambato, Ecuador {mayala,jorgebuele}@uti.edu.ec 3 Escuela Superior Politécnica de Chimborazo, Riobamba 060302, Ecuador [email protected]
Abstract. Objects recognition and their reconstruction is a process that involves a significant economic investment. This manuscript presents the basis for the design of a system that detects objects, extracts its main characteristics and digitally reconstructs them in three dimensions considering a reduced economic investment. The popular technological tool Kinect on its version 2.0 and MATLAB software have been linked to develop an efficient algorithm. Next, the process to obtain this prototype is briefly described, as well as the results from the corresponding experimental tests. Keywords: Image processing
Kinect sensor 3D object reconstruction
1 Introduction The development of technology in recent decades has allowed important advances in all fields of science and engineering [1–3]. As part of it, radars features have increased due to scope and coverage requirements while “visibility” of the radar with respect to the object has decreased [4]. In addition, greater measurement accuracy, robustness when disturbance or interference and the ability to recognize multiple objectives are required [5]. These advances have allowed the construction of highly complex systems from three-dimensional radars capable of locating and tracking hundreds of targets in distance, azimuth and elevation. Powerful radars that work in HF having an approximate range of 2000 km [6]. Similarly, laser radars allow to determine the presence and quantity of aerosols and pollutants in the atmosphere. At present day, radar system is an electronics branch of high impact whose main application is evidenced in the military field, as well as in agriculture, geology and cartography. In meteorology it allows predicting the behavior of events such as heavy storms, tornadoes, hailstorms, rains and the underground geological study [7]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 238–247, 2020. https://doi.org/10.1007/978-3-030-45691-7_23
Radar System for the Reconstruction of 3D Objects
239
Despite spectacular achievement in radar element technology (antennas, transmitters, receivers, processors), traditional radars cannot fulfill the highest requirement in most of the cases, being necessary to move forward in design of the radar system and its construction basis [8]. In Ecuador, there are projects such as the new control radar system for aerial traffic that Civil Aviation will carry out. It is based on an integrated system which will provide 95% of aerial control from all over the nation, with a device placed on aircrafts. In this way, now it is possible to visualize the aerial space during the entire flight, from the take off to the landing. In order to detect aircraft and any type of object it is necessary to implement systems that allow their detection and depending on user’s need, said objects could be rebuilt three-dimensionally [9]. This can be executed using a 3D scanner, considering that only few places have this evolved technological capital [10]. Therefore, there are tools that have good benefits at lower costs such as Kinect. The work of [11] shows the important role that Microsoft Kinect played in development of depth sensors for consumers, with a massive sale in market. This article also presents a data comparison provided by Kinect first and second generation to explain the achievements made with the upgrade in technology. After a meticulous accuracy analysis of the two sensors in different conditions, two sample applications are presented which allows the reader to choose the most suitable for the application you want to make. A similar study is described in [12], which shows the performance of three different models of Microsoft Kinect sensor using the Primesense OpenNI driver. Although all Kinect models were able to determine the location of a target with a low standard deviation (< 2 mm), it has been determined that the model selection does infer in expected results obtained. The process of scanning objects is expensive for developing countries, whereby in this work we seek to provide the user with a system that allows the reconstruction of 3D objects with a reduced economic investment. It demonstrates that when implementing an appropriate algorithm, efficient experimental results can be obtained, although it should be clarified that this is a prototype estimative study, which is still predisposed to improve. The document is divided into five sections, including the introduction in Sect. 1 andimage acquisition in Sect. 2. Section 3 presents the development of the proposal and Sect. 4 describes the results obtained. Finally, the conclusions are presented in Sect. 5.
2 Image Acquisition 2.1
System Resolution
Resolution must be defined to start the development of the radar system, which is the minimum distance between two targets of similar characteristics that the system is able to determine. To calculate the sampling frequency (fs ), parameters such as bandwidth (B) and maximum frequency (fmax ) are considered as shown in (1). Additionally, in order to obtain the resolution of the system (DR) it is necessary to consider the parameters said above and the accuracy of the distance (da ) and light speed (c) as shown (2). In the other hand, the maximum distance (dmax ) of a signal that can perceive
240
J. Barberán et al.
the system is calculated as shown in (3). It should be clarified that the formula used is already halved, since it is a real case and there is no talk of an ideal object, when (tf ) is the final time. fs ¼ 2ðfmax Þ ¼ B DR ¼ da ¼
c c ¼ B 2ðfmax Þ
dmax ¼
2.2
cðtf Þ 4
ð1Þ ð2Þ ð3Þ
Object Detection
Object detection is generated from the transmitted signal which is modulated starting from a cosine function as is presented in (4). This wave is a function of the time vector (VðtÞ) and the sampling frequency, where propagation speed is the light speed, so (ft ) is the transmitted signal frequency, (t) the time and (u0 ) the offset angle. Time delay and offset angle are generated when the transmitted wave has been reflected. Time delay is produced during the wave round trip time (Vd ðtÞ) as seen in (5). Chirp signal is analyzed in order to obtain the distance where an object is located. It changes from a low-to-high frequency and vice versa at a determined period of time and is given by an amplitude K.
2.3
VðtÞ ¼ V0 cosð2pft þ u0 Þ
ð4Þ
Vd ðtÞ ¼ V0 K cosð2pf ðt 2td Þ þ u0 Þlðt 2td Þ
ð5Þ
Kinect Analysis
A Microsoft created system that allows users to interact with the Xbox 360 videogame console without having physical contact with a controller. The version 2.0 which is used in this research has the following components: RGB camera, multiple matrix microphone, deep sensor, tilt motor and the PrimeSense Chip. All this stuff allows to execute the software to recognize gestures, objects and images as well as voice commands and obtain the objects depth. Since its launch in November 2010, it has been coupling a number of hardware and software add-ons that allow its use in various applications and research. Common controllers which work with Kinect are SDK MICROSOFT and SDK OPENNI/NITE so for this research, the second was chosen because of it features shown in Table 1.
Radar System for the Reconstruction of 3D Objects
241
Table 1. Comparison between SDK MICROSOFT and SDK OPENNI/NITE. SDK MICROSOFT Audio support Includes hands, feet and collarbone Full body tracking No need for calibration posture Tilt motor support Better treatment of non-visible joints Supports multiple sensors Large amount of information available Gesture recognition system Version 1.7 allows running the Kinect SDK on virtual machines
SDK OPENNI/NITE Free and commercial use Acquire hand data by tracking and gesture recognition Full body tracking You can calibrate the depth and color of the image Joint rotation calculation Multiple platform: Windows XP, Vista, 7, Linux and MacOSX Built-in support for recording and playback
3 Proposal Development 3.1
Hardware
For this prototype system design, the connection of the Kinect device to a computer was adapted using the respective add-ons. Additionally, through Open NI and MATLAB software, the object detection was achieved. General plan is shown in Fig. 1. Kinect sensor is the principal tool in this system due to its functionality. As a starting point, the transmitted infrared projection affects objects through a point pattern, so it reflects to the CMOS Kinect receiver and as a result, the object is detected with location and depth. Subsequently, through programming in MATLAB, all data is debugged to estimate the central point that the object is located from Kinect device.
Fig. 1. General diagram of the proposed system.
242
3.2
J. Barberán et al.
Software
For the Kinect functioning, many libraries are used such as: Open NI, PrimeSense NITE and.mex files that allow linking it with MATLAB software installed on a PC. PrimeSense company, that is currently owned by Apple, develop controllers and software which allow the information acquisition from the technological tool to a PC. At the same time, Open NI library provides a generic infrastructure based on APIs with open source to control and access devices such as camera, sensor and audio of Kinect. Libraries. In addition, PrimeSense develop the NITE library whose code is not open but, it helps to access to different advanced Kinect functions such as body tracking in real time. For the interaction between the device and MATLAB software, it is necessary to use the Kinect-Mex.mex files that allow to obtain elemental information from the RGB and infrared images captured by the recognition device and processed in the mathematical software. The corresponding programming is carried out in MATLAB software, where depth measurements are extracted for the four lateral faces that constitute the dummy cubic area where the object is located. That is why a matrix is stored with depth information for each of the fourth lateral faces, to later represent data in binary matrices which allow visualizing the reconstruction of the object in 3 dimensions. Figure 2 briefly describes the stages that make up the program. To link Kinect and MATLAB, the folder path where the .MEX files are located is added, through which Kinect device can be turned on and off. To establish the connection in SamplesConfig.xml mode of operation, the “mxNiCreateContex” function is used. This mode of operation is saved in MATLAB as a context and allows both RGB and depth images to be obtained at a resolution of 640480 pixels.
Fig. 2. Brief diagram of the stages of the program developed
Application Design in the MATLAB GUI. The developed graphic user interface is made with components that allow entering parameters such as: horizontal edge length, object height and the distance at which each horizontal and vertical measurement will be taken. Using the “Start” button, the application starts measuring until it covers the
Radar System for the Reconstruction of 3D Objects
243
entire reconstruction area and finally present the object in three dimensions to the final user. The user interface is presented in Fig. 3.
Fig. 3. HMI presented to the end user.
4 Result Analysis To perform the reconstruction tests, rectangular objects were used to facilitate measurements and analysis of the results of this initial work. According to a previous study it was determined that the Kinect sensor works better at a distance of 2 m, for this reason and for the tests performed on the objects, it was considered that the length of horizontal edges should be 2 m, as shown in Fig. 4.
Fig. 4. Object capture 2 m away.
244
4.1
J. Barberán et al.
3D Object Reconstruction
For the object reconstruction graph, it is defined that each data with value “1” represents a fraction of the object in hexahedron or cube within the data set that contains the binary matrix. Once matrices that store depth measurements of the four lateral faces are obtained, four square matrices of 0’s are created with dimensions equal to the number of columns that one of the stored matrices has. Two tests of three-dimensional reconstruction of the same object formed by several boxes were performed, as can be seen in future figures and through mathematical calculations in the MATLAB software, the relevant matrices are filled out as shown below. As can be seen in Table 2, the horizontal and vertical length, height and deltas were used as parameters. Horizontal Delta is the range of data that will be taken longitudinally. There are different values, which means that in the first test 10 measurements will be taken (2/0.2 = 10) and in the second 20 measurements. For height there was considered 0.1 m and for both tests a vertical delta of 0.1, so you have only 1 vertical scan. Table 2. Definition of parameters for tests. Parameters Length Horizontal delta Height Vertical delta
Test 1 2m 0.2 m 0.1 m 0.1 m
Test 2 2m 0.1 m 0.1 m 0.1 m
Fig. 5. Object reconstruction obtained in the first test.
Table 3 shows the data matrices values stored when performing the first test, where previously the amount of 1’s in 0’s matrices has already been determined and the rounding of the result obtained. Figure 5 shows the object and its respective threedimensional reconstruction in an area of 2 m2, for measurements of horizontal displacement of 0.2 m, vertical displacement of 0.1 m and for height of 0.1 m. However,
Radar System for the Reconstruction of 3D Objects
245
it is evident that each cardboard box that is part of the object 1 is represented in the reconstruction by 2 cubes, so it means there is a coincidence of 2 data by the width and 1 data by the length of each box. In addition, there is a height of 0.1 m for the entire object, determining that matrices containing depth information have only one data row. For a better understanding, matrices obtained by performing this procedure are shown in Fig. 6.
Table 3. 4-sided reconstruction matrix in the first test. Face 1 Matrix Item M R3 1’s 1 0 0 0 2 0 0 0 3 0 0 0 4 0,99 4,95 5 5 0,989 4,945 5 6 1,337 6,685 7 7 0 0 0 8 0 0 0 9 0 0 0 10 0 0 0
Face 2 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 1,186 5,93 6 1,197 5,984 6 1,181 5,905 6 0,884 4,42 4 0 0 0 0 0 0 0 0 0
Face 3 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 0 0 0 1,173 5,865 6 1,171 5,855 6 1,325 6,625 7 0 0 0 0 0 0 0 0 0
Face 4 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 1,309 6,545 7 1,31 6,55 7 1,023 5,115 5 1,031 5,155 5 0 0 0 0 0 0 0 0 0
Fig. 6. 0’s matrices (4 faces and reconstruction) with 1’s assignment.
Table 4 presents values corresponding to the second test performed. In Fig. 7 the object and its respective reconstruction are shown in an area of 2 m2, for measurements of 0.1 m horizontal and vertical displacement and for 0.1 m height. However, it is evident that each cardboard box that makes up the object (3 boxes), is represented in the reconstruction by 6 cubes. There is a match of 3 data for width and 2 data for length of each box. Just as in previous test, it can be seen that there is a height of 0.1 m for the object, confirming that matrices containing depth measurements have only one data row.
246
J. Barberán et al.
Fig. 7. Object reconstruction obtained in the second test.
Table 4. 4-sided reconstruction matrix in the second test. Item 1 2 3 4 5 6 7 8 9 10
Face 1 Matrix M R3 0 0 0 0 0 0 0,9993 4,948 0,994 4,978 1,313 6,687 0 0 0 0 0 0 0 0
1’s 0 0 0 5 5 7 0 0 0 0
Face 2 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 1,181 5,97 6 1,179 5,96 6 1,181 5,91 6 0,88 4,4 5 0 0 0 0 0 0 0 0 0
Face 3 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 0 0 0 1,182 5,67 6 1,188 5,671 6 1,317 6,78 7 0 0 0 0 0 0 0 0 0
Face 4 Matrix M R3 1’s 0 0 0 0 0 0 0 0 0 1,309 6,501 7 1,31 6,55 7 1,301 5,083 5 1,012 5,081 5 0 0 0 0 0 0 0 0 0
5 Conclusions The prototype system proposed in this work is developed in four main blocks that represent the processes inherent in handling and use of information provided by the Kinect device. These blocks are: image capture, depth estimation, information storage and object reconstruction. The tilt motor of the Kinect sensor allows to set up the inclination angle of the IR and RGB camera by 27° up and down. It let the device move in 10 different positions with a calibration angle of ± 5.4 degrees for each position, being 0 degrees of inclination (neutral position) when the mentioned chambers are aligned parallel to the axis of the ground. For this application, 0 degrees of inclination were used to obtain more accurate measurement data. Experimental tests were carried out in an area of 2m2 and it was determined that the optimum depth of an object is given at a maximum distance of 1.2 m and at a minimum distance of 0.5 m when the depth chamber is parallel to the
Radar System for the Reconstruction of 3D Objects
247
ground axis. The algorithm developed in MATLAB, allows to reconstruct objects through binary square matrices, i.e. a matrix is created for each existing row in the depth matrices. Thus, each element of the binary matrix with value 1 represents an object fraction. Experimental tests showed that this proposal is valid for object reconstruction, captured by a controller for entertainment purposes. Thus, several investigations can start from this work and so the authors propose as future work to expand this proposal to obtain a more efficient reconstruction and closer to the real object.
References 1. Buele, J., Franklin Salazar, L., Altamirano, S., Abigail Aldás, R., Urrutia-Urrutia, P.: Plataforma y aplicación móvil para proporcionar información del transporte público utilizando un dispositivo embebido de bajo costo. RISTI – Rev Iber. Sist. e Tecnol. Inf. E17, 476–489 (2019) 2. Fu, K.K., Wang, Z., Dai, J., Carter, M., Hu, L.: Transient Electronics: Materials and Devices (2016). https://doi.org/10.1021/acs.chemmater.5b04931 3. Fernández-S, Á., Salazar-L, F., Jurado, M., Castellanos, E.X., Moreno-P, R., Buele, J.: Electronic system for the detection of chicken eggs suitable for incubation through image processing (2019). https://doi.org/10.1007/978-3-030-16184-2_21 4. Franceschetti, G., Lanari, R.: Synthetic aperture radar processing (2018). https://doi.org/10. 1201/9780203737484 5. Ramachandra, K.V.: Maneuvering target tracking. In: Kalman Filtering Techniques for Radar Tracking (2018) 6. Shapero, S.: Introduction to Modern EW Systems, 2nd Edition de Martino, A. [Book review]. IEEE Aerosp. Electron. Syst. Mag. 34, 62–63 (2019) 7. McNairn, H., Shang, J.: A review of multitemporal synthetic aperture radar (SAR) for crop monitoring. In: Remote Sensing and Digital Image Processing (2016). https://doi.org/10. 1007/978-3-319-47037-5_15 8. De Maio, A., Eldar, Y.C., Haimovich, A.M.: Compressed Sensing in Radar Signal Processing. Cambridge University Press, Cambridge (2019). https://doi.org/10.1017/ 9781108552653 9. Mcfadyen, A., Mejias, L.: A survey of autonomous vision-based see and avoid for unmanned aircraft systems (2016). https://doi.org/10.1016/j.paerosci.2015.10.002 10. Starodubov, D., McCormick, K., Nolan, P., Volfson, L., Finegan, T.M.: Eye safe single aperture laser radar scanners for 3D acquisition. In: Radar Sensor Technology XX (2016). https://doi.org/10.1117/12.2224298 11. Zennaro, S., Munaro, M., Milani, S., Zanuttigh, P., Bernardi, A., Ghidoni, S., Menegatti, E.: Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In: Proceedings - IEEE International Conference on Multimedia and Expo (2015). https://doi. org/10.1109/ICME.2015.7177380 12. DiFilippo, N.M., Jouaneh, M.K.: Characterization of different Microsoft Kinect sensor models. IEEE Sens. J. 18, 4554–4564 (2015)
Multimedia Systems and Applications
Achieving Stronger Compaction for DCT-Based Steganography: A Region-Growing Approach Mohammed Baziyad1(B) , Tamer Rabie2 , and Ibrahim Kamel2 1
Research Institute of Sciences and Engineering (RISE), University of Sharjah, Sharjah, United Arab Emirates [email protected] 2 Computer Engineering Department, University of Sharjah, Sharjah, United Arab Emirates {trabie,kamel}@sharjah.ac.ae
Abstract. The strong energy compaction property of the Discrete Cosine Transform (DCT) has inspired researchers to utilize the DCT transform technique in steganography. The DCT transform can effectively represent a signal within a few significant coefficients leaving a large area with insignificant coefficients that can be safely replaced by a substantial amount of secret data. It has been proven that this strong energy compaction property has a relation with the correlation of the signal. The higher the correlation, the stronger the compaction, and thus more data can be hidden. Therefore, several state-of-the-art image steganography techniques tend to segment the cover image into homogeneous segments to exploit the strong compaction property of the DCT. In this paper, a precise segmentation process using the region-growing segmentation method is applied to maximize the homogeneity level between pixels in a segment, and thus, maximizing the hiding capacity while achieving improved stego quality.
1
Introduction
Steganography is the branch of cryptology that is concerned with concealing the existence of secret communications between two entities. That is done by hiding the secret message into an “innocent” cover medium to obtain the stego medium. The stego medium should be identical to the cover medium so that it does not become suspicious that this stego medium contains a secret message. On the other hand, cryptography is the branch that deals with securing the data itself by scrambling the data to an unintelligible form, so that an eavesdropper can not recognize the secret message being sent. In cryptography, the eavesdropper is still able to recognize the existence of the secret communication between the sender and the receiver. However, the security of the communication is moved to the next level by hiding the existence c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 251–261, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_24
252
M. Baziyad et al.
of the secret communication between the two entities, since all what an eavesdropper would intercept is an “innocent” cover medium that does not reveal any suspiciousness in the communication. Steganography is not a new art as it has been traced back to the 440 BC using some primitive conventional techniques [1]. However, with the increased usage of critical multimedia communications through insecure public networks, the interest in steganography techniques has raised dramatically due to its security advantages over cryptography [2–5]. Due to this increased demand on steganography techniques, several steganography techniques have been published in the literature targeting improving all or some of the steganography attributes, namely, stego quality, capacity, robustness and security [6–9]. Capacity refers to the amount of secret data that can be embedded in the cover medium. On the other hand, the stego quality attribute is concerned about the quality of the stego medium, and the degree of similarity between the stego medium and the cover medium. The stego quality attribute is a critical attribute since having a poor stego quality makes the stego medium dubious that it carries a secret message which defeats the ultimate goal of steganography. It has been widely known in the steganography literature that there is an inverse relationship between stego quality and capacity. It has been reported in many research papers that increasing the hiding capacity will affect adversely the quality of the stego medium [8]. The quality-capacity trade-off phenomenon is natural since hiding is done by replacing some data of the cover medium with the secret data. Increasing the hiding capacity simply means replacing more data of the cover medium and thus more degradations are expected to be presented in the stego medium. Steganography techniques can be broadly categorized into two general categories, the time/spatial domain techniques, and the frequency-based techniques. Secret data is directly hidden in the natural representation of the cover signal in the time/spatial domain techniques, while a frequency transform is first applied on the cover medium, and then hiding is performed in the frequency domain of the cover medium. Generally, secret data is hidden in insignificant portions of the cover medium determined by a significance inspection algorithm powered by the Human Visual System model (HVS). In the transform domain techniques, the cover medium is first transformed into the frequency domain using one of the popular transform techniques such as the Fourier Transform (FF), the Wavelets and the Discrete Cosine Transform (DCT). The DCT transform has gained popularity in steganography and compression techniques due to its strong energy compaction property. It has been proven that the DCT transform has a powerful ability to represent a signal within a few number of significant coefficients. Research reports also indicate that the strength of the energy compaction is highly related to the homogeneity level of the signal. The higher the homogeneity the stronger the energy compaction of the signal [10].
Achieving Stronger Compaction for DCT-Based Steganography
253
Therefore, several state-of-the-art image steganography techniques tend to segment the cover image into homogeneous segments and take the DCT of these homogeneous segments instead of the whole image. Since there is a high correlation level between pixels within a certain segment, it is expected that the DCT domain of these segments will contain a huge amount of insignificant DCT coefficients which can be replaced safely with the secret data. This has enabled segmentation-based hiding techniques to embed with a high capacity ratio while achieving advanced stego quality. Before declaring the contribution of this paper, and to make the contribution clear, we first introduce different techniques used for segmenting digital images in Sect. 2. Section 3 emphasizes the contribution of the paper and compares the work with state-of-the-art related work. The proposed technique is illustrated in Sect. 4. Section 5 presents the experimental comparative results. Finally, concluding remarks appear in Sect. 6.
2
Image Segmentation Techniques
Image segmentation is a process that partitions an image into its constituent regions or objects. Pixels that share certain characteristics such as texture, intensity, or color within a region will be tagged within a unique label. The objective of image segmentation techniques is to locate boundaries and objects in images. Image segmentation is a widely used technique that has a number of practical applications such as medical imaging, face recognition, and traffic control systems. There are two main classical approaches to segmentation, the top-down approach, and the bottom-up approach. The top-down approach starts with the whole image and then begins segmenting into smaller regions. In contrast, the bottom-up approach starts with a pixel and starts to grow to fill a certain region. The quad-tree segmentation technique is an example of a top-down segmentation approach. The quad-tree takes into consideration the spatial homogeneity distribution in the image intensities and aims to segment an image into nonoverlapping homogeneous correlated blocks. The quad-tree process starts by dividing the image into four equal squares. Next, each square will be checked whether it has met the homogeneity criteria or not. If the block did not meet the criteria yet, the block will be sub-divided further into four squares. This dividing process is repeated iteratively until every block meets the homogeneity criteria. A simple homogeneity criterion can be by inspecting the difference between the highest and lowest pixel value in a block. If the difference value is lower than a pre-defined threshold Tquad , then it is assumed that the block is homogeneous. The quad-tree algorithm can also be configured with a minimum block size and a maximum block size such that it will not produce blocks smaller than the minimum specified block size (e.g. 8 × 8) or larger than the maximum specified block size (e.g. 128 × 128). Blocks larger than the maximum specified block size are split even if they meet the threshold condition. At the end of the quad-tree
254
M. Baziyad et al.
process, the input image will be divided into variable-sized blocks based on the correlation between pixels. Figure 1 shows the implementation of the quad-tree segmentation technique.
Fig. 1. The quad-tree segmentation technique applied on the “F15Large” image with a minimum block size of 32 × 32
On the other hand, the region-growing segmentation technique does not segment the input image into squared blocks. Instead, it has the ability to segment an image into its constituent regions or objects whatever their shapes are. The region-growing segmentation process starts by electing a random seed pixel and compare this pixel with neighboring pixels. Next, the region is grown from the seed pixel toward correlated pixels. Once there are no more correlated pixels, the growing process is stopped and a new random pixel that does not belong to any region is selected to perform the growing process. These processes are repeated iteratively until each pixel in the image belongs to a region. Figure 2 shows the output of the region-growing segmentation technique.
Fig. 2. The region-growing segmentation process performed on a 4 × 4 section of an image. The threshold was set to have a value of 4
The correlation criterion is determined by setting a threshold value Tregion that represents the maximum allowed difference between the maximum and minimum value in a region. When adding a new pixel to the region, the difference
Achieving Stronger Compaction for DCT-Based Steganography
255
between the maximum and minimum value is calculated (including the newly added pixel), and if the difference is less then Tregion , then the new tested pixel will be added to the region. Otherwise, if the calculated difference value is greater than Tregion , it will not be added to the region, since adding this pixel will break the homogeneity criterion of this region. Based on Fig. 2, the threshold was set to have a value of 4. That means that the maximum difference between the maximum value and the minimum value in a region should not exceed the value of 4.
3
Related Work
Hiding in the DCT transform coefficients is a popular approach in the steganography field. Several state-of-the-art steganography techniques have utilized the advantages of the DCT transform such as the strong energy compaction property [3,11]. In [3], a JPEG image hiding technique based on the dependencies of inter-block DCT coefficients is proposed. The main idea was to preserve the differences among DCT coefficients in neighboring DCT blocks as much as possible. The cost values are adaptively assigned based on the modifications of adjacent inter-blocks in the embedding process. As best of our knowledge, the first attempt to investigate the relationship between the strong compaction property of the DCT transform and the correlation between pixels is reported in [10]. The proposed technique segments first the cover image into non-overlapping fixed-sized blocks, and then takes the DCT of each block. The idea was to increase the homogeneity since the correlation level in a block will surely be higher than the correlation level of the whole image. Then, the secret data is hidden in a squared area in the high-frequency area of each block. The size of the squared area to host the secret data is pre-defined and is unified for all blocks. In the Fixed-Block-size Globally Adaptive-Region (FB-GAR) technique [12], instead of hiding in a pre-defined fixed-size squared area in the high-frequency region of each block, the size of the squared area to host the secret data is adaptively chosen based on the number of insignificant DCT coefficients in the block. Inspecting the insignificant DCT coefficients is done using a JPEG quantization procedure. In each block, a maximum possible square is placed in the high-frequency region of the DCT of the block to cover the insignificant DCT coefficients which will be replaced by the secret data. The homogeneity within a block is further increased by utilizing the quad-tree segmentation technique in the Quad-Tree Adaptive-Region (QTAR) technique [13]. Instead of segmenting the cover image into fixed blocks, the cover image is segmented into adaptive-sized blocks based on the correlation between pixels. Since the homogeneity level within a quad-tree block is higher than the homogeneity level within a fixed-size block, it is expected that the compaction property of the DCT will be stronger within a quad-tree block. Therefore, the quad-tree technique proposed in [13] had showed advanced results over the fixedblock techniques proposed in [12] and [10]. In each quad-tree block, the secret
256
M. Baziyad et al.
data is hidden in a squared-area in the high-frequency area as in the technique proposed in [12]. It has been noticed that a squared-shaped area does not fit the entire area that contains the insignificant DCT coefficients. To benefit optimally from the compaction property of the DCT of the quad-tree approach in [13], The area instead is enclosed by a curve in [14] where the insignificant DCT coefficients are located under the curve in the DCT domain. This curve-fitting procedure has allowed the embedding technique to capture the whole insignificant DCT area, and thus increased hiding capacity and stego quality. This paper aims to fully exploit the strong compaction property of the DCT by maximizing the homogeneity level between pixels using the region-growing image segmentation technique. The region-growing technique has the ability to segment an image precisely into its constituent regions or objects instead of square-shaped regions as in the quad-tree segmentation technique. A natural image usually has non-squared objects which make segmenting using the regiongrowing segmentation technique more appropriate. Therefore, since the homogeneity in a region is expected to increase, the energy compaction property of the DCT is expected to be stronger, and thus, improved results in terms of hiding capacity and stego quality are expected to be achieved.
4
The Proposed Hiding Technique
The strong energy compaction property of the DCT transform has enabled DCTbased steganography techniques to embed a large amount of secret data without sacrificing the quality of the stego image. Moreover, it has been proven that the homogeneity between the pixels has an effect on the energy compaction property of the DCT [10]. The energy of highly correlated image is compacted strongly in a few low-frequency DCT components leaving a large area with insignificant DCT coefficients that can be replaced with secret data. As described earlier in Sect. 1, segmenting the input cover image into segments increases the homogeneity, and thus stronger compaction property is expected. Hence, higher capacity rates of the data hiding technique with improved stego quality is expected. Earlier work that exploit the relationship between data correlation and the strong compaction property have adopted block-based segmentation techniques such as the fixed-block segmentation and the quad-tree segmentation technique. However, the region-growing segmentation technique described earlier in Sect. 2 is utilized in this paper to perform a precise segmentation process on the input cover image, and segment it into objects rather than blocks as in the quad-tree segmentation technique. The steps of the embedding process of the proposed technique are illustrated in Fig. 3. First, the cover image is segmented into objects using the regiongrowing segmentation technique described in Sect. 2. Next, a region is selected and converted from its spatial 2D representation into a 1D vector using the column-major order conversion technique shown in Fig. 4. After that, the 1DDCT is applied to the 1D vector.
Achieving Stronger Compaction for DCT-Based Steganography 1 Region growing segmentaƟon
5 Rescale to range [0-0.01]
Cover image
Secret image 0.12 0.34 0.01 0.05
181.6
0.12
0.34
7 Apply 1D-IDCT
0.01
0.05
8 Convert back to 2D
2 Select a region
Segmented regions 4 Apply 1D-DCT and extract the magnitude
6 Hide in the high-frequency of the DCT magnitude ( 181.6
1.433
0.075
1.717
257
1.340
3 Convert to 1D vector 80 81 82 80 83
9 Re-pack all regions
82 80 82 81 81 Stego image
Fig. 3. The proposed hiding scheme
Fig. 4. The column-major order conversion technique
It has been proven that the DCT phase preserves more information of the transformed image than the DCT magnitude [10]. The proposed technique exploits the criticality of the DCT phase by hiding only in the DCT magnitude of the cover image. Next, hiding is performed in the high-frequency insignificant portion of the DCT magnitude. The amount of data to be hidden is controlled using the tuning parameter c which has a range of [0–1]. The value of c indicates the size of the portion of the DCT vector to host the secret data. For example, a c value of 0 simply means that no data to be embedded in the DCT vector, while a value of 1 simply means that the whole DCT vector will be replaced with the secret data. A c value of 0.5 indicates that half of the DCT vector will host the secret data. To be noted that always the secret data hosting portion is located in the high-frequency region of the DCT. For example, a c value of 0.3 simply means that hiding is performed in the last 30% coefficients of the DCT magnitude vector since high-frequency coefficients are located at the end of the DCT vector. To be noted that the secret image is re-scaled to the range [0–0.01] to blend into the natural range of the insignificant DCT coefficients values.
258
M. Baziyad et al.
After hiding, the Inverse DCT (IDCT) is applied on the DCT vector to regenerate the pixel representation of the vector. After that, the inverse procedure on the column-major order conversion technique shown in Fig. 4 is applied to convert the 1D vector back into the spatial representation of the object. These hiding procedures described up to this point are repeated on all objects segmented by the region-growing segmentation technique. To perform an ideal extraction of the secret data, the sender must send the segmentation information to the receiver side which is the location and shape of each segmented object. The segmentation information can be represented using labels where each pixel in the image has a label referring to the segment (object) that this pixel belongs to. However, sending the segmentation information to the receiver side increases the communication cost which makes the proposed technique impractical. Another implementation is proposed to reduce the communication cost where the sender is not required to send any information related to the segmentation information. Instead, the receiver applies the region-growing segmentation technique on the stego image. However, the output of segmenting the stego image will not be identical to the segmentation output of the cover image. Thus, degraded quality of the extracted image is expected. To compensate between the communication cost and the extraction quality, a third extraction approach is proposed. Before sending, the sender applies the region-growing segmentation technique on the stego image and compares the stego segmentation output with the cover segmentation output. The sender finds the error between the two segmentation outputs and sends only the error information. At the receiver, the segmentation process is applied on the stego image and then corrects the segmentation output with the error information received from the sender. The sender must also send some parameters such as the region-growing threshold parameter Tregion and the capacity tuning parameter c to the receiver to be able to extract the secret image. Extraction is done in the reverse order. Each region is converted to a 1D vector using the column-major order conversion technique. Next, the 1D-DCT is applied on the 1D vector and the magnitude is separated from the phase. Finally, the secret data is extracted using the c parameter, which will point to the location of the secret data. Finally, the secret image is re-scaled back to the natural range of [0–255].
5
Experimental Results
This section presents the results obtained using the proposed region-growing hiding technique using two different cover images. The technique is compared to other segmentation-based hiding schemes such as the FB-GAR [12] and QTAR [13] techniques described earlier in Sect. 3. Table 1 shows the comparative results obtained using the proposed, FB-GAR and QTAR techniques. It is clear from Table 1, that the proposed hiding scheme outperformed FBGAR and QTAR hiding techniques in both capacity and PSNR values. Another
Achieving Stronger Compaction for DCT-Based Steganography
259
Table 1. Comparative results of the proposed scheme versus the FB-GAR and QTAR schemes Cover
Technique
Parameter
Capacity PSNR
F15Large
FB-GAR [12] 32 × 32 64 × 64 128 × 128 QTAR [13] 32 × 32 64 × 64 128 × 128 Proposed c Tregion 0.90 0.15 0.20 0.25 0.95 0.15 0.20 0.25 0.98 0.15 0.20 0.25
18.05 19.29 20.83 19.29 19.97 21.01
27.64 27.67 27.24 27.89 27.73 27.21
19.83 19.95 20.13 20.94 21.06 21.25 21.60 21.72 21.92
41.60 35.37 33.00 38.84 33.50 31.02 38.80 31.29 27.30
TigerPounce FB-GAR [12] 32 × 32 64 × 64 128 × 128 QTAR [13] 32 × 32 64 × 64 128 × 128 Proposed c Tregion 0.90 0.15 0.20 0.25 0.95 0.15 0.20 0.25 0.98 0.15 0.20 0.25
17.17 19.04 19.63 18.8 19.37 19.63
28.71 27.64 27.23 28.85 28.38 27.23
17.07 17.12 17.25 18.02 18.07 20.21 18.59 18.65 18.78
52.88 50.46 34.98 52.88 32.54 32.03 52.88 26.76 26.72
noticeable fact, that by increasing the Tregion parameter, the capacity increases. The reason is that by increasing the Tregion parameter, the number of regions decreases, and thus, fewer significant coefficients will be available.
260
M. Baziyad et al.
The outstanding performance of the proposed technique can be attributed to the region-growing segmentation technique. This segmentation technique has the ability to increase the homogeneity of the region, and thus, larger amount of insignificant DCT coefficients can be found. Therefore, increased capacity and PSNR values can be achieved. Segmentation-based steganography techniques suffer from blockiness artifacts since hiding in different blocks, creates some perceptual differences between the blocks. However, the region-based segmentation technique does not produce blocks, it produces regions. Thus, the abrupt change between regions will not be clear since the noise, in this case, is distributed in edges which are high-frequency areas of the image, and it is widely known that the human perception system can tolerate the noise presented in high-frequency areas of the image [7].
6
Conclusion
In this paper, a new steganography technique based on the region-growing segmentation technique is proposed. The idea was to utilize the region-growing segmentation technique to increase the homogeneity of the region, and thus, stronger DCT energy compaction was achieved. Due to the strong energy compaction property of DCT transforms, larger amount of insignificant DCT coefficients was found. Therefore, the proposed technique was able to achieve increased capacity and PSNR values over competitive segmentation-based techniques. Moreover, the proposed technique was able to remove blockiness artifacts since the noise is distributed in edges which are high-frequency areas of the image.
References 1. Datta, B., Roy, S., Roy, S., Bandyopadhyay, S.K.: Multi-bit robust image steganography based on modular arithmetic. Multimedia Tools Appl. 78(2), 1511–1546 (2019) 2. Zhang, Y., Qin, C., Zhang, W., Liu, F., Luo, X.: On the fault-tolerant performance for a class of robust image steganography. Signal Process. 146, 99–111 (2018) 3. Liao, X., Yin, J., Guo, S., Li, X., Sangaiah, A.K.: Medical jpeg image steganography based on preserving inter-block dependencies. Comput. Electr. Eng. 67, 320–329 (2018) 4. Hussain, M., Wahab, A.W.A., Idris, Y.I.B., Ho, A.T., Jung, K.-H.: Image steganography in spatial domain: a survey. Signal Process. Image Commun. 65, 46–66 (2018) 5. Liao, X., Guo, S., Yin, J., Wang, H., Li, X., Sangaiah, A.K.: New cubic reference table based image steganography. Multimedia Tools Appl. 77(8), 10033–10050 (2018) 6. Rabie, T., Baziyad, M.: The pixogram: addressing high payload demands for video steganography. Access (2019) 7. Baziyad, M., Rabie, T., Kamel, I.: Extending steganography payload capacity using the l* a* b* color space. In: 2018 International Conference on Innovations in Information Technology (IIT), pp. 1–6. IEEE (2018)
Achieving Stronger Compaction for DCT-Based Steganography
261
8. Rabie, T., Kamel, I., Baziyad, M.: Maximizing embedding capacity and stego quality: curve-fitting in the transform domain. Multimedia Tools and Appl. 77(7), 8295–8326 (2018) 9. Rabie, T., Baziyad, M.: Visual fidelity without sacrificing capacity: an adaptive laplacian pyramid approach to information hiding. J. Electr. Imaging 26(6), 063001 (2017) 10. Rabie, T., Kamel, I.: On the embedding limits of the discrete cosine transform. Multimedia Tools Appl. 75(10), 5939–5957 (2016) 11. Saidi, M., Hermassi, H., Rhouma, R., Belghith, S.: A new adaptive image steganography scheme based on DCT and chaotic map. Multimedia Tools Appl. 76(11), 13493–13510 (2017) 12. Rabie, T., Kamel, I.: High-capacity steganography: a global-adaptive-region discrete cosine transform approach. Multimedia Tools Appl. 76(5), 6473–6493 (2017) 13. Rabie, T., Kamel, I.: Toward optimal embedding capacity for transform domain steganography: a quad-tree adaptive-region approach. Multimedia Tools Appl. 76(6), 8627–8650 (2017) 14. Rabie, T., Kamel, I., Baziyad, M.: Maximizing embedding capacity and stego quality: curve-fitting in the transform domain. Multimedia Tools Appl. (2017). https:// doi.org/10.1007/s11042-017-4727-5
Teaching Computer Programming as Well-Defined Domain for Beginners with Protoboard Carlos Hurtado1(&), Guillermo Licea2, Mario García-Valdez1, Angeles Quezada1, and Manuel Castañón-Puga2 1
2
Tecnológico Nacional de México/Instituto Tecnológico de Tijuana, 22430 Tijuana, BC, Mexico {carlos.hurtado,mario, angeles.quezada}@tectijuana.edu.mx Universidad Autónoma de Baja California, Calzada Universidad 14418, 22390 Tijuana, BC, Mexico {glicea,puga}@uabc.edu.mx
Abstract. Protoboard is a Learning Management System (LMS) developed to support the teaching-learning process of programming subjects. Protoboard uses a recommender based on fuzzy logic rules according to difficulty levels; the study material are learning objects and programming exercises, and it sends feedback to students about syntax and instructions. In this article, we describe the protoboard system used in beginner Mexican students in a Java programming course, in this system rules and restrictions are used to define the programming domain so that students learn good programming practices as they begin with Java. Keywords: E-learning programming
Learning management systems Computer
1 Introduction Online learning (also known as e-learning) adapts to any discipline thanks to its versatility, universal access, and low cost compared to traditional education. E-learning has revolutionized the education system due to the advantages it adds compared to traditional learning. Different studies show that e-learning is a useful tool to teach complex knowledge and the problem-solving ability, which is very important in engineering education: for example, research in online engineering education [1] established that online technologies are related to improving engineering education quality. Currently, with the increase of computing devices such as smartphones, tablets, and computers, there is a greater need for better engineers with programming and related skills knowledge [2]. To obtain better results with computer science students, university professors must teach programming languages and tools to improve the teaching-learning process © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 262–271, 2020. https://doi.org/10.1007/978-3-030-45691-7_25
Teaching Computer Programming
263
quality. Computer programming is a competence that computer science students must learn since it is a fundamental part of the curriculum. However, teaching this subject is more difficult than teaching subjects such as physics, calculus or chemistry [3, 4]. Programming skills have been difficult to learn for students at beginner levels [5–7]. Difficulties and problems experienced in these subjects makes the use of different techniques and methods necessary and vital [6, 7]. Some authors [8, 9] have classified the domains according to their characteristics, and even though there is no consensus, one can speak of well and poorly defined areas and levels of difficulty such as problem-solving domains, unverifiable analytic domains and design domains [10]; although, it should note that often the border between well and poorly defined domains is diffuse [11]. For the well-defined types of domains, two fundamental methods have been successfully developed [12]. Classified by some authors as cognitive models [13]: rulebased models [14, 15] and models based on restrictions [15, 16]. However, for poorly defined domains, despite the interest represented by their study, no general methods have been found [17]. Due to its characteristics, the programming domain is poorly defined [18], even for simple programs there are multiple correct solution paths and numerous possible errors that students can make [19]. In recent years Java has been used in many universities. This programming language is one of the most popular to teach object-oriented programming, according to one of the approaches proposed by several authors “First imperative, then objectoriented” [20–23] or “Object-oriented from the beginning” [24–28]. Learning management systems (LMS) are software applications to manage, document, track, report and deliver online education courses, many of the LMS are based on the web, allowing access through the internet to learning and administration materials. Today, almost any educational institution in the world has some distancelearning portal, where learning materials and exercises to assess student knowledge are available. LMS allows a wide range of activities for all users, the main disadvantage of elearning systems is a loss of direct communication between student and teacher. Also, the structure and learning materials are the same for all students (“One fits all”). However, the learning materials must be adaptive according to students’ characteristics [29–31]. The system must maintain a correct representation of information and be able to reason about domain knowledge taught to create feedback. The techniques and structures used for this purpose form a conceptual entity known as a domain model or expert model. This article is organized as follows. Section 2 describes the developed educational platform operation and the components that integrate it to make recommendations. Section 3 shows the programming course theory and practice interfaces. Section 4 presents results obtained after students’ interaction with the platform, that is: studying the leaning materials and programming the exercises. Section 5 shows a discussion based on the research carried out and observations of the results obtained. Section 6 shows the conclusions obtained after results analysis, and, finally, Sect. 7 shows the future work planned for the platform.
264
C. Hurtado et al.
2 Protoboard Features This section presents an educational adaptive hypermedia system [32], where students must complete learning activities previously specified by the instructor. These learning activities can be determined using Mamdani fuzzy inference rules that recommend diffuse values as its consequents, the fuzzy inference rules are based on difficulty levels of the learning objects (beginner, medium and advanced), and the system can make recommendations according to students’ knowledge through these rules. The application has a recommendation algorithm that considers the following cases: When a new student registers, the filtering algorithm needs to know its previous preferences to be precise, using the values of a classification matrix. When a new user is added to the system or does not have a certain number of classifications, there is not enough data to provide precise recommendations, so the teacher offers a value based on students of the same level. When adding a new learning activity, each one has standard metadata [33] indicating among other things: target audience, difficulty levels, formats, authors, and versions; this information can be used together with teachers’ suggestions to make content-based recommendations when new learning activities are added to the System.
3 Using the Protoboard Application The protoboard programming course has all the topics and subtopics of the subject taught at the university, however in this article, we focus on unit number five, Control structures, the units order is the following: the sub-themes with a book icon are learning materials, and those with the coffee cup icon are programming examples, the system using restrictions guides the student to see the learning material first and then performs two programs, one to complete code and other to code from scratch Fig. 1.
Fig. 1. Unit five topics.
Figure 2 shows the learning material screen, where students see academic content, some programming examples and then go on to practice, the learning material displayed on this screen can be text, images, and videos.
Teaching Computer Programming
265
Fig. 2. Learning material screen.
Figure 3 shows the screen of programming examples, here the student must follow the instructions indicated, it is crucial to pay close attention to read the instructions and to do what the System suggests.
Fig. 3. Exercises to complete a program and make a program from scratch
There are several restrictions, so the student first learns good programming practices such as proper variables names, constants, sentence declaration, the use of parenthesis and curly brackets. In case requirements are not completed the program will give feedback on those errors; the lower part of the instructions shows expected output, on the left, there is an example of programming where the student has to complete the code, and on the right, the student has to do the program from scratch.
Fig. 4. Instructions and syntax errors screens.
266
C. Hurtado et al.
Figure 4 shows the programming errors interfaces, the screen at the left shows the feedback given of possible errors if the student didn’t follow the instructions as expected and the screen at the right shows the feedback in case the student makes syntax errors.
Fig. 5. Completed program window.
In case the user completes the programming exercise successfully, a message with all tests passed will display (Fig. 5), and the user can continue with other exercises.
4 Results For testing, one hundred and twelve engineering students of two Mexican universities: Autonomous University of Baja California (UABC) and Tijuana Institute of Technology (ITT) were selected. The tests were conducted on UABC senior students of mobile devices application development course and ITT sophomore students of a programming course. Both courses include in the syllabus the basics of Java programming language. Control structures from unit five were used for testing, here students were asked to study the learning material and to make the programming exercises exactly as they were taught in classes, concerning: arrangement, declaration of symbols and variables, because this course is for beginners the main objective was learning things like syntax, semantics and good programming practices like identation, naming variables and using curly brackets among other things. Six subtopics of the unit were evaluated, in selection structures: if statement, if-else statement, selection instruction, and in iteration structures: while, do-while and for. After studying the learning materials of a subtopic students are asked to perform two exercises related to that topic, in the first one they have to complete a program and in the other one they must do the program from scratch, to do a total of twelve programs, six of completing code and six of doing the program from scratch. After doing the programs, the number of attempts made by all the students was analyzed, obtaining the following results.
Teaching Computer Programming
267
Table 1 shows the table of total attempts, average, median and standard deviation (STDEV) of the exercises, on the table it is observed that on average students need one to three attempts to do the programs correctly, being the programs of if from scratch and complete if-else, the ones that required more attempts. Table 1. Total student’s attempts Exercises If (complete) If If-else (complete) If-else Switch (complete) Switch While (complete) While Do-while (complete) Do-while For (complete) For
Attempts 186 285 316 195 199 260 153 185 176 184 254 198
Average 1.660 2.5 2.821 1.74 1.776 2.321 1.366 1.65 1.571 1.642 2.267 1.767
Median 1 2 2 1 1 2 1 1 1 1 2 1
STDEV 1.159 2.2 3.238 1.15 1.563 1.983 0.657 1.19 0.965 1.145 2.143 1.530
Table 2 shows the attempts of forty computer science students who took the programming in Java course at the Autonomous University of Baja California. It can be observed that students need one to four attempts to solve the programs, the exercise if from scratch and complete if-else required more attempts, however in these examples standard deviation was higher compared to other exercises. Table 2. Students attempts from the Autonomous University of Baja California Exercises If (complete) If If-else (complete) If-else Switch (complete) Switch While (complete) While Do-while (complete) Do-while For (complete) For
Attempts 56 110 161 74 59 86 54 72 56 78 102 76
Average 1.4 2.8 4.025 1.85 1.475 2.15 1.35 1.8 1.4 1.95 2.55 1.9
Median 1 2 2 1 1 1 1 1 1 1 1.5 1
STDEV 0.871 2.8 4.768 1.21 0.905 1.902 0.699 1.49 0.708 1.431 2.791 1.905
268
C. Hurtado et al.
Table 3 shows attempts of seventy-two students of engineering in information and communications technologies who took the subject of mobile devices applications development in Java from the Tijuana Institute of Technology, on average it took them from one to two attempts to solve the problems, being the example if from scratch and switch from scratch what require more tries from students, here the standard deviation is considerably lower than students from the programming in Java course. Table 3. Students attempts from the Tijuana Institute of Technology Exercises If (complete) If If-else (complete) If-else Switch (complete) Switch While (complete) While Do-while (complete) Do-while For (complete) For
Attempts 130 175 155 121 140 174 99 113 120 106 152 122
Average 1.805 2.4 2.152 1.68 1.944 2.417 1.375 1.57 1.666 1.472 2.111 1.694
Median 1 2 1.5 1 1 2 1 1 1 1 2 1
STDEV 1.274 1.7 1.624 1.12 1.814 2.033 0.637 0.99 1.074 0.918 1.683 1.285
5 Discussion In this article, the LMS protoboard is used for teaching Java programming as a welldefined domain for beginners. First, we observed that when students use the tool on average required two attempts to solve the exercises on the platform, so protoboard helps students solve programming exercises in a few attempts. We observe that the tool protoboard helped students to follow good programming practices during the course, fulfilling main objective expected to achieve when using the platform which is, to define the programming domain well, following the authors instructions [14–16] as is rule-based model and constraint-based model, in this way students can complete exercises only if they follow good programming practices. We observed that on average, the students of the Autonomous University of Baja California require the same number of attempts as the students of the Technological Institute of Tijuana, so we can conclude that the tool can be used in different high-level educational institutions. Within the limitations of this study is the sample size of students who used the system which can increase, another aspect to consider is that the system was used only for one unit so we must use it during a whole course, and finally, it is required that students from other careers who are not related to computer science and from other universities evaluate the system to compare their results.
Teaching Computer Programming
269
6 Conclusions After analyzing the results and the number of attempts made, we concluded that some students had problems at the beginning since they did not follow the instructions as expected. It was observed that when students realized they had to follow instructions exactly as mentioned and as they were developing programs, they became familiar with the system and had fewer errors. We notice that some exercises have to improve writing aspects as students said that some statements were hard to understand and, finally, we realized that after using the system, programming other exercises and doing exams, the students followed the good practices recommended by the feedback given by protoboard, improving their way of programming compared to undergraduates from previous semesters who continued to use bad programming practices. On the side of programming professors, the System is handy because, they can make validations and since students do the programs on the platform, teachers obtain statistics of their interaction such as attempts and the time needed to finish the exercises, in this way they can use rules to make recommendations of learning material, add more restrictions to the exercises and customize the course according to them.
7 Future Work One of the principal work to do is developing an interface for teachers, so they can make their learning materials and their programming exercises with their respective validations, restrictions, and rules to follow. We aim improve learning materials to make them more interactive. Concerning programming examples, we should make some changes in the statements because students had complications in this aspect. Additionally, creating more learning objects of these topics are proposed, in case some exercises are hard to complete and need more straightforward examples of the same subject. Another critical point is to make a complete course with programming exercises. Currently, exercises are available only for java programming language courses, so it is expected to develop application development courses for mobile devices on Android since it uses Java, we also plan to create courses for other programming languages like C#, C++, Python, swift among others.
References 1. Bourne, J., Harris, D., Mayadas, F.: Online engineering education: Learning anywhere, anytime. J. Eng. Educ (2005). https://doi.org/10.1002/j.2168-9830.2005.tb00834.x 2. Licea, G., Juárez-Ramírez, R., Gaxiola, C., Aguilar, L., Martínez, L.G.: Teaching objectoriented programming with AEIOU. Comput. Appl. Eng. Educ. (2014). https://doi.org/10. 1002/cae.20556 3. Malik, M.A.: Technical opinion: on the perils of programming. Commun. ACM 43, 95–97 (2000). https://doi.org/10.1145/355112.355130
270
C. Hurtado et al.
4. Gries, D.: Where is programming methodology these days? ACM SIGCSE Bull. 34, 5 (2002). https://doi.org/10.1145/820127.820129 5. Askar, P., Davenport, D.: An investigation of factors related to self-efficacy for Java programming among engineering students. Turk. Online J. Educ. Technol (2009) 6. Cetin, I., Ozden, M.Y.: Development of computer programming attitude scale for university students. Comput. Appl. Eng. Educ. 23, 667–672 (2015). https://doi.org/10.1002/cae.21639 7. Pillay, N., Jugoo, V.R.: An investigation into student characteristics affecting novice programming performance. ACM SIGCSE Bull (2005). https://doi.org/10.1145/1113847. 1113888 8. Lynch, C.F, Ashley, K.D, Aleven, V, Pinkwart, N.: Defining “III-Defined Domains”. A literature survey. In: Proceedings of the 8th International Conference Intelligence Tutoring System Workshop Intelligence Tutoring System III Defined Domains (2006) 9. Woolf, B.P.: Building Intelligent Interactive Tutors Student-centered strategies for revolutionizing e-learning. Morgan Kaufmann, Burlington (2010). https://doi.org/10.1007/ BF02680460 10. Fedeli, P.G.R.L.: Intelligent tutoring systems: a short history and new challenges. In: Intelligent Tutoring Systems: An Overview, p. 13 11. Waalkens, M., Aleven, V., Taatgen, N.: Does supporting multiple student strategies lead to greater learning and motivation? Investigating a source of complexity in the architecture of intelligent tutoring systems. Comput. Educ (2013). https://doi.org/10.1016/j.compedu.2012. 07.016 12. Mitrovic, A.: Modeling domains and students with constraint-based modeling. Stud. Comput. Intell (2010). https://doi.org/10.1007/978-3-642-14363-2_4 13. Corbett, A., Kauffman, L., Maclaren, B., Wagner, A., Jones, E.: A cognitive tutor for genetics problem solving: learning gains and student modeling. J. Educ. Comput. Res (2010). https://doi.org/10.2190/ec.42.2.e 14. Conati, C., Merten, C.: Eye-tracking for user modeling in exploratory learning environments: An empirical evaluation. Knowl. Based Syst (2007). https://doi.org/10.1016/j.knosys. 2007.04.010 15. Mitrovic, A., Martin, B., Suraweera, P., Zakharov, K., Milik, N., Holland, J., McGuigan, N.: ASPIRE : an authoring system and deployment environment for constraint-based tutors. Int. J. Artif. Intell. Educ (2009) 16. Mitrovic, A., Martin, B.: Evaluating the effects of open student models on self-assessment. Int. J. Artif. Intell. Educ (2007). https://doi.org/10.1007/3-540-47952-x 17. Mitrovic, A., Weerasinghe, A.: Revisiting Ill-Definedness and the Consequences for ITSs. In: Frontiers in Artificial Intelligence and Applications (2009). https://doi.org/10.3233/9781-60750-028-5-375 18. De, U., Informáticas, C., Antonio, S.: Los sistemas tutores inteligentes y su impacto en la enseñanza de la programación 19. Lea, N.T., Pinkwartb, N.: Considering Ill-Definedness of problem tasks under the aspect of solution space 20. Hu, C.: Rethinking of Teaching Objects-First. Educ. Inf. Technol (2004). https://doi.org/10. 1023/b:eait.0000042040.90232.88 21. Burton, P.J., Bruhn, R.E.: Teaching programming in the OOP era. ACM SIGCSE Bull. (2003). https://doi.org/10.1145/782941.782993 22. Jacquot, J.P.: Which use for Java in introductory courses? In: Proceedings Inaug. Conf. Princ. Pract. Program, Proceedings Second Work. Intermed. Represent. Eng. Virtual Mach (2002) 23. Tuttle, S.M.: iYO Quiero Java!: teaching Java as a second programming language. J. Comput. Sci. Coll. 17, 34–45 (2001). http://dl.acm.org/citation.cfm?id=775339.775348
Teaching Computer Programming
271
24. Cooper, S., Dann, W., Pausch, R.: Teaching objects-first in introductory computer science. ACM SIGCSE Bull (2003). https://doi.org/10.1145/792548.611966 25. Blumenstein, M.: Strategies for improving a Java-based, first year programming course, In: Proceedings of the - International Conference Computer Education ICCE (2002). https://doi. org/10.1109/cie.2002.1186162 26. Duke, R., Salzman, E., Burmeister, J.: Teaching programming to beginners-choosing the language is just the first step. In: Proceedings of the ACE (2000). https://doi.org/10.1145/ 359369.359381 27. Hadjerrouit, S.: Java as first programming language: a critical evaluation. Learning (1998). https://doi.org/10.1145/292422.292440 28. Clark, D., MacNish, C., Royle, G.F.: Java as a teaching language: opportunities, pitfalls and solutions, In: ACSE 1998 Proceedings of the 3rd Australasian Conference Computer Science Education (1998). http://doi.acm.org/10.1145/289393.289418 29. Graf, S.: Fostering adaptivity in E-learning platforms: a meta-model supporting adaptive courses, pp. 440–443 (2005) 30. Jackson, C.J.: Learning styles and its measurement: an applied neuropsychological model of learning for business and education (2002) 31. Paunovic, V., Jovanovic, S.: Towards advanced data retrieval from learning objects repositories. In: 4th International Conference E‐Learning, Belgrade, pp. 26–27 (2013) 32. García-Valdez, M., Parra, B.: A hybrid recommender system architecture for learning objects. Stud. Comput. Intell (2009). https://doi.org/10.1007/978-3-642-04514-1_11 33. Memletics. Learning styles inventory (2008)
Computer Networks, Mobility and Pervasive Systems
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities Raúl Lozada-Yánez1, Fernando Molina-Granja2(&), Pablo Lozada-Yánez1, and Jonny Guaiña-Yungan1 1
2
Facultad de Informática y Electrónica, Escuela Superior Politécnica de Chimborazo, Panamericana Sur km 1 1/2, Riobamba, Ecuador Facultad de Ingeniería, Universidad Nacional de Chimborazo, Avda. Antonio José de Sucre, Km 1.5 Vía a Guano, Riobamba, Ecuador [email protected]
Abstract. Currently, Machine Learning has become a research trend around the world and its application is being studied in most fields of human work where it is possible to take advantage of its potential. Current computer networks and distributed computing systems are key infrastructures that have allowed the development of efficient computing resources for Machine Learning. The benefits of Machine Learning mean that the data network itself can also use this promising technology. The aim of the study is to provide a comprehensive research guide on networking (networking) assisted by machine learning to help motivate researchers to develop new innovative algorithms, standards, and frameworks. This article focuses on the application of Machine Learning for Networks, a methodology that can stimulate the development of new network applications. The article presents the basic workflow for the application of Machine Learning technology in the field of networks. Then, there is also a selective inspection of recent representative advances with explanations of its benefits and its design principles. These advances are divided into several network design objectives and detailed information on how they perform at each step of the Machine Learning Network workflow is presented. Finally, the new opportunities presented by the application of Machine Learning in the design of networks and collaborative construction of this new interdisciplinary field are pointed out. Keywords: Machine learning Learning model classification Congestion control
Traffic prediction Traffic
1 Introduction Network research is a field that is bringing the attention of researchers around the world, due to the great extent to the prosperous development and growth of the Internet. Because of this development, researchers and network administrators must solve problems differentially for various types of networks (eg wired and wireless networks) and for various types of services and applications (eg end-user services, network security and Live content streaming (Sun et al. 2016)). Owing to the diversity and complexity of data networks and the characteristics of the services and applications © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 275–292, 2020. https://doi.org/10.1007/978-3-030-45691-7_26
276
R. Lozada-Yánez et al.
that are used in them, network algorithms are often designed to address the different network scenarios and infrastructure characteristics of red and specific requests from applications and users. Developing efficient algorithms and systems to solve specific complex problems for different network scenarios is a complex task. The treatment of complex problems is one of the tasks that best adapt to the use of Machine Learning techniques, in some cases these techniques have a very similar or even better performance than that of a human being, examples of what is said are Intelligence Artificial (AI), data mining or facial recognition. Generally, the problems that arise in the field of data networks are complex and require efficient solutions. This means that the incorporation of Automatic Learning techniques in the resolution of problems for this field, as well as for the improvement of management processes and automatic network administration, is presented as a promising solution that will also allow to improve the performance of a network and generate new applications. Studies that have been carried out in this regard report that the trend is the use of traditional machine learning algorithms (usually in charge of classification and prediction). However, the development of new data processing technologies such as the Graphics Processing Unit (GPU), chip developed by NVIDIA in collaboration with Google up to 30 times more powerful than a CPU, the integrated circuit developed by Google for the specific use in machine learning called TPU (Tensor Processing Unit), new frameworks for distributed data processing (eg Hadoop and Spark) and the design of new paradigms and network infrastructures such as Defined Networks by Software (Software Defined Networking - SDN), and the development of new libraries for Machine Learning (eg Scikit-learn, Theano, Keras, Tensorflow, Scrapy, etc.) provide a great opportunity to fully use the entire Power of Machine Learning in the field of data networks. For the specific field of data networks, Machine Learning is suitable for the following reasons: As is known, classification and prediction are important roles for the operation of data networks (eg for intruder detection and for predicting the network performance (Sun et al. 2016)). Machine learning is also expected to help in decision making, which will improve both the programming and administration of the network (Mao et al. 2017) and the adaptation of certain parameters (Winstein and Balakrishnan 2013; Dong et al. nd) depending on the state of the network’s operating environment. Many problems in the field of data networks arise when an interaction between complex systems or with different technologies is required. If you consider system behavior patterns such as load change for Content Delivery Network (CDN) (Jiang et al. Sf) or throughput characteristics, for example, the design is complicated and implementation of precise models to represent the behaviors (which are complex) of the Network System. Machine Learning can provide an estimated model of these systems with very acceptable accuracy. In “State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems - IEEE Journals & Magazine”, nd) the results of a survey on previous efforts that apply deep learning technology are presented in areas related to data networks.
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
277
The results obtained in this study were disseminated as part of the virtual course “Technological frameworks for virtual laboratories in universities”, which, like this research were developed as part of the FIE-ESPOCH research project entitled “Models, Methodologies and Technological frameworks for the implementation and use of Virtual Laboratories”.
2 Workflow for Machine Learning for Data Networks The baseline of the workflow for the application of Machine Learning in the field of data networks (see Fig. 1), is presented as a methodology that includes interrelated stages, these stages are problem formulation, collection and data analysis, model building, model validation, implementation, and inference.
Fig. 1. “Typical” machine learning workflow for data networks.
The proposed workflow is very similar to the traditional machine learning workflow, this is because the problems that arise in the field of data networks require solutions in which machine learning approaches can play a role. An explanation of each step of the Machine Learning workflow in the field of networks with representative cases is presented below. 2.1
Problem Formulation
Since the machine learning training process often takes a long time and represents a high cost, it is important to correctly perform the first step to achieve Machine Learning for data networks, this is an adequate abstraction and formulation of the problem. An objective problem can be classified in one of the categories of Machine Learning, whether it is a problem of classification, grouping or decision-making. This classification of the problem will help to correctly decide both the amount and type of data that should be collected and the Machine Learning model that should be used. The incorrect abstraction of the problem to be solved can provide an inadequate
278
R. Lozada-Yánez et al.
learning model, a fact that will make the results obtained unsatisfactorily. For example, to broadcast with an optimal quality of experience (QoE) in a live broadcast, it is better to address this problem as a real-time exploration-exploitation process rather than a prediction problem (Jiang et al. sf) to match the application’s characteristics. 2.2
Data Collection
Its objective is to collect a large amount of data representative of the data network without bias. Network data (for example, traffic traces, delays, throughput, session logs, performance metrics, etc.) are recorded from different network layers according to the needs of the analyzed application. For example, the traffic classification problem often requires data sets that contain packet-level traces labeled with the corresponding application classes (Zhang et al. 2015). In the context of Machine Learning for data networks, data is often collected in two phases: offline and online. In the offline phase, the collection of high-quality historical data is important for data analysis and training of Machine Learning models. In the online phase, real-time network status and performance information are often used as inputs or feedback signals for the learning model. Newly collected data can also be stored to update the historical data set for model adaptation. 2.3
Analysis of Data
Each network problem has its own characteristics and is affected by many factors, but only several factors (that is, the characteristic itself) have the greatest effect on the performance metric of the target network. For example, the round trip time (RTT, round-trip time or round-trip delay time) and the arrival time between ACK messages can be considered as the critical characteristics for choosing an appropriate (or best) size for the TCP congestion window (Winstein and Balakrishnan 2013). In the Machine Learning paradigm, finding the right characteristics for decision making is the key to fully exploiting the potential of the data collected. This step attempts to extract the effective characteristics of a problem that is detected in the data network by analyzing historical data samples, called the characteristic engineering process by the Machine Learning community. Before extracting these characteristics, it is important to preprocess and clean raw (or raw) data, through processes such as normalization, discretization, and completion of missing values. The extraction of clean data characteristics often requires specific knowledge of the problem domain that is presented in the target data network (Jiang et al. S.F.). For this reason, Deep Learning can be in some cases a good option to help automate feature extractions (Mao et al. 2017; «State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems - IEEE Journals & Magazine», sf). 2.4
Model Construction
The construction of the model implies the selection of the model, as well as its training and adjustment. An appropriate machine learning model or algorithm should be
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
279
selected according to the size of the collected data set, the typical characteristics of a data network scenario, the problem category, etc. For example, accurate performance prediction can improve the adaptation of the bit rate of video streaming over the Internet, and a Hidden-Markov model can be selected for prediction due to dynamic patterns of network throughput (Sun et al. 2016). After this, the historical data collected from the scenario will be used to perform the training of a model with hyper-parameter adjustment, which will take a long period of time to execute the offline phase. The parameter adjustment process still lacks sufficient theoretical guidance, so it implies a search over a large period to find acceptable parameters. 2.5
Model Validation
Offline validation is an indispensable step in the workflow of Machine Learning for Data Networks, such validation allows you to assess whether the learning algorithm works well enough. During this step, cross-validation is generally used to test the overall accuracy of the model and to show whether the model developed insufficient or inadequate. This provides a good guide on how to optimize the model, for example, one solution is to increase the volume of data and reduce the complexity of the model when there is an overfit in it. Analyzing incorrect samples helps to find the reasons for the errors to determine if the model and characteristics are correct or if the data is representative enough for a problem (Jiang et al. Sf; Zhang et al. 2015). The procedures in the previous steps may have to be rerun depending on the sources of error. Implementation and inference When implementing the learning model in an operational data network environment, some practical problems should be considered. Since there are often limitations in computing capabilities or energy resources and response time requirements, compensation between accuracy and overload is important for the performance of the practical network system (Jiang et al. sf). Additionally, it is worth mentioning that Machine Learning often works at a better effort and does not offer any guarantee in its performance, which means that system designers must consider fault tolerance. Finally, practical applications often require the learning system to obtain real-time information and obtain the inference and corresponding output information online.
3 Review of Recent Advances The studies on Machine Learning are carried out worldwide and the new trends that have emerged from this paradigm (eg Deep Learning), have led to several considerable advances that are applicable to different fields of data networks. To illustrate the relationship between these advances and the workflow for Machine Learning for data networks, Table 1 is presented, in which the works of specialized literature are divided into several application scenarios, showing how they perform in each Workflow step of Machine Learning for data networks.
Traffic prediction
It combines data from platforms with a few powerful advantage points in a homogeneous implementation Real and Network traffic Prediction with the synthetic traffic hidden Hidenvolume traces with flow Markov model prediction (Supervised learning) statistics (Chen et al. 2016)
Prediction with the RuleFit technique (Supervised Learning)
Cognitive information
Route measurement (Cunha et al. 2016)
Work scheduling (Mao et al. 2016)
Observe the flow statistics
ake user queries as input (round by round)
Machine learning workflow for data networks Problem formulation Data collection Out of line On line Resource The workload Decision making demand in with pattern (reinforcement variation is used real-time learning) for training
Resource management
Data network application Objectives Specific work
The flow count and traffic volume have a significant correlation
Direct programming of the arrival with the trained model
Optimize the measurement budget in each round to obtain the best consultation coverage Take flow Training of the statistics as input hidden Markov and get traffic model with Bayesian Rules and volume output recurrent neural network with short and long term memory unit (continued)
Action space is Offline training to update network too large (may policy generate conflicts between actions) Construction of the rule adjustment model to assign confidence to the prediction of each route
Analysis of data Model construction Implementation and inference
Table 1. Relationship between the reviewed jobs and the Machine Learning workflow for data networks.
280 R. Lozada-Yánez et al.
Traffic classification (Zhang et al. 2015)
Extrapolation of cloud configurations (Alipourfard and Yu s.F.)
Routing strategies (Mao et al. 2017)
Traffic classifier
Configuration Extrapolation
Network adaptation
Data network application Objectives Specific work
Decision making with “Deep Believe Architecture” (DBA) (Supervised Learning)
Traffic pattern labels based on routes calculated by OSPF
Online traffic patterns on each router
Machine learning workflow for data networks Problem formulation Data collection Out of line On line Traffic traces Statistical Grouping and labeled and characteristics classification of the flow (Supervised learning) unlabeled extracted from and (Unsupervised historical Learning) traffic flows As the input of Parameter search the model, the with Bayesian performance is optimization (Supervised learning) taken under the current configuration
Table 1. (continued)
Find the zero-day application class and train the classifier
Perform tests with different configurations and decide on the following test according to the Bayesian optimization model It is difficult to Use the “Layercharacterize I/ O Wise” training to start and the patterns to propagation reflect the dynamic nature process to adjust the DBA structure of large-scale heterogeneous networks
The zero-day application exists and can degrade classification accuracy Heterogeneous applications and large configuration space
Collect traffic patterns on each router periodically to obtain the following routing nodes from (DBA) (continued)
Inference with the trained model to output the results of the classification
Analysis of data Model construction Implementation and inference
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities 281
Network adaptation
Network adaptation
Network adaptation
Overall improvement of the quality of Experience (QoE) (Jiang et al. S.F.)
TCP traffic congestion control (Winstein and Balakrishnan 2013) TCP traffic congestion control (Dong et al. S.F.)
Data network application Objectives Specific work
Decision making using a UCB algorithm variant
Decision making through a tabular method (reinforcement learning)
Select the most influential metrics as status variables
Calculate the utility function according to the SACK received
The back-end group determines the session groups using CFA (Jiang et al. S.F.) With a large time scale
The generated algorithm interacts with the simulator to turn on the best actions according to the states
(continued)
Algorithm implementation directly in the corresponding network environment Perform tests with different shipping rates and find the best rate according to the feedback function (utility function) The Front-end explores and exploits based on the group and in real-time
Analysis of data Model construction Implementation and inference
The alleged actions for TCP traffic are often violated, measuring direct performance turns out to be a better signal It is possible to Session Session quality group information with quality features detected information on application sessions sharing a large time scale a small time the same scale characteristics
Machine learning workflow for data networks Problem formulation Data collection Out of line On line Collect data and Calculate Decision making experiences from network status through a tabular variables using a network method ACK simulator (reinforcement messages learning)
Table 1. (continued)
282 R. Lozada-Yánez et al.
Extrapolation Settings
Performance prediction
Performance prediction
Machine learning workflow for data networks Problem formulation Data collection Out of line On line The input is The data is Prediction with Throughput obtained from the the data Hidden Markov Prediction measurement of generated by Model (Sun et al. (HMM) (Supervised throughput with the user 2016) session HTTP traffic to Learning) functions give you the online video platform iQIYI As input, the Optimization Grouping (algorithm Datasets are characteristics collected of the quality designed by corresponding to of the session of experience researchers) quality measures are taken: (QoE) in video Unsupervised of Public Content Bitrate, Player, transmissions learning CDN, etc. Broadcasting (Jiang et al. S. Networks F.) (CDNs) Extrapolation Bayesian The model input of optimization is used is the configurations to search for key performance (in the cloud) parameters under the current (Alipourfard (Supervised learning) configuration and Yu s.F.)
Data network application Objectives Specific work
Table 1. (continued)
Heterogeneous applications and large configuration space
Related patterns will be generated from sessions with similar characteristics
Perform tests with different configurations and decide the orientation of the next test according to the Bayesian optimization model
From a set of critical characteristics (found), learn an HMM for each group of similar sessions
A new analyzed session is assigned to the cluster with more similar sessions, HMM is used to predict performance Similar sessions Learning of critical Measure the quality of critical characteristics (a that are features to minute scale is determined by respond to used) and quality critical queries in realstatistics (tens of characteristics time seconds)
Analysis of data Model construction Implementation and inference
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities 283
284
R. Lozada-Yánez et al.
Without Machine Learning techniques, the typical solutions that would solve this type of progress are: time series analysis (Sun et al. 2016; Chen et al. 2016), statistical methods (Sun et al. 2016; Jiang et al. sf; Jiang et al. sf; Zhang et al. 2015) and rulebased heuristic algorithms (Mao et al. 2017; Winstein and Balakrishnan 2013; Dong et al. Sf; Jiang et al. Sf; Poupart et al. 2016), which are often more interpretable and easier to implement. However, Machine Learning based methods have a greater capacity to provide a detailed strategy and can achieve greater prediction accuracy by extracting hidden information from historical data. In order to expand the understanding of the subject of study, the problem of viability is also addressed and discussed in this section. 3.1
Understanding the Information
It is logical to think that since data is the main resource for Machine Learning for data networks, knowledge and a proper understanding and efficient processing of the data collected is critical to capture the characteristics of interest and monitor the data network. The complexity (logical and physical) of the current data networks and the limitations presented by the tools oriented to the measurement of these complex infrastructures, as well as the variants that are presented in each architecture, make it still not easy to obtain certain type of data (eg obtain the route traces or a complete traffic matrix) with an acceptable cost and granularity. Factors such as the probability that the network is in a certain state and/or the reliability of said network can be evaluated with the prediction capabilities of Machine Learning. A key point for this type of implementation is the operational cost that it would entail. For example, it would be helpful to have a model that allows predicting Internet routes, in order to control the operating states of this network and improve its performance. However, it would be impossible to conduct a survey that collects input information for each Internet route. In addition, if this information were obtained, due to the nature of the behavior of the dynamic network protocols used on the Internet, the prediction would be difficult as routes change at all times. In (Cunha et al. 2016) an attempt is made to make a prediction of the “invisible paths” by assigning confidence through the use of the RuleFit supervised machine learning technique. Automatic Learning (like all learning), is based on the collection and analysis of historical data, Automatic Learning in the field of data networks is not a different case and as it has been for each area Specifically, this type of Machine Learning requires new understanding and data cognition schemes. For the correct functioning of the network, Machine Learning for data networks needs to maintain a global and updated state of the network and provide real-time responses to users who require it, to meet this expectation, it is necessary to collect data in the core of the network (central network and higher hierarchy, through which all traffic in the data network transits). In an effort to get the data network to perform self-diagnostics and based on them, it can make decisions for itself with the help of Machine Learning, in (Clark et al. 2003) a network architecture is presented with A new approach, this approach is called “the
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
285
Knowledge Plane” (the knowledge plane), in this work the construction of a different network based on Artificial Intelligence is proposed, being able to assemble itself based on instructions from high level, automatically discover when something is wrong, automatically fix a detected problem, and reassemble when network performance and operating conditions change. 3.2
Traffic Prediction and Classification
The prediction and classification of red traffic were two of the first applications of machine learning in the field of data networks, the efforts made for the execution of tasks assisted with ML are described below. Traffic Prediction. The precise accuracy of the traffic volume has been considered as a critical factor for the performance of a data network, for that reason, it has always been considered as an important research problem in the field of data networks since the anticipated knowledge of the Network traffic behavior would be beneficial for congestion control, resource allocation, network routing in even for QoS in critical applications such as audio and video transmissions or VoIP. It has been shown that trends have been studied mostly in this regard, the analysis of time series and the “tomography” of the network, which differ by the fact that the prediction is made with direct observations or not. It should be noted that it is extremely expensive to perform the direct measurement of the traffic volume of a data network, even more so if it is a high-speed environment and a large scale, a common way of functioning of the current convergent networks. The studies carried out have focused on reducing the cost of measurement through the use of indirect metrics instead of just testing several Machine Learning algorithms. One way to mitigate this problem is the formation of interdisciplinary research groups that make efforts to develop more sophisticated algorithms by exploring the specific knowledge of this area of data networks, also analyzing undiscovered data patterns. Results of the work carried out in (Chen et al. 2016) tries to predict the volume of traffic established the dependence between the data flow contents and the flow volume. A second way to address this problem through Machine Learning is the use of Deep Learning (DL) end to end, an example of this approach is presented in (Poupart et al. 2016), work in which the data used as input are taken from the information that is easily obtained (eg bits of the header of few of the first packets of the data stream) and from which, the learning model automatically extracts features. Traffic Classification. The problem of traffic classification is an essential component that can improve the management and security systems of a data network. Correct classification of the traffic that circulates in a network makes match the applications and protocols that are used with the corresponding traffic flows. Traditionally this problem has been solved with the use of two methods, the approach that uses communication ports and the payload approach. Both approaches present their problems in the application, the communication ports approach is ineffective as there are assignments of reused or non-fixed ports, while the payload approach has privacy problems due to the deep inspection that must be performed on the data packets, this method may even fail if it is encrypted traffic. Due to
286
R. Lozada-Yánez et al.
the above, the use of a machine learning-based approach that analyzes statistical characteristics has been studied in-depth in recent years, especially in the field of data network security. Despite the foregoing, it is difficult to think that Machine Learning (due to the immaturity that it still presents in its development), will become a 100% effective and omnipotent solution to this problem. For example, unlike the “traditional” way in which Machine Learning is applied to identify if an image is that of a car or not, trying to classify the traffic of a network with Machine Learning will generate a great cost in terms of processing capacity with an incorrect classification, putting the security of the network at risk. For now, the applications made to address this problem range from classification solutions with data from known scenarios to attempts to work in real scenarios with unknown traffic, as presented in (Zhang et al. 2015). A pioneering paradigm that is being studied to address Machine Learning problems in the field of data networks is the way in which supervised learning evolved to semi-supervised learning to become unsupervised learning. 3.3
Resource Management and Network Adaptation
Network adaptation and efficient resource management are key factors to improve the performance of any networking infrastructure. Typical problems that should be addressed in this area are routing (Mao et al. 2017), TCP congestion control (Winstein and Balakrishnan 2013; Dong et al. Sf), All these problems can be addressed as a problem of decision making (Mao et al. 2016). Despite the above, it is a great challenge to solve these problems with a rule-based heuristic algorithm due to the complexity of the various system environments, the noise that is presented in the input data and how difficult it is to optimize the performance of the tail (Mao et al. 2016). Deep Learning (DL) and presents as a promising technique for the solution of this type of network problems thanks to its ability to automatically characterize and without any intervention of the human being, the relationships between the inputs and outputs of a data network system. In that sense, the works presented in (Mao et al. 2017; Kato et al. 2017), present the design of a traffic control torque system based on Deep Learning techniques. Reconsidering the routing architectures in the backbone of the network, it takes the traffic pattern on each router as an input, generating the following nodes of the route as an output of the model using a generative graphic model of Artificial Intelligence called deep belief network (in English, Deep Belief Network, DBN). These efforts have demonstrated the potential of Deep Learning techniques for network programming and automatic routing management. Making use of the representation capabilities of deep neural networks, Deep Reinforcement Learning is able to offer excellent results for many Artificial Intelligence problems. In (Mao et al. 2016) one of the first works is presented in which a Deep Reinforcement Learning algorithm is applied to plan resources of a cluster, the demonstrated performance is comparable to that of modern heuristic algorithms but with lower costs Problems related to the Quality of Experience (QoE) can also be addressed from the Deep Reinforcement Learning approach, unlike previous work, in (Jiang et al. Nd) this problem is considered as a problem based in exploration and exploitation instead of considering it as a prediction problem. As a result, Pytheas shows better
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
287
performance compared to prediction-based systems by reducing prediction bias and response delay. Several attempts have been made in order to optimize the TCP traffic congestion algorithm using the Reinforcement Learning approach in the face of the difficulty that arises when trying to design a congestion control algorithm that adapts to all the states of the network. In order to make the algorithm adaptive, in (Winstein and Balakrishnan 2013) the assumptions of the target network and the traffic model are taken as prior knowledge to automatically generate the specific algorithm, which demonstrates a very good gain of performance in many circumstances. In the offline phase, we try to learn an assignment between the state of the network and the parameters corresponding to the congestion window, in English congestion window (cwnd) by interacting with the simulator, in the online phase, for each ACK received, the model searches its mapping table and is able to change its cwnd behavior according to the state of the network. 3.4
Network Performance, Prediction, and Configuration
Predicting performance extrapolation can guide decision making. Example applications for this type of solution are the location selection of the Content Distribution Network (CDN), selection of wireless channels, prediction of Quality of Experience in streaming video. Machine Learning techniques are presented as an excellent option for predicting the states of the network system for proper decision making. In a first typical prediction scenario, the ability to obtain sufficient historical data is possessed, but the construction of a complex prediction model that is updated in realtime is not a common task since an approach is required to exploit the domain specific knowledge, a fact that simplifies the problem. In (Sun et al. 2016) an attempt is made to improve the selection of the video bit rate with prediction accuracy. This work indicates that sessions that have similar key characteristics may have behavior in their throughput more related to data analysis. In the offline phase, the model learns to group similar sessions and then train different hidden Markov models (HMM) to predict the performance of each group based on current session information. This way of working allows the correlation between similar sessions in the training process to be reinforced, this fact makes this approach surpass those that employ a single model. In another prediction scenario, there is little historical data, which makes it impossible to collect representative data by performing performance tests since the test costs in real network systems are high. To solve this great problem, in (Alipourfard and Yu sf) the Bayesian optimization algorithm is used to minimize pre-execution rounds with some directionality to collect data from the execution time of workloads that are representative for different configurations
4 Viability of Implementation Machine Learning techniques face an important challenge, their viability. Well-known is that network applications are sensitive to delay, it is not a simple task to design and implement a system that works with large volumes of data to be processed and even less in real-time. A common solution with which it is desired to mitigate this problem is
288
R. Lozada-Yánez et al.
to carry out the training of the model over a long period of time with global information and gradually update it in a small period of time with local information (Jiang et al. Sf; Jiang et al. Nd), in this way a “negotiation” is achieved between the lack of information and the information processing overhead. When considering the online phase, it is commonly sought to obtain the results table or a graph of the inference with a trained model for real-time decision making. Considering the work analyzed above, it can be said that Machine Learning techniques in their current state of maturity are not suitable solutions for all the problems that arise in a data network. The network problems that have been solved with the Machine Learning techniques are related to prediction, classification and decision making is a complicated task to apply in solving other types of problems that occur in networking environments. Among other reasons that prevent the application of Automatic Learning techniques, we can mention the lack of representative tagged data, the great dynamics of this type of system, the volume of data generated in them and the cost generated by learning errors.
5 Opportunities for Machine Learning for Data Networks The efforts made for the use of Machine Learning in the field of Data Networks have been reviewed, being the main focus of most of the works addressing different challenges and solving problems on security, performance, and networking. The success of a supported solution with Machine Learning depends largely on the availability of representative data combined with models and algorithms developed and implemented properly. In that sense, it is anticipated that future networks will support exponential growth in traffic volume and in the number of interconnected devices in addition to unprecedented online access to Information. The studies reviewed essentially focus on solving network problems that can be solved by applying the concepts of classification and prediction. However, the advances made in the area of Machine Learning mean that this paradigm is shown as a promising solution to new problematic situations in the field of networking. Here are some opportunities for Machine Learning for data networks: 5.1
Open Data Set
The task of collecting sufficient representative data on network profiles and/or performance metrics for the proper functioning of Machine Learning is a critical factor for the application of this type of technique in the field of networks. However, obtaining this data remains a laborious and expensive task, making it difficult for researchers in the area to possess the necessary amount of real representative data, even with the existence of open data repositories on the Internet. This situation demonstrates the need to join efforts for the construction of open and available data sets for the community that conducts research on Machine Learning in the field of Data Networks (such as ImageNet). With open data sets available to the researchers’ communication, the performance tests that can be performed will provide a standard platform that allows comparisons of
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
289
the models, algorithms and/or architectures developed to improve, debug and evolve them towards more advanced solutions and effective, thus reducing the repetition of non-representative experiments, having a positive effect for the development of Machine Learning techniques in the field of data networks. As mentioned in (Winstein and Balakrishnan 2013), the execution of Machine Learning experiments in a simulator is more effective and less expensive than those implemented in a real environment for Reinforcement Learning scenarios. In short, due to the nature of the operating environment of the data networks, the limited availability of sufficient representative data and the high cost of conducting tests on large-scale systems, it is necessary to use simulators with sufficient power and that offer sufficient fidelity, scalability, adaptability and speed of operation. 5.2
Automated Protocols and Network Design
Through analysis and a deep understanding of the data network, researchers in this area have discovered that the network in its current configuration has many limitations. The components of a network will surely be added based on the understanding of the human being in a certain moment instead of being added following an engineering model. There are a considerable number of problems that, when solved, will improve the performance and efficiency of the data network by redesigning the protocols and network architecture. Even today, designing a protocol or network architecture automatically remains a fairly complex task. Despite this, the community of researchers in Machine Learning has made efforts in this direction, obtaining promising results such as allowing agents to communicate with other agents to perform a task cooperatively. Other efforts that have used Antagonistic Generative Networks (GAN), have shown that a machine learning model has the ability to generate existing elements in the real world and create strategies that people would not find or that would require a lot of time to be inferred by the human being, an example of this is AlphaGo (Chen 2016), AI software developed by Google DeepMind to play the board game Go. Despite these advances, the possibility of designing a protocol automatically is far away. The maturity of Machine Learning makes it present a great potential and that there is the possibility of designing new network components without the participation of the human being, this fact could refresh and guide in new directions the understanding of the network systems that the being possesses human and propose some new frameworks currently unacceptable. 5.3
Understanding Network Systems
The behavior of a data network is quite complex, due in part to the principle of network operation (with its end-to-end connection paradigm), a fact that causes several protocols to perform simple actions separately in the system, but that together cause complex behavior within the network. If considered so, even in a mature research domain such as TCP congestion control, determining the factors that directly affect the performance of one or another network metric to try to minimize them during the design of the algorithm is a difficult task. In spite of the above, the Automatic Learning techniques make it possible for the human being to analyze the results obtained after the
290
R. Lozada-Yánez et al.
application of the learning algorithms to find useful information that allows improving the understanding of the behavior of the network and the way to design algorithms of high performance. In that sense, in search of a more detailed explanation, mention is made of DeepRM (Mao et al. 2016), a resource management framework based on Machine Learning, the authors discovered that DeepRM “decides” to reserve space for the small jobs that are to come, a fact that eventually reduces the waiting time. Other good examples are Remy and CFA and their following investigations, which, respectively, provide important information about the key factors that influence TCP congestion control and the optimization of Video Experience Quality (QoE). 5.4
Machine Learning Theory and Techniques for Networks
Given that the barriers presented in terms of data storage and computing capacity that frustrate the application of Machine Learning in data networks every day are minor, one question may be asked: What factors prevent a successful application of these techniques in the field of the networks? It should be mentioned that the lack of a theoretical model is a major obstacle faced by Machine Learning in the field of data networks. This concern was expressed by David Mayer in (IETF - Internet Engineering Task Force, s. F.), Video-talk about Machine Learning and Networks. It is logical to think that, without the existence of a unified theory, each network must be learned separately, which could hamper the process of adopting Machine Learning for data networks. The Machine Learning techniques used in the field of data networks were designed considering other applications, this fact could be solved through a line of research in the field of development of ML algorithms for networks (Mestres et al. 2017). Another key issue is the lack of experience, in this sense, it is enough to mention that Machine Learning and data networks are two very different fields, so, currently the number of people who are experts in both domains is very reduced; This fact points to the need for more studies and collaborations that involve the Automated Learning communities with data network experts.
6 Conclusions In recent years, Machine Learning has been applied with great success to solve several problems in the field of data networks. This article aims to present an overview of some of the efforts made for the application of some Machine Learning techniques to support network management and operation, with a focus on traffic engineering, performance optimization, and network security. Representative works have been reviewed, the feasibility of solutions based on Machine Learning to face several challenges related to the operation and administration of current and future data networks has been explored and analyzed. In this article, a basic workflow has been presented to provide researchers with a practical guide to explore new machine learning paradigms for application in future network research.
Machine Learning and Data Networks: Perspectives, Feasibility, and Opportunities
291
It is not yet easy for network researchers to apply Machine Learning techniques in the field of data networks due to the lack of experiences in real environments and the lack of a clear methodology that provides the necessary guidelines for the proper execution of this task. Due to the heterogeneity of network systems, the adoption of Machine Learning techniques in the field of data networks is essential to obtain new advances in this interdisciplinary approach. Many issues are still open, an attempt has been made to provide guidance on the lines of research (Opportunities for Machine Learning for Data Networks) that require a greater effort by interdisciplinary groups of researchers, both from the point of view of Automatic learning as from data networks. The researchers express their gratitude to Washington Luna PhD., Director of Research Project “Models, methodologies and technological frameworks for the implementation and use of virtual laboratories” of the FIE-ESPOCH for his contribution to the realization of this study.
References Alipourfard, O., Yu, M.: CherryPick: adaptively unearthing the best cloud configurations for big data analytics, 15 (2017) Chen, J.X.: The evolution of computing: AlphaGo. Comput. Sci. Eng. 18(4), 4–7 (2016). https:// doi.org/10.1109/MCSE.2016.74 Chen, Z., Wen, J., Geng, Y.: Predicting future traffic using Hidden Markov Models. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), pp. 1–6 (2016). https:// doi.org/10.1109/ICNP.2016.7785328 Clark, D.D., Partridge, C., Ramming, J.C., Wroclawski, J.T.: A knowledge plane for the internet. In: Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 3–10 (2003). https://doi.org/10.1145/863955. 863957 Cunha, Í., Marchetta, P., Calder, M., Chiu, Y.-C., Schlinker, B., Machado, B.V.A., Katz-Bassett, E.: Sibyl: a practical internet route Oracle. In: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, pp. 325–344. Recuperado de (2016). http:// dl.acm.org/citation.cfm?id=2930611.2930633 Dong, M., Li, Q., Zarchy, D., Godfrey, P.B., Schapira, M.: PCC: Re-architecting congestion control for consistent high performance, p. 15 (2015) IETF - Internet Engineering Task Force: IETF97-NMLRG-20161117-1330. Recuperado de (2016). https://www.youtube.com/watch?v=XORRw6Sqi9Y Jiang, J., Sekar, V., Milner, H., Shepherd, D., Stoica, I., Zhang, H.: CFA: a practical prediction system for video QoE optimization, 15 (2016) Jiang, J., Sun, S., Sekar, V., Zhang, H.: Pytheas: enabling data-driven quality of experience optimization using group-based exploration-exploitation, 15 (2017) Kato, N., Fadlullah, Z.M., Mao, B., Tang, F., Akashi, O., Inoue, T., Mizutani, K.: The deep learning vision for heterogeneous network traffic control: proposal, challenges, and future perspective. IEEE Wirel. Commun. 24(3), 146–153 (2017). https://doi.org/10.1109/MWC. 2016.1600317WC
292
R. Lozada-Yánez et al.
Mao, B., Fadlullah, Z.M., Tang, F., Kato, N., Akashi, O., Inoue, T., Mizutani, K.: Routing or computing? The paradigm shift towards intelligent computer network packet transmission based on deep learning. IEEE Trans. Comput. 66(11), 1946–1960 (2017). https://doi.org/10. 1109/TC.2017.2709742 Mao, B., Fadlullah, Z.M., Tang, F., Kato, N., Akashi, O., Inoue, T., Mizutani, K.: State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Commun. Surv. Tutorials 19(4), 2432–2455 (2017) Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning, pp. 50–56 (2016). https://doi.org/10.1145/3005745.3005750 Mestres, A., Rodriguez-Natal, A., Carner, J., Barlet-Ros, P., Alarcón, E., Solé, M., Cabellos, A.: Knowledge-defined networking. SIGCOMM Comput. Commun. Rev. 47(3), 2–10 (2017). https://doi.org/10.1145/3138808.3138810 Poupart, P., Chen, Z., Jaini, P., Fung, F., Susanto, H., Geng, Y., Jin, H.: Online flow size prediction for improved network routing. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), pp. 1–6 (2016). https://doi.org/10.1109/ICNP.2016.7785324 Sun, Y., Yin, X., Jiang, J., Sekar, V., Lin, F., Wang, N., Sinopoli, B.: CS2P: improving video bitrate selection and adaptation with data-driven throughput prediction. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp. 272–285. ACM (2016) Winstein, K., Balakrishnan, H.: TCP ex machina: computer-generated congestion control. In: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, pp. 123–134 (2013). https://doi.org/10.1145/2486001.2486020 Zhang, J., Chen, X., Xiang, Y., Zhou, W., Wu, J.: Robust network traffic classification. IEEE/ACM Trans. Netw. 23(4), 1257–1270 (2015). https://doi.org/10.1109/TNET.2014. 2320577
Underground Channel Model for Visible Light Wireless Communication Based on Neural Networks Simona Riurean1(&) , Olimpiu Stoicuta1 , Monica Leba1 Andreea Ionica1 , and Álvaro Rocha2
,
1
University of Petrosani, 332006 Petrosani, Romania {simonariurean,olimpiustoicuta,monicaleba, andreeaionica}@upet.ro 2 University of Coimbra, Coimbra, Portugal [email protected]
Abstract. An accurate channel model is both essential and challenging when designing a reliable wireless Visible Light Communication (VLC) system. Modeling the channel in a restricted and harsh environment such as an underground mine, is even more difficult. The objective in our work is to investigate a suitable design of a VLC link for an underground environment, while providing a reliable connectivity between the optical transmitter and optical receiver. In this paper, we present an experimental study that aims to determine a general model of an underground mine optical channel. Considering that the underground optical channel has a nonlinear dynamic behavior described by a nonlinear autoregressive exogenous (NARX) mathematical model, the system’s order is established with a feedforward neural network (NN), with a parallel architecture. In this work, in order to define a proper Underground Mine Optical Channel Model (UMOCM) for reliable VLC, we present an innovative approach based on a suitable mathematical model and neural networks (NNs). Keywords: Underground Mine Optical Channel Model Impulse response LED Signal detection Intrinsic and apparent properties of light
1 Introduction The current wired communication systems (twisted pair, coaxial and optical fiber) in underground mines have high cost and a complex structure, susceptible to be damaged. The well-known wireless communication systems and applications based on radio frequency (RF) (Wi-Fi, ultra-wideband, Bluetooth, BLE, ZigBee, Z-Wave, iBeacon or Eddystone) used on the surface cannot be directly applied underground because of, on one hand the high attenuation of radio waves and, on the other hand the strict work security rules in underground mine spaces [1]. The toxic gases and substances resulting from mining and production, as well as the presence of dust, outcome in different hazards encountered by workers underground, therefore reliable communications, monitoring and tracking systems are required to guarantee safety and also maximize productivity in underground mining [2]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 293–305, 2020. https://doi.org/10.1007/978-3-030-45691-7_27
294
S. Riurean et al.
The most important actions to be monitored underground are related to detection in real/useful time of hazardous gases and/or smoke, mining machinery well function, and, most important, monitoring miners and their location underground before/after a possible disaster occurrence. Reliable wireless communications dedicated to monitoring and positioning systems that assure safety and maximize productivity in underground mining can be realized based on Optical Wireless Communication (OWC) technologies and application (VLC, Infrared IR, Light Fidelity-LiFi, Optical Camera Communication – OCC) [3, 4]. Hybrid systems consisting of both RF and OWC can also be considered. According to previous researches, the OWC technologies and applications proved to outperforms RF technologies applied in underground environments since optical communication supports high rates, has an unlicensed spectrum, low-cost and does not involve electromagnetic interferences [5]. VLC systems is possible to use the existing off-the-shelf light-emitting diodes (LEDs), fixed on the ceiling of the underground mining spaces and/or mounted on miner’s helmet, in order to establish any of the possible VLC link: miner to miner (M2M), miner to infrastructures (M2I) or and infrastructure to miner (I2M) [6]. Different UMOCMs have already been presented in some works [7–10] but none of them with this original approach based on the mathematical model and NNs used by us.
2 VLC Channel Model In order to determine a general model of the VLC optical channel, all the key characteristics of the following have to be accurately considered: – the optical transmitter (oTx) assembly (both electrical and optical setup) and its position in the VLC topology (Line of Sight - LoS, Non LoS etc.); – the optical receiver (oRx) assembly (both electrical and optical setup) and its position in the VLC topology; – the underground setup with all the surrounding elements (ceiling, sides and floor of the gallery), obstacles underground with their shape, position, materials and colors in order to determine their reflective characteristics; – the wireless communication optical channel itself (filled with tiny particles of dust, rock and/or coal); – additive noises (other light sources, thermal noise, shot noise etc.). The wireless communication optical channel is the propagation environment through which the optical signal passes from oTx to oRx. Due to multiple reflections from various objects, light emitted by LED travels along different paths of varying length till it reaches the active area of the photodetector (PD). The multiple interaction between photons and environment, as well as long distance between oTx and oRx, cause multipath fading at a specific location and the strength of the optical power (signal) decreases, as the distance between the oTx and oRx increases. When a longer transmission distance is considered, the difficulty of optical wireless communication at high data rates without the use of optical setups (lenses in front of oTx and optical concentrators and filters in front of oRx) becomes obvious. The optical wireless
Underground Channel Model for Visible Light Wireless Communication
295
channel is linear, memoryless, time invariant (exception being situations when light beam obstruction and shadowing occurs), with an impulse response of a finite duration [11].
3 Channel Modeling in an Underground Mine Environment 3.1
Considerations on the Light’ Properties in Underground Mine Environment
Modeling the channel in underground spaces, with the aim to determine a proper wireless communication in visible light, is a challenging task since there are many particularities of both the environment and the light’s intrinsic and apparent properties. Both Intrinsic Optical Properties (IOPs) and Apparent Optical Properties (AOPs) of light underground have to be considered for an accurate channel model. Light’s IOPs depend exclusively by the medium. The medium underground is the open air filled with particles of dust of rock and/or coal (polluted medium). AOPs dependent both of the medium and the environment studied. The environment refers to the ceiling, sides and floor as well as surrounding elements (objects/obstacles) with their particularities: geometry, type of materials (rough/smooth, specular/diffuse) and color within the space where the VLC setup is considered. The IOPs are conservative properties and hence the magnitude of the absorption coefficient linearly varies with the concentration of the absorbing material. Theoretically, the absorption coefficient can be expressed as the sum of the absorption coefficients of each component in the polluted air [12]. The propagation of the optical signal in a coal mine environment undergoes many physical phenomena such as absorption/refraction, diffraction and scattering due AOP of light in polluted medium with suspended particles. Hence, modeling the channel, with high accuracy, in this environment is an important and complicated issue to be accomplished. Since the Underground Mine Environment (UME) has specific characteristics of surfaces (non-reflective/rough objects inside as well as sides, ceiling and floor with prevalent black and dark grey colors), that poorly reflect light, the diffused and specular reflections are neglected as well as the optical signals’ time dispersion. Depending on the suspended particles’ dimensions and shape as well as the light’s wavelength, variable scattering/absorption of light is also expected. Due to a short distance between oTx and oRx, the multipath dispersion is not taken into consideration, therefore the only channel link considered is a Line of Sight (LoS) one, being modelled as a linear attenuation.
296
3.2
S. Riurean et al.
VLC-UMCM Identification Using Neuronal Networks
In order to identify the communication channel, we assume that the channel has a nonlinear dynamic behavior described by a nonlinear autoregressive exogenous (NARX) model [13], according to: yðtÞ ¼ F ðyðt 1Þ; . . .; yðt nb Þ; uðt 1Þ; . . .; uðt na ÞÞ þ eðtÞ
ð1Þ
where F is a nonlinear function with respect to the variables, na is the order of the input variable (u), nb is the order of the output variable (y), and e is the white noise [13]. Under these circumstances, the channel identification relies on system’s order identification (finding the values of na and nb) and select F using a set of input-output data acquired during experimental work. In order to approximate the function F, we use a feedforward NN, having a parallel architecture. A Multi-Layer Perceptron (MLP) with two layers (a Hidden Layer - HL consisting of N2 neurons and a singleneuron Output Layer - OL) are used. In order to mathematically model the optical communication channel, several activation functions are tested. Activation functions tested for the HL [14] are: • • • • •
the symmetric sigmoid function (tansig); sigmoid function (logsig); symmetric sigmoid function Elliot (elliotsig); radial basic function (radbas); the basic triangular function (tribas). On the other hand, the activation functions tested for the OL are:
• • • • •
the symmetric sigmoid function (tansig); symmetric sigmoid function Elliot (elliotsig); the linear function (purelin); positive linear function (poslin); saturation-positive linear function (satlin). In this case, the communication channel model can be represented as [13]:
N2 X ^ ðtÞ; ^h ¼ f2 ^yðtÞ ¼ g u ðw2k gk þ b2Þ
! ð2Þ
k¼1
^ ðtÞ ¼ ½^yðt 1Þ. . . ^yðt nb Þ uðt 1Þ. . .uðt na ÞT is the regression where u vector formed by the past values of the input (u) and the past values of the estimated output (^y), ^h are the estimated parameters of the model, N1 = na + nb is the length of ^ , gk are the basic functions, w2k are the synaptic ponders (connections’ ponder) vector u between the HL and the output and b2 is the activation threshold of the neuron in the OL. The basic functions gk are:
Underground Channel Model for Visible Light Wireless Communication N1 X
gk ¼ f 1
297
! ^ i þ b1k ; k ¼ 1; N2 w1ki u
ð3Þ
i¼1
where b1k is the threshold of activation of the neuron k in the HL (polar-isolation terms * biases) and w1ki are the synaptic weights related to the connections between inputs and neurons in the HL. On the other hand, relation (3), can be written as follows: gk ¼ f 1
na X
w1aki
uai
nb X
þ
i¼1
! w1bki
^ bi u
þ b1k ; k ¼ 1; N2
ð4Þ
i¼1
^ b ¼ ½^yðt 1Þ. . . ^yðt nb ÞT . where ua ¼ ½uðt 1Þ. . .uðt na ÞT ; u Relation (4) as array is: ^ b þ B1 G ¼ F1 W1a ua þ W1b u 2 where G ¼ ½g1
ð5Þ
3T
g2 . . .gN2 T ; F1 ¼ 4f1
f1 |{z} . . . f1 5 ; B1 ¼ ½b11
b12 . . .b1N2 T and
N2
W1a ,
W1b
are the following arrays: 2
wa11 6 wa21 6 W1a ¼ 6 . 4 .. waN2 1
... ...
wa12 wa22 .. . waN2 2
...
2 3 wb11 wa1na a 6 w2na 7 6 wb21 7 b 6 . ; W ¼ 7 .. 1 6 . . 5 4 . waN2 na wbN2 1
wb12 wb22 .. . wbN2 2
3 wb1nb wb2nb 7 7 .. 7 7: . 5 . . . wbN2 nb ... ...
According to the above, the channel model can be expressed as: na nb N2 X X X ^ ; ^h ¼ f2 ^ bi þ b1k ^yðtÞ ¼ g u w2k f1 w1aki uai þ w1bki u i¼1
k¼1
!
!! þ b2
i¼1
ð6Þ Expression (6) as array, is: ^ ; ^h ¼ f2 ðW2 G þ b2Þ ¼ f2 W2 F1 W1a ua þ W1b u ^ b þ B1 þ b2 ð7Þ ^yðtÞ ¼ g u where W2 ¼ ½w21
w22 . . .w2N2 . In these conditions, the model’s parameters are: h¼
W1a
W1b
W2
B1
b2
ð8Þ
298
S. Riurean et al.
where: W1a is an array of N2 na dimension, W1b is an array of N2 nb dimension, W2 is a vector of 1 N2 dimension, B1 is a column vector of size N2 1, and b2 is a real constant. The parameters of the NARX model are determined so that the following performance index is minimum: "
N 1 X J ðhÞ ¼ min ðyðtÞ ^yðtÞÞ2 N t¼1
# ð9Þ
where N is the number of available data. Based on the performance index and the Levenberg Marquardt numerical optimization method, the parameters of the mathematical model are determined according to the following iterative relationship [13]: ^hi þ 1 ¼ ^hi g R1 r^ Ji i i
ð10Þ
where ^hi is the vector of the estimated parameters after i iteration, r^ Ji is an estimation of the gradient of the function J, Ri is a matrix that changes the search direction and gi is the search step. In the case of the Levenberg Marquardt method, the Hessian matrix that changes the search direction is approximated based on the following relation [13]: Ri ffi JaiT Jai þ l I
ð11Þ
where l is a positive coefficient (also called thecombination coefficient), I is the unit @ ^ ^ ; hi . matrix and Jai is the Jacobian matrix Jai ¼ @ ^h g u i
In order to determine the matrix, the off-line method is used, based on the entire input-output data set of the transmission channel.
4 The Experimental Setup and Results With the aim to identify the proper mathematical model of the underground mine optical communication channel, the prototype inside the gallery model presented in the Fig. 1 has been developed and used. The entire oTx mobile module (PCB, LED type VLHW4100 and optics) has been placed inside of a main gallery model that has the same shape as the underground mine original galleries studied and the size of scale 1:6.25.
Underground Channel Model for Visible Light Wireless Communication
299
Fig. 1. The VLC setup with entire oTx and PD of oRx inside the gallery model (left) and the oRx (right up) with the oscilloscope’s display (right down)
The PDs with optics of the oRx are placed inside the gallery and the PCB is placed outside in order to acquire the received data amplified. Regarding the oTx side, data are acquired from the LED terminals and the received data are acquired at the output of the PCB with a suitable Trans Impedance Amplifier (TIA). Three different types of PDs have been under test (PD1- BPW34, PD2 - VTB8440BH, IR filtered and PD3 - a solar panel). With the experiment conducted, we aim to acquire data such as to develop a consistent model, which can be used to overcome the underground mine communication channel challenges. Using the oscilloscope UPO2104CS, the acquisition of the input-output data was done within a time interval T = [0, tf]. The input-output number of data for the sampling period of Te = 1 10−7 [s], is N = 27996. The stimulus applied to the input of the system that aims to be therefore mathematically modeled is a random signal, having the main statistical characteristics, presented in Fig. 2. Following the input stimulus, the response of the communication channel is presented in the Fig. 3. The statistical indicators presented in the Figs. 2 and 3 were obtained in the EViews10 program [15]. In order to determine the parameters, the Matlab program was used, namely the Neural Network Training toolbox (nntraintool) [14]. The 70% of the acquired input-output data were used to determine the parameters, the remaining 30% being used for validation (15%) and for testing (15%). In order to determine the parameters, the Levenberg Marquardt optimization method (trainlm) is used, with a performance index based on the normalized mean square of the error between the actual and estimated response of the transmission channel output (mse). Following several attempts that aimed to obtain a total determination coefficient (R), as close as possible to 1, the order of the system was chosen as na = nb = 3 for a NN MLP type with 2 layers (the HL N2 = 5 neurons and the OL has only 1 neuron).
300
S. Riurean et al.
Fig. 2. The main characteristics of (u)
Fig. 3. The main characteristics of (y)
Tests were performed for various activation functions, both for the HL and for the OL. Following the comparative analysis of the total determination coefficient (R), from the tests, we obtain that the activation function for the HL is of the sigmoid type (logsig), while the activation function for the OL is of the symmetric sigmoid type (tansig). The obtained results for sigmoid symmetrical function (tansig), activation for the OL are presented in the Table 1. Table 1. Results for sigmoid symmetrical function (tansig), activation for the OL . tansig logsig elliotsig radbas tribas
R (Training) 0.97387 0.97318 0.97362 0.97317 0.97222
R (Validation) 0.97237 0.97526 0.97388 0.97405 0.9731
R (Test) 0.97237 0.97277 0.97198 0.97339 0.97458
R (All) 0.97342 0.97344 0.97342 0.97334 0.97271
Epoch 27 94 9 30 13
MSE 0.0044665 0.0041716 0.0042386 0.0042976 0.0044578
The coefficients of determination calculated after learning, validating and testing the NN (which has the functions of activation logsig and tansig), are shown in Fig. 4.
Underground Channel Model for Visible Light Wireless Communication
301
Fig. 4. The regression lines and the determination coefficients related to NN
The parameter determination algorithm stops after six consecutive increases of the validation error, and the best performance is obtained in epoch 94, which has the lowest validation error, approximately 0.004, as can bee seen in Fig. 5.
Fig. 5. The evolution of the validation error according to the epochs
302
S. Riurean et al.
The parameters of the NN that models the transmission channel are defined by the following real values: 2
3 wa11 wa12 wa13 6 wa21 wa22 wa23 7 6 a 7 a wa32 wa33 7 W1 ¼ 6 6 w31 7 4 wa wa wa 5 41 42 43 a a a 2 w51 w52 w53 0:605573132085195 6 0:628508549839143 6 ¼6 6 2:144591611009496 4 2:131783375310913 0:540145126407498 3 2 b w11 wb12 wb13 6 wb wb wb 7 22 23 7 6 21 b b b b 7 W1 ¼ 6 6 w31 w32 w33 7 b b 4w w42 wb43 5 41 b b b 2 w51 w52 w53 1:574459777384680 6 2:046922266632343 6 ¼6 6 0:762296937054357 4 0:788445526240942 1:513928774902438
1:043076571416076 2:014505496788215 2:132153709371059 1:883740694499078 0:763819901542837
3 1:108755955225077 3:323570077934050 7 7 2:055727621996015 7 7 1:829471468920521 5 0:549603092399247
0:446587548466814 0:524196973244415 0:898599041277833 3:242166253912392 0:221324884053300
3 1:952992283280367 0:520143977801533 7 7 0:903709949628582 7 7 2:075865428672114 5 0:584217648507261
3 3 2 3T 2 2:076942875723168 b11 w21 6 b12 7 6 8:487857178619208 7 6 w22 7 7 7 6 7 6 6 7 7 6 7 6 ; W B1 ¼ 6 ¼ 0:879271142148328 b1 ¼ 7 2 6 w23 7 6 37 6 4 b14 5 4 2:021358863982187 5 4 w24 5 3:628867731019660 b1 w25 3T 2 5 1:251866063919914 6 5:372248805558441 7 7 6 7 ¼6 6 1:637567928695456 7 4 0:232687841350326 5 3:543973020585332 2
b2 ¼ 2:1669713960000671626 The response of the NN to the stimulus signal shown in Fig. 2, as well as the error between the experimental response (shown in Fig. 3) and the response obtained by simulating the NN in Matlab-Simulink, are presented in the Fig. 6.
Underground Channel Model for Visible Light Wireless Communication
303
Fig. 6. The evolution in time of the exit (Red - the output acquired experimentally. Blue simulated output)
According to the graphs presented above, it is observed that the error varies between ± 0.2[V] and their histogram is shown in Fig. 7.
Fig. 7. Histogram of errors
According to Fig. 7, in most of the situations (learning, validation - testing), the error is 0.007 and 0.01.
5 Conclusions In this article, the VLC communication channel for underground mine is mathematically modeled through a 2-layer MLP NN. The first (hidden) layer has 5 neurons, and the OL has only 1 neuron. A comparative analysis regarding the total determination coefficient (R), drove to the conclusion that the activation functions recommended to be used within the MLP NN are: the sigmoid activation function (logsig) for the HL, respectively the symmetrical sigmoid activation function (tansig) for the OL.
304
S. Riurean et al.
Following tests, it was observed that the performance of the mathematical model of the communication channel, based on an MLP NN, is influenced by the choice of the activation function in the OL. It has been observed that the activation functions in the OL of symmetrical type (purelin, tansig and elliotsig), cause the MSE error to be approximately 0.004, regardless the activation functions used in the HL. The mathematical model of the communication channel based on an MLP NN has been experimentally validated, the modeling errors being between ± 0.2[V]. Our work aims to test the current prototype in the underground optical channel conditions in order to gain consistent underground VLC wireless links of type M2M, M2I and/or I2M for a reliable monitoring and tracking systems to improve both workers’ security and company’s productivity in underground mining. This is an important base for a future work that aims to improve the key characteristics of the prototype developed, taking into consideration both AOPs and IOPs of light close to the working environment where polluted air (filled with tiny particles of dust, rock and/or coal) occurs.
References 1. Ren, P., Qian, J.: A power-efficient clustering protocol for coal mine face monitoring with wireless sensor networks under channel fading conditions. Sensors 16(6), 1–21 (2016) 2. Wang, J., Al-Kinani, A., Sun, J., Zhang, W., Wang, C.: A path loss channel model for visible light communications in underground mines, In: IEEE/CIC International Conference on Communications in China, ICCC, Qingdao, pp. 1–5 (2017) 3. Riurean, S., Olar, M., Ionică, A., Pellegrini, L.: Visible light communication and augmented reality for underground positioning system. In: 9th International Symposium, SESAM (2019) 4. Marcu, A.E., Dobre, R.A., Vlădescu, M.: Flicker free visible light communication using low frame rate camera. In: International Symposium on Fundamentals of Electrical Engineering (ISFEE), Bucharest, Romania, pp. 1–4 (2018) 5. Leba, M., Riurean, S., Ionica, A.: Li-Fi - the path to a new way of communication. In: Conferência Ibérica de Sistemas e Tecnologias de Informação, CISTI 2017 12ª. IEEE Xplore Digital Library (2017) 6. Riurean, S., Leba, M., Ionica, A.: Underground positioning and monitoring system based on visible light wireless communication technology, Romanian Patent Number: RO133189-A0. https://osim.ro/wp-content/uploads/Publicatii-OSIM/BOPI-Inventii/2019/bopi_inv_03_ 2019.pdf. Accessed 11 Nov 2016 7. Al-Kinani, A., Wang, C.X., Haas, H., Yang, Y.: A geometry-based multiple bounce model for visible light communication channels, In: Proceedings IEEE, IWCMC 2016, Cyprus, pp. 31–37 (2016) 8. Wang, J., Al-Kinani, A., Zhang, W., Wang, C.: A new VLC channel model for underground mining environments. In: 13th International Wireless Communications and Mobile Computing Conference, IWCMC, Valencia, pp. 2134–2139 (2017) 9. Yesilkaya, A., Karatalay, O., Ogrenci, A.S., Panayirci, E.: Channel estimation for visible light communications using neural networks. In: 2016 International Joint Conference on Neural Networks, IJCNN, Vancouver, BC, pp. 320–325 (2016)
Underground Channel Model for Visible Light Wireless Communication
305
10. Wu, G., Zhang, J.: Demonstration of a visible light communication system for underground mining applications. In: International Conference on Information Engineering and Communications Technology (2016) 11. Dimitrov, S., Haas, H.: Principles of LED Light Communications Towards Networked Li-Fi. Cambridge University Press, Cambridge (2015) 12. Riurean, S., Leba, M., Ionica, A., Stoicuta, O., Buioca, C.: Visible light wireless data communication in industrial environments, In: IOP Conference Series: Materials and Science Engineering, vol. 572 (2019) 13. Dumitrache, I., Constantin, N., Dragoicea, M.: Neural Networks System Identification and Control. Matrix ROM Press, Bucuresti (1999) 14. Hudson, B.M., Hagan, M.T., Demuth, H.B.: Deep Learning ToolboxTM User’s Guide. MathWorks, Natick (2019) 15. Andrei, T., Bourbonnais, R.: Econometrics. Economica Press, Bucuresti (2008)
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP Toshihiko Kato(&), Shiho Haruyama, Ryo Yamamoto, and Satoshi Ohzahata University of Electro-Communications, Chofu, Tokyo 182-8585, Japan {kato,hshiho0824,ryo_yamamoto, ohzahata}@net.lab.uec.ac.jp
Abstract. In Multiptah TCP, the congestion control is realized by individual subflows (conventional TCP connections). However, it is required to avoid increasing congestion window too fast resulting from subflows’ increasing their own congestion windows independently. So, a coupled increase scheme of congestion windows, called Linked Increase Adaptation, is adopted as a standard congestion control algorithm for subflows comprising a MPTCP connection. But this algorithm supposes that TCP connections use AIMD based congestion control, and if high speed algorithms such as CUBIC TCP are used, the throughput of MPTCP connections might be decreased. This paper proposes a new high speed MPTCP congestion control scheme, mpCUBIC, based on CUBIC TCP. Keywords: MPTCP CUBIC TCP
Congestion control Linked increase adaptation
1 Introduction Recent mobile terminals are equipped with multiple interfaces. For example, most smart phones have interfaces for 4G Long Term Evolution (LTE) and Wireless LAN (WLAN). However, the conventional Transmission Control Protocol (TCP) establishes a connection between a single IP address at either end, and so it cannot handle multiple interfaces at the same time. In order to utilize the multiple interface configuration, Multipath TCP (MPTCP) [1], which is an extension of TCP, has been introduced in several operating systems, such as Linux, Apple OS/iOS [2] and Android [3]. TCP applications are provided with multiple byte streams through different interfaces by use of MPTCP as if they were working over conventional TCP. MPTCP is defined in three Request for Comments (RFC) documents standardized by the Internet Engineering Task Force. RFC 6182 [4] outlines architecture guidelines. RFC 6824 [5] presents the details of extensions to support multipath operation, including the maintenance of an MPTCP connection and subflows (TCP connections associated with an MPTCP connection), and the data transfer over an MPTCP connection. RFC 6356 [6] presents a congestion control algorithm that couples the congestion control algorithms running on different subflows. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 306–317, 2020. https://doi.org/10.1007/978-3-030-45691-7_28
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
307
One significant point on the MPTCP congestion control is that, even in MPTCP, individual subflows perform their own control. RFC 6356 requires that an MPTCP data stream do not occupy too large resources compared with other (single) TCP data streams sharing a congested link. For this purpose, it defines an algorithm called Linked Increase Adaptation (LIA), which couples and suppresses the congestion window size of individual subflows. Besides, more aggressive algorithms, such as Opportunistic Linked-Increases Algorithm (OLIA) [7] and Balanced Linked Adaptation (BALIA) [8], are proposed. However, all of those algorithms are based on the Additive Increase and Multiplicative Decrease (AIMD) scheme like TCP Reno [9]. That is, the increase of congestion window at receiving a new ACK segment is in the order of 1/(congestion window size). On the other hand, current operating systems uses high speed congestion control algorithms, such as CUBIC TCP [10] and Compound TCP [11]. These algorithms increase the congestion window more aggressively than TCP Reno. So, it is possible that the throughput of LIA and other MPTCP congestion control algorithms is suppressed when it coexists with them. We presented the results of performance evaluation in our previous paper [12, 13]. We evaluated the performance when a MPTCP connection with LIA, OLIA or BALIA and a single path TCP with TCP Reno/CUBIC TCP share a bottleneck link. The results showed that the throughput of MPTCP connection is significantly lower than that of CUBIC TCP, and in most cases lower than even TCP Reno. Based on these results, we propose in this paper a new MPTCP congestion control algorithm which is comparable with CUBIC TCP for single path TCPs. We call this algorithm mpCUBIC (multipath CUBIC), which requires a similar amount of resources as a single path CUBIC TCP, in an MPTCP connection level. This paper proposes the details of mpCUBIC, describes how to implement it over the Linux operating system, and shows the results of performance evaluation. The rest of this paper is organized as follows. Section 2 explains the overview of MPTCP congestion control algorithm. Section 3 describes the procedure of mpCUBIC experimental setting for performance evaluation. Section 4 shows the implementation of mpCUBIC by modifying the CUBIC source program in the Linux operating system. Section 5 shows the results of performance evaluation in an experimental network. In the end, Sect. 6 concludes this paper.
2 Overview of MPTCP Congestion Control The MPTCP module is located on top of TCP. MPTCP is designed so that the conventional applications do not need to care about the existence of MPTCP. MPTCP establishes an MPTCP connection associated with two or more regular TCP connections called subflows. The management and data transfer over an MPTCP connection is done by newly introduced TCP options for the MPTCP operations. An MPTCP implementation will take one input data stream from an application, and split it into one or more subflows, with sufficient control information to allow it to be reassembled and delivered to the receiver side application reliably and in order. An MPTCP connection maintains the data sequence number independent of the subflow level sequence numbers. The data sequence number and data ACK number used
308
T. Kato et al.
in the MPTCP level are contained in a Data Sequence Signal (DSS) option independently from the TCP sequence number and ACK number in a TCP header. Although the data sequencing, the receipt acknowledgement, and the flow control are performed in the MPTCP level, the congestion control, specifically the congestion window management is performed only in the subflow level. That is, an MPTCP connection does not have its congestion window size. Under this condition, if subflows perform their congestion control independently, the throughput of MPTCP connection will be larger than single TCP connections sharing a bottleneck link. RFC 6356 decides that such a method is unfair for conventional TCP. RFC 6356 introduces the following three requirements for the congestion control for MPTCP connection. • Goal 1 (Improve throughput): An MPTCP flow should perform at least as well as a single TCP flow would on the best of the paths available to it. • Goal 2 (Do no harm): All MPTCP subflows on one link should not take more capacity than a single TCP flow would get on this link. • Goal 3 (Balance congestion): An MPTCP connection should use individual subflow dependent on the congestion on the path. In order to satisfy these three goals, RC6356 proposes an algorithm that couples the additive increase function of the subflows, and uses unmodified decreasing behavior in case of a packet loss. This algorithm is called LIA and summarized in the following way. Let cwndi and cwnd total be the congestion window size on subflow i, and the sum of the congestion window sizes of all subflows in an MPTCP connection, respectively. Here, they are maintained in packets. Let rtti be the Round-Trip Time (RTT) on subflow i. For each ACK received on subflow i, cwndi is increased by min
a 1 ; : cwnd total cwndi
ð1Þ
The first argument of min function is designed to satisfy Goal 2 requirement. Here, a is defined by max k a ¼ cwnd total P
cwndk rttk2
cwndk k rttk
2 :
ð2Þ
By substituting (2) to (1), we obtain the following equation. 0
max B k min@P
cwndk rttk2
cwndk k rttk
2 ;
1 1 C A cwndi
ð3Þ
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
309
As we mentioned in our previous paper, Eq. (2) can be derived by the assumption that the increase and decrease of congestion window sizes are balanced and that the virtual single TCP connections corresponding the subflows have the same throughput as the MPTCP connection [12]. It should be noted that we assume the AIMD scheme in Eqs. (1) and (2). More specifically, we assume that the increase is 1/(congestion window size) for each ACK segment and the decrease parameter is 1/2, which is the specification of TCP Reno. That is, LIA supposes that MPTCP subflows and coexisting single path TCP flows follow TCP Reno. In the case that the high speed congestion control is adopted, the increase per ACK segment will become larger and the decrease parameter will be small.
3 Proposal of mpCUBIC 3.1
Overview of CUBIC TCP
CUBIC TCP determines the value of congestion window size by the following cubic function of the time since the latest congestion detection (last packet loss). Wcubic ðtÞ ¼ C ðt K Þ3 þ Wmax
ð4Þ
Here, C is the CUBIC parameter, t is the elapsed time from the last congestion window size reduction, and Wmax is the congestion window size where the latest loss event occurred. K is the time period that the function (4) takes to increase Wcubic to Wmax, and is given by using the following equation. rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 Wmax b K¼ C
ð5Þ
where b is the window decrease constant at a fast retransmit event. As a result, the congestion window changes as shown in Fig. 1 [14]. In this figure, part 1 corresponds to a slow start phase. Part 2 uses only the concave profile of the cubic function from a packet loss until the window size becomes Wmax. Part 3 uses both the concave and convex profiles, where the convex profile contributes a rapid window increase after the plateau around Wmax. Figure 2 shows a balanced situation in CUBIC TCP. The horizontal axis in this figure is time and the vertical axis is a congestion window size. When the window reaches Wmax, it will reduced due to a packet loss. Then it increases according to the concave profile of the cubic function and it reaches Wmax again, when the window drops again. The duration between window drops is K. Since the window goes to Wmax ð1 bÞ at its reduction, the following equation is obtained by substituting t = 0 in (4). Wmax ð1 bÞ ¼ CðK Þ3 þ Wmax Wmax b ¼ CK 3 This derives (5).
310
3.2
T. Kato et al.
Design of mpCUBIC
In this paper, we show the design of mpCUBIC in the situation where two subflows are used in an MPTCP connection. When two subflows, both of which use CUBIC TCP, go through one bottleneck link and there are some packet losses due to the congestion, a balanced behavior of congestion window for two subflows is given by Fig. 3. A red line and a black line are a time variation of two CUBIC TCP flows. The feature of the behavior is summarized as follows.
Fig. 1. Cwnd behavior in CUBIC TCP [14].
Fig. 3. Balanced cwnd behavior for two CUBIC TCP flows.
Fig. 2. Balanced cwnd behavior in CUBIC TCP.
Fig. 4. Congestion window for two mpCUBIC subflows.
• Both of the congestion windows experience packet losses at the same value, Wmax. • This behavior is repeated in the same cycle, K • The cyclic behaviors of two subflows are shifted by K/2, that is a half of cycle in the original CUBIC TCP. As a result, the total congestion window size for an MPTCP connection is given by a blue line in the figure. That is, the total window size also follows a cubic function and the cycle is a half of one subflow. Therefore, we can deduce the following two points in order to make the total congestion window size for an MPTCP connection with two subflows comparable with single path CUBIC TCP connection. • The period between two packet losses will be 2K instead of K. • The maximum window size just before a packet loss needs to be set to Wmax in the total congestion window size for an MPTCP connection.
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
311
So, we conclude that the congestion window size of one subflow in mpCUBIC is given by the following equation. 3 1 t C0 K þ Wmax : W ðt Þ ¼ ð6Þ D 2 Here, C’ and D are values defined in mpCUBIC, K and Wmax are the values used in CUBIC TCP, and t is a time period from the latest packet loss. We focus on one cycle of congestion window size increase for one mpCUBIC subflow together with another subflow, and obtains the graph shown in Fig. 4. The total widow size for one MPTCP connection (Wtotal(t)) is given in the following equations. 3 t 3 1 tþK C0 K þ C0 K þ 2Wmax Wtotal ðtÞ ¼ D 2 2 3 3 1 0 t 0 tK C K þC K þ 2Wmax Wtotal ðtÞ ¼ D 2 2
! for 0\t\K
ð7Þ
for K\t\2K
ð8Þ
!
For these two equations, we apply the following requirements. • At t = 0, Wtotal is equal to ð1 bÞ times of the value just before the packet loss. That is, Wtotal ð0Þð1 bÞ ¼ Wtotal ð þ 0Þ:
ð9Þ
• At t = 2K, Wtotal is equal to the maximum value of the conventional CUBIC TCP, Wmax. That is, Wtotal ð2K 0Þ ¼ Wmax :
ð10Þ
From (9), we obtain ! 3 3 1 1 3 0 0 C K þ 2Wmax ð1 bÞ ¼ C ðK Þ þ C K þ 2Wmax : 2 2 0
By substituting (5), C’ is obtained by the following equation. C0 ¼
2 C 1 þ 18 b
From (10), we obtain ! 3 1 1 0 C K þ 2Wmax ¼ Wmax : D 2
ð11Þ
312
T. Kato et al.
By substituting (5), D is obtained by the following equation. b D¼2 4 1 þ 18 b
ð12Þ
Equations (6), (11) and (12) produce the following equation for the congestion window size of an mpCUBIC sublow. W ðt Þ ¼
1 24
b
ð1 þ 18bÞ
t 3 2 K þ Wmax C 1 þ 18 b 2
! ð13Þ
Here, as described above, C is the CUBIC parameter, t is the elapsed time from the last congestion window size reduction, Wmax is the congestion window size where the latest loss event occurred, and K is a time period specified by Eq. (5).
4 Implementation 4.1
CUBIC TCP Software in Linux Operating System
In the Linux operating system, the TCP congestion control algorithms are implemented in a form of kernel module. The TCP Reno, NewReno specifically, is implemented in the file tcp_cong.c, which is included as the kernel software itself. The other algorithms are implemented in a tcp_[name of algorithm].c file, for example, tcp_cubic.c for CUBIC TCP. These files are compiled independently of the kernel itself and prepared as kernel modules. The congestion control algorithm that is used in the system is settled by the sysctl command setting net.ipv4.tcp_congestio_control parameter. A congestion control program module has the common structure as follows [15]. • Each congestion control program defines the predefined functions, such as how to handle the first ACK segment in a TCP connection (init()), how to process a new ACK segment (pkts_acked()), and congestion avoidance processing (cong_avoid()). • Each program registers the pointers of the predefined functions described above in the specific data structure defined as struct tcp_congestion_opt. In this structure, the name of congestion control algorithm, such as cubic, is specified. This name is used in the sysctl command. • Each program calls the tcp_register_congestion_control function when it is registered as a kernel module. In the tcp_cubic.c source file, function bictcp_cong_avoid() is registered in the tcp_congestion_opt data structure as the congestion avoidance function, which calls bictcp_cong_avoid() that calculates the congestion window size according Eq. (4).
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
4.2
313
Implementation of mpCUBIC
In order to implement mpCUBIC, we modified the tcp_cubic.c file as follows. • In the tcp_congestion_opt data structure, we defined a new name, “mp_cubic” for our new congestion control algorithm. • In function bictcp_cong_avoid(), the calculation of ðt K Þ3 is replaced by ðt=2 K Þ3 , C is replaced by 2C, and the result of congestion windows size is multiplied by 1/D. Here, we used 0.2 as the value of b.
5 Performance Evaluation 5.1
Network Configuration
Figure 5 shows the network configuration used in our experiment. An MPTCP data sender and a TCP (single path TCP) sender are connected to a 100 Mbps Ethernet hub. MPTCP sender has two Ethernet interfaces. A data receiver for MPTCP and TCP is connected to the hub through a bridge, which limits a data transmission rate to 25 Mbps. All nodes are running the Linux operating system, whose distribution is Ubuntu 16.04. Both MPTCP sender and receiver use an MPTCP software whose version is 0.94 stable, which is the newest version we can obtain. The IP addresses assigned to the network interfaces are shown in Fig. 5. It should be noted that the LAN used in the experiment has subnet address 192.168.0.0/24, and the second Ethernet interface in the MPTCP sender another subnet 192.168.1.0/24. In the MPTCP sender side, the routing table needs to be specified for individual interfaces by using ip command. In the receiver side, a route entry to subnet 192.168.1.0/24 needs to be specified explicitly. In the experiment, one MPTCP connection between the MPTCP sender and the receiver and one TCP connection between the TCP sender and the receiver are evaluated. In order to emulate a wide area Internet communication, 100 ms delay is inserted at the receiver. The bridge limits the data transmission rate from the senders to the receiver into 25 Mbps. Here, an actual congestion will occur. As for the congestion control algorithm, the MPTCP sender uses mpCUBIC, CUBIC, or LIA, and the TCP sender uses CUBIC TCP. The single path TCP uses CUBIC TCP in all cases.
Fig. 5. Network configuration for performance evaluation.
314
T. Kato et al.
In actual measurement runs, iperf [16] is used for data transfer whose communication duration is 120 s in one measurement run. During the measurement, we obtained communication logs by Wireshark [17], and obtained TCP internal parameter values such as the congestion window size, by tcpprobe [18]. 5.2
Evaluation Results
Table 1 gives the results of measured throughput. In the case of mpCUBIC, the MPTCP connection and single path TCP (SPTCP) gave almost the same throughput in a 120 s iperf data transfer. On the other hand, the case of CUBIC, that is, all of two subflows and one SPTCP flow use CUBIC, provided higher throughput for MPTCP than SPTCP. In the case that MPTCP uses LIA, the throughput of MPTCP is largely small compared with CUBIC SPTCP. Figures 6, 7 and 8 show the time variation of sequence number and congestion window size for the cases that MPTCP uses mpCUBIC, CUBIC, and LIA, respectively. The time variation of sequence number corresponds to the time variation of the number of transferred bytes in two subflows and one SPTCP flow. The congestion window size is measured at each ACK reception by use of tcpporbe. In mpCUBIC, the time variation of sequence number in two subflows are quite similar and the value is around half of the sequence number in SPTCP. The time variation of congestion window size in two subflows are similar with each other, and is around half of CUBIC SPTCP. These behaviors are exactly what we expected. Table 1. Throughput results (Mbps). Scheme mpCUBIC CUBIC LIA MPTCP 11.7 14.1 3.78 SPTCP 12.0 9.61 20.0
(a) sequence number vs. time
(b) cwnd vs. time
Fig. 6. Time variation of sequence number and congestion window size in mpCUBIC.
In the case that two MPTCP subflows follow CUBIC TCP, the increases of sequence number in two subflows and one SPTCP are comparable. As a result, the throughput of MPTCP, which is the sum of two subflows’ transferred bytes, is larger
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
315
than that of one SPTCP. Specifically, the sequence number in SPTCP is larger than those in two subflows. This is because the congestion window size in SPTCP in 30 s at the beginning is larger than MPTCP subflows. The reason is that, in SPTCP, the congestion window size becomes large in the slow start phase just after the connection establishment. During 30 s and 120 s, however, the congestion window sizes in two subflows and one SPTCP are comparable. This is the result that two subflows provide the congestion control independently, and this control provides too favors to MPTCP.
(a) sequence number vs. time
(b) cwnd vs. time
Fig. 7. Time variation of sequence number and congestion window size when MPTCP uses CUBIC TCP.
(a) sequence number vs. time
(b) cwnd vs. time
Fig. 8. Time variation of sequence number and congestion window size when MPTCP uses LIA.
In the case that MPTCP uses LIA, the increase of sequence number and congestion window size is very small for MPTCP, and CUBIC based SPTCP occupies the bandwidth in the bottleneck link eagerly.
6 Conclusions This paper proposed a new congestion control algorithm for MPTCP, mpCUBIC, which provides similar performance with CUBIC TCP used by single path TCP connections. The conventional congestion control algorithms for MPTCP, LIA, OLIA, and BALIA, are based on the AIMD scheme, and so they are weaker than high speed
316
T. Kato et al.
TCP such as CUBIC TCP. Therefore, we modified the CUBIC TCP algorithm so as to fit the multipath environment. For the case that two subflows are used, the cycle of packet losses is doubled and the maximum window for an MPTCP connection is equal to that for a single path TCP. We implemented mpCUBIC over the Linux operating system, and evaluated the performance over an inhouse testbed network. The performance evaluation results showed that mpCUBIC provides a similar throughput with one CUBIC TCP connection when they share the same bottleneck link. On the other hand, a CUBIC based MPTCP connection requires more resources than a single path CUBIC connection. An LIA MPTCP connection provides poor throughput compared with a single path CUBIC connection.
References 1. Paasch, C., Bonaventure, O.: Multipath TCP. Commun. ACM 57(4), 51–57 (2014) 2. AppleInsider Staff: Apple found to be using advanced Multipath TCP networking in iOS 7. https://appleinsider.com/articles/13/09/20/apple-found-to-be-using-advanced-multipath-tcpnetworking-in-ios-7. Accessed 5 Sep 2019 3. icteam: MultiPath TCP – Linux Kernel implementation, Users:: Android. https://multipathtcp.org/pmwiki.php/Users/Android. Accessed 5 Sep 2019 4. Ford, A., Raiciu, C., Handley, M., Barre, S., Iyengar, J.: Architectural Guidelines for Multipath TCP Development. IETF RFC 6182 (2011) 5. Ford, A., Raiciu, C., Handley, M., Bonaventure, O.: TCP Extensions for Multipath Operation with Multiple Addresses. IETF RFC 6824 (2013) 6. Raiciu, C., Handley, M., Wischik, D.: Coupled Congestion Control for Multipath Transport Protocols. IETF RFC 6356 (2011) 7. Khalili, R., Gast, N., Popovic, M., Boudec, J.: MPTCP is not pareto-optimal: performance issues and a possible solution. IEEE/ACM Trans. Netw. 21(5), 1651–1665 (2013) 8. Peng, Q., Valid, A., Hwang, J., Low, S.: Multipath TCP: analysis, design and implementation. IEEE/ACM Trans. Netw. 24(1), 596–609 (2016) 9. Floyd, S., Henderson, T., Gurtov, A.: The NewReno Modification to TCP’s Fast Recovery Algorithm. IETF RFC 3728 (2004) 10. Ha, S., Rhee, I., Xu, L.: CUBIC: a new tcp-friendly high-speed TCP variant. ACM SIGOPS Oper. Syst. Rev. 42(5), 64–74 (2008) 11. Tan, K., Song, J., Zhang, Q., Sridharan, M.: A compound TCP approach for high-speed and long distance networks. In: IEEE INFOCOM 2006, pp. 1–12. IEEE, Barcelona (2006) 12. Kato, T., Diwakar, A., Yamamoto, R., Ohzahata, S., Suzuki, N.: Performance evaluation of MaltiPath TCP congestion control. In: 18th International Conference on Networks, ICN 2019, pp. 19–24. IARIA, Valencia (2019) 13. Kato, T., Diwakar, A., Yamamoto, R., Ohzahata, S., Suzuki, N.: Experimental analysis of MPTCP congestion control algorithms; LIA, OLIA and BALIA. In: 8th International Conference on Theory and Practice in Modern Computing (TPMC 2019), pp. 135–142. IADIS, Porto (2019) 14. Afanasyev, A., et al.: Host-to-host congestion control for TCP. IEEE Commun. Surv. Tutor. 12(3), 304–342 (2010) 15. Arianfar, S.: TCP’s congestion control implementation in Linux kernel. https://wiki.aalto.fi/ download/attachments/69901948/TCP-CongestionControlFinal.pdf. Accessed 5 Nov 2019
mpCUBIC: A CUBIC-like Congestion Control Algorithm for Multipath TCP
317
16. iperf. http://iperf.sourceforge.net/. Accessed 5 Nov 2019 17. Wireshark. https://www.wireshark.org/. Accessed 5 Nov 2019 18. Linux foundation: tcpprobe. http://www.linuxfoundation.org/collaborate/workgroups/ networking/tcpprobe. Accessed 5 Nov 2019
Context-Aware Mobile Applications in Fog Infrastructure: A Literature Review Celestino Barros1(&) , Vítor Rocio2 and Hugo Paredes4
4
, André Sousa3,
1 Universidade de Cabo Verde, Praia, Cabo Verde [email protected] 2 Universidade Aberta and INESC TEC, Lisbon, Portugal [email protected] 3 Critical TechWorks, Porto, Portugal [email protected] INESC TEC and Universidade de Trás-os-Montes e Alto Douro, Porto, Portugal [email protected]
Abstract. Today’s cloud computing techniques are becoming unsustainable for real time applications as very low latency is required with billions of connected devices. New paradigms are arising; the one that offers an integrated solution for extending cloud resources to the edge of the network and addresses current cloud issues is Fog Computing. Performing Fog Computing brings a set of challenges such as: provisioning edge nodes to perform task volumes downloaded from the Cloud; placing task volumes on edge nodes; resource management on edge nodes; need for a new programming model; programming, resource management, data consistency service discovery challenges; privacy and security and improving the quality of service (QoS) and user experience (QoE). This paper aims at introducing the Fog Computing concept and it presents a literature review on the way it is applied: context-sensitive applications and context-sensitive mobile service platforms. The result of the study is presented as the current research challenges for context aware mobile applications in Fog Computing infrastructure. Keywords: Context Context aware platforms Fog computing
Mobile applications Mobile service
1 Introduction Nowadays, in some cases/applications smart mobile devices are an alternative to personal computers, enabling a full set of features in a single device, including running applications, communication, entertainment and games. Moreover, a key feature of smart mobile devices is the ability to identify and share different types of context information at user, device, application and network level. The intrinsic problems of these devices, such as reduced processing capacity, resource scarcity, reduced battery autonomy, low connectivity, among others, force analysts and developers to create © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 318–328, 2020. https://doi.org/10.1007/978-3-030-45691-7_29
Context-Aware Mobile Applications in Fog Infrastructure
319
services that extend the capabilities of applications running on these devices through use of cloud-hosted services [1]. However, despite its advantages, in some situations, the use of the Cloud architecture is not beneficial, since it is centralized and consequently the processing is performed in concentrated data center, to optimize energy costs, communication, among others. In environments where devices are e together, it is not efficient to send data to distant processing centres and wait for commands from a remote centre for individual actuator devices. In this scenario, delays and overloads make solutions that require low latency and real-time availability, among others, impractical. Different techniques that minimize cloud execution through local processing in peripheral elements have been proposed allowing to solve limitations such as: low latency; mobility and location. One of these techniques is the use of Fog architecture, an extension of the Cloud [2] that aims at reducing the response time and energy consumption, among other limitations. Access to applications on Fog nodes varies based on the contexts: execution time, device and application including device battery level, network signal strength and QoS requirements. This paper aims at presenting the concept of Fog Computing and a literature review on context aware application for the Fog Computing infrastructure, including the discussion of the importance of using the Fog architecture in running context-sensitive mobile applications. Users’ context is very dynamic in mobile computing. When using applications in this environment, the behaviour of an application must be customized to the users’ current situation. To promote effective use of context, several authors provide definitions and categorizations of context and context-sensitive computing [3]. The concept of context recognition and important aspects of context-aware mobile computing, context-sensitive mobile service platforms are discussed. This study presents a selected set of state-of-the-art context-sensitive mobile service platforms. The discussion includes the analysis of some mobile computing research challenges in Fog Computing: edge node provisioning issues to perform off-cloud task volumes; placing task volumes on edge nodes; resource management on edge nodes stand out; quality of service (QoS); and user experience (QoE). Also, this research reveals the need for a new programming model with challenges concerning resource management, data consistency service discovery, privacy and security. The paper is organized as follows: Sect. 2 introduces the background of the Fog Computing. Section 3 presents the literature review which includes the concepts of context and context aware services and applications; fog computing and context aware services and applications; and platforms for context aware services and applications. The discussion is presented on Sect. 4 along with the research challenges of mobile services and applications development in the Fog Computing paradigm. The last part of the paper contains some final remarks-on Sect. 5.
2 Background Mobile computing offers users various utilities, enables portability and supports applications of multiple interests. They also face a number of internal limitations such as: shortage of resources; reduced battery autonomy; low connectivity and processing capacity.
320
C. Barros et al.
In recent years, research aimed at solving some of these limitations. As a result, sensorintensive applications and massive processing requirements are increasingly used in implementations of such applications. Unlike a local server or personal computer, Cloud computing, can be defined as the use of a set of remote servers hosted on the Internet to store, manage and process data. The concept of cloud computing and data offloading has been used to address some inherent limitations of mobile computing by allowing the use of resources different from mobile devices themselves to host mobile application execution [1]. This infrastructure that allows hosting, storing, processing and running applications off the mobile device is called a mobile cloud. By exploiting mobile cloud computing and storage capabilities, intensive applications can run on low-resource mobile devices. Fog Computing is a new paradigm that addresses cloud limitations by providing services at the network edges [5]. Vaquero and Rodero-Merino [6] highlight some features including geographic distribution, predominance of wireless access, heterogeneity, distributed environment. Fog Computing has grown as an integrated solution to extend Cloud resources to the edge of the network and enable you to respond to the inconveniences of the classic centralized model [2]. Fog Computing allows running applications in a layered manner between devices and the cloud. Elements such as smart gateways, routers and dedicated Fog devices provide computing and storage to extend Cloud services at the network edge. 2.1
Context Aware
User expectations and reaction anticipation of a system are highly dependent on the situation and the environment. This knowledge allows intelligent behavior and predicting an action may bring advantages. Context provides this knowledge. Thus, some important aspects of the context are: location; neighborhood; nearby resources. According to Bonomi et al. [2], context can be categorized in terms of change in execution environment, where the environment can be considered as: computing environment (available processors, devices accessible for user input and visualization, network capacity, connectivity and computing costs); user environment (location, gathering of people nearby and social situation); physical environment (lighting and noise level). Context includes the whole situation relevant to an application and its users. In regards to how to use context, the solution is to use context-aware applications and systems. A system is context aware if it uses context to provide relevant user information and services, where relevance depends on the user’s task [3]. For example, when a context-sensitive system can detect that a user never answers phone calls while driving, it automatically proposes transferring the incoming calls to the user’s voicemail. Three basic features must be implemented by any context-aware application [3]: • presentation of information and services: refers to functions that present contextual information to the user or use context to propose appropriate selections of actions to the user; • automatic service execution: describes the functions that trigger a system command or reconfiguration on behalf of the user according to context changes; • storage (and recovery) of context information: applications highlight the captured data with relevant information in context.
Context-Aware Mobile Applications in Fog Infrastructure
321
3 Literature Review Bazire and Brézillon [7] examine 150 context definitions in the different research areas and conclude that the creation of a single definition is an arduous and probably impossible effort since it varies with the scientific field and depends mainly on the field of application. Moreover, the authors conclude that the context is a set of constraints that influence the behaviour in relation to a particular task. Abowd et al. [3] proposed the most commonly used definition of context in all fields: “Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between user and an application, including the user and applications themselves”. Zimmermann et al. [8], from the definition provided by Abowd et al. [3], deconstructed context in five elements: (1) Individuality (entity properties and attributes); (2) Activity (all tasks the entity is involved in); (3) Location (area or place where the entity is or intends to go); (4) Time (relative the intended action); and (5) Relations (information that the entity may establish with other entities). According to Henricksen [9], despite the generic definition proposed by Abowd et al. [3], it has some gaps, especially at the boundary between some concepts, especially between context, contextual model and contextual information. The author characterizes contextual information as a mechanism that enables users to perform actions in an automated and flexible manner. In addition to that, the context is considered in relation to the actions between the user and the applications. Thus, he proposes the interpretation for the following concepts: context; contextual model and contextual information. Context is considered at the action level and thus states that the context of an action is the set of circumstances that surround it and are of potential relevance to conclude it. At the contextual model level, the author defines it as a concrete subset of the context, which is realistically capable of being obtained from sensors, applications, users and which can be exploited to perform an action. Henricksen [9] also defines the contextual attribute as a dataset obtained from the sensors, which belongs to the contextual model. Thus, the contextual attribute provides a sample, from a given point in time, of the subset of the context. Contextual characterization helps to understand, use and interpret different types of contexts in an efficient manner. In Fog based research works user and application level context have been discussed for resource and service provisioning. User context such as user characteristics, service usage history, service relinquish probability can be used for allocating resources for that user in the future [20]. For instance, user’s service feedback such as Net Promoter Score (NPS) and user requirements [21] can also be used for service and resource provisioning purposes [22]. On the other works, user’s density [23], mobility [24] and network status [25] have also been considered for service provisioning. Application context can be considered as operational requirements of different applications. Operational requirements includes task processing requirements (CPU speed, storage, memory) [26–28], networking requirements [29, 30], and it can affect resource and service provisioning. There are other (works/projects) that the current task load of different applications [31, 32] have also been considered as application context.
322
C. Barros et al.
Moreover, contextual information in Fog computing can be discussed in terms of execution environment, nodal characteristics, application architecture, and by using the other contexts, they can play vital roles in provisioning resource and services. 3.1
Fog Computing and Context Aware Services and Applications
In the field of mobile computing, context refers to the computing environment, user environment, physical environment, representative of the interaction between a user and an application, including the user and the applications themselves [4]. Fog Computing, on the other hand, is a virtualized platform that enables processing, communication and storage services between devices typically located at the network end and in the cloud [2]. Fog computing technology provides computation, management services, networking and storage between the end-users and the cloud data centers [17]. This supports protocols for resources to perform computing, mobility, communication, distributed analysis of data and integration of cloud in order to provide less latency. It consists of a set of nodes [11]. According Marin-Tordera et al. [18], Fog node is the device where the Fog Computing is deployed. Currently, due to the multiplication of IoT devices, large amounts of raw data are produced. The Fog architecture enables these large amounts of data to be processed at Fog nodes that are installed at the edge of the network, near the place where they are created. Providing faster responses, more security and better user experience [19]. Mobile devices communicating with the cloud suffer from high network latency, high power consumption, congestion and delays [4]. Fog-based mobile computing addresses some of these limitations by sending tasks to Fog nodes at the edge of the network for processing and returning the result. This reduces transmission delay and power consumption of the mobile device. Requests, which due to Fog node restrictions are unable to be scheduled, are sent to run on Cloud [2]. 3.2
Platforms for Context Aware Services and Applications
Today, the vast majority of mobile phones can provide applications with a set of context data, including time and other sensor data such as GPS, accelerometer, light sensor, microphone, thermometer, clock, compass. The availability of this data can be grouped in contexts and made available through platforms for context aware services and applications. The simplest example of these platforms is a context widget, capable of acquiring a certain type of context information and making them available to applications in general, regardless of how they are actually detected. Other examples of platforms for enabling context-sensitive mobile services in the Cloud follow the one presented by La and Kim [10]. Its structure allows: context capture; determine which context specific adaptation is required; adjust candidate services to the context; and perform the tailored service. A platform that enables network edge integration into the computing ecosystem through the development of a provisioning mechanism that facilitates communication between edge nodes and the Cloud is proposed by Varghese et al. [11]. Its architecture
Context-Aware Mobile Applications in Fog Infrastructure
323
enables edge node provisioning for cloud offloaded task volumes and incorporates a dynamic, low overhead scaling mechanism to add or remove resources to efficiently handle task volumes on the edge node. A conceptualization of the generalization of context awareness and treatment of heterogeneity in mobile computing environments is presented by Schmohl and Baumgarten [12]. The authors describe a general architecture that solves the heterogeneity problems present in the context awareness and interoperability domains. A context-gathering framework that introduces sensor data formats, set of interfaces and messaging protocols that allows context gathering of sensor resources is provided by Devaraju, Hoh and Hartley [13].
4 Research Challenges of Mobile Services and Applications Development in the Fog Computing Paradigm Fog Computing is emerging as an integrated solution for extending Cloud resources to the edge of the network and allowing for the inconvenience of the classic centralized model. A platform to facilitate its full use is not yet available [14]. This platform will have to meet some requirements, such as real-time order processing and task volume deployment requirements of different applications on different edge nodes. According to Wang et al. [14] for application deployment at the network edge, must be taken account: • the placement strategies - where to put volume of tasks; • connection policies - when using edge nodes; • heterogeneity - how to deal with different types of nodes and the security and privacy issues that are crosscutting. Therefore, to fully Fog Computing paradigm usage some research challenges must be addressed. Provisioning Edge Nodes to Perform Unloaded Cloud Task Volumes There are currently many solutions for running task volumes in the cloud. By contrast, for Fog nodes do not exist. This makes it difficult to provision an edge node as standard policies and protocols for starting services on a potential edge node and for communicating between the Cloud and fog nodes are not fully known and developed. Placing Task Volumes on Fog Nodes. In this regard, two important questions should be considered: (i) how to place a task volume on an Fog node? In the Cloud, virtual machines are typically used to perform task volume. However, the use of virtual machines is less likely to be suitable on Fog nodes due to limited availability of hardware resources [11]. (ii) How large is the task volume that can be placed on an endpoint? Due to the availability of large amounts of resources in the Cloud, task volumes are easily performed in the Cloud. Resource constraints and the varying availability of compute at network endpoints makes it challenging to decide the appropriate task volumes to perform on each endpoint.
324
C. Barros et al.
Need for a New Programming Model Fog is a new computing paradigm that needs a new programming model. Developing intuitive and effective tools can help developers to orchestrate dynamic, hierarchical and heterogeneous resources to build portable applications. According to Hao et al. [16], If we take task scheduling and migration processes as an example, some research challenges may arise: How can we provide a simple abstraction for developers to pinpoint tasks that can be migrated? What options and preferences should be left to the users? How to allow developers to specify migration rules on disparate and varied devices? How to enable developers to reuse features that are likely to be common, such as distributed caching, task volume balancing, system monitoring and more. Quality of Service (QoS) and Quality of Experience (QoE) The quality delivered by the edge nodes can be captured by QoS and quality delivered to the user by QoE [14]. One of the principles to be taken into account and that influence both the QoS and QoE is not overloading the edge with volumes of computational tasks. A correlation between network-level QoS and users’ perceived QoS is presented by Shaikh et al. [15]. It provides a comprehensive analysis of user behaviour change at different levels of service performance through objective and subjective assessments. The challenge is to ensure that nodes achieve high availability and productivity and are reliable in delivering intended task volumes, accommodating additional task volumes from a data centre or edge devices. Regardless of whether an edge node is exploited, an edge device or data centre user expects a minimum level of service [11]. Having a management platform will be desirable, but it raises issues related to monitoring, scheduling and rescheduling at infrastructure, platform and application levels in terms of network edge nodes. Programming Challenges Fog presents some dazzling programming challenges that lead us to question how tasks move between different physical devices, that is, between client devices, Fog nodes and Cloud servers. Task scheduling and rescheduling will be very complex [16]. Some research questions are raised: Is it acceptable for us at Fog with heterogeneous hardware to switch power to reduced latency? Should a process running on a Fog node be interrupted when the user switches to another node? How to scale tasks taking into account existing latency, power consumption, mobility and task volume? Where should the scheduling be performed? What are the benefits of scheduling grouped tasks? Some other concerns require a deep analysis. For example, security and privacy issues are complex in Fog. Context aware application tasks should be scheduled on the most trusted nodes. In addition, algorithms used for process scheduling on traditional computers and servers may not be ideal in the Fog paradigm because different Fog nodes may have different hardware capabilities and some tasks may be more important than others. The challenge arises as to which other scheduling algorithms can be used to optimize factors important for highly mobile computing with low latency requirements. Resource Management on Fog Nodes Due to its basic service, task volumes transferred to an edge node are of secondary priority because the primary service cannot be compromised [11]. This makes resource
Context-Aware Mobile Applications in Fog Infrastructure
325
management on edge nodes challenging because the resources allocated for task volume will need to be dynamically scaled as there are limited hardware resources available on the edge node when compared to the Cloud environment. In addition, resource allocation to host multiple tenants also needs to be considered. Data Management Fog Computing ensures faster processing by making local storage possible. How to implement this storage and data management presents some new challenges. Hao et al. [16] rise some questions: What efficient algorithms can be used to exchange data between devices? How can a prefetch be implemented to enable low latency? What namespace scheme should be used? How can confidential and encrypted data be cached privately and effectively? How to decrease power consumption and mobile network usage by devices, as they have limitations imposed by limited battery technology and data limits imposed by mobile operators. Service Discovery Cloud enables automatic service discovery. As Fog Computing nodes are spaceheterogeneous, users can reach a new space and take advantage of the many services provided by Fog nodes in this new space. Because this feature depends on the service developers’ implementation, installing service discovery protocols in Fog Computing can be quite challenging. Also, in Fog Computing the provision of services is usually done dynamically. That is, new virtual machines are locally orchestrated when a specific service is required. This raises some research questions: When should services be started and stopped? What is the best way to balance the volume of tasks? What virtualization to use? Virtual machines or containers. Is it possible to predict which services are needed and provision the needs of users? What are the methodologies for efficiently provisioning services to many users? What is the best way to divide the volume of service tasks that aggregate information from devices and close clients. What about your energy-efficient customer-side relationships? Privacy and Security In Fog Computing security and privacy issues are often overlooked for functionality and interoperability due to the heterogeneous nature [16]. Encryption and strict privacy policies make it difficult to exchange data between different devices. It is important to make Fog applications preserve user privacy, provide strict security guarantees and meet the needs of all parties involved. There are many privacy and security related challenges among which stand out: ways to do authentication and design authorizations; identification of safe and effective protocols for use on Fog nodes without deteriorating performance or increasing power consumption; design a safe and accurate location verification scheme that works in volatile environments while being suitable for devices with limited resources. Data Consistency In Cloud Computing, unlike Fog Computing, data consistency is achieved by coordinating Cloud servers in data centres. When writing data to a Fog Computing environment, it is necessary not only to coordinate Cloud servers, but also to invalidate cached data on Fog Computing nodes and customer devices if strong consistency is required. This may slow down the performance of the recordings and consequently
326
C. Barros et al.
undermine the benefits of using Fog nodes as write cache servers. On the other hand, Fog Computing also offers opportunities to achieve data consistency more efficiently than Cloud, especially when data is sent to only one Fog node over a given period of time mainly because it is on the edge of the network. However, exploring opportunities to gain data consistency at Fog is still a challenge and requires a lot of research efforts.
5 Final Remarks More and more devices are added to the Internet, every day. Devices such as routers, base stations and switches, lie between end users’ devices and the Cloud services, arising limitations related to latency, storage and processing power. A new Cloud Computing paradigm has emerged that has attracted a lot of attention from industry and academia. This work reviewed the concept of Fog Computing and related it with context-sensitive applications and context-sensitive mobile service platforms. The discussion of the literature review looked at some research challenges in mobile applications in Fog Computing. Current state of the art reveal that, despite of the potential of Fog Computing, the realization of the concept requires the fulfillment of several open problems and research challenges.
References 1. Fernando, N., Loke, S.W., Rahayu, W.: Mobile cloud computing: a survey. Future Gener. Comput. Syst. 29(1), 84–106 (2013) 2. Bonomi, F., Milito, R., Natarajan, P., Zhu, J.: Fog computing: a platform for Internet of things and analytics. In: Bessis, N., Dobre, C. (eds.) Big Data and Internet of Things: A Roadmap for Smart Environments, vol. 546, pp. 169–186. Springer, Cham (2014) 3. Abowd, G.D., Dey, A.K., Brown, P.J., Davies, N., Smith, M., Steggles, P.: Towards a better understanding of context and context-awareness. In: Gellersen, H.-W. (ed.) Handheld and Ubiquitous Computing, vol. 1707, pp. 304–307. Springer, Heidelberg (1999) 4. Musumba, G.W., Nyongesa, H.O.: Context awareness in mobile computing: a review. Int. J. Mach. Learn. Appl. 2(1), 5 (2013) 5. Stojmenovic, I.: Fog computing: a cloud to the ground support for smart things and machineto-machine networks. In: Telecommunication Networks and Applications Conference (ATNAC), pp. 117–122 (2014) 6. Vaquero, L.M., Rodero-Merino, L.: Finding your way in the fog: towards a comprehensive definition of fog computing. ACM SIGCOMM Comput. Commun. Rev. 44, 27–32 (2014) 7. Bazire, M., Brézillon, P.: Understanding context before using it. In: Dey, A., Kokinov, B., Leake, D., Turner, R. (eds.) Modeling and Using Context: 5th International and Interdisciplinary Conference Context, pp. 29–40 (2005) 8. Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of context. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) Context 2007. LNCS, vol. 4635, pp. 558–571. Springer, Heidelberg (2007) 9. Henricksen, K.: A framework for context-aware pervasive computing application. Ph.D. University of Queensland. http://henricksen.id.au/publications/phd-thesis.pdf. acedido em 11 Sep 2019
Context-Aware Mobile Applications in Fog Infrastructure
327
10. La, H.J., Kim, S.D.: A conceptual framework for provisioning context-aware mobile cloud services. In: IEEE 3rd International Conference on Cloud Computing (CLOUD), pp. 466– 473. IEEE (2010) 11. Singh, S.P., Nayyar, A., Kumar, R., Sharma, A.: Fog computing: from architecture to edge computing and big data processing. J. Supercomput. 75(4), 2070–2105 (2019) 12. Schmohl, R., Baumgarten, U.: A generalized context-aware architecture in heterogeneous mobile computing environments. In: The Fourth International Conference on Wireless and Mobile Communications, ICWMC 2008, pp. 118–124. IEEE (2008) 13. Devaraju, A., Hoh, S., Hartley, M.: A context gathering framework for context-aware mobile solutions. In: Proceedings of the 4th International Conference on Mobile Technology, Applications, and Systems and the 1st International Symposium on Computer Human Interaction in Mobile Technology, pp. 39–46. ACM (2007) 14. Wang, N., Varghese, B., Matthaiou, M., Nikolopoulos, D.S.: ENORM: a framework for edge node resource management. IEEE Trans. Serv. Comput. X,1–14 (2017). https://doi.org/ 10.1109/TSC.2017.2753775 15. Shaikh, J., Fiedler, M., Collange, D.: Quality of experience from user and network perspectives. Ann. Telecommun.-annales des télécommunications 65(1–2), 47–57 (2010) 16. Hao, Z., Novak, E., Yi, S., Li, Q.: Challenges and software architecture for fog computing. IEEE Internet Comput. 21(2), 44–53 (2017) 17. Gohar, M., Ahmed, S.H., Khan, M., Guizani, N., Ahmed, A., Rahman, A.U.: A big data analytics architecture for the Internet of small things. IEEE Commun. Mag. 56(2), 128–133 (2018) 18. Marn-Tordera, E., Masip-Bruin, X., Garca-Almiana, J., Jukan, A., Ren, G.-J., Zhu, J.: Do we all really know what a fog node is? Current trends towards an open definition. Comput. Commun. 109, 117–130 (2017) 19. Anawar, M.R., Wang, S., Azam Zia, M., Jadoon, A.K., Akram, U., Raza, S.: Fog computing: an overview of big IoT data analytics. Wirel. Commun. Mob. Comput. 1–22 (2018). https:// doi.org/10.1155/2018/7157192 20. Aazam, M., St-Hilaire, M., Lung, C.H., Lambadaris, I.: Pre-fog: IoT trace based probabilistic resource estimation at fog. In: 13th IEEE Annual Consumer Communications Networking Conference (CCNC), pp. 12–17 (2016) 21. Datta, S.K., Bonnet, C., Haerri, J.: Fog computing architecture to enable consumer centric Internet of things services. In: International Symposium on Consumer Electronics (ISCE), pp. 1–2 (2015) 22. Aazam, M., St-Hilaire, M., Lung, C.H., Lambadaris, I.: MeFoRE: QoE based resource estimation at fog to enhance Qos in IoT. In: 23rd International Conference on Telecommunications (ICT), pp. 1–5 (2016) 23. Yan, S., Peng, M., Wang, W.: User access mode selection in fog computing based radio access networks. In: IEEE International Conference on Communications (ICC), pp. 1–6 (2016) 24. Hou, X., Li, Y., Chen, M., Wu, D., Jin, D., Chen, S.: Vehicular fog computing: a viewpoint of vehicles as the infrastructures. IEEE Trans. Veh. Technol. 65(6), 3860–3873 (2016) 25. Zhu, J., Chan, D.S., Prabhu, M.S., Natarajan, P., Hu, H., Bonomi, F.: Improving web sites performance using edge servers in fog computing architecture. In: IEEE 7th International Symposium on Service Oriented System Engineering (SOSE), pp. 320–323 (2013) 26. Aazam, M., Huh, E.N.: Fog computing and smart gateway based communication for cloud of things. In: International Conference on Future Internet of Things and Cloud (FiCloud), pp. 464–470. IEEE (2014)
328
C. Barros et al.
27. Truong, N.B., Lee, G.M., Ghamri-Doudane, Y.: Software defined networking-based vehicular adhoc network with fog computing. In: IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 1202–1207 (2015) 28. Gazis, V., Leonardi, A., Mathioudakis, K., Sasloglou, K., Kikiras, P., Sudhaakar, R.: Components of fog computing in an industrial internet of things context. In: 12th Annual IEEE International Conference on Sensing, Communication, and Networking Workshops (SECON Workshops), pp. 1–6 (2015) 29. Cirani, S., Ferrari, G., Iotti, N., Picone, M.: The IoT hub: a fog node for seamless management of heterogeneous connected smart objects. In: 12th Annual IEEE International Conference on Sensing, Communication, and Networking-Workshops (SECON Workshops), pp. 1–6. IEEE (2015) 30. Souza, C., Ahn, G.J., Taguinod, M.: Policy-driven security management for fog computing: preliminary framework and a case study. In: IEEE 15th International Conference on Information Reuse and Integration (IRI), pp. 16–23 (2014) 31. Cardellini, V., Grassi, V., Presti, F.L., Nardelli, M.: On QoS-aware scheduling of data stream applications over fog computing infrastructures. In: IEEE Symposium on Computers and Communication (ISCC), pp. 271–276 (2015) 32. Shi, H., Chen, N., Deters, R.: Combining mobile and fog computing: using coAP to link mobile device clouds with fog computing. In: IEEE International Conference on Data Science and Data Intensive Systems, pp. 564–571 (2015)
Analyzing IoT-Based Botnet Malware Activity with Distributed Low Interaction Honeypots Sergio Vidal-González1(&), Isaías García-Rodríguez1, Héctor Aláiz-Moretón1, Carmen Benavides-Cuéllar1, José Alberto Benítez-Andrades1, María Teresa García-Ordás1, and Paulo Novais2 1
Department of Electrical and Systems Engineering and Automation, Universidad de León, León, Spain [email protected], {igarr,hector.moreton,carmen.benavides,jbena, m.garcia.ordas}@unileon.es 2 Algoritmi Centre/Department of Informatics, University of Minho, Braga, Portugal [email protected]
Abstract. The increasing number of Internet of Things devices, and their limited built-in security, has led to a scenario where many of the most powerful and dangerous botnets nowadays are comprised of these type of compromised devices, being the source of the most important distributed denial of service attacks in history. This work proposes a solution for monitoring and studying IoT-based botnet malware activity by using a distributed system of low interaction honeypots implementing Telnet and SSH remote access services, that are used to manage the majority of IoT devices in the home environment like routers, cameras, printers and other appliances. The solution captures and displays real-time data coming from different honeypots at different locations worldwide, allowing the logging and study of the different connections and attack methodologies, and obtaining samples of the distributed malware. All the information gathered is stored for later analysis and categorization, resulting in a low-cost and relatively simple threat information and forecasting system regarding IoT botnets. Keywords: Honeypot
Botnet Malware SSH Telnet
1 Introduction The number of Internet-connected devices is continuously growing, especially during the last years. The number of online devices in 2017 was of about 20 billion and it is expected to reach 75 billion by 2025 [1]. The technology in these devices allows them to interact with other machines and people, but also directly with the physical world [2]. The problem with this situation lies in the fact that the security measures implemented by the majority of these devices are far from good, and this issue is still far from being fixed. The great variety of devices and technologies, as well as the pressure © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 329–338, 2020. https://doi.org/10.1007/978-3-030-45691-7_30
330
S. Vidal-González et al.
on manufacturers in order to put their products in the market, are some of the reasons for this lack of security features. This situation has led to the development of botnets specifically oriented towards IoT devices and its use for a number of increasingly different malicious activities. The public release of the source code for the Mirai botnet [3] supposed the dramatic increase in the number of bots trying to infect this kind of devices. The exploitation of the growing range of vulnerabilities found in these devices has led to use them for increasingly complex and harmful attacks like SYN, ACK or UDP flooding, or DNS water torture [4]. From 2016, these networks of bots have been responsible for the biggest attacks to companies like GitHub, Amazon or Twitter [5]. Internet of Things devices have become not only the target of malicious attacks, but also the source of many of them. Many of the attacks aimed at compromising IoT devices use, as the entry point, the device’s remote access service, and in particular Telnet and secure shell (SSH) [6]. The remote access using Telnet has experienced a rebirth thanks to the IoT, as it is a very lightweight technology that can be supported in very simple, low-resource chips. Many of the vulnerabilities of both Telnet and SSH come from bad or missing security configuration like using weak passwords or keeping the default from the vendor [6]. During the last years a number of studies have used honeypots in order to study this security concern regarding IoT home devices. Honeypots are devices designed to be contacted, and sometimes even compromised, by attackers with the aim of studying the characteristics, technologies and methodologies used by those attackers in a controlled environment [7]. Honeypots can be classified according to their interaction level as low, medium or high-interaction [8]. Low-interaction honeypots simulate vulnerable services by using specialized pieces of software, they attract attackers but do not let them completely compromise the system where they are being executed. Highinteraction honeypots are real systems that are exposed to the attacker, trying to get as many information about the malware activities as possible while trying not to compromise other systems in the network and implementing recovery measures in the case the honeypot gets totally compromised or useless [9]. High-interaction honeypots are harder to setup and maintain, but can collect a great amount of information about attacks. On the other side, low-interaction honeypots are easier to deploy and usually do not compromise other devices but, obtain less information about the attack. The objective of this study is to study how bots contact IoT home devices through their remote access services and what activities and malware they try to introduce in these systems. Another objective is to get a record of the malware bot activity worldwide during the period when the honeypots are active, with the aim of obtaining and analyzing information about which botnets are more active and what type of malware they use. The solution to be developed must accomplish a number of features: • • • • •
The honeypots must not be centered in a specific vulnerability. Both Telnet and SSH remote access services will be used. The honeypots must focus, as far as possible, on malware for IoT devices. The solution must be worldwide distributed and scalable. Quasi real-time graphical information and data of the attack sources must be displayed.
Analyzing IoT-Based Botnet Malware Activity
331
None on the existing IoT honeypot systems studied adapted completely to the design requirements presented. Also, the authors wanted to explore open source technologies that could be used to build such a system from scratch, what increases the controllability about the functionalities and improves the flexibility of the approach. The rest of the paper is organized as follows. Section 2 describes the architecture and functionality of the proposed solution, as well as the implementation details. Section 3 presents the results obtained, highlighting the main outcomes, the captured malware activity metrics and the differences found between the Telnet and the SSH honeypots. Section 4 gives a brief discussion and presents some future work.
2 Materials and Methods Six low interaction honeypots were deployed at six different locations in the world using the cloud service by DigitalOcean1. The honeypots captured all the interactions and communications maintained with different contacting bots using the Telnet and SSH services. The honeypots were active for a 52-days period, from 1019, may, 23rd to 2019, July, 14th, capturing and storing the different communications and downloading the malware in a sandbox environment for later analysis. The malware activity was monitored in quasi real time using a graphical dashboard fed with the incoming data. By analyzing the data obtained, a categorization of the contact methodologies and strategies, as well as the malware used by the bot was carried out whenever possible. 2.1
Architecture of the Proposed Solution
The domain of security in IoT ecosystems is dynamic and evolves rapidly, so the solution must adapt to this dynamic nature, being flexible and scalable. The functionalities of the system must go beyond the implementation of the interaction with the bots, building functionalities for the gathering, storing and analysis of the information in an integrated platform. The main component of the developed system are: The honeypots, a log manager, a sandbox for the malware to be downloaded, and a graphical dashboard to display information. The honeypot is built from scratch using the Twisted event-driven networking library [10]. This allows to program the exact functionality that is needed, implementing both the Telnet and SSH services, and permitting the incorporation of new functionality if needed. The main modules that comprises the system are: • IP source scan and reputation module. This module takes the IP address of the contacting malware in order to try to find its origin and reputation. The origin is roughly characterized as either “TOR” (using tools like CollecTor2 and Onionoo3),
1 2 3
https://www.digitalocean.com/. https://metrics.torproject.org/collector.html. https://metrics.torproject.org/onionoo.html.
332
S. Vidal-González et al.
“proxy” (using lists from sources like Pastebin4) or “personal device” (used for those origins that do not fit on the other categories). The reputation of the IP address is calculated by querying the IPIntel service API5. • Command interpreter. This module is in charge of the communication of the honeypot with the malware. In a low-interaction honeypot the responses to the malware do not come from a real system, it is the honeypot who must build and send these responses. The module works by using a matching algorithm that uses the malware commands to find the corresponding most suitable response. Many of these commands are invocations to the BusyBox tool, a lightweight executable that implements many of the Unix/Linux utilities in the IoT devices. Many of the initial commands issued by the malware are “pre-attack checks” (a series of tests performed on the system to fingerprint it and see if it fits the needs of the attacker). The honeypot must conveniently fulfill these checks in order the malware keeps going on with the communication. • Reporter. This module is responsible for the download of the malware software file and testing if it has been previously reported as such. Whenever the command from the malware contains a URL, the reporter downloads and stores the file, creating the corresponding hashes for later testing. The software file is sent to a sandbox component and also stored locally with a given name that includes the current timestamp and the source IP. The reporter tests if the downloaded file has been reported in the VirusTotal6 service, the IP address for the source is also checked in urlhaus7 to test if it is a already known source. • Log manager. All interactions and the information generated by these modules is stored in the log manager module. The information is categorized with a key indicating its type from one of four categories “new client” (when the malware contacts the honeypot), “client disconnected” (when the malware finishes the connection), “authentication attempt” (introducing a login and a password), and “command” (introducing a command once it has gained access with the login credentials). 2.2
Implementation
The system was built in virtual machines instantiated in a DigitalOcean private cloud. These machines, with the honeypots inside, were Linux Ubuntu 16.04 located at six different places: London, Bangalore, Singapore, San Francisco, New York and Spain (this last one was not part of the cloud platform). Each honeypot included both a Telnet and a SSH service, with a number of accounts and passwords that are usually employed as default credentials in a number of IoT devices (IP cameras, printers, routers, etc).
4 5 6 7
https://pastebin.com/JZrsbc9E. https://getipintel.net/. https://www.virustotal.com/. https://urlhaus.abuse.ch/.
Analyzing IoT-Based Botnet Malware Activity
333
As well as the Twisted framework, as previously introduced, the Graylog8 log management platform was used due to its flexibility when centralizing and unifying logs from different, heterogeneous sources. Graylog uses Elasticsearch, MongoDB, and Scala for capturing, processing and visualizing data in real-time in a graphical dashboard. The malware sandbox was implemented with the Viper9 malware analysis framework, which is able to organize, categorize and classify the malware downloaded by the honeypots. The communications between the different modules and the Graylog platform were implemented by using the Graylog Extended Log Format (GELF), a technology consisting of a protocol and a data format. On the transport layer, TCP was chosen to transport GELF messages. The communications with the Viper platform was performed by using the Viper REST API.
3 Results The study and analysis of results has two main components: the study of the interactions and the study of the malware files. Both Telnet and SSH results are presented. 3.1
Study of the Interactions
Interactions Regarding Telnet. A total of 652881 interactions were captured using the Telnet honeypots. Figure 1 shows the distribution of the IP sources of the interactions using the Telnet service. The top geographical origins were the Netherlands (33.69%), USA (10.92%), China (8.18%) and Iran (7.37%).
Fig. 1. Geographical distribution of malware source IP for Telnet honeypots.
Figure 2 show the histogram of the total traffic in all the honeypots on a daily basis.
8 9
https://www.graylog.org/. https://viper-framework.readthedocs.io/.
334
S. Vidal-González et al. Number of events
Day
Fig. 2. Histogram of aggregated daily events captured in the Telnet honeypots.
The most usual captured event is the insertion of commands, with a 33.5% of the total events, while the login attempts constitute the 8.53%. A 74.41% of the introduced credentials were incorrect while a 25.59% were successful. This indicates that bots use lists of well-known users and passwords, though they appear to be used randomly. Regarding the commands used by the bots, roughly a 25% are typical preamble commands (e.g. enable, sh, shell, system). The rest of the commands have a great variability, with more than 192420 different ones. The great majority of source IPs correspond to personal devices that may have been compromised by botnets (though it must be taken into account that this includes sources that have not categorized as “TOR” or “proxy”) with a total of 192420 interactions. The category “proxy” has only 184 interactions, and the “TOR” origin has only one. The most usual interactions in the Telnet honeypots are: • Fingerprinting, usually achieved by detecting the presence and version, and later issuing different calls, to the BusyBox software suite. • Malware download, that can consist on a stub downloader, that will, in turn, download the real infection (phase 2, much bigger) malware, or this infection malware directly. The malware download is accomplished by one of these different approaches: • Using the tftp or wget commands provided by BusyBox, if present. • In the case that both of these utilities are not present in the system, issuing a number of “echo” commands in order to download the hexadecimal representation of the code for the stub, or phase 2, malware. • Using more complex scripts that include a selective malware download depending on the detected underlying architecture. The number of this kind of increasing complexity scripts raised during the capture period, being an interesting trend that needs further attention and study. The usual command sequence in the Telnet interactions consist in the download of the malware, setting 777 permissions to the corresponding file, executing it, and exiting. Interactions Regarding SSH. A total of 1315716 interactions were captured using the SSH honeypots, the distribution of the IP source addresses is shown in Fig. 3. The majority of the SSH traffic is originated in India (55.53%), USA (11.55%) and China (9.04%), including a remarkable 4.06% coming from Germany. The histogram of the SSH interactions, shown in Fig. 4, presents marked peaks of near or more than 80 K interactions per day. The cause of these peaks is found in the brute force attacks, where the attacker tries a huge combination of login credentials.
Analyzing IoT-Based Botnet Malware Activity
335
Fig. 3. Geographical distribution of malware source IP for SSH honeypots. Number of events
Day
Fig. 4. Histogram of aggregated daily events captured in the SSH honeypots.
In this case, the most prevalent event is the login attempt, with a 38% of the total events (where 94,77% were failing attempts and the 5,23% were successful logins), while the insertion of commands only constitutes the 0.17%. This contrasts to the case of the Telnet service, where the login attempt was the less prevalent event while the insertion of commands was the more prevalent one. In the case of the SSH service, the range of different users and passwords for different device types is huge and hence the brute force attacks generate a great number of interactions in the log. Among the most used commands issued by the SSH-based bots three main types can be distinguished: • Commands using the string “/gisdfoewrsfdf”, that could be used as a boundary string for parsing purposes by the attacker. • Shell scripts used to fingerprint and download specific malware adapted to the underlying platform. • Commands for downloading malware related to well-known DDoS attacks but not specifically oriented towards the IoT ecosystem. The source of attack clients, contrary to the Telnet case, finds a greater TOR and proxy origins, but still summing only the 0.8% of the total. The majority of sources are personal devices (or unknown ones). Regarding the interactions in SSH, it is worth to note that, once the attacker has gained access to the honeypot, most of the times there are no further commands issued, the connection is kept open but the malware seems to stay idle. Sometimes, as stated previously, the attacker downloads a DDoS malware not specifically oriented towards IoT devices. A small number of interactions, but also the more interesting ones, consists on the download of a shell script that tries to fingerprint
336
S. Vidal-González et al.
and categorize the device and also perform a test to investigate whether the honeypot IP is blacklisted or not. It is not clear why this test is not performed previously to entering the honeypot, as this would save time to the attacker. 3.2
Study of the Malware Samples
A total of 438 samples of different malware software were obtained during the active capture period. Two main categories can be studied: executable files and scripts. In the case of the Telnet honeypots, an 83.3% of the downloaded malware corresponded to the Hajime botnet [11], with a MD5 hash of a04ac6d98ad989312783 d4fe3456c53730b212c79a426fb215708b6c6daa3de3. This is a lightweight stub downloader appearing with the name “.i”, it is distributed as statically linked and stripped for hindering the debugging tasks. The majority of the rest of downloaded malware were scripts with commands for adapting the phase 2 malware to be downloaded to the underlying architecture. Only 21 out of the 438 total samples of malware were downloaded in the SSH honeypots. And only one from these 21 was specifically related to IoT, it was a script named “i.sh” belonging to the Hajime botnet. The rest of the malware samples are DDoS malware like “Linux2.6”, “DDoSClient” or “iptraf”. An interesting characteristic of the SSH malware download is that it does not usually try to find the most suitable malware file according to the underlying architecture, but use to download a generic file suitable for the i386, 32-bit platforms. Also, it is also worth to remark the presence of debug symbols in the malware samples distributed in SSH honeypots. The most usual malware families in these honeypots are Linux.Znaich and Linux.dofloo. Download Scripts. A number of download scripts were captured by the honeypots. In the case of the Telnet honeypots there is a great variability in their names, though there are also some ones that use to appear many times, like “bins.sh”. Table 1 shows the scripts found and their corresponding malware family.
Table 1. Script samples obtained. Botnet family Hajime Tsunami Mirai
Nr of samples 1 2 17
Bashlite/Gafyt
19
Name of the script i.sh bins.sh, zbot.sh 8UsA.sh (3), Pemex.sh, jaknet.sh, bins.sh (8), cayosinbins.sh, messiahbins.sh, sh, Zehir.sh bins.sh (17), paranoid.sh, njs.sh
All these scripts are containers of many malwares for different architectures (among them arm4, arm5, arm6, arm7, mk68, mips, mipsel, sparc, mpsl, ppc, i586, i686, x86, sh4) that, in certain occasions, simulate to be known, trusted services in order to hide (for example ftp, apache2, bash, cron, ntpd, openssh, sshd, tftp, wget). The SSH honeypots have also detected these kind of scripts, though only once; the script in this case presented a higher complexity than those found in the Telnet honeypot.
Analyzing IoT-Based Botnet Malware Activity
337
4 Discussion and Conclusions Honeypots are an effective way of studying different malware technologies that are playing an increasingly essential paper in the cybersecurity domain [12]. In the case of this research, the focus is the threat from malware bots regarding IoT devices and their remote login services. In this application scenario, honeypots have gained attention as a means of monitoring and studying the complex and evolving threat landscape of IoT malware and IoT botnets. There are a number of previous honeypot-based solutions that have focused on the IoT ecosystem, see Table 2 for a summary of some of them. All these approaches have some advantages and drawbacks. Using high interaction honeypots implies a greater cost and the need for setting up a high number of IoT devices. It is worth the effort if the focus is on studying a particular set of devices, but it is not suitable for distributed solutions. Some of the approaches do not implement both the Telnet and SSH login services, others are not scalable or distributed. Some solutions focus only on a given, or a number of, vulnerabilities in order to get a deep understanding of the threat. The system proposed in this paper is able to detect bot malware activity in honeypots using Telnet and SSH remote login services. The information obtained may be used as a low-cost and relatively simple threat information and forecasting system regarding IoT botnets. Some issues must still be addressed, though. The lack of interaction of the malware using SSH once it has gained access into the system is an issue that must be studied, as it could indicate some kind of problem with the honeypot, or maybe some activity is being missed. Also, the set of variables that are obtained
Table 2. Some related work. Project Name SIPHON
Reference [13]
IoTCandyJar
[14]
IoTPot
[15]
ThingPot
[16]
HIoTPOT
[17]
Comments High interaction honeypots and real devices. High management costs Captures malware requests and then scans the IPv4 address space in order to find real devices able to respond these same request High interaction backend that execute the commands issued by the attacker in a virtual controlled environment, only Telnet Honeypot platform that focuses on other IoT specific protocols like XMPP A honeypot with similar features to an intrusion detection system (IDS) that is able to redirect the attacker commands to a real or to a simulated environment Combines a number of low interaction honeypots like Cowriea or Dionaeab with high interaction ones with controlled vulnerabilities
“Before [18] Toasters Rise Up” a https://github.com/cowrie/cowrie. b https://github.com/DinoTools/dionaea.
338
S. Vidal-González et al.
must be studied in order to find a possible correlation with a measure of the worldwide malware activity. In this sense, integrating and studying information from other similar sources could be a potential line of future research.
References 1. Statista: Internet of Things (IoT) connected devices installed base worldwide from 2015– 2025 (2019). https://www.statista.com/statistics/471264/iot-number-of-connected-devicesworldwide/ 2. Wu, G., Talwar, S., Johnsson, K., Himayat, N., Johnson, K.D.: M2M: from mobile to embedded internet. IEEE Commun. Mag. 49(April), 36–43 (2011) 3. Margolis, J., Oh, T.T., Jadhav, S., Kim, Y.H., Kim, J.N.: An In-depth analysis of the mirai botnet. In: Proceedings - 2017 International Conference on Software Security and Assurance, ICSSA 2017 (2018) 4. Antonakakis, M., et al.: Understanding the Mirai Botnet. In: 26th USENIX Security Symposium (2017) 5. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer (Long. Beach. Calif)., vol. 50, no. 7, pp. 80–84 (2017) 6. Kishore, A.: Turning Internet of Things (IoT) into Internet of Vulnerabilities (IoV) : IoT Botnets, arXiv.org (2017). https://arxiv.org/abs/1702.03681v1 7. Joshi, R.C., Sardana, A.: Honeypots: A New Paradigm to Information Security, 1st edn. CRC Press, Boca Raton (2011) 8. Provos, N., Holtz, T.: Virtual Honeypots: From Botnet Tracking to Intrusion Detection. Addison Wesley Professional, Boston (2007) 9. Mohammed, M., Rehman, H.: Honeypots and Routers. Collecting Internet Attacks. CRC Press, Boca Raton (2016) 10. Williams, M., et al.: Expert Twisted: Event-Driven and Asynchronous Programming with Python. Apress, New York (2019) 11. Edwards, S., Profetis, I.: Hajime: Analysis of a decentralized internet worm for IoT devices. (2016). https://security.rapiditynetworks.com/publications/2016-10-16/hajime.pdf 12. Sochor, T., Zuzcak, M.: Study of internet threats and attack methods using honeypots and honeynets. Commun. Comput. Inf. Sci. 43, 118–127 (2014) 13. Guarnizo, J., et al.: SIPHON: towards scalable high-interaction physical honeypots. In: CPSS 2017 - Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, co-located with ASIA CCS 2017 (2017) 14. Luo, T., Xu, Z., Jin, X., Jia, Y., Ouyang, X.: IoTCandyJar: Towards an IntelligentInteraction Honeypot for IoT Devices. Blackhat (2017) 15. Pa, Y., Suzuki, S., Yoshioka, K., Matsumoto, T., Kasama, T., Rossow, C.: IoTPOT: a novel honeypot for revealing current IoT threats. J. Inf. Process. 24(3), 522–533 (2016) 16. Wang, M., Santillan, J., Kuipers, F.: ThingPot: an interactive Internet-of-Things honeypot (2018). https://arxiv.org/abs/1807.04114 17. Gandhi, U.D., Kumar, P.M., Varatharajan, R., Manogaran, G., Sundarasekar, R., Kadu, S.: HIoTPOT: surveillance on IoT devices against recent threats. Wirel. Pers. Commun., pp. 1– 16 (2018) 18. Vervier, P.A., Shen, Y.: Before toasters rise up: a view into the emerging IoT threat landscape. Proceedings of the Research in Attacks, Intrusions, and Defenses 2018, 556–576 (2018)
Evolution of HTTPS Usage by Portuguese Municipalities Hélder Gomes1(&), André Zúquete2, Gonçalo Paiva Dias3, Fábio Marques1, and Catarina Silva4 IEETA, ESTGA, University of Aveiro, 3750-127 Águeda, Portugal {helder.gomes,fabio}@ua.pt 2 IEETA, DETI, University of Aveiro, 3810-193 Aveiro, Portugal [email protected] GOVCOPP, ESTGA, University of Aveiro, 3750-127 Águeda, Portugal [email protected] 4 University of Aveiro, 3810-193 Aveiro, Portugal [email protected] 1
3
Abstract. This paper presents a study on the evolution of the use of HTTPS by the official websites of all (308) Portuguese municipalities. One year ago, we found a bad situation regarding HTTPS usage: only a small percentage of websites adopted HTTPS correctly. The results were communicated to the relevant entities so actions could be taken. After one year, we performed a new assessment to check for evolution. This paper presents the results of this second assessment. We found a significantly better situation, although still with plenty of room for improvement: 31 municipal websites were classified as Good (20 more), while 42 less were classified as Bad (100 in total). We concluded that two determinants that were identified as contributing to explain the results of the first study - municipal taxes and total population - do not contribute to explain the improvements observed in this assessment. We believe that we contributed to those improvements by raising awareness to the high number of municipalities not using or badly using HTTPS. Keywords: E-government
Local government HTTPS adoption
1 Introduction HTTPS is the current de facto protocol used to provide secure services through the Web, and its usage is recommended by standard bodies, such as W3C - World Wide Web Consortium1 [1] and IAB - Internet Architecture Board2 [2], and major browsers [3, 4]. In conformity, the European Commission websites must mandatorily use HTTPS connections [5] and, similarly, in the USA it is required that all publicly accessible Federal websites and web services only provide services through a secure
1 2
https://www.w3.org. https://www.iab.org/.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 339–348, 2020. https://doi.org/10.1007/978-3-030-45691-7_31
340
H. Gomes et al.
connection [6]. In Portugal, the technical requirements for the security architecture of network and information systems also require the use of secure connections [7]. In the most common configurations, HTTPS provides three security services: data confidentiality, data integrity and server authentication. It assures users are accessing the web site they intend to and guarantees that the data is both exchanged without having snooping third parties and not altered in its way to the destination. However, if HTTPS is not properly implemented, an attacker may launch meet-in-the-middle attacks between the web server and the client browser and circumvent all those three security properties. All Portuguese municipalities have an official website, which is the main reference point for the provisioning of municipal services to the citizens and for the promotion of the municipality. Among the provided services, there are informational services, with information regarding the municipal government bodies and municipal regulations and services, and transactional services, which citizens may use electronically. While the use of HTTPS is not questioned on municipal websites that provide more sensitive services to citizens, such as transactional services, where all confidentiality, integrity and authentication are obviously relevant, it must also be used on all municipal websites, because integrity and server authentication is always relevant to guarantee that the correct server is being accessed and the exchanged data was not altered. This is particularly relevant for the links pointing to websites where sensitive operations are served, to guarantee that citizens are directed to the correct server. Previous studies addressed the adoption of HTTPS on websites [8–10] and the quality of the implementation [8, 10, 11]. However, only one of these studies is targeted to the adoption of HTTPS on the specific domain of local e-government. It presents the case for Sweden, addressing the evolution between May 30 and August 20, 2016 [10]. One year ago, November 2018, we assessed the usage of HTTPS by all 308 Portuguese municipalities, finding a bad scenario [12]. The findings were communicated to the relevant entities so they could take steps to improve the status quo. After one year, November 2019, we decided to make a new assessment to verify if measures had been taken to improve the situation. This paper is organized as follows: we start by making a short presentation of the previous assessment; next we present the methodology; then we present the results of the new assessment; following we compare the two assessments in order to characterize the evolution and to search for determinants that may explain it; and finally, we discuss the results and conclude the paper.
2 Previous Work Our first assessment on the usage of HTTPS by the official websites of all 308 Portuguese municipalities occurred in November, 2018 [12]. When requesting the entry page of each website we assessed a set of parameters that allowed us to characterize the usage of HTTPS by their servers. Those parameters are briefly presented in the next section. Detailed figures concerning them were presented in [12] and will be used here to assess the evolution on the exploitation of HTTPS by the websites.
Evolution of HTTPS Usage by Portuguese Municipalities
341
The overall classification obtained was not brilliant, as is illustrated in Fig. 1. Only 11 municipalities (4%) achieved the classification of Good, while 142 (46%) achieved a classification of Bad.
Fig. 1. Overall results of the 2018 assessment of HTTPS usage by Portuguese municipalities.
Possible socio-economic determinants for the classifications were investigated using multiple T-tests and logistic regression analysis. We found that the group of ‘good or reasonable’ municipalities is statistically different from the group of ‘minimum or bad’ municipalities when ‘municipal taxes’ and ‘total population’ are used as indicators. However, we also found that none of those indicators is a good predictor, which suggests the existence of other relevant explanatory factors not identified in the study. The results of this first assessment were communicated to the Portuguese association of municipalities (Associação Nacional de Municípios Portugueses, ANMP) and to the Portuguese cybersecurity center (Centro Nacional de Cibersegurança, CNCS). After that communication, we were invited to present the results of our study in a session where some municipalities were present. In addition, months after our communication, the .PT and the CNCS made available webcheck.pt3, an online platform that allows users to verify, in real time, the level of compliance of an Internet and email domain with the latest standards for a secure communication between systems. Among other verifications, the platform checks if a website is using the most recent security recommendations, which includes checking the correct use of HTTPS. This evaluation, however, is mainly complementary to ours.
3 Methods Our first objective was to collect data for a set of parameters that characterize the usage of HTTPS by the municipalities’ websites. In order to compare this data with the data from our 2018 assessment, we used the same set of parameters: • Existence of an HTTPS server, given the municipality DNS name. This means that for a given host name in a URL, there must be an open TCP port 443 and a TLS connection can be established with the server that owns that port. • Correct authentication of the HTTPS server. Involves validating a server’s signature with its certified public key and validating the corresponding certificate.
3
https://webcheck.pt.
342
H. Gomes et al.
• For the same host name in a URL, the contents presented for a given path by an HTTP server must match, functionally and visually, the contents presented for the same path by an HTTPS server. • All resources in pages provided through HTTPS must be obtained using HTTPS only; • For sites providing a correct HTTPS service, it must be preferred over HTTP. This implies that HTTP requests must be redirected to HTTPS locations. • Several redirection methods exist; from these, HTTP redirection 301 (Moved Permanently) is the preferable, as browsers memorize it for future requests. • The inclusion of HSTS (HTTPS Strict Transport Security) headers in HTTPS server responses forces browsers to use HTTPS when requesting further pages from that server. In our previous assessment, we used a set of tools and UNIX shell scripts to gather all the relevant data. However, while the gathering of data was efficient, we used a very fine-grained error detection approach which was somewhat irrelevant for the high-level assessment we intended to perform. Therefore, we decided to simplify the assessment process and integrate it in a framework that we developed using Python to gather and store all the relevant data. The data about the above listed set of parameters, collected for each of the 308 municipalities, allowed us to classify each municipal website. For that, we used the same criteria used in our previous assessment, presented in Table 1. Table 1. Criteria to classify the municipal Websites. Good Reasonable HTTPS service (TCP port 443) Yes Yes Valid certificate and certification chain Yes Yes HSTS and redirection to HTTPS (Code 301) Yes HTTPS page includes HTTP resources No No No HTTP page or equal HTTPS and HTTP pages Yes(a) Yes(b) Redirection to HTTP No No (a) HTTPS page only. (b) Either only HTTPS page or equal HTTPS and HTTP pages
Minimum Bad Yes Yes
Yes(b) No
In our first assessment, as previously mentioned, we concluded that statistically significative mean differences existed between the group of ‘good or reasonable’ municipalities and the group of ‘minimum or bad’ municipalities for the variables ‘municipal taxes’ (logarithmized) and ‘total population’ (logarithmized). By performing a binary logistic regression, we also concluded that these two variables explained 3.9% of the variance in the classification of the municipalities, and the regression correctly classified only 64.0% of the cases. Even so, because a statistically significative effect was demonstrated, it is relevant to investigate if these two variables are also associated with changes in the classification of the municipalities between the first and the second measurement.
Evolution of HTTPS Usage by Portuguese Municipalities
343
To investigate this matter, we run T-tests for two dichotomous dependent variables: ‘results improved’; and ‘results got worst’. The indicators ‘municipal taxes’ (logarithmized) and ‘total population’ (logarithmized) were used as independent variables. As for the first measurement, data for these indicators was obtained from the National Statistics Institute and is relative to 2017.
4 Results of Current Assessment We found that 271 municipalities (88%) have a server on port TCP 443 that is able to initiate a TLS session and, possibly, provide the main webpage through HTTPS. The remaining 37 municipalities (12%) do not (see Fig. 2).
Fig. 2. Number of municipalities that provide a HTTPS web server.
Regarding the certificates, we found that 217 HTTPS servers (70%) presented a correct certificate while 54 (18%) presented a wrong certificate (see Fig. 3).
Fig. 3. Number of municipalities presenting correct and incorrect certificates
Among the reasons why the certificates are not correct, we found expired certificates, valid certificates but issued to other entities, self-signed certificates and certificates presented without their certification chain. Notice that in this last group there are certificates issued by proper root Certification Authorities that browsers may consider correct if their certification chains are cached.
Fig. 4. Number of websites providing correct content and different content in HTTPS
344
H. Gomes et al.
Regarding the differences of contents in entry pages provided through HTTP and HTTPS, we found that 222 websites (72%) provide the same content (i.e., the correct content), while 49 (16%) have entry pages with different contents (see Fig. 4). Notice that we classify HTTPS only websites (not having an HTTP version) as providing the correct HTTPS content. On the contrary, we classify websites that only present an HTTP page as providing incorrect HTTPS content. Also, all HTTPS websites were verified regardless of the correctness of the certificates they presented. Comparing the number of websites with correct HTTPS content with the number of websites with correct certificates, we can see that there are websites with correct HTTPS content that present incorrect certificates. Although it is not evident from the numbers, there are websites presenting valid certificates that do not have an HTTPS service. Those websites have valid certificates to allow the redirection of HTTPS requests to HTTP. There is one exception, though, where the presented HTTPS page is in fact different because it shows a message informing that we do not have permission to access the requested page. Regarding the protocol used to obtain the resources in the entry page of the municipal HTTPS websites that present the correct content, we found that 169 (55%) do not fetch resources using HTTP, while 53 (17%) do so (see Fig. 5).
Fig. 5. Use of HTTP resources in entry web pages of HTTPS websites
As it is advised that websites only provide their content using HTTPS, a redirection to HTTPS must occur when some page is requested using HTTP. Regarding these redirections, we found that 161 websites (52%) with correct HTTPS content redirect HTTP requests to the HTTPS version, while 61 (20%) do not (see Fig. 6).
Fig. 6. Number of websites that redirect entry page HTTP request to HTTPS.
The numbers also show that 61 websites (20%) have both HTTP and HTTPS versions. This means that to access those websites using HTTPS a user must explicitly write the URL starting with “https://”. We also found that, from the 161 websites that redirect HTTP requests to HTTPS, only one does not provide the correct content in the HTTPS page: the already mentioned case of a page with a forbidden message.
Evolution of HTTPS Usage by Portuguese Municipalities
345
Furthermore, we found that 140 websites (45%) use the 301 HTTP code, while 21 (7%) use other methodologies (see Fig. 7). Regarding redirections, it is important to note that we found 16 websites that redirect HTTPS requests to HTTP, some using the HTTP redirection code 301. Half of these 16 websites present an invalid certificate, which prevents the redirection to occur due to the invalid certificate message presented by browsers. The other half presents a valid certificate and the redirection is effectively performed, which indicates that no use of HTTPS on the website could be a municipality policy.
Fig. 7. Number of websites using code 301 to redirect HTTP requests to HTTPS.
We also found three municipalities that redirect HTTPS requests to HTTP, followed by a second redirection to HTTPS. Although we should, we did not penalize this situation to maintain the criterion used in the previous assessment. Regarding the usage of the HSTS header by municipalities with a correct HTTPS service, we found that only 37 (12%) use this header (see Fig. 8).
Fig. 8. Usage of HSTS in websites with correct HTTPS content.
However, we must mention that 9 out of the 37 websites using HSTS do not include the includeSubDomains directive. This is not a good practice, since it leaves the first request for subdomain pages in those websites unprotected. When applying the Overall Quality indicator to the municipal websites, we found that 31 (10%) are classified as Good, 137 (44%) are classified as Reasonable, 40 (13%) are classified as Minimum and 100 (32%) are classified as Bad (see Fig. 9).
Fig. 9. Classification of the Portuguese municipalities regarding the usage of HTTPS on their websites.
346
H. Gomes et al. Table 2. Variation on the figures for the assessed parameters.
Usage Correct HTTPS of certificate with HTTPS correct content 2018 259 177 168 2019 271 217 216 Evolution 12 40 48
No HTTP resources in HTTPS pages 115 169 54
Redirections from HTTP to HTTPS 103 157 54
HTTP to HTTPS redir. with code 301 76 136 60
Usage of HSTS header 14 36 22
5 Comparison with Previous Assessment Despite the overall results of the 2019 assessment (see Fig. 9) being far from an ideal situation, significant improvements occurred for all the parameters assessed, as can be seen in Table 2. In terms of the overall classification, the variations are illustrated in Fig. 10. There was a significant increase in the number of municipalities classified as Good (20 more) and Reasonable (35 more), and a correspondent decrease in the number of municipalities classified as Bad (minus 42) and classified as Minimum (minus 13).
Fig. 10. Improvements in the overall classification.
Table 3 and Table 4 present the results of the T-Tests (see Sect. 3.3). Equal variance was not assumed based on previous Levene tests. As can be observed, the tests were not significant (p > 0.05). Thus, despite municipal taxes and total population were relevant to explain the results of the first measurement (albeit to a very small extent), they are not relevant to explain the classification changes that occurred between the first and second measurements. Therefore, it cannot be said that the ‘size’ of the municipality has any relation to obtaining better or worse results from the first to the second measurements. Other relevant explanatory factors must exist.
Evolution of HTTPS Usage by Portuguese Municipalities
347
Table 3. Results of the T-tests for getting better results Dependent variable Results Independent variables Sig.*** Municipal taxes (log) 0.495 Total population (log) 0.757 *** Two extremities
improved Mean dif. Std. err. dif. −0.0545 0.0797 −0.0197 0.0636
Table 4. Results of the T-tests for getting worst results. Dependent variable
Results Independent variables Sig.*** Municipal taxes (log) 0.223 Total population (log) 0.222 *** Two extremities
got worst Mean dif. Std. err. dif. −0.2536 0.1946 −0.2160 0.1653
6 Discussion of Findings There were significant improvements from the first to the second assessment. Although it cannot be directly proved with the methods used, these improvements might be the result of increased awareness to the problem that may have resulted from the release of the first study. Indeed, the authors’ warnings to ANMP and CNCS, and the subsequent creation of webcheck.pt, may have triggered the improvements. This is also consistent with the fact that changes in the classifications are not explained by the ‘size’ of the municipalities, as we have demonstrated. However, it is not possible to eliminate the hypothesis that the improvements are the result of any other phenomena not controlled by the authors. The demonstration of a cause-effect relationship between the disclosure of the first study and the improvement observed in the meantime could only be achieved by direct observation or by assessing the perceptions of relevant actors, for example through interviews or questionnaires. This was not done at this stage of the study.
7 Conclusions and Future Work This study consisted of an assessment of HTTPS usage by all Portuguese municipalities, performed in 2019, after a first assessment made in 2018. We concluded that, despite a significant improvement in the usage of HTTPS on the Portuguese municipalities’ websites, there is still much room for improvement. The number of municipalities that achieved the classification of Good is 31 and, despite an increase of 20, it only corresponds to 10% of the total number of municipalities. Also, despite less 42 municipalities classified as Bad, there are still 100 (32%) with that classification. We also conclude that municipal taxes and total population do not contribute to explain the observed improvements, although they were considered relevant to explain the results of the first assessment. We argue that these improvements could be the result of the increased awareness resulting from the release of the first study.
348
H. Gomes et al.
Acknowledgments. This work was partially funded by National Funds through the FCT Foundation for Science and Technology, in the context of the project UID/CEC/00127/2019.
References 1. Nottingham, M.: Securing the Web: W3C TAG Finding 22 January 2015, W3C Technical Architecture Group (TAG) (2015) 2. Morgan, C.: IAB Statement on Internet Confidentiality, Internet Architecture Board (2014). https://www.iab.org/2014/11/14/iab-statement-on-internet-confidentiality. Accessed 27 Nov 2018 3. Vyas, T., Dolanjski, P.: Communicating the Dangers of Non-Secure HTTP, Mozilla Security Blog (2017). https://blog.mozilla.org/security/2017/01/20/communicating-the-dangers-ofnon-secure-http. Accessed 27 Nov 2018 4. Schechter, E.: A secure web is here to stay, Google Security Blog (2018). https://security. googleblog.com/2018/02/a-secure-web-is-here-to-stay.html. Accessed 27 Nov 2018 5. European Commission, Europa Web Guide (2019). https://wikis.ec.europa.eu/display/ WEBGUIDE/2019.08.30+|+Notes+regarding+EDPS+inspection. Accessed 12 Nov 2019 6. Scott, T.: Policy to Require Secure Connections across Federal Websites and Web Services. In Washighton DC: Executive Office of the President, Office of Management and Budget (2015) 7. CNS - Centro Nacional de Cibersegurança, Arquitetura de segurança das redes e sistemas de informação: Requisitos técnicos. (2019). https://www.cncs.gov.pt/content/files/SAMA2020_ RASRSI_CNCS.pdf 8. Vumo, A.P., Spillner, J., Kopsell, S.: Analysis of Mozambican websites: how do they protect their users?. In: 2017 Information Security for South Africa (ISSA) (2017) 9. Wullink, M., Moura, G.C.M., Hesselman, C.: Automating Domain Name Ecosystem Measurements and Applications. In: 2018 Network Traffic Measurement and Analysis Conference (TMA), Tma, pp. 1–8 (2018) 10. Andersdotter, A., Jensen-Urstad, A.: Evaluating websites and their adherence to data protection principles: tools and experiences. In: IFIP International Summer School on Privacy and Identity Management, Springer, pp. 39–51 (2016) 11. Buchanan, W.J., Woodward, A., Helme, S.: Cryptography across industry sectors. J. Cyber Secur. Technol. 1(3–4), 145–162 (2017) 12. Gomes, H., Zúquete, A., Dias, G. P., Marques, F.: Usage of HTTPS by Municipal Web Sites in Portugal. In: New Knowledge in Information Systems and Technologies. WorldCIST 2019 2019. Advances in Intelligent Systems and Computing, vol. 931, Springer, Cham, pp. 155–164 (2019)
Augmented Reality to Enhance Visitors’ Experience at Archaeological Sites Samuli Laato1(B) and Antti Laato2 1
2
University of Turku, Turku, Finland [email protected] ˚ Abo Akademi University, Turku, Finland [email protected]
Abstract. After archaeological excavations are completed, many of the sites are prepared for visitors by including things such as (1) scientific interpretations of what the uncovered structures might represent, (2) reconstructions of ancient structures and (3) historical items or artifacts found during excavations. In addition to these, technology including augmented reality (AR) can be used to provide additional information on site. We study how currently popular global location-based AR games are supplementing the visitors experience at three archaeological sites in the Levant: Tel Hazor, Tel Megiddo and Tel Gezer by inspecting virtual points of interest (PoIs) in the game Ingress Prime. In the three locations, the virtual PoIs are linked to real world locations, however, they cover only a fraction of the visible archaeological structures. A bias was seen in the PoI names and descriptions towards certain archaeological interpretations. We propose location-based AR games should utilize as rigorous information of archaeological sites as possible, in order to provide players the possibility to learn real history in an accurate way.
Keywords: Location-based games Levant
1
· Archaeology · AR · Ingress ·
Introduction
Annually millions travel to see archaeological sites of cultural, historical or religious significance. These sites are typically outdoors and are prepared for visitors after archaeological excavations are completed [8]. Pottery and other smaller artifacts found on the excavation site or nearby may also be put on display, as well as models or reconstructions of predicted historical structures. To supplement the artifacts visible on site, signs, guidebooks, audio-guides, games [18,22] or augmented reality (AR) applications may be created to support the visitors understanding of the place. This additional material typically includes scientific interpretations of who built the structures that are visible on site and when, and for what purpose. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 349–358, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_32
350
S. Laato and A. Laato
One particularly interesting new solution in this context are the global location-based games (LBGs), also called AR-games or pervasive games, which augment a virtual world on top of the real world. These games can transform physical PoIs into virtual PoIs and harness the real world as a playground. Many LBGs use virtual PoIs as part of their gameplay, but currently only in three games, Ingress, Pok´emon GO and Harry Potter: Wizards Unite they are linked to real world locations [20,21,35]. Most archaeological sites that are open for visitors contain these PoIs. A common challenge in the archaeological sites of Hazor, Megiddo and Gezer is that there exists competing views among scholars with regards to interpretations of the excavated structures dating and original purpose. One of the main questions with regards to the observed three locations has been whether or not the great fortification systems with gates mentioned in The First Book of Kings 9:15 can really be dated to the reign of Solomon (i.e. to time of united monarchy) or whether they should be dated a little bit later to the Omride dynasty in the kingdom of Israel. At the heart of this debate is the so-called Low chronology for Iron Age I as suggested by Israel Finkelstein in his two articles [14,15] which challenged the united monarchy proposed by William Dever [10] and others. Afterwards the discussion has profiled mainly in the debate between Finkelstein and Mazar (See, e.g., [16,17,24–26]). Methodological assumptions behind Finkelstein’s low chronology have also received criticism [19]. This discussion highlights how scholars have dated stratigraphic layers differently at archaeological sites and consequently interpreted the origin and purpose of discovered structures in varying ways. Previous studies have shown that games based on virtual PoIs linked to real world objects provide users the opportunity to learn about their environment [21,29], however it has not yet been demonstrated how well these PoIs match real world objects and how large a quantity of important objects they cover. This study aims to answer the following research questions: 1. How well do currently popular global LBGs cover PoIs at archaeological sites in the Levant? 2. What kind of information is given about the PoIs and which scientific interpretations are supported? By providing answers to these questions this study aims to supplement the findings of previous studies [21,29] and consequently increase understanding of the current educational values that global LBGs provide for visitors to archaeological sites, national parks and outdoor museums.
2 2.1
Research Design Selecting the Cases
For answering the research questions, three archaeological sites in the Levant were chosen for observation based on the Bible, First Book of Kings Chapter
Augmented Reality to Enhance Visitors’Experience at Archaeological Sites
351
9 verse 15: “Salomon built Hazor, Megiddo and Gezer”. All three places, Tel Hazor, Tel Megiddo and Tel Gezer have been excavated by archaeologists and are currently open for visitors [27,33]. These sites have ruins of ancient structures which have been discovered in multiple strata such as those dated to Late Bronze Age and Iron Age [34]. Tel Hazor and Tel Megiddo have been declared World Heritage Sites, meaning their conservation has been recognised internationally to be of great importance. The PoI database used in Ingress was chosen for analysis as the database is global [35], the virtual PoIs match real world locations [21] and PoIs are visible for all in the Ingress Intel Map [28]. Besides Ingress, the same PoIs are largely used also in other games such as Pok´emon GO and Harry Potter: Wizards Unite [21]. Furthermore, games based on this database have been found to increase players place attachment [29] providing preliminary evidence towards LBG’s potential for enhancing visitors experience at cultural sites. 2.2
Research Process and Analysis
The three archaeological sites were looked up in the Ingress Intel Map in October 2019. All found PoIs, their title and location were recorded, and based on these characteristics, they were mapped to corresponding real world objects. If the PoI title was in another language than English such as Arabic or Hebrew, it was translated to English. As a comparison and tool for analysis, information of the sites was obtained from the Israel Nature and Parks Authority [27] website as well as major publications on the archaeological findings and their scholarly interpretations. The virtual PoIs found in Ingress were analysed by looking at (1) what kind of a PoI is it? (ruin, sign, model), (2) from which time period or stratum is it from? and (3) which archaeological interpretation does it represent? The virtual PoIs were then roughly compared to the actual visible structures.
3 3.1
Results Tel Hazor
Lead by Yigal Yadin, major archaeological excavations took place at Tel Hazor in the 1950’s which revealed bronze and iron age structures and evidence of both Canaanite and then later Israelite settlement [2,3,37,39]. The site has been of interest to biblical scholars, archaeologists and historians [3] and has been studied together with several other similar ancient ruins in the region [36,39]. The largest individual remaining structure in Tel Hazor is an underground water system which was discovered by Yadin’s later 1968–1969 expeditions, and has been dated to the Iron-age [38]. Similar water systems have been found in several cities on top of mountains from the same time period [38]. Another major structure is a “Salomonic city gate”, dating of which has been discussed by scholars to be either
352
S. Laato and A. Laato
Fig. 1. Satellite map view provided by Google Maps and current Niantic PoIs in Tel Hazor
from the time of Salomon (10th century BCE) or the Omrid dynasty (9th Century BCE) [36]. Also other structures, mostly interpreted as housing, remain on site [12], including a typical 8th century BC Israelite four-room house [13,32]. Figure 1 shows the locations and names of all virtual PoIs (4) of Tel Hazor currently in Ingress. Two of the PoIs, 10 Century BC Salomonic Gate and The Water System- Tel Hazor point to ancient historical artifacts. Yaco ’Bob’The Watchman shows a modern art piece depicting an ancient Israelite Guard and the final PoI Tel Hazor-National Park is a reference to the entire site. It is evident these PoIs only lightly touch the historical depths of this location, as multiple structures such as the Israel four room-house are not included as virtual PoIs and the information of the existing PoIs is limited. For example, with regards to the 10 Century BC Salomonic Gate, only the interpretation of Yadin and Ben-Tor is shown. 3.2
Tel Megiddo
Tel Megiddo is a world heritage site located on a mountain in the middle of the Jezreel plains and has been featured a few times in pop culture due to it being referenced in an eschatological context in The Revelation of John when talking about Armageddon and the apocalypse [7]. Tel Megiddo features a 35 m deep water system [23] from the Iron Age period, similar to those found in Hazor and Gezer [38] as well as the ruins of a great temple dated to the early Bronze Age (3000 BCE) [1]. Tel Megiddo has arguably the most detailed data in all of Levant for the period from Late Bronze (3000 BCE) to Iron Ages (750BCE) and has thus unparalleled historical value [34]. In Fig. 2 all found virtual PoIs (8) in Tel Megiddo are depicted. Seven of them are named in English and the final one is in Hebrew, which represents a city gate. Three of the PoIs are signs: Tel Megiddo, Tel Megiddo World Heritage Site and Tel Megiddo National Park. Then there are three scultupres: Battle Ready Chariot Sculpture, Chariot Sculpture and Salomon’s Stabled Horse. Unlike in Hazor, the
Augmented Reality to Enhance Visitors’Experience at Archaeological Sites
353
Fig. 2. Satellite map view provided by Google Maps and current Niantic PoIs in Tel Megiddo.
PoIs in Megiddo do not offer direct references to ancient structures except for the city gate. For example, the “Salmonic Gate” or the water system are not PoIs and neither is the ruin of the Bronze Age (3000 BCE) Canaanite temple [1]. Thus, we conclude that virtual PoIs in Megiddo provide hardly any connection to the historical depths of the location. 3.3
Tel Gezer
Ancient Gezer was an important strategic area due to its geographical location guarding Via Maris, Valley of Aijalon and the trunk road leading to Jerusalem [11]. Excavations began at the site in 1902 lead by Robert Alexander Stewart Macalister and lasted seven years [9]. More excavations have since taken place such as Alan Rowe’s six-week campaign in 1934 and The Hebrew Union College Excavations in 1964–1966 [9]. Structures from multiple strata dating to Late Bronze Age and Iron Age have been discovered from the location [11,30,39] including a Salomonic four-entryway city gate, a similar which is also found in Tel Hazor and Tel Megiddo [11]. However, the Gezer gate is a bit different in it being based on a square plan instead of a rectangular one [36]. A Canaanite water tunnel has also been found in the ruins along with Masseba stone structure and many other smaller structures. In Fig. 3 we see the Ingress PoIs that are currently located at Tel Gezer. These PoIs are in Hebrew and are roughly translated, going clockwise as (1) Sheikh Aljazarli’s Tomb, (2) Area of Worship: Masseba Site, (3) Salomon Gate, (4) Canaanite Gate, (5) Water System, (6) Map of the vicinity of Tel Gezer and (7) Gezer Calendar. Compared to the other two observed locations, Tel Gezer has the largest quantity of virtual PoIs representing ancient structures.
354
S. Laato and A. Laato
Yet, for example, the debate regarding the chronology of visible structures is not visible. Similarly to virtual PoIs in Tel Hazor, Finkelstein’s Iron Age low chronology [14,15] is dismissed.
Fig. 3. Satellite map view provided by Google Maps and current Niantic PoIs in Tel Gezer
4 4.1
Discussion Key Findings
We summarize our findings with three points: – Ingress PoIs represent only a fraction of the visible archaeological structures in all three observed sites. – Scholars have proposed varying interpretations regarding observed structures and their chronological origins. This debate is not visible in the observed PoIs. – PoIs represent structures from different strata, but are all displayed on the same level. Thus, the visitor does not get support from the game in understanding the chronology of their observations. Regardless, games based on the Ingress PoIs can bring relevance to these sites by helping players find the site, guiding players through the site (though currently sub-optimally due to the limited number and accuracy of existing virtual PoIs), offering players short snippets of information regarding the real world locations virtual PoIs represent, and providing players a fun game and motivation to travel to these sites. 4.2
How AR Can Support Archaeological Sites in the Future?
To utilize AR more optimally in archaeological locations, more cooperation between technology developers and scholars is needed. Currently archaeologists are not harnessing AR to its fullest potential and the observed game Ingress is not using the scholarly information of archaeologists adequately. Base on the results of this study, we propose three areas of improvement.
Augmented Reality to Enhance Visitors’Experience at Archaeological Sites
355
Increasing the Quality and Quantity of Virtual PoIs. As Ingress PoI submission and review are crowdsourced, there is variance in the quality of PoIs depending on the area [20,21]. Virtual PoIs in location-based AR games should cover the key real life PoIs on the site to support learning of real history. Ingress currently allows PoIs to have a short description and photos in addition to their name and location, which can be used to contain relevant information. We propose that with world heritage sites, such as Tel Megiddo, location-based AR game developers should cooperate with local authorities and scholars to create virtual PoIs that better serve the location. Informing Visitors of Differing Scholarly Interpretations. As scholars sometimes disagree on interpretations of archaeological evidence, it is important to accurately present evidence of all cases for visitors. Table 1 shows the interpretations of Hazor findings from strata X and IX according to two different schools: Yadin and Ben-Tor and Finkelstein. When reconstructing structures from these layers, only one reconstruction can be presented at the correct location using traditional means. AR solves this issue as the reconstructions are digital and can be switched at will. For example, a broken ancient wall, which depending on the theory was either an arc or just a wall, can be displayed as both to the user using AR. Table 1. Comparison of chronological and historical explanations of ruins and artifacts discovered in Strata X and IX in Tel Hazor Yadin and Ben-Tor Stratum Dating
Historical setting
X
10th Century BCE
Salomon
IX
Late 10th, Early 9th
Israelite
Finkelstein Stratum Dating
Historical setting
X
Early 9th Century BCE
Israel: Omrides
IX
First half of 9th Century BCE Israel: Omrides
Differentiating Strata and Visualizing Lost Information. Several excavations such as those that have taken place in Tel Hazor [3–6,8] have revealed structures from multiple time periods across many strata. Furthermore, when archaeologists wish to dig deeper to reveal older structures, they are sometimes forced to remove strata on top. As a result of this process, many excavation sites are left with structures from different strata to display. AR gives the possibility of visiting the same physical place multiple times, each time with a different era lens through which the site can be looked at [31]. An example of this is visualized in Fig. 4 where an observer can see layers of destroyed history through the AR lens.
356
S. Laato and A. Laato
Fig. 4. Visualizing how destroyed layers, for example, late and early iron age, can be reconstructed and displayed in AR
4.3
Negative Effects of Using AR at Archaeological Sites
Gamification of archaeological sites through AR can also have negative consequences. First, having PoIs at world cultural heritage sites might attract unwanted attention. Players with no regard for the site could in worst cases cause damage to the place if they focus too heavily on the game and dismiss real life guidance on how to behave. Second, as these sites can be the target of pilgrims and serious contemplation, roaming LBG players could disturb the atmosphere. On the flip side, this atmosphere might also teach the players to appreciate cultural heritage more. Finally, over-gamifying places might steal visitors attention too much away from the actual real life sights. 4.4
Limitations
This study is limited by its scope in both the observed AR-solutions and observed locations. We only looked at one AR solution, the location-based game Ingress, and only three major archaeological sites, all from the same geographical area. This study could also be expanded to other AR solutions besides games, and to archaeological sites in other parts of the world. Furthermore, empirical evidence on people playing Ingress while visiting these locations could increase the understanding of how well the existing technologies serve these sites.
5
Conclusions
The positive side of Ingress and other similar apps is, that as they are global, users are not required to download a new museum app for each site they visit, but can instead use the same app everywhere. Despite the observed PoIs of Ingress being linked to real world locations, they present only a fraction of the historically rich structures at Tel Hazor, Tel Megiddo and Tel Gezer. The possibilities
Augmented Reality to Enhance Visitors’Experience at Archaeological Sites
357
of LBGs and AR for supplementing the visitors experience at archaeological sites are greater than what the existing solutions offer. Collaboration between AR content designers and archaeological scholars should be increased to enable visitors to accurately learn real history via the technologies.
References 1. Adams, M.J., Finkelstein, I., Ussishkin, D.: The great temple of early bronze one megiddo. Am. J. Archaeol. 118(2), 285–305 (2014) 2. Bechar, S.: Tel hazor: a key site of the intermediate bronze age. Near East. Archaeol. 76(2), 73–75 (2013) 3. Ben-Ami, D.: The iron age I at tel hazor in light of the renewed excavations. Isr. Explor. J. 148–170 (2001) 4. Ben-Tor, A.: Hazor and the chronology of northern israel: a reply to Israel Finkelstein. Bull. Am. Sch. Orient. Res. 317(1), 9–15 (2000) 5. Ben-Tor, A.: Hazor in the tenth century BCE. Near East. Archaeol. 76(2), 105–109 (2013) 6. Ben-Tor, A.: The renewed hazor excavations. Near East. Archaeol. 76(2), 66–67 (2013) 7. Cline, E.H.: The Battles of Armageddon: Megiddo and the Jezreel Valley from the Bronze Age to the Nuclear Age. University of Michigan Press, Ann Arbor (2002) 8. Cohen, O.: Conservation and restoration at hazor. Near East. Archaeol. 76(2), 118–122 (2013) 9. Dever, W.G.: Excavations at gezer. Biblical Archaeologist 30(2), 47–62 (1967) 10. Dever, W.G.: Monumental architecture in ancient Israel in the period of the united monarchy. Studies in the Period of David and Solomon and other Essays, pp. 269– 306 (1982) 11. Dever, W.G.: Solomonic and Assyrian period ‘palaces’ at Gezer. Isr. Explor. J. 35(4), 217–230 (1985) 12. Faust, A.: Socioeconomic stratification in an Israelite city: Hazor vi as a test case. Levant 31(1), 179–190 (1999) 13. Faust, A., Bunimovitz, S.: The four room house: embodying iron age Israelite society. Near East. Archaeol. 66(1–2), 22–31 (2003) 14. Finkelstein, I.: The date of the settlement of the philistines in canaan. Tel Aviv 22(2), 213–239 (1995) 15. Finkelstein, I.: The archaeology of the united monarchy: an alternative view. Levant 28(1), 177–187 (1996) 16. Finkelstein, I.: Hazor and the north in the iron age: a low chronology perspective. Bull. Am. Sch. Orient. Res. 314(1), 55–70 (1999) 17. Finkelstein, I., Piasetzky, E.: The iron age chronology debate: is the gap narrowing? Near East. Archaeol. 74(1), 50–54 (2011) 18. Keil, J., Pujol, L., Roussou, M., Engelke, T., Schmitt, M., Bockholt, U., Eleftheratou, S.: A digital look at physical museum exhibits: designing personalized stories with handheld augmented reality in museums. In: 2013 Digital Heritage International Congress (DigitalHeritage), vol. 2, pp. 685–688. IEEE (2013) 19. Kletter, R.: Chronology and united monarchy. A methodological review. Zeitschrift des Deutschen Pal¨ astina-Vereins, (1953-), 120(1), 13–54 (2004). (43 pages) 20. Laato, S., Hyrynsalmi, S.M., Paloheimo, M.: Online multiplayer games for crowdsourcing the development of digital assets. In: International Conference on Software Business, pp. 387–401. Springer (2019)
358
S. Laato and A. Laato
21. Laato, S., Pietarinen, T., Rauti, S., Laine, T.H.: Analysis of the quality of points of interest in the most popular location-based games. In: Proceedings of the 20th International Conference on Computer Systems and Technologies, pp. 153–160. ACM (2019) 22. Laine, T.H., Sedano, C.I., Sutinen, E., Joy, M.: Viable and portable architecture for pervasive learning spaces. In: Proceedings of the 9th International Conference on Mobile and Ubiquitous Multimedia, p. 1. ACM (2010) 23. Lamon, R.S.: The Megiddo Water System, vol. 32. University of Chicago Press, Chicago (1935) 24. Mazar, A.: Iron age chronology. a reply to I. Finkelstein. Levant 29(1), 157–167 (1997) 25. Mazar, A.: The spade and the text: The interaction between archaeology and Israelite history relating to the tenth-ninth centuries BCE. In: PROCEEDINGSBRITISH ACADEMY, vol. 1, pp. 143–172. Oxford University Press (2007) 26. Mazar, A.: Archaeology and the biblical narrative: the case of the United Monarchy. BZAW 405; Berlin: de Gruyter (2010) 27. Nature, I., Authority, P.: National parks and nature reserves. Used between 10th of October and 5th of November (2019). https://www.parks.org.il/en/about/ 28. Niantic: Ingress intel map. Used between 10th of October and 22th of October (2019). https://intel.ingress.com/intel 29. Oleksy, T., Wnuk, A.: Catch them all and increase your place attachment! the role of location-based augmented reality games in changing people-place relations. Comput. Hum. Behav. 76, 3–8 (2017) 30. Ortiz, S., Wolff, S.: Guarding the border to Jerusalem: The iron age city of Gezer. Near East. Archaeol. 75(1), 4–19 (2012) 31. Petrelli, D.: Making virtual reconstructions part of the visit: an exploratory study. Digit. Appl. Archaeol. Cult. Herit. 15, e00123 (2019). http://www.sciencedirect.com/science/article/pii/S2212054819300219 32. Shiloh, Y.: The four-room house its situation and function in the Israelite city. Israel Explor. J. pp. 180–190 (1970) 33. Stern, E., Aviram, J.: The New Encyclopedia of Archaeological Excavations in the Holy Land, vol. 5. Eisenbrauns, Winona Lake (1993) 34. Toffolo, M.B., Arie, E., Martin, M.A., Boaretto, E., Finkelstein, I.: Absolute chronology of megiddo, israel, in the late bronze and iron ages: high-resolution radiocarbon dating. Radiocarbon 56(1), 221–244 (2014) 35. Tregel, T., Raymann, L., G¨ obel, S., Steinmetz, R.: Geodata classification for automatic content creation in location-based games. In: Joint International Conference on Serious Games, pp. 212–223. Springer (2017) 36. Ussishkin, D.: Was the “Solomonic” city gate at megiddo built by king solomon? Bull. Am. Sch. Orient. Res. 239(1), 1–18 (1980) 37. Ussishkin, D.: Notes on the middle bronze age fortifications of hazor. Tel Aviv 19(2), 274–281 (1992) 38. Weinberger, R., Sneh, A., Shalev, E.: Hydrogeological insights in antiquity as indicated by canaanite and Israelite water systems. J. Archaeological Sci. 35(11), 3035– 3042 (2008) 39. Yadin, Y.: Solomon’s city wall and gate at Gezer. Israel Explor. J. 8(2), 80–86 (1958)
Design and Performance Analysis for Intelligent F-PMIPv6 Mobility Support for Smart Manufacturing Byung Jun Park and Jongpil Jeong(B) Department of Smart Factory Convergence, Sunkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea {bjunpark,jpjeong}@skku.edu
Abstract. In this paper, we propose a new mobility management network called i-FP, which will be used for smart factories. i-FP was created to address existing local mobility management issues in legacy frameworks. To allow MNs (Mobile Node) to move from one domain to another, i-FP uses the three network entities of LFA (Local Factory Anchor), FAG (Factory Access Gateway), and MN as an extension concept of the PMIPv6. In i-FP, the three network entities can reduce the handover latency of the MN. In addition, i-FP uses an IP header swapping mechanism to avoid traffic overhead and improve network throughput. To evaluate the performance of i-FP, we measure and evaluate the three methods by measuring the new frameworks i-FP, HMIPv6 and PMIPv6, which are legacy protocols for local mobility management. Through the entire analysis, i-FP shows superior performance to other network methods used in smart factories. Keywords: Smart manufacturing · i-FP Factory access gateway · F-PMIPv6
1
· Local factory anchor ·
Introduction
In this paper, as the demand for new wireless networks for smart factories explosively increases and new technologies are developed, various hierarchical mobility frameworks are emerging. In the wireless network framework, user mobility is generally divided into domain movement and local domain movement. These two mobility correspond to two protocols, the global mobility protocol [11] and the local mobility protocol [7–9]. While the global mobility protocol maintains the reachability of user movement across a wide area beyond the domain, the local mobility protocol supports handover of restricted domains within the domain. When a user within network connects another network, the network resends the traffic the original domain in order to manage the network which accesses from outside for the first time, and uses the global motility protocol [1,2]. Then, the c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 359–367, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_33
360
B. J. Park and J. Jeong
local mobility protocol delivers the traffic within the domain, and guarantees that it is successfully transmitted by the user. Supported by the global mobility and the local mobility, users can be flexibly provided with the mobility at a high performance, and can enjoy the communication. The global mobility protocol such as MIP, HIP [3], HMIPv6 (Hierarchical Mobile IPv6) [4], F-PMIPv6 (Fast Proxy Mobile IPv6) [5,16] and etc. supports the mobility management of users. This study focuses on the part of this global mobility and the local mobility. The existing PMIPv6 (Proxy Mobile IPv6) [6] or HMIPv6 protocol requires that the traffic be used to determine the destination address in the network, and that the traffic must be used to reach the top-level gateway in the network. This is an inefficient way to operate since the same applies to communication between neighboring nodes. In order to improve the problems which are contained in the existing protocol, this study suggests a new mobile network protocol, and names it as i-FP which means ‘intelligent Fast PMIPv6. When connecting to the web for smart factories [17–21], i-FP improves the performance of existing local mobility protocols. At the same time, it uses the IP header swapping technology to avoid the network traffic overhead. Will be conducted a performance evaluation which compares i-FP with other frameworks related to the local mobility protocol in order to evaluate the performance of iFP which is newly suggested. On the basis of the result, it will be proved that i-FP is the most effective skill of the local domains. The paper consists of : Sect. 2 discusses the associated studies, and Sect. 3 discusses the architecture and operating procedures of the proposed technique. Section 4 performs an evaluation of the performance of the proposed technique, and Sect. 5 concludes with a conclusion based on the results of the performance assessment.
2
Related Work
FPMIPv6 [5] applied FMIPv6 fast handover scheme in PMIPv6 environment. It detects mobile signal before MN (Mobile Node) moves and detects pMAG (previous Mobile Access Gateway) to prepare handover of the MN by using HI (Handover Initiate) and HAck (Handover Acknowledgment) message for information transmission of MN. At this time, a tunnel is formed between pMAG and nMAG, and data from LMA (Local Mobile Anchor) is buffered from pMAG to nMAG while communication with MN is disconnected. When the MN connects to the nMAG and the communication is connected, the nMAG transmits packet data buffered by the nMAG to the MN, thereby preventing packet data loss caused by disconnecting the connection at the time of handover, thereby maintaining communication. The p-AN receiving the report message informs pMAG of the movement of MN by transmitting HI message including MN-ID and n-AN ID. The pMAG transmits the HI message containing information of the MN and the LMA to the nMAG. After the pMAG receives the HAck message, it forms a bidirectional tunnel between pMAG and nMAG. From the point of time when the bidirectional tunnel
Design and Performance Analysis for Intelligent F-PMIPv6
361
is established, pMAG transmits the packet that the LMA sent to the MN to the nMAG, and the nMAG puts the packet received the pMAG into the buffer. When the MN is connected to the nMAG after the L2 handover, the nMAG transmits the stored packet to the MN and transmits a PBU (Proxy Binding Update) message to the LMA for binding the MN. Therefore, when MN moves to nMAG and Predictive mode fails and nMAG is connected, nMAG sends HI message to pMAG, and pMAG that receives HI message sends HAck message to nMAG in response. As in the Predictive mode, a bidirectional tunnel is formed between pMAG and nMAG, and nMAG buffers and stores packets of MN. In addition, nMAG sends a PBU message to the LMA for binding of the MN. The LMA sends the PBA message to nMAG to complete the binding process. The contents are shown in Fig. 1.
Fig. 1. Handover operation of the PMIPv6.
PMIPv6 [2] is a protocol that supports the mobility of IP-based signaling mobile nodes without involvement or utilization of mobile nodes for mobility support in the network. All mobility signaling and routing state settings are made by the mobility entity in the network. The main functional entities in this scheme are LMA and MAG. The LMA is responsible for the reachability state of the mobile node and the topology anchor point of the mobile node HNP (Network Prefix). The MAG is an access link to which the mobile node is connected and performs mobility management on behalf of the mobile node. The role of the MAG is to detect that the mobile node is entering and leaving the access network and initiating the binding registration to the mobile node’s LMA. HMIPv6 [4] is a method proposed by the IETF as one of the methods to reduce the handover delay that occurs when a mobile node moves in MIPv6. The access network is hierarchically structured in HMIPv6 to solve this problem. HMIPv6 can reduce signaling costs due to user mobility and scalability in the
362
B. J. Park and J. Jeong
growing network in managing local mobility, and has separated global mobility management and local mobility. Global mobility MIP [10], DMA [11], PMIPv6 [6,13–15] to have. The hierarchical structure is based on a mobility agent, and all routers must be involved in mobility signaling. Instead, a mobility agent as a topology anchor point and an access router as an external agent are introduced.
3 3.1
Intelligent Hierarchical Mobility Support for Smart Manufacturing Network Architecture
In order to gain better performance than HMIPv6 and PMIPv6, i-FP uses the major three entities of PMIPv6 as what is, that is, LFA (Local Factory Anchor), FAG (Factory Access Gateway), and MN, and was given additional functions to improve its performance. The Fig. 2 shows the system structure of i-FP.
Fig. 2. Concept of i-FP used in smart factory communication
LFA is the gateway router of a local domain. LFA plays the same role as the proxy HA (Home Agent) for MN. When MN moves in the local domain, MLA receives the traffic on behalf of MN, and transmits the traffic to the link where the MN is located. In order to make it be realized, i-FP uses two kinds of addresses - RCoA (Regional Care of Address) and LCoA (on Link Care of Address) - which HMIPv6 uses as a way to manage the MN. RCoA is the address
Design and Performance Analysis for Intelligent F-PMIPv6
363
which is gained from the NN when the MN enters the local domain for the first time. RCoA plays a role as an identification card which can prove the identity of the MN in the local domain. The MN uses RCoA as a location indication to update HA [12] or peers in communicating [3]. When the MN moves in the local domain, RCoA is fixed without changing. 3.2
Operation Procedure
The addition of the concept for the cloud to the smart factory will require a network architecture configuration in Fig. 2. For the architecture that makes up cloud-based fog computing, the network that connects to the cloud must be configured, and there must be an application that responds to the configured network. The cloud server is configured using the OpenStack, and the fog computing is positioned at the edge of the server to configure the cloud-based fog computing. IoT data will also be stored in real time on configured cloud storage. When the necessary gateways and servers are configured between real time storage, they will act as controllers by configuring nodes at the application end. Finally, the IoT sensing data is stored through the gateway and stored and analyzed by the server. In the application layer that applies the stored and analyzed data, a server is configured for each node, and a real time processing application becomes possible. In this section, performance evaluation is performed by mathematical modeling of the two proposed methods, HMIPv6 and PMIPv6, and the newly proposed i-FP method. Under the same conditions, we analyzed how much each technique can reduce the cost generated by the network. Each cost is defined by the hops distance in terms of message size and bandwidth. According to this definition, the cost of router processing is not considered, and for the analytical model, Table 1 defines the parameters for the mobile protocol used for performance analysis.
4 4.1
Performance Analysis Number of Routing Hop
The first performance analysis is the traffic routing hops, a measure of the degree of transmission delay. One important objective of i-FP is to provide an optimal routing path for traffic within the domain. This section shows the number of routing hops of intra-domain traffic in three protocols. At the same time, the transmission delay of the intra-domain traffic is compared, and the number of routing hops of the intra-domain traffic of the three protocols is shown. The number of routing hops for intra-domain traffic is the same for all three protocols. When a packet is sent from the CN, the LFA as a domain’s GR (Global Router) receives the packet list. At that time, the GR sends the packet to the AR to which the new MN is connected. HM IP v6 = HCN −GR + HGR−AR + HAR−M N HInter
(1)
364
B. J. Park and J. Jeong Table 1. Parameter values for performance analysis Parameter
Value Parameter
Value
H CN TO GR
2
H AR1 TO AR2
1
H GR TO AR
1
i-FP BU
96
H AR TO MN
1
i-FP BA
96
H MN1 TO AR1
1
i-FP RouterSol
44
H AR1 TO GR1
1
i-FP RouterAdv
68
H GR1 TO AR2
1
i-FP BEU
142
H AR2 TO MN2
1
HMIPv6 RBU
80
HMIPv6 RBA
60
PMIPv6 PBA
88
HMIPv6 RouterSol
44
PMIPv6 RouterSol
44
HMIPv6 RouterAdv 68
PMIPv6 RouterAdv 68
PMIPv6 PBU
88
T FAG TO LFA
100
D L2
100
W FAG TO LFA
300
A
10
T MN TO LFA
200
T MN TO MAP
100
L1 PHeader
100
W MN TO MAP
300
H MAP TO MN
2
H LFA TO FAG
1
U
10000
R
1000
P M IP v6 HInter = HCN −GR + HGR−AR + HAR−M N
(2)
i−F P HInter = HCN −GR + HGR−AR + HAR−M N
(3)
The AR sends the packet to the MN via the wireless link. Therefore, the number of routing hops can be expressed as 5. Hx−y means the number of routing hops of node X and node Y. In HMIPv6 and PMIPv6, packets are required to be transmitted by GR. The GR encapsulates the packet and transmits it to the current location of the MN. Therefore, intra-domain traffic in HMIPv6 and PMIPv6 causes triangular routing problems. However, in i-FP, the number of routing hops in the intra domain is different from the existing method. If the MN in i-FP tries to send the packet to another MN, the packet arrives at AR1 and AR1 sends the traffic to AR2 where MN2 is located. Finally, AR2 transmits the packet to MN2. Since the packet is transmitted on the shortest path, the number of routing hops of i-FP is smaller than that of HMIPv6 and PMIPv6. (1), (2), and (3) represent the number of inter-domain routing hops of HMIPv6, PMIPv6, and i-FP, and (4), (5), and (6) represent the number of routing hops in a domain. HM IP v6 = HM N 1−AR1 + HAR1−GR1 + HGR1−AR2 + HAR2−M N 2 HInf ra
(4)
Design and Performance Analysis for Intelligent F-PMIPv6
4.2
365
P M IP v6 HInf = HM N 1−AR1 + HAR1−GR1 + HGR1−AR2 + HAR2−M N 2 ra
(5)
i−F P HInf ra = HM N 1−AR1 + HAR1−AR2 + HAR2−M N 2
(6)
Signaling Cost
The signaling cost is the amount of packets used when the MN updates its location. The signaling cost includes a RS (Router Solicitation) message, a BU (Binding Update) message, and a BA (Binding Acknowledgment) message. The cost of the protocol signaling is also caused by the changing MN of the network. This section discusses the cost of protocol signaling Cs which is an additional cost from the handover procedure. Especially Cs Four parameters are used. P is Unit time t Is the probability that one handover will occur. Cs =
∞
n ∗ pn ∗ (1 − p) ∗ m ∗
n=1 ∞
s t
(7)
RBU + RBA + RS + RA (8) t n=1 Handover probability p Due to, MN the probability of staying in mn = 1−p for express. s is the total size of protocol packets used in the handover procedure. M is the number of mobile nodes in the domain t is the unit time. N the signaling cost of HMIPv6, PMIPv6, and i-FP can be calculated as (8), (9), (10) Is expressed. CSHM IP v6 =
CSP M IP v6
=
∞
n ∗ pn ∗ (1 − p) ∗ m ∗
n ∗ pn ∗ (1 − p) ∗ m ∗
n=1
CSi−F P = 4.3
∞ n=1
RBU + RBA + RS + RA t
n ∗ pn ∗ (1 − p) ∗ m ∗
RS + RA t
(9)
(10)
Numerical Results
Performance evaluation was performed to verify the performance difference between HMIPv6, PMIPv6 and i-FP with various conditions, and numerical results were obtained. Routing hops, traffic signaling cost, handover delay, and traffic overhead. We analyze the numerical results of each evaluation method in the order mentioned. Figure 3 shows the average number of routing hops for the three protocols. i-FP has a minimum number of routing hops, and the average number of routing hops of PMIPv6 is smaller than that of HMIPv6. δ represents the ratio of the intra-domain traffic Finter divided by the sum of the intra-domain traffic Finter and the intra-domain traffic Finter , which means δ = Fintra /(Finter + Fintra ). In the figure, we can see that the average number of routing hops of i-FP is lower than the other two protocols.
366
B. J. Park and J. Jeong
Fig. 3. The average number of routing hops
5
Conclusion
This paper proposed the new technique of i-FP based on PMIPv6 network, which improves the limitation of the existing techniques. This research analyzed and evaluated the total expenses of HMIPv6, PMIPv6, and i-FP, and proved that the i-FP is the skill which has the excellent cost efficiency, because the cost of packet data transmission, handover and signaling, and the traffic overhead is the lowest among other methods in the intra domains. In addition, this study confirmed that for the i-FP the data loss is small and the delay time is almost none, and that the i-FP is the mobile network protocol framework which also supports the handover between domains. The i-FP improves the problems of the existing techniques, and its cost is comparatively low as well so that its level of satisfaction could be high. This can be the foundation to judge that the i-FP is the most appropriate solution to the local mobility mobile network environment. And additional studies are going to be carried out to analyze its performance in comparison with the other existing techniques which have not been considered yet. Acknowledgment. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1A6A3A11035613). This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2020-2018-0-01417) supervised by the IITP (Institute for Information & communications Technology Promotion). This work was supported by the Industrial Cluster Program funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea) and the Korea Insdustrial Complex Corporation [Project Number: SKN19ED].
Design and Performance Analysis for Intelligent F-PMIPv6
367
References 1. Deering, S., Hinden, R.: Internet Protocol, Version 6, NTWG RFC 2460 (1998) 2. Narten, T., Nordmark, E., Simpson, W., Soliman, H.: Neighbor Discovery for IP version 6, NTWG RFC 4861 (2007) 3. Moskowitz, R., Nikander, P., Jokela, P., Henderson, T.: Host identity protocol, IETF RFC 5201 (2008) 4. Soliman, H., Castelluccia, C., ElMalki, K., Bellier, L.: Hierarchical mobile IPv6 (HMIPv6) mobility management. IETF RFC 5380 (2008) 5. Yokota, H., Chowdhury, K., Koodli, R.: Fast Handovers for Proxy Mobile IPv6, IETF RFC 5949 (2010) 6. Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., Patil, B.: Proxy mobile IPv6, IETF RFC 5213 (2008) 7. Lim, T., Yeo, C., Lee, F., Le, Q.: TMSP: terminal mobility support protocol. IEEE Trans. Mobile Comput. 8(6), 849–863 (2009) 8. Valko, A.: Cellular IP: a new approach to internet host mobility. ACM SIGCOMM Comput. Commun. Rev., vol. 29 (1), pp. 50–65 (1999) 9. Ramjee, R., Varadhan, K., Salgarelli, L., Thuel, S., Wang, S., La Porta, T.: HAWAII: a domain-based approach for supporting mobility in wide-area wireless networks. IEEE/ACM Trans. 10(3), 396–410 (2002) 10. Das, S., Misra, A., Agrawal, P.: TeleMIP: telecommunications enhanced mobile IP architecture for fast intradomain mobility. IEEE/ACM Trans. IEEE Personal Commun. FAG. 7(4), 50–58 (2000) 11. Saha, D., Mukherjee, A., Misra, I., Chakraborty, M.: Mobility support in IP: a survey of related protocols. IEEE Pers. Commun. IEEE Netw. 18(6), 34–40 (2004) 12. Johnson, D., Perkins, C., Arkko, J.: Mobility support in IPv6. In: Proceedings of the IETF RFC 3775 (2004) 13. Kempf, J.: Problem statement for network-based localized mobility management (NETLMM). IETF RFC 4830 (2007) 14. Kempf, J.: Goals for network-based localized mobility management (NETLMM). IEFT RFC 4831 (2007) 15. Vogt, C., Kempf, J.: Security threats tonetwork-based localized mobility management (NETLMM). IETF RFC 4832 (2007) 16. Oh, D.K., Min, S.-W.: A fast handover scheme of multicast traffics in PMIPv6. ICS Inform. Mag. 36(3), 208–213 (2011) 17. Lee, J.H., Jeong, J.P.: A novel multicasting-based mobility management scheme in industrial mobile networks towards smart manufacturing. IEEE IEMCON 2019, 833–838 (2019) 18. Kim, J.H., Jeong, J.P.: Design and performance analysis of an industrial iot-based mobility management for smart manufacturing. IEEE IEMCON 2019, 471–476 (2019) 19. La, S.H., Jeong, J.P.: On intelligent hierarchical F-PMIPv6 based mobility support for industrial mobile networks. In: Proceedings of the 16th International Conference on Mobile Systems and Pervasive Computing (MobiSPC), Vol. 155, pp. 169-176, August 2019 20. Park, D.G., Jeong, J.P.: A novel SDN-based cross handoff scheme in industrial mobile networks. In: Proceedings of the 6th International Symposium on Emerging Inter-networks, Communication and Mobility (EICM), pp. 642-647, August 2019 21. Kim, J. A., Park, D.G., Jeong, J.P.: Design and performance evaluation of costeffective function-distributed mobility management scheme for software-defined smart factory networking. J. Ambient Intell. Humanized Comput. 1-17 (2019)
Development of Trustworthy Self-adaptive Framework for Wireless Sensor Networks Sami J. Habib(&) and Paulvanna N. Marimuthu Computer Engineering Department, Kuwait University, P.O. Box 5969 Safat, 13060 Kuwait City, Kuwait [email protected]
Abstract. Wireless sensor networks (WSN) deployed for outdoor monitoring face many problems due to harsh external environment. Transmission loss is one such problem under weather extremities, which may induce erroneous decisions or a complete data loss. We have developed a self-adaptive trustworthy framework, which utilizes self-awareness of the environment and trustworthiness of the sensor to select alternate transmission channel with suitable transmission powers to manage the current environmental conditions. Each sensor channel is partitioned with varying transmission powers to boost the received signal strength and thereby ensuring an improved data delivery. In this paper, we have selected temperature and wind velocity as the environmental parameters to monitor, as their combined extremities produce unfavorable conditions for wireless transmission at 2.4 GHz. The framework has ensured the trust of the communicating sensor by checking its retransmission history and battery performance, as the selected environment parameters have direct influence on battery lifespan and quality of data delivery. Our framework has devised the possible impacts due to the combined effects of the selected weather extremities into four categories as no-loss, sub minimal loss, minimal loss and medium loss to partition the channel accordingly with 0%, 4%, 6% and 10% increased transmission powers. Our experiments on sensors tested under two sets of environmental data show an average of 5% improvement in data delivery after redesigning the data transmission channel; however, with 2% increase in battery consumption due to the gusty environments. Keywords: Self-adaptive Self-aware Battery management Transmission loss Trust management Uncertain environment Wireless sensor networks
1 Introduction Wireless sensors deployed for outdoor monitoring face many environmental challenges while transmitting data wirelessly to the sink node. The changes in weather conditions due to rain, snow, fog, dust storm, vegetation, temperature, humidity, and so on, produce a complex impact on wireless transmission, thereby producing transmission errors and data losses. Sometimes the impact may go worsen leading to a complete data loss or a transmission link failure. The partial or complete data loss may be due to severe weather conditions, such as dust storm, dense foggy environment and heavy snow and also the presence of obstacles, such as, a tree or mountain in the transmission path. With gusty © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 368–378, 2020. https://doi.org/10.1007/978-3-030-45691-7_34
Development of Trustworthy Self-adaptive Framework
369
environments, the number of retransmissions gets increased; moreover, the sensors’ battery lifetime also gets decreased with increasing temperatures and retransmissions, thus degrading the sensor functioning and its reliability [1]. One way to guarantee the reliability of a data transmission is to study how much resilient the sensors are to the weather extremes. A very high temperature with strong wind or a high temperature with humid weather is common in Middle Eastern region during the summer season, which forms a threat to the radio frequency signal transmission at 2.4 GHz. Furthermore, the sensor’s battery performance is also suffered at high temperatures, as the chemical activity inside the battery increases with increasing temperatures, which increased the rated current and decreased the battery lifespan. The released energy in battery is based on the conversion of chemical energy into electrical energy and thus, a micro change in chemical compounds by the high temperatures causes an accountable change in battery characteristics [1], thereby influencing the reliability of sensors. Recently, self-awareness plays a major role in maintenance management of engineering infrastructures, where the awareness is implanted into the system by deploying appropriate sensors to monitor the status of various parts of the system; moreover, the notion of operating (external) environment extremities improve the functioning, which is not considered during the design phase. Thus, self-awareness of sensor functionalities under varying environmental conditions are necessary for designing a selfadaptive system, which redesigns the data transmissions for improved data delivery. By adding sensor reliability to the self-adaptive system, the framework may guarantee a reliable transmitted data, as trustworthiness plays a vital role when there is some degree of uncertainty in the sensor functioning [2]. Thus, the awareness of functionalities together with the history of system behaviors aid to build a trustworthy self-adaptive system, which observes its environment, and redesigns the underlying process if the system is found to be trustworthy. In this paper, we have developed a trustworthy, self-adaptive framework for sensor communications, where we have utilized self-awareness to sense the environment and trustworthiness to check the sensor reliability before redesigning data transmissions under uncertain environmental conditions. We have utilized our expertise in redesigning [3–5] and in trust calculations [6–8] to develop the proposed self-adaptive framework, where the efficiency of the trusted transmissions are improved. We have selected temperature and wind speed as the environmental parameters to be aware of, as their combined extremities produce gusty environments which are unfavorable for wireless transmission at 2.4 GHz. We have formulated the trustworthiness model utilizing sensor’s retransmission history and its battery performance to ensure the reliability of the communicating sensor, as these parameters are greatly affected by the increase in temperature and wind speed, and their effects are reflected in the quality of data delivery. We have selected the weather report for the month of February and June [24] as testing scenarios, as these two data reflect the normal and extreme weather conditions respectively in Middle East region. Each sensor channel is partitioned into four with varying transmission powers [9] to boost the signal strength with 0%, 4%, 6% and 10% to manage the no-loss, sub minimal loss, minimal loss and medium loss situations respectively. Our experimental results on analyzing the battery power consumption and the improved data delivery efficiency under extreme conditions have validated our proposed framework with reliable and with 5% increased signal quality transmissions.
370
S. J. Habib and P. N. Marimuthu
2 Related Work The vision of autonomy in procedural designed engineering systems paves the path to self-aware systems, so as to avoid human errors. Kephart and Chess [10] perceived autonomic computing systems, as the self-governing systems, capable of centering towards the administrator defined goals/objectives. Self-awareness found applications in many fields, from service provisioning to architecture design in engineering and computing software systems. A self-aware computing system should be capable of incorporating awareness in various levels of computing system, such as a processor architecture, operating system, compiler, programming library, and application [11]. Marinescu et al. [12] introduced a bio-inspired self-organizing model for distributed systems, where they studied the quantitative and qualitative behavior of systems. Emmanouilidis and Pistofidis [13] developed a self-aware system for condition monitoring and fault detection in a chemical pump plant. In a self-aware machinery application, Write et al. [14] developed a MEMS-based accelerometer for monitoring machine tool vibrations. In health monitoring, Sterritt et al. [15] developed a self-aware system to locate faulty architecture in a pulse monitoring system and Maitland and Arthur [16] presented a self-monitoring system for cardiac rehabilitation. In addition to self-aware, self-adaptive systems are developed to improve the outcome of a system by redesigning the system functionalities according to the operating environment. Silva et al. [17] proposed a self-adaptive, energy-aware sensing scheme for WSNs (e-LiteSense), where the data gathering process is self-adjusted according to the context of WSN. Das et al. [18] presented a self-detection and healing (SDH) mechanism to overcome the misbehaving of nodes due to shrinkage in coverage environmental parameters. Here, the authors utilized the self-detection mechanism to be conscious of environmental parameters and the self-healing mechanism to improve throughout by utilizing various packet success rates with changing modulation and coding schemes under the detected environmental conditions. To our knowledge, none of the above papers dealt with the sensors trustworthiness before implementing the adaptive mechanism. We have studied research works dealing with dust storm and its effect on transmission losses. The rain and dust are the main sources of signal attenuation in the atmosphere, which are known to cause a transmission loss in electromagnetic waves by scattering. Especially, the dust particles cause attenuation and depolarization of the electromagnetic waves propagating in the sandy desert environment, according to [19]. Many of the research works focused at the study of path losses on radio frequencies (RF) around 40 GHz [20]. We found few research works dealing with the study of RF transmission losses in outdoor environment, which is found to be suited for WSN transmission frequency of 2.4 GHz. Rama Rao et al. [21] analyzed the received signal strength at 868, 916 and 2400 MHz in forest, and Mango and Guava vegetation environment, whereas Mujlid [22] carried out an empirical study on transmission losses due to dust storm at RF frequencies at 2.4 GHz. The awareness of environment is an excellent parameter for sensors deployed for outdoor monitoring, as the environment dynamics greatly influence the data transmission. Wireless sensors operate in unattended environments in most of the applications; therefore, the sensors are vulnerable to attacks. Ahmed et al. [23] proposed a trust and energy aware routing protocol (TERP) for the detection and isolation of misbehaving and faulty sensors. Moreover, the protocol was comprised of a routing function
Development of Trustworthy Self-adaptive Framework
371
accounting trust, residual-energy, and hop counts of neighbor sensors in making routing decisions. In paper [1], the authors studied how the temperature variations affect the battery life time and the data delivery efficiency in WSN. In this work, we have developed a trustworthy self-aware and self-adaptive framework for wireless transmissions at 2.4 GHz to improve the data delivery efficiency under the harsh environment with high temperatures and varying wind speeds.
3 Analytical Modeling of Self-adaptiveness Within WSN The wireless sensors deployed outdoor need to be aware of the environment, as the atmospheric variations causes attenuation of the transmitted signal. A normal wireless sensor network monitoring system with an input XðtÞ and an output YðtÞ is converted into a self-aware system by adding a set of environmental parameters E ¼ e1; e2 ; e3 ; . . .; eL to be aware of the environmental uncertainties EðtÞ causing a data error or loss. We have considered a set of unfavorable environmental conditions and their negative impacts on the output signal and we have defined the system output under such uncertain environment as YðtÞ ¼ XðtÞ þ AðtÞ, where the term AðtÞ is the signal loss due to the attenuation and it is defined in the closed interval ð1; 0Þ. The two extremes reflect a dust storm causing complete data loss and an ambient temperature favoring normal transmission. Thus, by adding the environmental variations, the self-aware system may figure out the reason for transmission errors. In view of adding trust of the sensor, we have added a trust factor TrðtÞ to the selfaware system to authenticate the reliability of the sensor, as the sensor battery performance and data quality get deteriorated by high temperatures. The term TrðtÞ is defined in a closed interval ð0; 1Þ and the sensor is defined as trusted if the estimated trust falls within ð0:5; 1Þ interval. Further, we have embedded a redesign scheme by reconfiguring the sensor channel with s transmission channels possessing varying powers to improve the signal strength. The output of the self-adaptive system is defined as YðtÞ ¼ XðtÞ þ CðtÞ; where CðtÞ is added to show the changes in signal power and the term CðtÞ takes on two values: f þ ve; zerog to reflect the improvement over the deteriorated output of the normal system. We defined the normal system, the system with self-aware functionalities, and the system with trust, self-aware, and self-redesign functionalities mathematically under unfavorable environment, as in Eq. (1). The term f ðÞ is added in Eq. (1) to show the notion of the mentioned parameters. 8 > >
> XðtÞ þ AðtÞ þ f ðEðtÞ þ TrðtÞÞ Trusted self - aware system : XðtÞ þ f ðEðtÞ þ TrðtÞÞ CðtÞ Trusted self adaptive system
3.1
ð1Þ
Modeling Sensor Trustworthiness
We have estimated the sensor trust from its own parameters: battery performance and its own retransmission history under high temperature and high wind speed. Sensors
372
S. J. Habib and P. N. Marimuthu
are equipped with alkaline batteries with a nominal voltage of 3 V. The existing battery capacity depends upon the rate at which it is discharged and the sensor would function correctly until the residual voltage drops to 1.8 V. The unfavorable weather might cause the battery discharge at a rate higher than the rated current capacity. Moreover, the increased number of retransmissions on a channel shows its decreased reliability. Thus, we have formulated trust as a linear first order equation as in Eqs. 2 to 3. The first part of Eq. 2 represents the ratio of rated current (x) at normal temperature ðTe0 Þ say 20 C, to increase in rated current at high temperatures. The terms d1 and d2 are added to x to indicate the increase in rated current, and the term d1 is estimated by dividing the difference in temperature between actual Teact and normal Tenor by 10. The dividing factor d is selected to show that every 10 C raise in temperature would significantly affect the discharge curve of the battery. The term d2 reflects the increase in current due to retransmissions, which may happen under high wind velocity bringing dust in air. @i The term @t denotes the variation in current during the transmission interval. The Te second part of Eq. 2 demonstrates the number of retransmissions during a fixed length (l) of time and its impact in lowering the reliability of the sensor.
TrðsiÞ ¼ b
x
@i
!
@t Te0
ð x þ d1 þ d2 Þ
@i
@t Te0
0 B B þ ð1 bÞ B @
1 C C C ð2Þ l A P 1þ Reðt iÞ l
1
i¼1
d1 ¼
Teact Tenor 1 and d2 ¼ jRej x d
ð3Þ
We have listed possible states of uncertainty under normal and high temperatures in Table 1 and we evaluated the outcome of evidence space to generate the trust space in the interval (0,1). The sensor is reliable if the estimated trust falls in ð0:5 Tr 1ÞÞ and it is unreliable in the interval ð0 Tr 0:5Þ. The sensor trust is uncertain if the battery performance is low and the number of retransmissions are high, whereas the trusted performance includes normal functioning of the battery with zero (or negligible) retransmission(s). The uncertain conditions in Table 1 are further analyzed to generate the trust space.
Table 1. Sensor reliability under weather extremities. Temperature Normal Normal Normal Normal Normal Normal Normal
Wind velocity Battery performance Low Normal Low Normal Low Low Low Low High Normal High Normal High Low
Retransmission history Trustworthiness Zero Trusted Positive Uncertain Positive Untrusted Zero Untrusted Zero Trusted Positive Uncertain Zero Uncertain (continued)
Development of Trustworthy Self-adaptive Framework
373
Table 1. (continued) Temperature Normal High High High High High High High High
Wind velocity Battery performance High Low Low Normal Low Normal High Low High Low Low Low Low Low High Normal High Normal
Retransmission history Trustworthiness Positive Untrusted Zero Trusted Positive Uncertain Zero Uncertain Positive Untrusted Zero Uncertain Positive Untrusted Zero Trusted Positive Untrusted
4 Proposed Method The proposed trustworthy self-adaptive framework is illustrated in Fig. 1, which is comprised of three procedures: self-aware, trust-aware and self-redesign. The framework checks the log of parameters, which facilitated to be aware of environmental uncertainty and switch to check the trust of the sensor on any abnormal conditions. The high temperatures increasing the rated current of the sensor battery and the wind speed causing hazy atmosphere are selected as the trust factors in estimating the trust values. If the sensor is found to be trustworthy, then a suitable redesign operation is selected from the list to improve the throughput. Otherwise, the sensor data is rejected by reporting error in transmission. By selecting the channel with increased power, the transmission errors get reduced and the received signal strength is increased under unfavorable environmental condition.
Fig. 1. Proposed framework.
374
S. J. Habib and P. N. Marimuthu
5 Results and Discussion We have coded the trustworthy self-adaptive framework in Java platform and we have considered a typical weather report from Kuwait metrological department [24] as a testing scenario to validate our proposed framework. The weather report is selected in the month of June, as the summer season is prone to high temperatures and gusty winds in Middle Eastern region and the month of February to reflect the normal weather conditions, as shown in Figs. 2(a) and 2(b). In our experimental setup, we have considered a WSN deployed for outdoor monitoring, which is aware of the expected transmission loss caused by the atmospheric uncertainties from the awareness of deployed environment and redesign its transmission channel based on the expected transmission loss. We carried out a set of experiments under normal weather and under gusty weather to analyze the sensor data delivery efficiency.
(a)
(b)
Fig. 2. Weather data a) normal and extreme temperatures, b) wind speeds and its deviations from normal, according to [24].
We considered a sensor within WSN and its transmission channel was corrugated into four, namely CH1, CH2, CH3 and CH4 with normal (0% boosting), 4% boosting, 6% boosting and 10% boosting of transmission powers respectively. The sensor is assumed to be with 75% of battery power while startup so as to check the stability near the threshold voltage (1.8 V) and it was presumed to consume 2% of power for normal transmission; moreover, each retransmission due to uncertain weather added up 1% to the battery consumption. We considered the temperatures ranging from 17 to 25°C and the wind speed ranging from 4 to 10 mph from the weather report in the month of February as normal temperatures and normal wind speeds. The wind speed for the target month (June) and its deviation from the normal maximum by 38% wss demonstrated in Fig. 2(b).
Development of Trustworthy Self-adaptive Framework
375
We carried out a set of experiments on the weather data to find out the sensor trustworthiness and the suitable channel selection for efficient sensor data transmission. The estimated trust values, the residual battery level, and wind speed for the month of June are shown in Fig. 3, where, the wind speed is presented as the percentage deviation from normal wind speed and the trust level and residual battery power are presented in a 0 to 100 scale. Here, the battery lifespan is found to decrease with high temperatures and high winds, as the gusty wind generates hazy atmosphere which increased the number of retransmissions and battery power consumption. The trust estimation is made by accounting the increased rated current consumption due to high temperatures and the checking incorrect functioning of the battery nearer to a threshold voltage (1.8 V) [1]; moreover, the increased power consumption from number of retransmissions due to gusty weather is accounted within trust calculations by varying the number of retransmissions from 1 to 5. The trust level is projected in a 0 to 100 scale, wherein the trusted space is bounded by 50 to 100 and the untrusted space lies from 0 to 49.99. The estimated trust values fell below 50 when the battery power falls below 1.9 to 1.8 due to temperature and high wind effect. The trust value is found to be higher when the wind speed is normal around its mean of 10 mph and the residual battery level is higher (70%). Here, the quality of data delivery during high temperature is compensated by the increased rated current. The behavior of self-adaptive framework with the notion of environmental parameters and sensor’s trust and the battery lifespan is demonstrated in Fig. 4. It is observed from Fig. 4 that the redesigning of sensor transmission is not executed if trust falls under 50%. It is due to the battery votlage level reaching the threshold (1.8 V) and the increased number of retransmissions due to gusty weather. However, with trusted behavior, the self-adaptive system boosts the received signal strength by 4%, 6% and 10% and generated a data delivery with an average of 5% increased signal strength. Since the observed transmission efficiency is modeled with minimal losses, the transmission signal boosting is bounded with a maximum of 10%.
Fig. 3. Sensor’s trust estimation under harsh environments.
376
S. J. Habib and P. N. Marimuthu
Fig. 4. Behavior of self-adaptive framework with trust and transmission efficiency.
6 Conclusion We have developed a self-adaptive trustworthy framework, which is aware of environment uncertainties and sensor’s trust to select alternate transmission channel with increased powers to improve the transmission efficiency. The temperature, and wind velocity have been selected as the environmental parameters to be aware of, as their combined extremities produce unfavorable conditions for wireless transmission at 2.4 GHz. The framework has ensured the trust of the communicating sensors by checking their retransmission history and battery performance. The sensor channel was partitioned with 0%, 4%, 6% and 10% increased transmission powers to transmit under no-loss, sub minimal loss, minimal loss and medium loss conditions respectively. Our experiments on sensors tested under high temperatures with high wind speed showed an average of 5% improvement in data delivery after redesigning the data transmission channel. We are continuing our research with more atmospheric conditions and improving data transmission system with minimal power consumption. Acknowledgement. This work was supported by Kuwait University under a research grant no. QE02/17.
Development of Trustworthy Self-adaptive Framework
377
References 1. Guo, W., Healy, M., Zhou, M.: Experimental study of the thermal impacts on wireless sensor batteries. In: Proceedings of the IEEE International Conference on Networking, Sensing, and Control, 10–12 April, Paris-Evry, France (2013) 2. Khalid, O., Khan, S.U., Madani, S.A., Hayat, K., Khan, M.I., Min-Allah, N., Kolodziej, J., Wang, L., Zeadally, S., Chen, D.: Comparative study of trust and reputation systems for wireless sensor networks. Secur. Commun. Netw. 6, 669–688 (2013) 3. Habib, S.J., Marimuthu, P.N.: Self-organization in ambient networks through molecular assembly. J. Ambient Intell. Humaniz. Comput. 2, 165 (2011) 4. Habib, S., Marimuthu, P.N.: Green synthesis of hospital enterprise network. Int. J. Med. Eng. Inf. 6(1), 26–42 (2014) 5. Habib, S.J., Marimuthu, P.N., Naser, Z.: Carbon-aware enterprise network through redesign. Comput. J. 58(2), 234–245 (2015) 6. Habib, S.J., Marimuthu, P.N.: Reputation analysis of sensors’ trust within tabu search. In: Proceedings of the World Conference on Information Systems and Technologies, 11-13 April, Madeira, Portugal (2017) 7. Boudriga, N., Marimuthu, P.N., Habib, S.J.: Measurement and security trust in wsns: a proximity deviation based approach. Ann. Telecommun. 74(5–6), 257–272 (2019) 8. Habib, S.J., Marimuthu, P.N.: Analysis of data trust through an intelligent–transparent–trust triangulation model. Expert Syst. 36, e12287 (2019) 9. Habib, S.J., Marimuthu, P.N., Renold, P., Balaji, G.A. Development of self-aware and selfredesign framework for wireless sensor networks. In: the Proceedings of World Conference on Information Systems and Technologies, 16-19 April, Galicia, Spain (2019) 10. Kephart, J.O., Chess, D.M.: The vision of Autonomic computing. Computer 36(1), 41–50 (2003) 11. Hubert, U.S., Pham, H., Paluska, J.M., Waterman, J., Terman, C., Ward, S.: A case for goaloriented programming semantics. In: the Proceedings of System Support for Ubiquitous Computing Workshop, 12–15 October, Seattle, Washington, USA (2003) 12. Marinescu, D.C., Morrison, J.P., Yu, C., Norvik, C., Siegel, H.J.: A self-organization model for complex computing and communication systems. In: Proceedings of the Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems, 20–24 October, Venetia, Italy, pp. 149–158 (2008) 13. Emmanouilidis, C., Pistofidis, P.: Machinery self-awareness with wireless sensor networks: a means to sustainable operation. In: Proceedings of the 2nd Workshop on Maintenance for Sustainable Manufacturing, 12 May, Verona, Italy, pp. 43–50 (2010) 14. Wright, P., Dornfeld, D., Ota, N.: Condition Monitoring in End-Milling Using Wireless Sensor Networks (WSNs). Trans. NAMRI/SME. 36, 177–183 (2008) 15. Sterritt, R., Gunning, D., Meban, A., Henning, P.: Exploring autonomicoptions in an unified fault management architecture through reflex reactions via pulse monitoring. In: Proceedings of the 11th Annual IEEE Interantional Conference Workshop on Engineering of Computer Based Systems, 24-27 May, Brno, Czech Republic, pp. 449–455 (2004) 16. Maitland, J., Arthur, M.C.: Self-monitoring, self-awareness, and self-determination in cardiac rehabilitation. In: Proceedings of the Conference on Human Factors in Computing Systems, 04-09 April, Boston, MA, USA (2009) 17. Silva, J.M.C., Bispo, K.A., Carvalho, P., Lima, S.R.: Flexible WSN data gathering through energy-aware adaptive sensing. In: Proceedings of the Conference on Smart Communications in Network Technologies, El Oued, Algeria, pp. 317–322 (2018)
378
S. J. Habib and P. N. Marimuthu
18. Das, S., Kar, P., Jana, D.K.: SDH: self detection and healing mechanism for dumb nodes in wireless sensor network. In: Proceedings of theIEEE Region 10 Conference, Singapore, pp. 2792–2795 (2016) 19. Abuhdima, E.M., Saleh, I.: M. Effect of sand and dust storms on microwave propagation signals in southern Libya. In: Proceedings of the 15th IEEE Mediterranean Electrotechnical Conference, 25–28 April, Valeta, Malta, pp. 695–698 (2010) 20. Srivastava, S.K., Vishwakarma, B.R.: Study of the loss of microwave signal in sand and dust storms. IETE J. Res. 50(2), 133–139 (2014) 21. Rama Rao, T., Balachander, D., Nanda Kiran, A., Oscar, S.: RF Propagation measurements in forest and plantation environments for wireless sensor networks. In: Proceedings of the International Conference on Recent Trends in Information Technology, 19–21 April, Chennai, India (2012) 22. Mujlid, H., Kostanic, I.: Propagation path loss measurements for wireless sensor networks in sand and dust storms. Front. Sens. 4, 33–40 (2016) 23. Ahmed, A., Bakar, K.A., Channa, M.I., Haseeb, K., Khan, A.W.: TERP: a trust and energy aware routing protocol for wireless sensor network. IEEE Sens. J. 15(12), 6962–6972 (2015) 24. Accuweather.com. https://www.accuweather.com/en/kw. Accessed 25 Oct 2019
Intelligent and Decision Support Systems
Traffic Flow Prediction Using Public Transport and Weather Data: A Medium Sized City Case Study Carlos Silva and Fernando Martins(B) Information Systems Department, School of Engineering, University of Minho, Guimar˜ aes, Portugal [email protected], [email protected]
Abstract. Reliable traffic flow forecasting is seen as something extremely useful in various management areas of a city. Its impact can extend to the average road users, the public transportation systems, corporations, government organizations and local administration companies. The value of this data can be harnessed to support decisions that can improve traffic flow, parking availability and better manage resources leading to monetary savings, reduction in environment pollution and the improvement of quality life regarding all the city users. The purpose of this research was to develop a decision support system that can predict traffic on the city of Braga in Portugal, using data collected by a fleet of buses from the local public transport company Transportes Urbanos de Braga and relating it to weather data and events in order to have a more accurate vision of the future traffic, leading to better decisions. Keywords: Data Mining · Machine Learning mobility · Traffic prediction
1
· Smart city · Smart
Introduction
In recent years there has been a growing demand for automated intelligence in order to maximize the efficiency and productivity of every layer of society. With the rise of IoT (Internet of Things) there is now more data than any team of human analysts could ever go throw and extract useful information in a reasonable amount of time, much less when it comes to real time data (Rathore et al. 2016). In the area of transportation, the need for “smartness” has been met with a growing number of solutions, especially algorithm-based solutions but despite it, there’s still a long way to go. The classic route optimization problem is not as relevant today has it was a few years ago, with the current level of information and the flexibility that comes with it, the new goal of this era is to predict traffic. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 381–390, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_35
382
C. Silva and F. Martins
The precise prediction of traffic congestion is a critical problem on modern cities, and it is necessary to bring improvements in order to improve the management of public transportation. As said by (Bezuglov and Comert 2016) the prediction models can be affected by the driver’s behavior, events and the weather. Incorporating the latter with our data can significantly improve the accuracy of the trained model. Traffic is an erratic occurrence that demonstrates different characteristics depending on its environment, demanding a high level of automated intelligence to make sense of it and provide more accurate predictions. As said by (Fusco et al. 2016), these predictions are essential for ITS (Intelligent Transportation Systems). In this article, we present a Machine Learning (ML) approach to traffic prediction in the city of Braga throw regression type models. The data used as basis to this project is various geographical points collected by buses from Transportes Urbanos de Braga (TUB) merged with information about events, holidays and weather.
2
Case Study
It is a fact that all companies have different scopes, for example there are some that exist in business contexts related to industry or commerce activities, others are public entities and there is still room for those that do not have any profit objective (Martins et al. 2018). Regardless of this distinction, they all have internal structures, which represent their mission, vision and strategy that serve as a foundation for all the objectives of these same companies (Martins et al. 2018). Transportes Urbanos de Braga (TUB) is a company located in Braga, Portugal, that operates in the urban passenger transport sector, being certified by the Portuguese standard that regulates Research, Development and Innovation, NP4457:2007 (IPQ 2007), this is a fact that makes it more open to have a proactive attitude in the search for new solutions and ways to support the various decision making, either by it or by its main stakeholder, which is the Municipality of Braga. This company is heir to a long tradition, but its ambition leads them to make future every present day through its dynamic image and continuity, looking for an Integrated Urban Mobility that encompasses pedestrian paths, cycle paths and integration between the various modes of transport. This way of thinking, considering all the attention given to the urban environment, multimodal accessibility and the use of ICT for public management, will lead to a sustainable urban development and a better urban landscape (Caragliu et al. 2011). TUB have sought to position Braga as a city capable of responding to the latest challenges in terms of transport and mobility, so they see innovation, research and development as critical factors for their activity because of their certification.
Traffic Flow Prediction: Braga
2.1
383
Braga, a Perfect City Lab
Braga is one of the youngest cities in Europe, which makes it a dynamic and energetic city. Over the past 30 years the District’s population has grown by over 25%. The District of Braga has parameters of development and quality of life far above the national average, being only surpassed by the regions of Greater Porto and Greater Lisbon. From statistics and ratios, we can easily understand that, added to its strategic geographical position and development, this District is one of the most attractive regions for investment. Braga is one of the oldest Portuguese cities and one of the oldest Christian cities in the world; Founded in Roman times as Bracara Augusta, it has over 2000 years of history as a city. Located in the north of Portugal, more specifically in the C´ avado Valley, Braga has about 174 thousand inhabitants, being the center of the Greater Metropolitan Area of Minho, with about 800 thousand inhabitants. At the economic level, it is a city that sees economic dynamism, investment capture and internationalization as key vectors in the strategic measures for its growth, in order to boost dynamism, innovation, knowledge and creativity, towards a new local and regional economic cycle. Knowledge, creativity and innovation are today essential in the economic, social and cultural development of Braga. For its authenticity, its identity, but also for its unparalleled ability to renew itself and reinvent itself in the face of new realities. Braga offers technological conditions that reinforce this identity: a top academic environment, internationally recognized in the technological field; the International Nanotechnology Laboratory, a world-class international projection laboratory operating in nanoscience and nanotechnology; and several municipal companies, such as TUBs, with a strong innovative character. Braga has an accentuated technological aspect, which identifies and differentiates it, allowing it to increasingly assert itself on the national and international scene. All this ecosystem created in the promotion of development based on knowledge transfer, technology and eco-sustainability, has boosted Braga’s vision as a “cluster” of technological industries, making it increasingly an attractive pole for Investment which enhances the Braga’s growing success in the search for new solutions that enable the increase of comfort of all its inhabitants. 2.2
Mobility as a Core Element of a City
The need to offer mobility and comfort solutions in the region, satisfying and surprising the expectations of the involved partners (TUB 2018) is what justifies its constant search to work on new means and mechanisms to serve its customers, recognizing the importance and value of information and making it available to anyone. TUB is well aware that the use of new technologies in business models and infrastructures has been influenced, in large part, by the internet and globalization. The next trend in innovation should be in the human ability to connect to
384
C. Silva and F. Martins
the machines and the information resulting from this interaction (Zhuhadar et al. 2017), and this is a reality associated with the needs inherent to the management of a urban public transport company. This is the main justification seen from the company in order to work with this proof of concept, identifying the capabilities offered by the solution described during this paper, in order to extract all the data regarding the daily operation of buses, aggregate them, and making them available and optimize the operations of that same organization.
3
Related Work
There have been multiple projects trying to tackle the problem of predicting traffic flow. In 1997, Dougherty and Cobbett (1997) used Machine Learning (ML) to predict the traffic flow, speed and occupation. Cong et al. (2016) used LSSVM (Least squares support vector machine), a type of SVM that can be used to approximate non-linear systems with better accuracy to predict traffic flow. They concluded that the traffic flow is strictly related with the traffic flow felt in the previous minutes, therefore previous data about the traffic flow can be used as data to predict future traffic. A group of investigators (Fusco et al. 2016) tried to predict speed, using Big Data generated by floating cars, in a short term type of prediction. Their data was collected via GPS, causing problems on paths not yet traveled. The authors claim that the traffic flow prediction problem is usually tried to be solved by using Neural Networks and Bayesian Networks, that try to establish existing correlations. Their data consisted in 100000 vehicles collecting every 2 min, organized in 5 min intervals. In the 2017 article, “Deep learning for short-term traffic flow prediction”, Polson and Sokolov (2017) tried to model the effects of construction zones, events and accidents on the traffic flow. They concluded that by using Deep Learning it is possible to predict traffic flow at long term and it is possible to do it using external data with effect on traffic. Liu et al. (2009) presents a model based on a Neural Network to predict the travel time in urban areas. After reviewing multiple models such as KNN, concluded that there was space to improve on traffic prediction, especially on predicting traffic using multiple variables instead of relying only on positional data. Goves et al. (2016) studied the use of Neural Networks to predict traffic in a few roads on the UK up to 15 min in the future. They anticipate that traffic prediction could bring benefits in the form of being able to proactively take measures to mitigate congestion. This work differentiates itself from the previous researches by offering a new perspective on urban traffic prediction, by viewing it from the buses perspective for a whole city, and by allowing the decision maker to see prediction not only at short term but long term as well, allowing this information to be used in a multitude of different applications, like creating and optimizing routes. Most work in this area is done in small controlled location using only the vehicles data as a source of information, this won’t do for a urban transportation
Traffic Flow Prediction: Braga
385
company. With this work it is possible to expand our predictions to a whole medium sized city, predictions that using the surrounding environment as a source are expected to be more accurate.
4
Modeling and Development
For the purposes of this study the data collected by the buses was merged with complementary data about holidays, events and most notably weather. To initiate the Data Mining (DM) effort in a structured way, it was adopted the methodology CRISP-DM. Following the CRISP-DM methodology should always result in a successful project because its guidelines are structured in a way that makes each step manageable, including backtracking if necessary. Before initiating the development of the purposed model, it was necessary to analyze the options at disposal in the form of tools and concepts. To train a model that is powerful enough to predict accurately the traffic on the city of Braga and, at the same time light enough to be possible to do it with low on resources, it was discovered that the best path to achieve the designated goals was to use a regression type of approach with a sizeable sample of the original data. After researching related work and regression models in general it was decided that the models that would produce the best results for the problem in question were Multiple Regression, KNearest Neighbors (KNN), Neural Network (Multilayer perceptron), Random Forest and Support Vector Machine (SVM). Multiple regression is an algorithm based on linear regression that estimates the relationship between input and output variables (Bakar et al. 2006) but instead of creating a linear equation that best fits the relationship between the input variables and the prediction variable, it creates a plane to fit the described relationship. Although it is a very simple model, there is the possibility that a simple solution would suffice. The KNN model averages the predictions to the closest values for the input variables and returns it as the prediction. Being this a traffic problem, this model can be suited to this task due to is behavior being similar to a geographical point configuration like the provided data, however, as stated by Maillo et al. (2017), due to time and memory it is not advisable to use this model in a real life scenario. A Neural Network approach is a good option in any type of machine learning project. It’s nature of simulating real neurons and its links creates an intelligence without the need of specifying anything about the data besides what is an input and what is the prediction variable. It can recognize patterns, learning from the provided data discovering patterns to help it predict the future (Azoff 1994). This type of model is being very used in Artificial Intelligence (AI) projects in recent years, specially in tasks related with computer vision and classification of data. The Random Forest model works as a boosting type in the sense that its inputs are based on the output of a group of other models. It works as a group of decision tree type of predictors (Segal 2004). Decision trees use the tree format with successive conditions, following the path best suited for the input until it reaches the desired output, in this case
386
C. Silva and F. Martins
a prediction. The prediction in which more trees “vote for” shall be considered the real output. Considering that traffic predictions on a map can be grouped in small areas on a map, an SVM model that groups variables in order to make its predictions can be a very good suit for the task at hand. In the task at hand it was not possible to use SVM to its full potential because, has advised by Wang and Hu (2005), SVM have a high computational cost meaning in this case that it will not be possible to train this model with the same amount of data than the others. Using R and the rminer package the five chosen models were trained. The intention was to use the same amount of data (10.000.000 rows) for each but due to computational restrictions the models SVM and Random Forest had to be trained with less data (100.000 rows). Each model was tested against the same dataset and compared using as metrics the explained variance score, mean absolute error, mean squared error, median absolute error, r squared and time to respond. The model that presented the best performance over all the metrics would after, be put to test in the real world, comparing its actual prediction with the real traffic felt in multiple parts of the city of Braga.
5
Evaluation
To obtain a fair comparison, for all the trained model it was used the same test set. With this it is possible to make a direct comparison between the results of the various metrics used. The results of the tests performed can be consulted on the Table 1. Table 1. Results of the trained models according to the selected metrics. Model
Explained Mean variance absolute error
Mean squared error
Median absolute error
R2
Response time (seconds)
Multiple 4906615 Regression
11.61479
202.436
10.22792
0.002418068
1
KNN
213393
9.230217
143.238
7.192696
0.4990334
557
Neural Network
265846355 10.65137
176.1743
9.084123
0.1310139
2
Random Forest
680408128 8.667872
122.5863
7.1116
0.3353173
3
SVM
321856950 10.67735
177.1282
9.111392
0.1586169
21
Traffic Flow Prediction: Braga
387
From this results it was concluded that the model better suited for the designated task would be the Random Forest model, since it has the best scores all around and a really low time of response, which gives it a solid advantage over the similar scoring KNN that takes almost 10 min to complete a task. For a decision taker to use the system, a response time bigger than a few seconds is unreasonable. A scatter-plot of the final model and a regression error characteristic curve can be consulted on the Figs. 1 and 2 respectively.
Fig. 1. Scatter-plot of the Random Forest model.
Fig. 2. REC curve of the Random Forest model.
To guarantee that there was no over-fitting, the model was trained and tested again using a training set ordered in time and testing against a surely unknown future. The results were as follow: – – – – –
Explained Variance Score: 1110568; Mean Absolute Error: 8.899467; Mean Squared Error: 128.2273; Median Absolute Error: 7.353696; R 2 : 0.281174.
It was therefore concluded that the model is not over fitted and will perform as expected in a real life scenario. Finally, to access the real life capabilities of the trained model, its predictions were compared with the real life traffic felt over the course of multiple hours in various days of the week in multiple points of the city, as exemplified in the Fig. 3.
388
C. Silva and F. Martins
Fig. 3. Comparison between prediction (top image) and reality (bottom image)
6
Results
The models used on this project were trained using a consumer grade desktop computer making the developed models not as good as they could potentially have been if it was used a more powerful machine. Despite this limitation the final selected model can be used in a real life context with as approximated 75% accuracy for reasonable 10 km/h difference (up to 80% for 20 km/h). From the tests performed over the real life traffic it can be concluded that the predictions of the final model mirror the true registered values very closely, enough to make decisions based on the given information. Being this type of data based a lot on the unpredictable human behavior it was already expected to have an accuracy score lower than 90%. Traffic data is usually a very noisy type of data, to obtain such a level of accuracy on a prediction task like this could be described as impressive. It was also discovered, using the Neural Network model that, for the city of Braga in specific, it is possible to create a direct correlation between the
Traffic Flow Prediction: Braga
389
weather and the quantity of traffic. This model created a relationship between the variables that worked as followed: for the temperature, the higher the number went (the hotter it get) there would be more congestion felt all over the city, the rain variable followed the same patter, the higher the value went (the more intense the rain felt) the more traffic would be felt in the city. The information given by this system can be used by the Transportes Urbanos de Braga to take better and more informed decision, supported by concrete data, allowing them to predict and have a proactive attitude towards traffic congestion, optimize existing routes and creating news according to the predicted data.
7
Conclusion
Traffic flow prediction is an essential element to a city and the companies that operate in it. It allows for monetary savings, reducing pollution and managing it’s resources efficiently. The system presented, and the model in its core demonstrated a great potential to improve the operations of Transportes Urbanos de Braga. With its long temporal reach and the great accuracy results demonstrated in a more than reasonable margin of error the model should hold up for a long time, even at the verge of traffic automation. With the revolution of the IoT, we hope that in the future, this same concept can be revisited using a bigger quantity of data and more diverse sources of information from IoT devices spread through the city. Acknowledgments. This work has been supported by LabSecIoT powered by DigitalSign and Transportes Urbanos de Braga.
References Michael Azoff, E.: Neural Network Time Series Forecasting of Financial Markets. Wiley, Hoboken (1994) Bakar, Z.A., et al.: A comparative study for outlier detection techniques in data mining. In: 2006 IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–6. IEEE (2006) Bezuglov, A., Comert, G.: Short-term freeway traffic parameter prediction: application of grey system theory models. Expert Syst. Appl. 62, 284–292 (2016) Caragliu, A., Del Bo, C., Nijkamp, P.: Smart cities in Europe. J. Urban Technol. 18(2), 65–82 (2011) Cong, Y., Wang, J., Li, X.: Traffic flow forecasting by a least squares support vector machine with a fruit fly optimization algorithm. Procedia Eng. 137, 59–68 (2016) Dougherty, M.S., Cobbett, M.R.: Short-term inter-urban traffic forecasts using neural networks. Int. J. Forecast. 13(1), 21–31 (1997) Fusco, G., Colombaroni, C., Isaenko, N.: Short-term speed predictions exploiting big data on large urban road networks. Transp. Res. Part C: Emerg. Technol. 73, 183– 201 (2016) Goves, C., et al.: Short term traffic prediction on the UK motorway network using neural networks. Transp. Res. Procedia 13, 184–195 (2016)
390
C. Silva and F. Martins
IPQ. NP4457:2007 - Gest˜ ao da Investigac˜ ao, Desenvolvimento e Inovac˜ ao (IDI), Requisitos de um projecto de IDI. In: Instituto Portuguˆes da Qualidade, pp. 1–31 (2007) Liu, H., et al.: A neural network model for travel time prediction. In: Proceedings - 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, vol. 1, pp. 752–756 (2009) Maillo, J., et al.: kNN-IS: An Iterative Spark-based design of the k- Nearest Neighbors classifer for big data. Knowl.-Based Syst. 117, 3–15 (2017) Martins, F., Ribeiro, P., Duarte, F.: Improving project management practice through the development of a business case: a local administration case study. In: World Conference on Information Systems and Technologies, pp. 433–448. Springer (2018) Polson, N.G., Sokolov, V.O.: Deep learning for short-term traffic flow prediction. Transp. Res. Part C: Emerg. Technol. 79, 1–17 (2017) Mazhar Rathore, M., Ahmad, A., Paul, A.: IoT-based smart city development using big data analytical approach. In: 2016 IEEE International Conference on Automatica (ICA-ACCA), pp. 1–8. IEEE (2016) Segal, M.R.: Machine learning benchmarks and random forest regression (2004) TUB. Relat´ orio e Contas 2017. In: Transportes Urbanos de Braga (2018) Wang, H., Hu, D.: Comparison of SVM and LS-SVM for regression. In: 2005 International Conference on Neural Networks and Brain, vol. 1, pp. 279–283. IEEE (2005) Zhuhadar, L., et al.: The next wave of innovation–review of smart cities intelligent operation systems. Comput. Hum. Behav. 66, 273–281 (2017)
Filtering Users Accounts for Enhancing the Results of Social Media Mining Tasks May Shalaby(&) and Ahmed Rafea(&) The American University in Cairo, Cairo, Egypt {mayshalaby,rafea}@aucegypt.edu
Abstract. Filtering out the illegitimate Twitter accounts for online social media mining tasks reduces the noise and thus improves the quality of the outcomes of those tasks. Developing a supervised machine learning classifier requires a large annotated dataset. While building the annotation guidelines, the rules were found suitable to develop an unsupervised rule-based classifying program. However, despite its high accuracy, the performance of the rule-based program was not time efficient. So, we decided to use the unsupervised rule-based program to create a massive annotated dataset to build a supervised machine learning classifier, which was found to be fast and matched the unsupervised classifier performance with an F-Score of 92%. The impact of removing those illegitimate accounts on an influential users identification program developed by the authors, was investigated. There were slight improvements in the precision results but not statistically significant, which indicated that the influential user program didn’t identify erroneously spam accounts as influential. Keywords: Twitter
Social media Classification
1 Introduction Twitter has become a popular source of data among researchers. Identifying the relevant portions of these data is critical for researchers attempting to analyze certain aspects of behavior on the platform. More importantly is filtering the noise because the rise in use of Twitter has also witnessed growth in unwanted activities. Removing this noise will help in getting better results in data mining tasks such as influential blogger identification, topic extraction, sentiment analysis, and others. The objective of the research presented in this paper is to build an efficient classifier to identify illegitimate accounts and hence improve the quality of social media mining tasks. In order to achieve this objective two research questions were suggested. First, to investigate an efficient classifier to identify and filter out illegitimate accounts. Second, measure the impact of removing those illegitimate accounts on a social media task like influential users identification. The rest of this paper is organized as follows. In the following section we review some of the related work, followed by our proposed approach in Sect. 3, experimental results and discussion in Sect. 4, and finally, the conclusion in Sect. 5.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 391–400, 2020. https://doi.org/10.1007/978-3-030-45691-7_36
392
M. Shalaby and A. Rafea
2 Related Work Many works, such as Davis et al. (2016), Liu et al. (2016), Ala’M et al. (2017), Aslan et al. (2018) and Inuwa-Dutse et al. (2018) focus on applying and comparing popular machine learning techniques for Twitter spam detection, making use of the features available via the Twitter API. A major subcategory of spam are social bots; accounts controlled by software, algorithmically generating content and establishing interactions. Some perform useful functions, however, there is a growing record of malicious applications; Varol et al. (2017) present a framework for bot detection, leveraging more than one thousand features. Chavoshi et al. (2016a, b) developed a method which calculates cross user activity correlations to detect bot accounts, and Chavoshi et al. (2017) performed temporal pattern mining on bot activities and identified a set of indicators that separate bots from humans. Duh et al. (2018) also focused on the temporal properties of Twitter users’ activities, separating accounts into populations of bots and humans. Inuwa-Dutse et al. (2018) investigate lexical features. The magnitude of the problem was underscored by a Twitter bot detection challenge organized by DARPA (Subrahmanian et al. 2016) (Varol et al. 2017). Kudugunta and Ferrara (2018) exploited both the tweet’s textual features and metadata as inputs to a contextual LSTM neural network. Madisetty and Desarkar (2018) proposed an ensemble approach, combining deep learning and a feature-based model. Jain et al. (2019) proposed a hybrid deep learning architecture, Sequential Stacked CNN-LSTM model, for spam classification.
3 Proposed Approach Different kinds of accounts exhibit different behaviors, making them viable to classification via machine learning (ML) approaches. However, building an ML classifier requires annotated data for training which is a very tedious task to carry out manually. In order to answer the first research question, we found that state of the art in classifying illegitimate users were: RFC in Aslan et al. (2018) and Deep Learning in Jain et al. (2019). It was also common that many features were collected from different sources which necessitated usage of deep learning. In our work we focus on features from metadata that accompanies the tweet. Consequently, we decided to investigate building a classifier using supervised and unsupervised approaches and assess runtime performance. If the performance of the unsupervised approach was satisfactory, we will use it and avoid the tedious work of annotating data for a supervised classifier. Otherwise we have no option but to annotate data for training a supervised classifier. In order to answer the second research question, we decided to filter the accounts identified to be influential by the program described in Shalaby and Rafea (2018) and measure the precision before and after removing the illegitimate accounts. The following subsections describe the data collection, the unsupervised approach to classify the account as illegitimate/legitimate, the supervised approach to do the same task, and the methodology we followed to measure the impact of removing illegitimate accounts on the influential user identification module.
Filtering Users Accounts for Enhancing the Results
3.1
393
Data Collection
Datasets were collected using the Twitter Search API. Each Tweet retrieved is accompanied by its metadata, including its author’s profile metadata. A search for Arabic tweets about the Egyptian football league retrieved 80331 tweets posted by 41869 users. This is referred to as Dataset 1, to be used in the unsupervised and supervised classification experiments. To verify the performance of the supervised classification and measure the impact of account filtering on a social media task, we needed a dataset other than the one used to train the supervised classification model. So, another search for Arabic tweets about the English football league retrieved 99329 tweets posted by 39709 users. This is referred to as Dataset 2. 3.2
Unsupervised Classification
The following are the different kinds of illegitimate accounts that can be identified by a set of rules highlighting unwarranted online behavior: 1. 2. 3. 4. 5.
Click-bait accounts, where most of the posts contain hyperlinks. Rated accounts, where the posts are of deviant content. Very highly active accounts, where the tweet post rate is abnormally high. Exclusively retweeting accounts, where most of the posts are retweets. Accounts to ignore, and eliminate from further analysis, since they’ve been deactivated or switched to private by the user, or suspended by Twitter. 6. Premature accounts, which have few followers, statuses, or recently created. 7. Social capitalists accounts, where their main goal is to acquire followers and retweets. A rule-based program is to be built so that for each user, it is to retrieve the most recent 200 posts of their timeline via the Twitter REST API, and analyze each of those tweets, classifying the different kinds of accounts listed according to the following proposed conditions respectively: 1. If over 90% of the account’s tweets contain some URL, then the account is a clickbait account, and tagged with a value of ‘1’. 2. A lookup list of rated hashtags extracted from tweets was compiled. If more than three tweets contained any of the hashtags, then the account is tagged with a value of ‘2’. 3. An average activity rate calculated by dividing the account’s statuses count by the account age in days. Also calculated is the number of tweets posted in the last 24 h. If the smaller of the two numbers is greater than 30, then the account is very highly active, and tagged with a value of ‘3’. 4. If over 90% of the account’s tweets were found to be retweets, the account tagged with a value of ‘4’. 5. If the Twitter API is denied access, then we are to ignore that account in further analysis and tag it with a value of ‘5’. 6. If the account has less than 20 followers, or 20 statuses, or the account is less than 6 months old, then the account is premature, and tagged with a value ‘6’.
394
M. Shalaby and A. Rafea
7. A lookup list of commonly used hashtags among social capitalists extracted from tweets was compiled. If 40% of the tweets contain any of the hashtags, they will be tagged with a value of ‘7’. Also, if more than 50% of the tweets contain more than 4 hashtags per post, the account will be tagged with a value of ‘7’. 8. If the account is not detected as one of the 7 types of accounts mentioned, it gets tagged with a value of ‘0’. 3.3
Supervised Machine Learning (ML) Classification
Supervised ML classification requires training with annotated data. The accounts are to be classified in a binary fashion; the illegitimate accounts; ‘1’, and others; ‘0’. Once we have an annotated dataset, we train a Random Forest Classifier (RFC), a Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) and test their performance. Derived via simple calculations from an account’s profile metadata, the proposed classifying features vector for machine learning are: • • • • • • • • • • 3.4
Account Age: The age, in days, of an account, since its creation date. Activity Rate: The average number of posts sent by the account per day. TFF Ratio: Twitter Follower-to-Friend ratio. Followers count: The number of followers this account currently has. Friends count: The number of users this account is following. Listed count: The number of public lists this account is a member of. Favorites count: The number of tweets the users has liked in the account’s lifetime. Statuses count: The number of tweets, including retweets, issued by the user. Location: A binary feature, whether the user had set their location. Description Length: The length of the user-defined string describing their account. Impact of Account Filtering on a Social Media Task
To measure the impact of removing those illegitimate accounts, we test it on a social media task; Identifying Influential Twitter Users. The task is carried out following the method applied in Shalaby and Rafea (2018), using an unsupervised approach where the users are ranked according to their influence. A set of features were selected based on the influential users’ behaviors they reflect. The users are then ranked according to each of these features and cut off after the top 50. Each user is then ranked according to how many of the feature lists they appeared in, a.k.a. their appearance frequency.
4 Experiments and Discussion 4.1
Experiment 1: Unsupervised Account Classification
The objective of this experiment is to build a rule-based program to classify the different kinds of illegitimate accounts. Method: A program is built so that for each user in the dataset, it is to retrieve the most recent 200 posts of their timeline, and analyze each of those tweets, classifying the
Filtering Users Accounts for Enhancing the Results
395
accounts according to the rules listed in Sect. 3.2. The program was run on the 41869 accounts of Dataset 1, and a random selection of accounts is then manually investigated to measure its performance. Results: After classifying the accounts, a random selection of each of the classified accounts types were extracted for evaluation. Each of the account profiles and their posts are examined in order to confirm or debunk the classification. The number of accounts selected for inspection is roughly equivalent to the percentage of the accounts in each type, with a minimum of 10 accounts. So, accounts with labels ‘2’ and ‘7’ with 0% will not be inspected. Accounts with label ‘5’ are labelled such since the classifier was denied access to them, so they will be excluded from further analysis and inspection. Table 1 shows the number of accounts and percentage classified in each account type, also, the manually inspected sample size and precision. With an average precision of 0.99, calculated from the values in Table 1, the unsupervised classifier seems to have followed the rules and sufficiently classified the accounts according to the conditions set.
Table 1. Unsupervised rule-based classification and evaluation Account type Click-bait Rated Highly active Exclusively retweeting To ignore Premature Social capitalist Valid user
Label Number of accounts Percentage 1 2081 5% 2 10 0% 3 2321 5.5% 4 1790 4.3% 5 1892 4.5% 6 9250 22.1% 7 4 0% 0 24521 58.6%
Sample size Precision 10 1 0 – 10 1 10 1 0 – 30 1 0 – 60 0.95
Discussion: As per the manual evaluation, the classifier functions as intended. However, the program took a considerable amount of time to classify the accounts. We realized that the average time it took to classify an account was 2.724 s. Some accounts required less time; 0.684, while a few required up to 11.459 s. That time was in a major part due to the Twitter timeline retrieval of each user, in addition to the Twitter API rate limitations. The unsupervised classifier was found impractical to run for online applications due to its long runtime. However, its high performance makes it a very convenient alternative to manual data annotation; much faster and cheaper than manually labelling a massive dataset such as Dataset 1.
396
4.2
M. Shalaby and A. Rafea
Experiment 2: Supervised Account Classification
The objective of this experiment is to train a supervised ML classifier to classify the illegitimate accounts, using the accounts’ profile features. Data Annotation: Rather than relying on manual labeling for the training and testing datasets, we decided to take advantage of the rule-based program of experiment 4.1. Excluding the 1892 accounts to “ignore”, brings down the annotated Dataset 1 to 39977 accounts. Then we apply a binary annotation; ‘0’ for valid accounts, and ‘1’ for the illegitimate accounts. This results in 24521 accounts (61.3%) annotated with a ‘0’, and the remaining 15456 accounts (38.5%) annotated with a ‘1’. Method: The annotated Dataset 1 is randomly split into a training dataset of 29983 accounts (75%) and a testing dataset of 9994 accounts (25%). We train an RFC with 100 trees, an SVM with RBF kernel, and a KNN, and test their performance. Results: Table 2 shows the performance measures for each of the classifiers. Discussion: The performance measures in Table 2 show that the unsupervised classifier provided a sufficient annotated dataset for training and testing, with the RFC outperforming the other classifiers using features that require no additional data collection or further processing as is the case with what we found in the literature. The accounts are classified, using RFC, in an online fashion, with 0.90 precision, 0.96 recall, 0.93 F-score and 0.92 accuracy.
Table 2. ML classifiers’ performance Classifier RFC SVM KNN
4.3
Precision 0.904 0.8 0.814
Recall 0.965 0.93 0.877
F-score 0.933 0.86 0.846
Accuracy 0.917 0.822 0.804
Experiment 3: Supervised Classification Validation
The objective of this experiment is to validate the performance of the RFC model. Method: Dataset 1 was used to train the classifiers, so Dataset 2 is used in this experiment. It is split into 5 similarly sized sub-datasets; Datasets 2.1, 2.2, 2.3, 2.4, 2.5, and the performance of the trained RFC model of experiment 2 is tested with each. Since the unsupervised classifier was used to annotate the training dataset of the RFC model, we assume its classification as ground truth. The T-test is then carried out to measure statistical significance. With a null hypothesis (H0) that there is no statistical significance between the performance of the RFC and the outcome of the unsupervised classifier. And an alternative hypothesis (Ha) stating that the performance of the RFC is significantly less than that of the unsupervised classifier.
Filtering Users Accounts for Enhancing the Results
397
Results: Table 3 shows the performance measures for running the RFC on each of the Datasets 2.1, 2.2, 2.3, 2.4 and 2.5. The T-test was then carried out using the F-score values in Table 3, compared against the unsupervised classifier values. The t-score was calculated to be 1.36E-08. With a degree of freedom of 4, and alpha level of 0.05, the pvalue from the t-table is 2.132. The calculated t-value is much less than the p-value, thus supporting the null hypothesis (H0) that there is no statistically significant difference between the performance of the RFC and the unsupervised classifier outcome. Discussion: Despite the high accuracy levels of the unsupervised method, it is a very slow process to carry out, whereas the RFC is very fast, and as the experiment shows, the performance difference between the two methods is statistically insignificant. The RFC, using a different set of features, was able to match the performance of the unsupervised method.
Table 3. The performance measures for each of the datasets Datasets Dataset 2.1 Dataset 2.1 Dataset 2.3 Dataset 2.4 Dataset 2.5
4.4
Precision 0.897 0.902 0.902 0.898 0.901
Recall 0.936 0.937 0.938 0.935 0.932
F-score 0.916 0.919 0.919 0.916 0.917
Experiment 4: Impact of Account Filtering on a Social Media Task
The objective of this experiment is to measure the impact of removing those illegitimate accounts on a social media task; Identifying Influential Twitter Users. The task is carried out following the method applied in Shalaby and Rafea (2018). Method: The trained RFC, of experiment 2, was used to predict labels for the 39709 Dataset 2 user accounts. The 27549 accounts (69.93%) predicted to be legitimate, labelled with a ‘0’, will be coined as ‘Filtered Dataset 2’. Our experiment consists of running the Ranking Users program (Shalaby and Rafea 2018) and having the outcome analyzed. The Ranking Users program is run one time on all 39709 user accounts of Dataset 2, to produce a list of the top 50 ranked accounts, which we will call ‘list 1’. The 50 accounts in ‘list 1’ are classified via the RFC, filtering out the illegitimate accounts resulting in a list, which we will call ‘list 1.1’. The Ranking Users program is run another time only on the 27549 Dataset 2 accounts predicted by the RFC, as legitimate: ‘Filtered Dataset 2’, producing another list of the top 50 ranked accounts, which we will call ‘list 2’. Results: Table 4 shows the Influential Users’ analysis at 10, 20, 30, 40 and 50 of each of the lists 1, 1.1 and 2, showing the number of influential users (IU count) and precision of each.
398
M. Shalaby and A. Rafea Table 4. Precision of the ranking users program on the different datasets List 1: without filter List 1.1: filter postranking Analysis at: IU count Precision IU count Precision 10 6 0.6 6/10 0.6 20 13 0.65 13/19 0.68 30 18 0.6 18/28 0.64 40 22 0.55 22/36 0.61 50 25 0.5 25/42 0.59 Avg. precision 0.58 0.62
List 2: filter preranking IU count Precision 6 0.6 12 0.6 17 0.57 20 0.5 23 0.46 0.55
Discussion: As can be seen in Table 4, filtering the users after ranking them resulted in the highest precision; when RFC was applied on list 1 to filter out the spam users, producing list 1.1, there was approximately 7% improvement. 4.5
Experiment 5: Impact Verification
The objective of this experiment is to verify the impact of using the RFC model to filter out illegitimate accounts on the precision of the Ranking Users program (Shalaby and Rafea 2018), and whether the impact is statistically significant. Method: From Dataset 2 five sub-datasets were generated by randomly selecting 70% of the users, five times with repetition; Dataset 2A, 2B, 2C, 2D and 2E. Our experiment consists of running the Ranking Users program (Shalaby and Rafea 2018) on each of the five datasets, and producing 2 lists for each dataset: A list of the top 50 accounts, unfiltered, which we will call ‘list 1’. The accounts in ‘list 1’ are then classified via the RFC, filtering out the illegitimate accounts resulting in a list, which we will call ‘list 1.1’. Each of the outcomes are evaluated and the Influential Users precision is calculated. Finally, a T-test is used to decide if the RFC filter resulted in any statistical significance in the outcome. With a null hypothesis (H0) that there is no difference between the Influential Users detection precision before and after filtering with the trained RFC model, and an alternative hypothesis (Ha) stating that filtering the spam accounts via the RFC model improved the precision of the Influential Users detection. Results: The analysis at 20 consistently produced the highest precision values in five runs on the five datasets. Table 5 shows the precision at 20 with and without the RFC filter. The T-test was carried out on the precision values shown in Table 5. The t-score was calculated to be 0.103. With a degree of freedom of 4, and alpha level of 0.05, the p-value from the t-table is 2.132. The calculated t-value is much less than the p-value, thus supporting the null hypothesis (H0) that there is no statistically significant difference between the Influential Users detection precision before and after filtering with the trained RFC model.
Filtering Users Accounts for Enhancing the Results
399
Table 5. The precision values at 20 users with and without RFC filter Dataset 2A Dataset 2B Dataset 2C Dataset 2D Dataset 2E Average precision:
List 1 without filter List 1.1 filter post-ranking 0.7 0.68 0.75 0.75 0.75 0.74 0.8 0.8 0.8 0.79 0.741 0.74
Discussion: After carrying out the T-test, the results did not turn out to be statistically significant. In fact, the filtering seems to have little to no effect on the precision of the list of Influential users produced by the ranking algorithm. This contradicts our expectation that filtering out the illegitimate users will enhance the precision of influential users.
5 Conclusion We want to filter out the illegitimate Twitter accounts for an ongoing online analysis. Developing a supervised machine learning program to classify these illegitimate accounts needs a large annotated data set which requires a lot of effort. We decided to start by building guidelines for annotating the accounts; identifying a set of rules to mark an account as legitimate/illegitimate. Those rules were found suitable to develop an unsupervised rule-based classifier that goes through each of the accounts’ timeline tweets. However, it takes a lot of time in analyzing the tweets’ contents to classify illegitimate tweets. In the meantime, the accuracy of this unsupervised classifier was very high. We decided to use it to annotate a training dataset to build a supervised classifier that would match the performance of the unsupervised but more efficient in terms of speed. The supervised classifier features are calculated from the account’s profile metadata that accompany each tweet retrieved by the Twitter API. Three supervised classification approaches were carried out; RFC, SVM and KNN, with the RFC outperforming the other classifiers. Despite finding higher performance scores for RFC in the literature, the features we use require no additional data collection or further processing. We were able to classify the accounts using RFC, and filter the users in an online fashion, relying on the basic user profile features, with high scores of 0.90 precision, 0.96 recall, 0.93 F-score and accuracy of almost 0.92, statistically matching the performance level of the unsupervised method using a different set of features. We then went on to investigate the impact of removing those illegitimate accounts on a social media task; Identifying Influential Twitter Users. With the expectation that filtering out the illegitimate users will enhance the precision by removing some of the false positive accounts identified as influential. This thought was due to the heuristic ranking method we used and the possibility of identifying some spam accounts as influential which has happened in some cases. However, we found the difference in
400
M. Shalaby and A. Rafea
precision of the user ranking algorithm when applying the filter statistically insignificant. In the meantime, we can say that any influential account identified by our algorithm would not be spam with high confidence as per our experiment. Despite failing to find significant impact for the spam filter on Identifying Influential Twitter Users, further investigation on other data mining tasks can be carried out. Acknowledgments. The authors would like to thank ITIDA and AUC for sponsoring the project entitled “Sentiment Analysis Tool for Arabic”.
References Ala’M, A. Z., Alqatawna, J., Faris, H.: Spam profile detection in social networks based on public features. In: 2017 8th International Conference on Information and Communication Systems (ICICS), pp. 130–135. IEEE (2017) Aslan, Ç.B., Sağlam, R.B., Li, S.: Automatic detection of cyber security related accounts on online social networks: Twitter as an example. In: Proceedings of the 9th International Conference on Social Media and Society, pp. 236–240. ACM (2018) Chavoshi, N., Hamooni, H., Mueen, A.: Identifying correlated bots in Twitter. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10047, pp. 14–21. Springer, Cham (2016a) Chavoshi, N., Hamooni, H., Mueen, A.: DeBot: Twitter bot detection via warped correlation. In: ICDM, pp. 817–822 (2016b) Chavoshi, N., Hamooni, H., Mueen, A.: Temporal patterns in bot activities. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1601–1606. International World Wide Web Conferences Steering Committee (2017) Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 273–274. International World Wide Web Conferences Steering Committee (2016) Duh, A., Slak Rupnik, M., Korošak, D.: Collective behavior of social bots is encoded in their temporal Twitter activity. Big data 6(2), 113–123 (2018) Inuwa-Dutse, I., Bello, B.S., Korkontzelos, I.: Lexical analysis of automated accounts on Twitter. arXiv preprint arXiv:1812.07947 (2018) Jain, G., Sharma, M., Agarwal, B.: Spam detection on social media using semantic convolutional neural network. Int. J. Knowl. Discov. Bioinform. 8(1), 12–26 (2018) Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018) Liu, S., Wang, Yu., Chen, C., Xiang, Y.: An ensemble learning approach for addressing the class imbalance problem in Twitter spam detection. In: Liu, J.K.K., Steinfeld, R. (eds.) ACISP 2016. LNCS, vol. 9722, pp. 215–228. Springer, Cham (2016) Madisetty, S., Desarkar, M.S.: A neural network-based ensemble approach for spam detection in Twitter. IEEE Trans. Comput. Soc. Syst. 5(4), 973–984 (2018) Shalaby, M., Rafea, A.: Identifying the topic-specific influential users in Twitter. Int. J. Comput. Appl. 179(18), 34–39 (2018) Subrahmanian, V.S., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., Menczer, F.: The DARPA Twitter bot challenge. Computer 49(6), 38–46 (2016) Varol, O., Ferrara, E., Davis, C. A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. In: Eleventh International AAAI Conference on Web and Social Media (2017)
From Reinforcement Learning Towards Artificial General Intelligence Filipe Marinho Rocha1,3,4(B) , V´ıtor Santos Costa1,4 , and Lu´ıs Paulo Reis2,3 1
4
FCUP – Faculdade de Ciˆencias da Universidade do Porto, Porto, Portugal [email protected] 2 FEUP – Faculdade de Engenharia da Universidade do Porto, Porto, Portugal 3 LIACC – Laborat´ orio de Inteligˆencia Artificial e Ciˆencia de Computadores, Universidade do Porto, Porto, Portugal CRACS – Centre of Advanced Computing Systems, INESCTEC, Porto, Portugal
Abstract. The present work surveys research that integrates successfully a number of complementary fields in Artificial Intelligence. Starting from integrations in Reinforcement Learning: Deep Reinforcement Learning and Relational Reinforcement Learning, we then present NeuralSymbolic Learning and Reasoning since it is applied to Deep Reinforcement Learning. Finally, we present integrations in Deep Reinforcement Learning, such as, Relational Deep Reinforcement Learning. We propose that this road is breaking through barriers in Reinforcement Learning and making us closer to Artificial General Intelligence, and we share views about the current challenges to get us further towards this goal. Keywords: Reinforcement Learning · Deep Learning · Neural-symbolic integration · Inductive Logic Programming General Intelligence
1
· Artificial
Introduction
The integration of complementary fields of Artificial Intelligence (AI) [1], has been fruitful, as for example, the combination of Deep Learning (DL) [2] with Reinforcement Learning (RL) [3,4], that gave rise to Deep Reinforcement Learning (DRL) [5]. DRL is nowadays, the state-of-the-art in the RL domain. However, DRL is reaching several limitations [6]. Some of these limitations are: sample inefficiency, that is, the massive amount of data or interactions with the environment required for learning; weak generalization, that is, weak capability of acquiring knowledge that can be successful transferred to new environments and for different tasks; inability of learning complex tasks; and the lack of explainability and interpretability, of DRL models and their outputs. Pedro Domingos advocates, in [7], to overcome the current limitations of Machine Learning (ML) [8], the integration of different paradigms of AI is required. He believes if we attain, what he calls, the Master Algorithm, a c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 401–413, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_37
402
F. M. Rocha et al.
universal learner capable of deriving all knowledge from data, we will surpass human-level learning capabilities and this algorithm will have general applicability. In this case, he states we will have 80% of the solution to Artificial General Intelligence (AGI). His own development, the Markov Logic Network (MLN) [9], is a prime example of unifying different paradigms: Logical and Statistical AI with Markov Logic. New approaches are required to overcome DRL limitations. Some new research is trying to do this, by combining Relational reasoning with DRL [10–12]. Using Relational reasoning within DL architectures can facilitate learning about entities, their relations, and rules for composing them. The Inductive Logic Programming (ILP) [13], or Relational Learning, approach, an instance of the Symbolic AI paradigm, with the First-Order Logic (FOL) [14] representations it uses, may solve many of the current problems of DRL and reach the kinds of solutions that many envision. For example, ILP is considered very sample efficient. Also, FOL is highly expressive and its statements are easily readable by humans, therefore, it has the potential of improving on the explainability and interpretability issues of DRL. Moving from a propositional to a Relational representation can facilitate generalization over goals, states, and actions in a RL setting, exploiting knowledge learned during an initial learning stage, hence, allowing for better generalization and transfer learning. We propose that these improvements will lead us closer to achieving AGI.
2 2.1
Integrations in Reinforcement Learning Deep Reinforcement Learning
The field of Reinforcement Learning (RL) [3,4] has achieved remarkable results in the motor control and games domains. In Fig. 1, we can observe the RL setting, where an agent observes the current state of the environment, chooses and takes an action and receives a reward accordingly. Then, the agent observes the following state of the environment and the process is repeated. In this setting, the objective of the agent is to maximize the cumulative reward. Usually, the highest reward is received when reaching the goal or when completing the target task in some environment. One of the original RL algorithms, is Q-learning [15], or, Q-table. It learns the Q-values, or Action-values, for all possible combinations of states and actions and places this information in a lookup table. On the other hand, Deep Learning (DL) [2] has also achieved impressive results, especially in pattern recognition and perception tasks. The integration of DL with RL, called Deep Reinforcement Learning (DRL), which the Deep Q-Network (DQN) [5] is the seminal example of, reached better results than ever before. The difference between traditional Q-learning and this algorithm, is that instead of a lookup table, the latter uses a Deep Network to approximate the optimal Q-value function.
From RL Towards AGI
403
Fig. 1. Agent learns while interacting with the environment in RL
In creating the DQN, they took advantage of advances in DL to develop a novel artificial agent that could learn successful policies directly from highdimensional sensory inputs, using end-to-end RL. They demonstrated that the DQN agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of classic Atari 2600 games, using the same algorithm, network architecture and hyper-parameters. In the same work, it was argued that the theory of RL provides a normative account, deeply rooted in psychological and neuro-scientific perspectives on animal behaviour, of how agents may optimize their control of an environment. While RL agents had achieved some successes in a variety of domains, their applicability was previously limited to domains in which useful features could be handcrafted, or to domains with fully observed, low-dimensional state spaces. But to use RL successfully in situations approaching real-world complexity, agents must derive efficient representations of the environment from highdimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of RL and hierarchical sensory processing systems. Some barriers were passed through with DRL, but still, limitations remained. Some of the limitations of DRL, such as sample inefficiency, are explored at [6], where several research papers on this subject are reviewed. One successful and famous approach that broke some of the existent limitations, was the product of the combination of DRL with Markov Tree Search (MTS), the AlphaGo [16], a system that excelled at Go, considered a very hard and complex game, beating all the human champions. It was followed by the more powerful AlphaZero [17], which achieved even better performance at Go, without receiving human input, and can also generalize to excel at Chess and Shogi games. Another RL variants, that try to overcome some of the limitations of DRL algorithms are Hierarchical RL [18] and Multitask RL [19,20]. Hierarchical RL algorithms decompose the target task or goal in sub-tasks or sub-goals that combined in sequence makes achieving the final goal or accomplishing the target task, easier and less complex.
404
F. M. Rocha et al.
Instead of combining the primitive actions of the RL agent, the higher-level actions corresponding to executing a sub-task or achieving a sub-goal are combined, hence largely reducing the combinatorial space to be explored. The problem with these approaches, is obtaining automatically the sub-goals and sub-tasks from the original goals and tasks. Multitask RL is an approach that involves training the RL algorithms in multiple tasks. Its objective is to reduce data requirements by allowing knowledge transfer between tasks, tackling the sample inefficiency of DRL, and hopefully, attain better generalization abilities. A specific algorithm that improves on the sample efficiency in DRL, is Hindsight Experience Replay (HER) [21]. It is state-of-the-art for very sparse rewards environments, when no human input is provided. Its strength is that it learns from mistakes too, improving the target task learning efficiency. Dyna [22] is also a possible solution to the sample inefficiency, when it comes to real data. This is a model-based [4] RL algorithm. One of its variants, is Dyna-Q and it uses DQN to model a Q-value function. Its idea is to implement another model besides the typical RL model, that learns the dynamics of the environment, the so called, World model. Given the current state and action as input, it returns the predicted next state and reward. This model is then used for simulation, while the RL model is learning a Q-value function with real experiences. In between real experiences, the RL model is also updated with simulated experiences. So it uses a lot of simulated data to learn, reducing the requirements for real data. This is considered a form of Planning, but an implicit one. Often, in order to facilitate learning, since DRL by itself is unable to solve more complex tasks in a timely manner, human input is provided, using examples where an algorithm learns from imitating the human actions [23] or where the high-level steps required to execute a task are given to an algorithm by a human [24]. This is called Imitation Learning [25]. Human intervention is a limited resource and for some tasks the human is not aware of the optimal solution, or even any solution at all. Another problem, is that the Deep networks used in DRL, are black-boxes. Explainable AI/ML is a current research trend [26,27]. A lot of ML models are black-boxes and they provide no insight to the model itself or explainability about its predictions or other outputs. Without explainability and interpretability of models, one cannot perform verification of the models, and so verifiability is also a current issue to be solved, specially because there are human safety concerns when deploying these black-box models to the real world, as in autonomous driving, for example. In [28], it is introduced the Behaviour Suite for RL. It consists of a collection of carefully-designed experiments that investigate core capabilities of RL. It facilitates reproducible and accessible research on the core issues in RL, and ultimately the design of superior learning algorithms. This testing suite instead of focusing on an one dimensional performance evaluation of RL, it acknowledges other dimensions to learning performance. One of the core capabilities for a RL
From RL Towards AGI
405
agent, that was overlooked in the past, is generalization, which is now considered to be limited in the state-of-the-art DRL algorithms (Fig. 2).
Fig. 2. Radar plot for the performance of several DRL algorithms regarding the 7 core capabilities for RL agents described in [28]
2.2
Relational Reinforcement Learning
A established research field that tries to empower RL agents with high-level Relational reasoning is the Relational Reinforcement Learning (RRL) field [29–31]. In this field, it is studied the representation and generalization in RL and higher-order representations are advocated, instead of the commonly used, propositional representations. The core idea behind RRL is to combine RL with Relational reasoning by representing states, actions and policies using a Relational language, like FOL [14]. Moving from a propositional to a relational representation facilitates learning the relations between agent and objects and generalizing goals, states, and actions. Additionally, a Relational language also facilitates the use of background knowledge. Background knowledge can be provided, by giving the system, facts and rules, described in the Relational language, relevant to the learning problem. Traditional Artificial Neural Networks (ANNs) are only capable of learning propositional or attribute-value representations. Instead, FOL allows for more abstract higher-order representations. FOIL (First-Order Inductive Learner) [32] algorithm, is a method for learning these relations, a prime example of ILP or Relational Learning.
406
F. M. Rocha et al.
In RL, learning the relations between entities, like the agent or agents and objects, can be extremely important for understanding the environment dynamics. Along this line of work, in [33], were defined Relational Actions (R-actions) and Relational States (R-states), that substitute the primitive actions and states values that typically feed a RL algorithm. The problem with this approach is defining automatically this Relational Space (R-space), composed of higher-level actions and state descriptions, from the primitive actions and states values, that describe relations in the environment.
3
Neural-Symbolic Approaches in Artificial Intelligence
The combination of Symbolic AI with Neural Networks seems promising to many researchers and is the study object of an established field of research, called Neural-Symbolic Learning and Reasoning, as shown by the yearly conferences and workshops, since 2005 [34]. There is new research that aims to combine ANNs with Logic reasoning in relevant ways. An example of this combination, is [35], where a Differentiable Inductive Logic Programming (DILP) framework is proposed. They argue that ANNs are powerful function approximators, capable of creating models to solve a wide variety of problems but as their size and expressivity increases, they end up with a over-fitting problem. Although mitigated by regularization methods, the common solution is to use large amounts of training data that sufficiently approximates the data distribution of the domain of the problem. In contrast, Logic Programming (LP) [36] based methods, such as ILP, offer an extremely data-efficient process by which models can be trained to reason on symbolic domains, however, these methods are not robust to noise in or mislabeling of inputs, and cannot be applied in domains where the data is ambiguous, such as operating on raw pixels. Their solution, DILP, provides models trained by backpropagation that can be hybridized by connecting them with ANNs over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalization beyond what ANNs, on their own, can achieve. Another example of the referred combination, is [37]. They propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both Inductive learning and Logic reasoning. NLMs exploit the power of both ANNs, as function approximators, and LP, as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on small-scale tasks, NLMs can use learned rules to generalize to large-scale tasks. Yet, another relevant example is [38]. They propose an ANN module called Neural Arithmetic Logic Unit (NALU). ANNs can learn to represent and
From RL Towards AGI
407
manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. Their experiments show that NALU-enhanced Neural Networks can learn to perform arithmetic and logic operations. In contrast to conventional ANN architectures, this method obtains substantially better generalization outside the range of numerical values encountered during training. They argue in this paper, the ability to represent and manipulate numerical quantities is apparent in the behavior of many species, from insects to mammals to humans, suggesting that basic quantitative reasoning is a general component of intelligence, while referring to [39,40], to support this view. They also mention that while neural networks can successfully represent and manipulate numerical quantities, the behavior that they learn does not generally exhibit systematic generalization, referring to [41,42]. They state that this failure pattern indicates that the learned behavior is better characterized by memorization than by systematic abstraction.
4 4.1
Integrations in Deep Reinforcement Learning Relational Deep Reinforcement Learning
There is new research that goes further down the path of successive integrations and aims to combine both, ANNs/DL and Relational reasoning, directly with RL, or, in a way applicable to the RL domain. In [11], it is proposed a simple Neural Network module for Relational reasoning. Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for ANNs to perform it. In this paper they describe how to use Relation Networks (RNs), as a simple plug-and-play module, to solve problems that fundamentally hinge on Relational reasoning. They showed that powerful ANNs do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs, and how a DL architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. In [43], it is argued that AI has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decisionmaking. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of DL. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences, a hallmark of human intelligence from infancy, remains a formidable challenge for modern AI. They also argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, they reject the false choice between “hand-engineering” and “end-to-end” learning, and instead advocate for an approach which benefits from their complementary strengths.
408
F. M. Rocha et al.
They explore how using relational inductive biases within DL architectures can facilitate learning about entities, relations, and rules for composing them. They present a new building block for the AI toolkit with a strong relational inductive bias, the graph network, which generalizes and extends various approaches for Neural Networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. They discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. In [44], it is argued that reasoning about objects, relations, and physics is central to human intelligence, and a key goal of AI. They introduce the Interaction Network (IN), which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. The IN takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using DL. They evaluated its ability to reason about several challenging real-world physical systems and showed that it can generalize automatically to systems with different numbers and configurations of objects and relations. In [12], it is introduced an approach for DRL that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and Relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free Policy [4]. By considering architectural inductive biases, their work tries to open new directions for overcoming important, but stubborn, challenges in DRL. In [45], it is introduced an approach for augmenting model-free [4] DRL agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability. 4.2
Neural-Symbolic Approaches in Deep Reinforcement Learning
A previous work [10] defined the limitations of DRL, inherited from the DL techniques used, and tried to overcome these by implementing a NeuralSymbolic approach. The limitations mentioned were the requirements of very large datasets to work effectively and the slowness to learn. They also referred the lack of ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, they mentioned that DRL is largely opaque to humans, rendering it unsuitable for domains in which verifiability is important. Their solution involved the combination of high-level Relational reasoning with Deep Learning in the RL domain and was applied to several variants of a simple video game. They showed that the resulting system, just a prototype,
From RL Towards AGI
409
learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, outperforms a conventional, fully neural DRL system. Finally, the only approach that combines DRL with, specifically, ILP, is the Neural Logic Reinforcement Learning (NLRL) [46]. In this work, it is argued that most DRL algorithms suffer a generalization problem, which makes a DRL algorithm performance largely affected even by minor modifications of the training environment. Also, the use of deep Neural Networks makes the learned policies hard to be interpreted. With NLRL, they are proposing a novel algorithm that represents the policies in RL using FOL. NLRL is based on Policy Gradient [47] algorithms, integrated with DILP, which has demonstrated significant advantages in terms of interpretability and generalization.
5
Towards Artificial General Intelligence
Towards the ultimate goal of building Artificial General Intelligence (AGI), it is argued we need to build machines that learn and think for themselves [48], or like people [49]. In [48], it is considered that for building human-like intelligence, modelbased [4] reasoning is essential. Autonomy is also considered important. They advocate agents that can both build and exploit their own internal models, with minimal human hand engineering. They believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. They survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some of the outstanding challenges. In [49], it is considered that the recent progress in AI has renewed interest in building systems that learn and think like people. Many advances have come from using deep Neural Networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. Specifically, they argue that these machines should build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems, and harness compositionality and learning-tolearn, to rapidly acquire and generalize knowledge to new tasks and situations. They suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent Neural Network advances with more structured cognitive models.
410
6
F. M. Rocha et al.
Conclusions
The combination of DL with RL, DRL, became the state-of-the-art in RL algorithms, but limits were reached in terms of generalization and knowledge transfer, in other words, a model with a good performance in some environment or in doing some task, may perform very poorly in another context. Another barrier, is the massive number of interactions and corresponding data required to learn how to do even simple tasks, while complex tasks are still out of the reach for most RL algorithms. Finally, since DL models are black-boxes the lack of interpretability and explainability is evident. One approach to achieve learning of complex tasks with RL, is to learn the relations present in the environment dynamics, the so called, Relational RL. This allows for much more efficient task learning and provides reasoning abilities. The relations between objects and agents are effectively learned by a RL algorithm, instead of memorizing all possible scenarios where some relations are maintained. These relations are much more important in many cases than the absolute positions of the entities in an environment, for example. Not only the DRL limitations were identified but also the limitations of ANNs themselves. Since DRL uses the state-of-the-art in ANNs, DL, this has huge implications for DRL abilities as well. The combination of ANNs with Symbolic AI, in Neural Symbolic Approaches, overcomes some of the limitations of ANNs, such as poor generalization. Relational DRL is a newer development compared to Relational RL, since it takes advantage of the recent developments of DL. The purpose of this field, though, is exactly the same of Relational RL, to develop methods and architectures that allow learning of the relevant relations in the dynamics of a RL environment for the learning and execution of some target task. Since it takes advantage of DL, this field reached new heights, compared to its predecessor. Neural Symbolic Approaches in DRL, combines Relational DRL with Neural Symbolic Approaches. This field is the culmination of the integration of all the previous fields. There is little work yet to be surveyed. This field is different than Relational DRL, because it uses FOL representations which are fully symbolic. Relational DRL learns relations using current ANNs, and so, it learns propositional representations and misses the learning of higher-level representations and abstractions such as the ones provided by FOL statements and its learning using ILP. FOL and ILP can provide the ability to reason on an abstract level, which without, it will be hard to implement high-level cognitive functions such as transfer learning, deductive and inductive reasoning. These last approaches surveyed, are the ones in RL closest to leading us to AGI and if they further improve on overcoming the current DRL limitations, we argue that will be closer to reaching AGI. Acknowledgements. We thank the institutions CRACS/INESCTEC4 and LIACC/ UP3 , for the support and contribution of its members, that was very valuable, for the research behind this paper and its presentation.
From RL Towards AGI
411
This work is financed by National Funds through the Portuguese funding agency, FCT - Funda¸ca ˜o para a Ciˆencia e a Tecnologia within projects: UID/EEA/50014 and UID/EEA/00027.
References 1. Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education, London (2010). Third International Edition 2. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015) 3. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 16, 285–286 (1988) 4. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. ArXiv, cs.AI/9605103 (1996) 5. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015) 6. Irpan, A.: Deep reinforcement learning doesn’t work yet (2018). https://www. alexirpan.com/2018/02/14/rl-hard.html 7. Domingos, P.: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books, New York (2015) 8. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, 5th edn. Springer, Heidelberg (2007) 9. Domingos, P.M., Lowd, D.: Unifying logical and statistical AI with markov logic. Commun. ACM 62(7), 74–83 (2019) 10. Garnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. ArXiv, abs/1609.05518 (2016) 11. Santoro, A., Raposo, D., Barrett, D.G.T., Malinowski, M., Pascanu, R., Battaglia, P.W., Lillicrap, T.P.: A simple neural network module for relational reasoning. In: NIPS (2017) 12. Zambaldi, V.F., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D.P., Lillicrap, T.P., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M.M., Vinyals, O., Battaglia, P.W.: Relational deep reinforcement learning. ArXiv, abs/1806.01830 (2018) 13. Paes, A., Zaverucha, G., Costa, V.S.: On the use of stochastic local search techniques to revise first-order logic theories from examples. Mach. Learn. 106(2), 197–241 (2017) 14. Fitting, M.: First-Order Logic and Automated Theorem Proving. Graduate Texts in Computer Science, 2nd edn. Springer, Heidelberg (1996) 15. Christopher JCH Watkins and Peter Dayan: Q-learning. Mach. Learn. 8(3–4), 279– 292 (1992) 16. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
412
F. M. Rocha et al.
17. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.P., Simonyan, K., Hassabis, D.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018) 18. Ghazanfari, B., Afghah, F., Taylor, M.E.: Autonomous extraction of a hierarchical structure of tasks in reinforcement learning, a sequential associate rule mining approach. ArXiv, abs/1811.08275 (2018) 19. Ghazanfari, B., Taylor, M.E.: Autonomous extracting a hierarchical structure of tasks in reinforcement learning and multi-task reinforcement learning. ArXiv, abs/1709.04579 (2017) 20. El Bsat, S., Bou-Ammar, H., Taylor, M.E.: Scalable multitask policy gradient reinforcement learning. In: AAAI (2017) 21. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017) 22. Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2, 160–163 (1990) 23. Finn, C., Yu, T., Zhang, T., Abbeel, P., Levine, S.: One-shot visual imitation learning via meta-learning. ArXiv, abs/1709.04905 (2017) 24. Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. ArXiv, abs/1611.01796 (2016) 25. Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. 50, 21:1–21:35 (2017) 26. Gunning, D.: Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA), nd Web, 2 (2017) 27. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016) 28. Osband, I., Doron, Y., Hessel, M., Aslanides, J., Sezener, E., Saraiva, A., McKinney, K., Lattimore, T., Szepezv´ ari, C., Singh, S., Van Roy, B., Sutton, R.S., Silver, D., van Hasselt, H.: Behaviour suite for reinforcement learning. ArXiv, abs/1908.03568 (2019) 29. Dˇzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine learning 43(1–2), 7–52 (2001) 30. Tadepalli, P., Givan, R., Driessens, K.: Relational reinforcement learning: an overview. In: Proceedings of the ICML-2004 Workshop on Relational Reinforcement Learning, pp. 1–9 (2004) 31. Van Otterlo, M.: Relational representations in reinforcement learning: review and open problems. In: Proceedings of the ICML, vol. 2 (2002) 32. Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990) 33. Morales, E.F.: Scaling up reinforcement learning with a relational representation. In: Proceedings of the Workshop on Adaptability in Multi-agent Systems, pp. 15– 26 (2003) 34. Neural-symbolic integration. http://www.neural-symbolic.org 35. Evans, R., Grefenstette, E.: Learning explanatory rules from noisy data. J. Artif. Intell. Res. 61, 1–64 (2017) 36. Lloyd, J.W.: Foundations of logic programming. In: Symbolic Computation (1984) 37. Dong, H., Mao, J., Lin, T., Wang, C., Li, L., Zhou, D.: Neural logic machines. ArXiv, abs/1904.11694 (2019)
From RL Towards AGI
413
38. Trask, A., Hill, F., Reed, S.E., Rae, J.W., Dyer, C., Blunsom, P.: Neural arithmetic logic units. In: NeurIPS (2018) 39. Faris, W.G.: The number sense: how the mind creates mathematics by stanislas dehaene. Complexity 4(1), 46–48 (1998) 40. Gallistel, C.R.: Finding numbers in the brain. Philos. Trans. Roy. Soc. London Ser. B Biol. Sci. 373(1740) (2017) 41. Fodor, J.A., Pylyshyn, Z.W.: Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988) 42. Marcus, G.F.: Integrating connectionism and cognitive science, The algebraic mind (2001) 43. Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V.F., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., G¨ ul¸cehre, C ¸ ., Francis Song, H., Ballard, A.J., Gilmer, J., Dahl, G.E., Vaswani, A., Allen, K.R., Nash, C., Langston, V., Dyer, C., Heess, N.M.O., Wierstra, D., Kohli, P., Botvinick, M.M., Vinyals, O., Li, Y., Pascanu, R.: Relational inductive biases, deep learning, and graph networks. ArXiv, abs/1806.01261 (2018) 44. Battaglia, P.W., Pascanu, R., Lai, M., Rezende, D.J., Kavukcuoglu, K.: Interaction networks for learning about objects, relations and physics. In: NIPS (2016) 45. Zambaldi, V.F., Raposo, D.C., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D.P., Lillicrap, T.P., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M.M., Vinyals, O., Battaglia, P.W.: Deep reinforcement learning with relational inductive biases. In: ICLR (2019) 46. Jiang, Z., Luo, S.: Neural logic reinforcement learning. ArXiv, abs/1904.10729 (2019) 47. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS (1999) 48. Botvinick, M.M., Barrett, D.G.T., Battaglia, P.W., de Freitas, N., Kumaran, D., Leibo, J.Z., Lillicrap, T., Modayil, J., Mohamed, S., Rabinowitz, N.C., Rezende, D.J., Santoro, A., Schaul, T., Summerfield, C., Wayne, G., Weber, T., Wierstra, D., Legg, S., Hassabis, D.: Building machines that learn and think for themselves: commentary on lake et al., behavioral and brain sciences, 2017. Behavioral Brain Sci. 40, e255 (2017) 49. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)
Overcoming Reinforcement Learning Limits with Inductive Logic Programming Filipe Marinho Rocha1,3,4(B) , V´ıtor Santos Costa1,4 , and Lu´ıs Paulo Reis2,3 1
4
FCUP – Faculdade de Ciˆencias da Universidade do Porto, Porto, Portugal [email protected] 2 FEUP – Faculdade de Engenharia da Universidade do Porto, Porto, Portugal 3 LIACC – Laborat´ orio de Inteligˆencia Artificial e Ciˆencia de Computadores, Universidade do Porto, Porto, Portugal CRACS – Centre of Advanced Computing Systems, INESCTEC, Porto, Portugal
Abstract. This work presents some approaches to overcome current Reinforcement Learning limits. We implement a simple virtual environment and some state-of-the-art Reinforcement Learning algorithms for testing and producing a baseline for comparison. Then we implement a Relational Reinforcement Learning algorithm that shows superior performance to the baseline but requires introducing human knowledge. We also propose that Model-based Reinforcement Learning can help us overcome some of the barriers. For better World models, we explore Inductive Logic Programming methods, such as First-Order Inductive Learner, and develop an improved version of it, more adequate to Reinforcement Learning environments. Finally we develop a novel Neural Network architecture, the Inductive Logic Neural Network, to fill the gaps of the previous implementations, that shows great promise. Keywords: Relational Reinforcement Learning · Deep Reinforcement Learning · Inductive Logic Programming · Artificial neural networks
1
Introduction
Since Deep Reinforcement Learning (DRL) [1], the state-of-the-art in the RL domain, is reaching several known limitations [2], some new approaches are required to overcome these. Some of these limitations are: sample inefficiency, that is, the massive amount of data or interactions with the environment required for learning; weak generalization, that is, weak capability of acquiring knowledge that can be successful transferred to new environments and for different tasks; inability of learning complex tasks; and the lack of explainability and interpretability, of DRL models and their outputs. Recent research is trying to do this, by combining Relational reasoning with DRL [3–5]. Using Relational reasoning within Deep Learning (DL) [6] architectures can facilitate learning about entities, relations, and rules for composing c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 414–423, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_38
Overcoming RL limits with ILP
415
them. This lays the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. The Inductive Logic Programming (ILP) [7], or Relational Learning, approach, an instance of the Symbolic AI paradigm, and the First-Order Logic (FOL) [8] representations it uses, may solve a lot of the current problems of DRL and reach the kinds of solutions that many envision. The requirements of very large datasets to learn, can be tackled by ILP, which is very data efficient. From a small set of examples, it can extract rules that can be applied to never seen examples, with input values from a much larger range or very different than presented in the training data. The obtained logical rules and facts can then be combined for logical inference. Moving from a propositional to a relational representation facilitates generalization over goals, states, and actions, exploiting knowledge learned during an earlier learning phase, hence, allowing for better generalization and transfer learning. On the other hand, FOL representations also facilitate the use of background knowledge. Background knowledge can be provided by a human, when giving the system, FOL facts and predicates relevant to the learning problem. FOL is highly expressive and its statements are easily readable by humans. One of the promising approaches for solving the current limits of DRL is model-based RL [9]. Most of the current research in RL uses model-free approaches. A world model in model-based RL can be very useful since the learning of the environment dynamics can help in the process of learning several tasks in the same environment instead of starting from scratch every time it learns a new task. This knowledge can also be useful in new environments with some common dynamics, therefore improving on the generalization and sample efficiency capabilities. An important feature of model-based approaches is making general knowledge of the environment available for decision-making and planning by the RL agent. But this knowledge can also be made available for humans, if it is interpretable, as it is the case with FOL statements, since it provides symbolic rules that are easily comprehensible by humans. At the end, we present a novel architecture that integrates Neural Networks with ILP, the Inductive Logic Neural Network (ILNN). It performs ILP or Relational Learning, and this capability is embedded in the Neural Network itself. The learned FOL predicates can then be used for deductive reasoning and for diverse tasks, effectively performing transfer learning.
2 2.1
Implementations Simple OpenAI Gym Environment Implementation
A simple OpenAI Gym [10] environment was implemented, in order to experiment with several approaches. It consists of a 2D grid environment where the agent, a robot, has the goal to put a ball in a box. The agent can pick-up and
416
F. M. Rocha et al.
put-down, the box or the ball, but can carry only one of the objects at one time. It is a discrete action and state space, and deterministic environment. There are 4 sizes of the grid available: 3 × 3, 5 × 5, 8 × 8 and 20 × 20. And one can configure to get the initial states fixed for each and every episode or variable, instead. In both cases, initial states are selected randomly. States are described by a vector: [x1 , x2 , x3 , x4 , x5 , x6 , x7 ], where (x1 , x2 ), (x3 , x4 ) and (x5 , x6 ) are the XY coordinates of the agent, ball and box, respectively, and x7 is the holding variable: 0-holds nothing, 1-holds the ball and 2-holds the box. There are 10 possible actions (A), an agent can take: [‘move-up’, ‘movedown’, ‘move-left’, ‘move-right’, ‘move-up-right’, ‘move-up-left’, ‘move-downright’, ‘move-down-left’, ‘pick-up’, ‘put-down’]. The reward is 100 if it attains the goal and −1, otherwise. This is to motivate the RL agent to learn the shortest sequence of steps towards the goal (Fig. 1).
Fig. 1. Screenshot of the RL agent in action in the simple OpenAI Gym environment
2.2
Several RL Algorithms Implementation and Testing
Several RL algorithms were implemented and tested in our simple environment: Deep Q Network (DQN) [1], a Relational RL algorithm [11], where R-actions and R-states are defined, to train a DQN agent on, also called, rQ-learning. Hindsight Experience Replay (HER) [12] and Dyna-Q [13] were implemented too. The learning performance of some of the algorithms, in a 5 × 5 grid, with variable and random initial states for each learning episode, can be visualized in Figs. 2, 3 and 4. The 20 × 20 version of our simple environment is a very sparse reward environment and hard to solve. The reward is only given when the ball is put in the
Overcoming RL limits with ILP
417
box. So it is required that the right high-level steps combination is performed everytime: pick-up the ball, carry the ball to the cell where the box is, and put-down the ball in the box, to, finally, get a different signal. The only algorithm capable of learning the task, in a timely manner, in this 20 × 20 grid, was the rQ-learning, using DQN as the base algorithm. To implement this algorithm, the R-actions defined were: [‘go-to-ball’, ‘go-to-box’, ‘pickup-ball’, ‘pick-up-box’, ‘put-down-ball’, ‘put-down-box’]. This R-actions could be defined using FOL. For example, goTo, pickUp, and putDown could be FOL predicates and then the variable Object could be ball or box, endowing the system with more abstraction and generalization capabilities. And the R-state description defined was: [‘Agent to Ball distance in steps’, ‘Agent to Box distance in steps’, ‘Agent holding Ball’, ‘Agent holding Box’]. Instead of the primitive actions and states given as input, this higher-level relational descriptions were given as input to the DQN model. The performance was great, but had the problem of requiring a human to define the R-space of actions and states.
Fig. 2. Learning performance of DQN only vs Dyna-Q with DQN in a 5 × 5 grid
2.3
FOIL Implementation with an Improvement
The learning by the model of the environment dynamics, in the Dyna-Q approach, took too long, specially learning to predict the rarest state transitions, such as the transition to the Goal state. So we tried to learn the transition to the Goal state by using the FOIL algorithm [14]. It was much more data efficient than the Deep Network used in Dyna-Q, since it learned to predict the transition to Goal state with 100% accuracy using much less examples than the DL model.
418
F. M. Rocha et al.
Fig. 3. Learning performance of HER in a 5 × 5 grid
The goal predicate could then be learned: g o a l (X1 , X2 , X3 , X4 , X5 , X6 , X7 ,A): − e q u a l (X1 , X5 ) , e q u a l (X2 , X6 ) , e q u a l (X7 , 1 ) , e q u a l (A, 9 ) .
Since the goal predicate was successfully learned in the form of a FOL [8] rule with perfect generalization capabilities, we then tried to use FOIL to learn action predicates. For each possible action, the definition of the state transition: current state X [x1 , x2 , x3 , x4 , x5 , x6 , x7 ] to next state Y [y1 , y2 , y3 , y4 , y5 , y6 , y7 ], in the form of a FOL predicate, was achieved. For example, for the action Move Up, the rules learned were: up (X1 , X2 , X3 , X4 , X5 , X6 , X7 , Y1 , Y2 , Y3 , Y4 , Y5 , Y6 , Y7): − e q u a l (X1−1 ,Y1 ) , e q u a l (X2 , Y3 ) , g r e a t e r T h a n (X1 , 0 ) . up (X1 , X2 , X3 , X4 , X5 , X6 , X7 , Y1 , Y2 , Y3 , Y4 , Y5 , Y6 , Y7): − e q u a l (X1 , Y1 ) , e q u a l (X2 , Y3 ) , e q u a l (X1 , 0 ) .
In order to learn FOL atoms of the type: equal(X1 + 1, Y1), we made an improvement on the classical FOIL algorithm. All variables could be combined with a numerical term by addition. These term could be a positive or negative value, amongst the possible discrete values in the grid, in the interval [4, −4], for the 5 × 5 grid case. The obtained FOL statements are correct too and have perfect generalization capabilities, but aren’t enough to describe the next state and reward completely, so FOIL couldn’t be used for building a complete model of the world.
Overcoming RL limits with ILP
419
Fig. 4. Learning performance of rQ-learning in a 20 × 20 grid
2.4
Novel ANN Architecture Development: ILNN
Hence, a new neural network architecture was designed in order to perform ILP, capable of learning a complete model of the world in our simple environment. It was named Inductive Logic Neural Network (ILNN) Fig. 5. The proposed architecture is composed of a Relational layer, itself composed of an arbitrary number of Relational and Linear Nodes. Linear Nodes (+) have the Identity activation: f (x) = x For each Relational Node, the corresponding activation function, is: Greater than (>) 1, x > 0 f (x) = 0 Lower than ( 0, that have True/False, or, 1/0 outputs.
420
F. M. Rocha et al.
Input Layer
Relational Layer
Product Layer
Output Layer
∗
y1
∗
y2
∗
yn
> >
x1
=
=
x2
4
w=
1
b = −4 x1 = 4
1
w= x1
b = −4
∗
w=1 x1 < 4
w=
1
x2
b = +2
∗
y
x1 + 2 b = +2
1 w=
3 w=
w=
2x2 + 2
2
∗
b = +2
3x1 + x2 + 2
Fig. 6. Compressed representation of an example Inductive Logic Neural Network
422
F. M. Rocha et al.
The solution achieved for this simple challenge was perfect and the compressed version of the architecture obtained can be visualized in Fig. 6. This is a compressed version, since the nodes with zero value weights in their connections, were removed, as they weren’t contributing for the final output of the network.
3
Conclusions
We designed a simple RL environment, the Simple OpenAI Gym environment, to test and compare state-of-the-art DRL algorithms with promising approaches for solving current DRL limitations. These approaches, we have identified, are: Relational RL, Model-based RL and the application of ILP methods for both Relational RL and Model-based RL. The learning of the world model in our simple environment, when using the Dyna-Q algorithm, took too long, specially learning to predict the rarest state transitions, such as the transition to the Goal state. So, we applied an ILP method to substitute the Deep Network used for learning the world model. The ILP method used was FOIL, but with an added improvement we developed. We then start by applying it to learn the Goal predicate, that is, the FOL predicate that described the state transition to the Goal State in the RL environment. This predicate was perfectly learned by the improved FOIL. We then applied it for learning all the actions predicates too, but the obtained logical rules were incomplete, as they didn’t describe all dimensions of the output: the next state and reward prediction. To bridge this gap, we develop a novel ANN architecture: ILNN. Before trying to apply it to learn the world model of our simple environment, we applied it to a simpler problem. We designed a simple branch function with an one-dimensional output, to generate data for training an ILNN model. The obtained model was perfect since it described symbolically, in an identical form, the target function to be learned. The ILNN shows great promise for learning world models in RL and also for performing Relational RL, in a broader sense, for both model-based and model-free approaches. These will be our next steps. Acknowledgements. We thank the institutions CRACS/INESCTEC4 and LIACC/ UP3 , for the support and contribution of its members, that was very valuable, for the research behind this paper and its presentation. This work is financed by National Funds through the Portuguese funding agency, FCT - Funda¸ca ˜o para a Ciˆencia e a Tecnologia within projects: UID/EEA/50014 and UID/EEA/00027.
Overcoming RL limits with ILP
423
References 1. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015) 2. Irpan, A.: Deep reinforcement learning doesn’t work yet (2018). https://www. alexirpan.com/2018/02/14/rl-hard.html 3. MGarnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. ArXiv, abs/1609.05518 (2016) 4. Santoro, A., Raposo, D., Barrett, D.G.T, Malinowski, M., Pascanu, R., Battaglia, P.W., Lillicrap, T.P.: A simple neural network module for relational reasoning. In: NIPS (2017) 5. Zambaldi, V.F., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D.P., Lillicrap, T.P., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M.M., Vinyals, O., Battaglia, P.W.: Relational deep reinforcement learning. ArXiv, abs/1806.01830 (2018) 6. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015) 7. Paes, A., Zaverucha, G., Costa, V.S.: On the use of stochastic local search techniques to revise first-order logic theories from examples. Mach. Learn. 106(2), 197–241 (2017) 8. Fitting, M.: First-Order Logic and Automated Theorem Proving. Graduate Texts in Computer Science, 2nd edn. Springer, New York (1996) 9. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. ArXiv, cs.AI/9605103 (1996) 10. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. ArXiv, abs/1606.01540 (2016) 11. Morales, E.F.: Scaling up reinforcement learning with a relational representation. In: Proceedings of the Workshop on Adaptability in Multi-agent Systems, pp. 15– 26 (2003) 12. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O.P., Zaremba, W.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017) 13. Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bulletin 2, 160–163 (1990) 14. Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990) 15. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
A Comparison of LSTM and XGBoost for Predicting Firemen Interventions Selene Cerna1(B) , Christophe Guyeux2 , H´eber H. Arcolezi2 , Rapha¨el Couturier2 , and Guillaume Royer3 1
S˜ ao Paulo State University (UNESP), Ilha Solteira, SP, Brazil [email protected] 2 Femto-ST Institute, UMR 6174 CNRS, Univ. Bourgogne Franche-Comt´e, Besan¸con, France {christophe.guyeux,heber.hwang arcolezi,raphael.couturier}@univ-fcomte.fr 3 SDIS 25, Besan¸con, France [email protected]
Abstract. In several areas of the world such as France, fire brigades are facing a constant increase in the number of their commitments, some of the main reasons are related to the growth and aging of the population and others to global warming. This increase occurs principally in constant human and material resources, due to the financial crisis and the disengagement of the states. Therefore, forecasting the number of future interventions will have a great impact on optimizing the number and the type of on-call firefighters, making it possible to avoid too few firefighters available during peak load or an oversized guard during off-peak periods. These predictions are viable, given firefighters’ labor is conditioned by human activity in general, itself correlated to meteorological data, calendars, etc. This article aims to show that machine learning tools are mature enough at present to allow useful predictions considering rare events such as natural disasters. The tools chosen are XGBoost and LSTM, two of the best currently available approaches, in which the basic experts are decision trees and neurons. Thereby, it seemed appropriate to compare them to determine if they can forecast the firefighters’ response load and if so, if the results obtained are comparable. The entire process is detailed, from data collection to the predictions. The results obtained prove that such a quality prediction is entirely feasible and could still be improved by other techniques such as hyperparameter optimization.
Keywords: Long Short-Term Memory · Extreme Gradient Boosting Firemen interventions · Machine learning · Forecasting
·
This work was supported by the EIPHI Graduate School (contract “ANR-17-EURE0002”), by the Region of Bourgogne Franche-Comt´e CADRAN Project, by the Interreg RESponSE project, and by the SDIS25 firemen brigade. We also thank the supercomputer facilities of the M´esocentre de calcul de Franche-Comt´e. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 424–434, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_39
LSTM and XGBoost for Predicting Firemen Interventions
1
425
Introduction
Most of the problems faced on a day-to-day basis by fire brigades are related to the increase in the number of interventions over time and with the management of an insufficient budget. This results in personnel and equipment shortages and affecting the response time to the incidents. Therefore, taking advantage of their data gathered through the years to build models that can predict the occurrence of an intervention in the future would help in establishing better strategies to nurse the community and reduce the response time. Consequently, more lives would be saved with fewer efforts. Making a review of the literature, researches on the specific field of forecasting the number, type or location of interventions for fire departments is still scarce in the literature [5]. For this reason, the present work compares the use of two machine learning (ML) methods: the Extreme Gradient Boosting (XGBoost), which is based on decision trees that highly optimize the processing time and model complexity. And, the Long Short-Term Memory (LSTM), a highlighting variation of the Recurrent Neural Network (RNN) and introduced by [12], which has shown a remarkable performance in sequential data applications along with overcoming the vanishing gradient problem presented in the RNN [10,11]. With the primary objective of providing a data-driven decision-making approach for fire departments to forecast the number of interventions in the next hour. As references, it was taken some researches with LSTM: short-term traffic speed and flow predictions [8,13] and a survey on the analysis of eight LSTM variants on three tasks: speech recognition, handwriting recognition, and polyphonic music modeling [11]. For XGBoost one can find researches predicting traffic flow prediction using ensemble decision trees for regression [4] and with a hybrid deep learning framework [15]. The following sections of this paper are structured as: in Sect. 2.1 the way the data were acquired and encoded is presented; in Sect. 2.2 a short description of LSTM and XGBoost methods is provided; in Sect. 3 prediction results are described and a discussion to highlight the results is made, and in Sect. 4 concluding thoughts and forthcoming works are given.
2 2.1
Materials and Methods Data Acquisition and Encoding
It was considered two sources. The main contains information about all the interventions recorded from 2006 to 2017 by the fire and rescue department SDIS25, in the region of Doubs-France. And the second contains external variables such as weather, traffic, holidays, etc. First, the date and the time of each intervention were extracted, in order to recognize time patterns on the occurrence of incidents. For example, it was noticed that there were more interventions occurring during the day. Besides, meteorological data were considered, which contributes significantly to the
426
S. Cerna et al.
forecast of the number of incidents (e.g., road accidents are related to road surface condition). Holidays and academic vacations were also taken into account in view of young people tend to go out during these periods. Thus, a dictionary was organized as it is described in [5], section III, subsection A, with the following differences: – The height of the six most important rivers of the Doubs department was considered. The average, the standard deviation and the number of readings belonging to the block of 1 h were used [2]. – From the Skyfield library [3], it was taken the distance between the Earth and the Moon to examine its influence in natural disasters. – Festivities such as Ramadan, Eurock´eennes, Perc´ee du Vin Jaune and the FIMU were included as indicators with values 1 for the eve, duration and a day after, and 0 for normal days. – After analyzing the data, it was discovered that leap years have an impact on the variable of the day in the year. For instance, July 14th (the National Day of France) is not the same day when the month of February has 29 days. For this reason, February 29th of 2008, 2012 and 2016 were removed. The data were transformed into our learning format employing two methods from Scikit-learn library [14]. The “StandardScaler” method was applied to numerical variables such as year, hour, wind speed and direction, humidity, nebulosity, dew point, precipitations, bursts, temperature, visibility, chickenpox, influenza, and acute diarrhea statistics, rivers height and moon distance; which re-scales the distribution of values to zero mean and unit variance. The “OneHotEncoder” method was employed to convert into indicators categorical variables such as Bison Fut´e’s values, day, day of the week, day of the year and month, holidays, barometric trend and festivities. The original target values were kept (the number of interventions) because the distribution of the interventions count is better represented by discrete values. The organization of each sample consisted of joining the extracted features with the number of interventions of the previous 169 h (1 week plus 1 h). Eventually, the data set is considered as sequential data and converted to supervised learning, i.e., the target is the number of interventions in the next hour (t + 1) of a present sample (t). 2.2
Machine Learning Techniques for Predicting Firefighters Interventions
Long Short-Term Memory. Its memory cell consists of one principal layer and three gate controllers: input, forget and output. The principal layer analyzes the present entry xt and the preceding short-term state ht−1 . The input gate regulates the flow of new memories. The forget gate controls which memories will be eliminated from the previous long-term state ct−1 , and with the new memories it is obtained the new long-term state ct . The output gate establishes which memories will be considered as the new output of the LSTM cell for a
LSTM and XGBoost for Predicting Firemen Interventions
427
specific time step, i.e., the y(t), which at some point during the operation is equal to the new short-term state h(t) [9]. The process is mathematically expressed as: T T · xt + Whi · ht−1 + bi ) it = σ(Wxi
(1)
T T ft = σ(Wxf · xt + Whf · ht−1 + bf )
(2)
T T ot = σ(Wxo · xt + Who · ht−1 + bo )
(3)
T T gt = tanh(Wxg · xt + Whg · ht−1 + bg )
(4)
ct = ft ⊗ ct−1 + it ⊗ gt
(5)
yt = ht = ot ⊗ tanh(ct )
(6)
where Wxi , Wxf , Wxo and Wxg are the weight matrices for their connection to the input vector xt ; Whi , Whf , Who and Whg are the weight matrices for their connection to the previous short-term state ht−1 ; and bf , bg , bi and bo are the bias terms of each layer. For more details about LSTM, see [9,10,12]. Our LSTM model was developed with Keras library [7]. It was built with one LSTM layer and 6000 neurons, one dense layer with one neuron as output and a last layer with the LeakyReLU activation function, considering 0.1 in the negative slope coefficient. The time step was one per input. For the training phase, the Stochastic Gradient Descent optimizer was used with a learning rate of 0.01, momentum and decay values of 0.0001, Poisson as loss function, a batch size of 64 and 200 epochs with an “EarlyStopping” of 10 epochs to monitor the loss function decrease of the validation set. Extreme Gradient Boosting. XGBoost uses a new regularization approach over the conventional Gradient Boosting Machines (GBMs) to significantly decrease the complexity. In order to measure the performance of a model given a certain data set, XGBoost defines an objective function considering the training loss L(θ) and regularization Ω(θ) terms, where the latter penalizes the complexity of the model and prevents the overfitting, and θ refers to the parameters that (t) will be discovered during the training (Eq. 7). The result model yˆi at training the round t is the combination of k trees, i.e., an additive strategy is applied during the training, one new tree that optimizes the system ft (xi ) is added at a (t−1) time to the model yˆi generated in the previous round, where xi is the input (Eq. 8). To determine the complexity of the tree Ω(f ), [6] proposed an approach that defines it as Eq. 9, where the first term γT evaluates the number of leaves T , taking γ as a constant, and the second term computes L2 norm of leaves scores wj . In Eq. 10 and Eq. 11, gi and hi respectively, are the first and second
428
S. Cerna et al.
order partial derivatives after taking the Taylor expansion of the loss function chosen, Ij = {i|q(xi ) = j} is the group of indices of data points attributed to the j-th leaf and q(x) is the structure of the tree. Finally, in the objective function, the argument of the minimum and the minimum of the quadratic function for the single variable wj are taken, considering q(x) as fixed and λ as a very small constant value, the outcomes are Eq. 12 and Eq. 13, where the latter assesses the quality of a tree structure, i.e., an smaller score is better [6]. obj(θ) = L(θ) + Ω(θ) (t)
yˆi
=
t
(t−1)
fk (xi ) = yˆi
+ ft (xi )
(7) (8)
k=1 T
1 2 Ω(f ) = γT + λ w 2 j=1 j
Gj =
(9)
gi
(10)
hi
(11)
i∈Ij
Hj =
i∈Ij
wj∗ = − obj ∗ = −
Gj Hj + λ
T 1 G2j + γT 2 j=1 Hj + λ
(12)
(13)
Our XGBoost model was improved using a GridSearchCV procedure from the Scikit-learn library [14]. The best model used in this research has a max depth of 3, a learning rate of 0.1, the learning task is Count and the learning objective is Poisson, which is for data counting problems; the remaining parameters were kept as default.
3 3.1
Prediction Results and Discussion Prediction Results
The metrics defined to evaluate the results are the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE). And because the events to be predicted are countable, the accuracy score was also considered with a margin of error zero (ACC0E) that represents the number of exact predictions reached, with a margin of error less or equal to one (ACC1E) and to two (ACC2E), which provides feasible results for real applications. In order to discover unusual years through the analysis of the prediction metrics, during each iteration, each year is predicted (it is considered as testing set), the remaining years are used as training
LSTM and XGBoost for Predicting Firemen Interventions
429
and validation sets (e.g., to predict 2006, 2007–2017 were used as learning sets; to predict 2007, 2006 and 2008–2017 were used as learning set, etc). Naturally, this is not a real case, but it provides information about how well can each year be predicted and why some years present atypical results. The data set was not cleaned from possible outliers, such as natural disasters (e.g., storms, fires, floods) and strikes that were found in our search analysis. Considering that in real-world applications, the system must perform well in such conditions it is worth maintaining these occurrences and evaluates the performances of the proposed methods. Thus, Table 1 presents data analysis of the interventions per year, with the metrics: total number of interventions (Total Interv.), the average (Average), the standard deviation (Std. Dev.) and the maximum number of interventions (Max. Interv).w Table 2 presents the results of the forecast to both LSTM and XGBoost models for all years (2006–2017). Figure 1 represents the total number of interventions per year. Figure 2, Fig. 3 and Fig. 4 illustrates responses of LSTM and XGBoost models on 100 samples trying to predict an unusual number of interventions occurred in 2010, 2011 and 2016 as a result of natural disasters in the Franche-Comt´e region. Moreover, forecasting results on 100 samples for 2017 are presented in Fig. 5, which is the year that only considers past years in the training process and presents an uncommon behavior due to ambulance strikes and climate conditions presented in the Doubs region during that year. Lastly, taking into account that the LSTM NN and the XGBoost models predict real values (e.g., 5.67 interventions), results were transformed to the closest integer (e.g., 6 interventions) for being coherent with real-world applications. Table 1. Data analysis of the interventions during 2006–2017 Year Total interv. Average Std. dev. Max. interv. 2006 17,375
1.98
2.04
30
2007 19,368
2.21
2.06
13
2008 18,037
2.05
1.95
16
2009 28,719
3.27
3.34
84
2010 29,656
3.38
3.05
93
2011 33,715
3.84
3.66
48
2012 29,070
3.31
2.50
26
2013 29,830
3.40
2.48
30
2014 30,689
3.50
2.55
22
2015 33,586
3.83
2.68
21
2016 34,434
3.93
3.13
85
2017 37,674
4.30
2.94
22
430
S. Cerna et al.
Fig. 1. Total number of interventions per year.
Table 2. Prediction results on data 2006–2017 Year LSTM
XGBoost
RMSE MAE ACC0E ACC1E ACC2E RMSE MAE ACC0E ACC1E ACC2E 2006 1.60
1.13
28.28%
73.04%
90.99%
1.61
1.16
25.55%
73.27%
90.86%
2007 1.63
1.19
27.27%
70.91%
89.06%
1.66
1.20
26.19%
71.48%
88.83%
2008 1.59
1.16
26.83%
71.68%
90.28%
1.64
1.22
24.45%
69.94%
89.55%
2009 2.28
1.49
22.72%
62.28%
83.00%
2.39
1.58
21.59%
59.04%
80.36%
2010 2.32
1.49
23.17%
61.96%
81.92%
2.22
1.51
22.65%
60.82%
81.50%
2011 2.49
1.68
21.05%
57.54%
78.92%
2.55
1.69
21.07%
58.16%
78.93%
2012 2.06
1.53
21.30%
58.26%
81.11%
2.08
1.55
21.16%
58.03%
80.02%
2013 2.05
1.53
21.15%
58.81%
80.58%
2.06
1.54
20.91%
58.68%
80.22%
2014 2.04
1.52
21.26%
59.10%
81.17%
2.06
1.52
21.47%
59.37%
81.00%
2015 2.09
1.58
21.14%
56.41%
79.49%
2.09
1.56
21.51%
57.70%
79.48%
2016 2.64
1.71
18.94%
53.91%
77.51%
2.58
1.67
19.16%
55.49%
78.42%
2017 2.26
1.69
19.90%
54.63%
76.80%
2.27
1.68
20.38%
55.59%
76.94%
Fig. 2. Predictions for 2010.
Fig. 3. Predictions for 2011.
LSTM and XGBoost for Predicting Firemen Interventions
Fig. 4. Predictions for 2016.
3.2
431
Fig. 5. Predictions for 2017.
Discussion
The purpose of this research was to develop and evaluate two ML methods on forecasting the number of future firefighters interventions using data from 2006 to 2017, dividing them in training, validation and testing sets. As presented in Table 2, one can see that with reasonable efforts on features and relatively basic use of the XGBoost and LSTM techniques, quite good predictions results were obtained. Furthermore, it was noted that the results in both methods were very similar, one for which the basic expert is a neuron (LSTM) and the other a decision tree (XGBoost). Moreover, as one can see in Figs. 2, 3 and 4, the XGBoost technique is a little more robust to outlier data than LSTM, where the first recognized the peak occurrences better during natural disasters. Considering that these occurrences are highly likely to happen in the future and fire brigades pursue to nurse its community better, real systems must be prepared to face the input data with uncommon values. Notwithstanding, the LSTM model presented better metrics values and accuracies for almost all of the years, which represents that in normal conditions and even with higher error values during peak occurrences, its metrics outstand those from XGBoost. Moreover, the use of deeper layers and more time steps could improve results by better generalizing the data. Additionally, as presented in Table 1 and in Fig. 1, an increment in the number of interventions throughout the years is clearly highlighted, which could be probably due to population-aging and growth. However, one can notice an abnormal increment from 2008 to the years 2009 and 2011, in which natural disasters took place, i.e., in contrast to the aforementioned years, 2012–2015 follow a regular pattern of increment. This characteristic is also noted analyzing the metrics average, where from approximately 2 interventions per hour in 2008 increases to almost 4 in 2011; the standard deviation with higher values are for the years 2009–2011 and 2016, where the data were more sparse due to peak value occurrences during natural disasters, which is also well represented by the maximum number of interventions.
432
S. Cerna et al.
Also, in Table 2 one can observe a high increment in the RMSE and MAE metrics and a decrement in the ACC0E, ACC1E and ACC2E metrics starting from 2009. Initially, for the years 2009–2011 and 2016, poor metrics results are obtained probably occasioned by the outlier data. However, for 2012–2015 this increment is also following a normal pattern comparing to the increment of the total number of interventions. Finally, 2017, which is the most realistic prediction, presents lower ACC02E accuracy, which is probably because of the typical increment over the years and because of some factors that could not be detected by the models as outlier data, i.e., the increment was not just for a few hours like peaks (e.g., the Max. Interv. is just 22), but for many samples. In our search analysis, we found that there was an ambulance strike that lasted 29 days between September and October, resulting in more incident attendance for the fire brigade. Also, it was found online sources related to an increment of 60% in the number of interventions for the Doubs department caused by a heatwave that occurred in June [1]. Therefore, for normal years, i.e., without outlier data the proposed models could achieve a good prediction, e.g., for 2006–2008 both models could predict with a high level of accuracy considering the ACC1E and ACC2E metrics approximately 73% and 90% were accomplished respectively. And, for years in which social or natural circumstances affected directly the prediction results, the scores are still acceptable for practical purposes, i.e., we do recognize that an intelligent system with accuracy between 50% and 70%, could not be used as first decision-making approach for fire brigades. However, we believe that results can even be improved by adding significant features and developing new models.
4
Conclusion
The development of intelligent systems to predict the number of interventions at a given time into the future could help fire brigades around the world to efficiently prepare themselves for future incidents. This paper presented two well-known machine learning methods, the LSTM and the XGBoost to predict the number of interventions for the next hour that firefighters would face in the region of Doubs-France. To validate the performance of both methods, a data set containing interventions information registered during 12 years (2006–2017) was provided by the departmental fire and rescue SDIS25, located in Doubs-France. The analysis of the results demonstrated a high increment in the number of interventions over the years, wherein 12 years this value was more than duplicated. This could represent more and more work for the next years if the pattern is kept. In other words, a change in the management of the budget must be considered to prevent personnel and equipment shortages, to continue improving response times to incidents and to better attend victims’ needs. Furthermore, results demonstrated that forecasting firemen interventions with good accuracy are possible and feasible for practical purposes. Considering that both models were basically tuned, better results can be achieved concentrating more efforts on the tuning procedure and on arranging features.
LSTM and XGBoost for Predicting Firemen Interventions
433
For future work, we will continue testing different Machine Learning methods, combining the LSTM NN with others NN models (e.g., Convolutional NN) and testing a large number of time steps, evaluating and adding new variables to our data set (e.g., social events) and trying out feature selection methods (e.g., F-test and Principal Component Analysis). Additionally, we are working on techniques capable of predicting the sort and the place of interventions to build a complete predictive system for firefighters.
References 1. France Bleu. https://www.francebleu.fr/infos/societe/60-d-interventions-en-pluspour-les-pompiers-du-doubs-pendant-la-canicule-1498142401 2. Minist`ere de l’ecologie, du d´eveloppement durable et de l’energie. http://www. hydro.eaufrance.fr/ 3. Skyfield. https://github.com/skyfielders/python-skyfield 4. Alajali, W., Zhou, W., Wen, S.: Traffic flow prediction for road intersection safety. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, October 2018. https://doi.org/10.1109/smartworld.2018.00151 5. Cerna, S., Guyeux, C., Arcolezi, H.H., Lotufo, A.D.P., Couturier, R., Royer, G.: Long short-term memory for predicting firemen interventions. In: 6th International Conference on Control, Decision and Information Technologies (CoDIT 2019), Paris, France, April 2019. https://doi.org/10.1109/codit.2019.8820671 6. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2016, pp. 785–794. ACM, New York (2016). https://doi.org/ 10.1145/2939672.2939785 7. Chollet, F., et al.: Keras (2015). https://keras.io 8. Du, S., Li, T., Gong, X., Yang, Y., Horng, S.J.: Traffic flow forecasting based on hybrid deep learning framework. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE, November 2017. https://doi.org/10.1109/iske.2017.8258813 9. G´eron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow, vol. 1, 1st edn. O’Reilly Media, Sebastopol (2017) 10. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin/Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2 11. Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017). https://doi.org/10.1109/tnnls.2016.2582924 12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 13. Liu, Y., Wang, Y., Yang, X., Zhang, L.: Short-term travel time prediction by deep learning: a comparison of different LSTM-DNN models. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). IEEE, October 2017. https://doi.org/10.1109/itsc.2017.8317886
434
S. Cerna et al.
14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 15. Shi, X., Li, Q., Qi, Y., Huang, T., Li, J.: An accident prediction approach based on XGBoost. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE, November 2017. https://doi.org/10.1109/ iske.2017.8258806
Exact Algorithms for Scheduling Programs with Shared Tasks Imed Kacem, Giorgio Lucarelli, and Th´eo Naz´e(B) Laboratoire LCOMS, Universit´e de Lorraine, Metz, France {imed.kacem,giorgio.lucarelli,theo.naze}@univ-lorraine.fr
Abstract. We study a scheduling problem where the jobs we have to perform are composed of one or more tasks. If two jobs sharing a nonempty subset of tasks are scheduled on the same machine, then these shared tasks have to be performed only once. This kind of problem is known in the literature under the names of VM-PACKING or PAGINATION. Our objective is to schedule a set of these objects on two parallel identical machines, with the aim of minimizing the makespan. This problem is NP-complete as an extension of the PARTITION problem. In this paper we present two exact algorithms with worst-case timecomplexity guarantees, by exploring different branching techniques. Our first algorithm focuses on the relation between jobs sharing one or more symbols in common, whereas the other algorithm branches on the shared symbols.
Keywords: Scheduling
1
· Exact algorithms · Makespan minimization
Introduction
In this paper we consider the problem of scheduling a set P of programs on two parallel identical machines M1 and M2 . Given a set of tasks T , each program Pi ∈ P is composed of a subset of tasks Ti ⊆ T . Each task Tj ∈ T is characterized by a processing time. Note that, any pair of programs may share several tasks, while a task may be shared by multiple programs. If two programs sharing a non-empty subset of tasks are assigned in the same machine, then these shared tasks have to be executed only once. In order to consider a program Pi ∈ P as successfully performed, there should be a machine which processes all tasks in Ti . Note that a shared task may be processed by both machines. There is no constraint on the execution order of the tasks of any program. Let S ⊆ T be the subset of tasks that are shared by at least two programs and c = |S|. The goal is to find a schedule of minimum makespan, i.e., the completion time of the machine which completes last is minimized. Using the three-field notation [4] , this problem can be denoted as 2|shared|Cmax with the second field (shared) denoting the constraint associated with the shared tasks execution. Throughout this contribution, we will sometimes refer to this problem as PAGINATION [5]. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 435–444, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_40
436
I. Kacem et al.
An instance of the above problem can be represented by a simple graph G = (V, E) whose vertices correspond to the programs (i.e., V = P) and there is an edge between two vertices if and only if the corresponding programs share at least one task. Figure 1 gives an example of an instance composed by four programs and 7 tasks of unit processing time where three of them are shared, as well as, its graph representation and the corresponding optimal schedule. Each ellipse corresponds to a program: T1 = {T1 , T5 , T6 , T7 }, T2 = {T2 , T5 }, T3 = {T3 , T6 } and T4 = {T4 , T7 }.
P2
T2 T5 T6
T1 T7
T3
P1 P3 (a)
P2
T4
P1
P4 P3
(b)
P4
M1
T1 T2 T5 T6 T7
M2
T3 T4 T6 T7 (c)
Fig. 1. An example of an instance, its graph representation and the corresponding optimal schedule.
It is easy to see that the above problem is NP-complete, since it generalizes the well-known PARTITION problem [3]. In this work, we are interested in the design of exact algorithms, i.e., algorithms that create an optimal schedule, with worst-case complexity guarantees. This complexity may depend on the size of the instance and/or the value of a parameter of the instance. The motivation for studying the above problem comes from pagination aspects in virtual machine environments (see for example [8]). Two different objectives have been studied in the context of the pagination problem: (i) the maximization of the executed jobs or throughput [8], and (ii) the minimization of the number of used machines. [5,8]. We propose to study the makespan minimization problem which corresponds to the minimization of the storage capacity in the pagination problem. 1.1
Motivation
Putting aside the theoretical richness of this problem, PAGINATION has numerous practical applications in fields manipulating interconnected objects. This is for example the case when hosting virtual machines (VMs) on physical servers [8]. We consider a set of VMs to host on servers. Each VM is composed of multiple memory pages. As some memory pages can be repeated in different VMs, it is possible to minimize the memory space used by multiple VMs on one server by stocking the mutualized pages only once. Two main objectives have
Exact Algorithms for Scheduling Programs with Shared Tasks
437
been studied in this context: given a fixed number of physical servers, maximize the number of hosted VMs, [8]. and given a fixed number of VMs to be hosted, minimize the number of used physical servers [5,8]. This problem also has applications in the field of high performance computing. Indeed, the set of elements given in input can be seen as actual programs to compute, and the subsets present in multiple elements can be seen as instructions easy to parallelize. By modeling PAGINATION as a scheduling problem with multiple machines, scheduling programs sharing a subset of identical instructions on the same machine or not correspond to the choice of parallelizing these instructions or not. PAGINATION is also an interesting scheduling problem, the objective being to schedule a set of programs sharing tasks on two parallel homogeneous machines, in order to minimize the makespan. The sharing constraint has not been studied in this context prior to this work. 1.2
Notation
Let P be the set of n programs, T be the set of k tasks, each task Tj ∈ T being characterized by a processing time pj . Let S ⊆ T be the set of tasks shared by at least two programs, with |S| = c. Let C = {C1 , C2 , . . . , CT } be the set of the connected components of the representation graph G. T denotes the number of these components. For each Ct ∈ C (t = 1, 2, ..., T ), let Vt be the set of vertices (programs) in Ct , Et be the set of edges in Ct and nt = |Vt |. We call partial schedule a feasible partition of a subset of programs in two subsets. Intuitively, each of these subsets corresponds to a potential schedule on a machine, but the specific machine for each of them is not specified yet. The feasibility of this partition is defined by the need of each program to have all of its tasks in the same subset.
2
Trivial Algorithms for this Problem
A first algorithm can be given by deciding on which machine each program can be scheduled. As there are two machine, the number of possible assignments is 2n . For one assignment, the corresponding makespan can be computed by removing the eventual processing times of tasks being present multiple times on one machine. This operation can take at most kn steps. The computing time of this algorithm is O(nk2n ). Another way we can look at this problem is regarding its shared tasks. Once all the shared tasks present in one instance of our problem have been attached to a given machine, our problem becomes the problem of scheduling incompatible jobs on two homogeneous machines [6,7]. Using the three field notation, this problem is denoted 2|inc|Cmax , with the second field (inc) denoting the incompatibilities between the jobs. The best known algorithm for solving the N knapsack problem runs in O∗ (2 2 ) time, with N being the number of elements in the input. We can adapt this algorithm to solve 2|inc|Cmax without increasing its complexity. Then, in our problem a shared task can either be present on the
438
I. Kacem et al.
first machine, the second, or on both of the machines. We can enumerate in 3c steps the different ways to assign the c shared tasks of the input (S denotes the set of these shared tasks) into the two available machines. For each of those assignments, solving 2|inc|Cmax will give us the corresponding optimal schedule n for the remaining tasks in T \ S. This algorithm runs in O∗ (3c 2 2 ) steps.
3
The 2|inc|Cmax Scheduling Problem
Let N = {(a1 , b2 ), (a2 , b2 ), . . . , (an , bn )} be a set of n couples. We call the first element of one couple the a-element, and the second the b-element. The a-element and the b-element of one couple both represent a processing time. Given two parallel identical machines, our goal is to schedule every couple present in the set N such that if a a-element ai is scheduled on one machine, then the b-element bi has to be scheduled on the other machine, and the aim is to minimize the makespan. Let J = {1, 2, . . . , n}. Let S1 ⊆ J and S2 = J \ S1 . Assume the a-elements of the couples (ai , bi ) with i ∈ S1 are scheduled on the first machine. This means that b-elements of the couples (ai , bi ) with i ∈ S1 , and the a-elements of the couples (aj , bj ) with j ∈ S2 are scheduled on the second machine. And thus, the b-elements of the couples (aj , bj ) with j ∈ S2 are scheduled on the first machine. Thus a subset of a-elements scheduled on one machine represents a complete schedule. Every possible schedule can be built from one a-element subset (which we call initial subset). Consequently, a number of 2n subsets can be distinguished.
Algorithm 1: 2-TABLE adaptation algorithm Data: Tables T1 and T2 Result: A pair (T1 [i], T2 [j]) such that T1 [i] + T2 [j] is the closest to Cmax Initialization: i = 0, j = 0, best = (0, 0); n n while i < 2 2 and j < 2 2 do if T1 [i] + T2 [j] = Cmax then return (T1 [i], T2 [j]); end if T1 [i] + T2 [j] < Cmax then if T1 [i] + T2 [j] > best then best = (T1 [i], T2 [j]); end i = i + 1; end if T1 [i] + T2 [j] > Cmax then j = j + 1; end end return best;
Exact Algorithms for Scheduling Programs with Shared Tasks
439
Let Cmax be the lower nbound on the optimal makespan for one input data of this problem: Cmax = ( i=1 ai + bi )/2. n
Theorem 1. 2|inc|Cmax can be solved in time O(n2 2 ). Proof. We partition the set N into two subsets: X = {(a1 , b1 ), . . . , (a n2 , b n2 )} and Y = {(a n2 +1 , b n2 +1 ), . . . , (an , bn )} (cf. chapter 9 of [2]). For each of those two sets, we compute the set of all possible schedules, such that the corresponding initial subset for each schedule is computed on the first machine. The total n number of computed schedule is at most 2 2 +1 per set. Let IX and IY be the sets of makespan length of those computed schedules on the first machine for X and Y respectively. Then, the optimal solution for our scheduling problem is the sum sX + sY such that sX ∈ IX and xY ∈ IY , and with sX + sY being the closest possible to Cmax . The set IX is sorted increasingly and the set IY is sorted decreasingly regardn n ing the makespan lengths. This step is done in O(2 2 log 2 2 ) steps. Now, given those two sorted set, Algorithm 1 finds a couple sX and sY such that sX + sY is the closest possible to Cmax . The correctness of this algorithm can be demonstrated this way: Let T1 and T2 be the two sorted tables respectively in increasing and decreasing order. The elements xi are part of T1 , and the elements yj are part of T2 . Let (xi∗ , yj∗ ) be the optimal couple, that should be returned by our algorithm. Let xi be an element of T1 with i < i∗, and yj be an element of T2 with j < j∗. Assume Algorithm 1 returns a wrong result. This means that the couple (xi∗ , yj∗ ) is not evaluated/detected. This can be the case when: 1. The algorithm is now evaluating a couple (xi , yj ) with j > j∗. As j > j∗ > j, this means that xi + yj > Cmax , and the algorithm tries to find a smaller value. As the algorithm is now evaluating (xi , yj ), this also means that xi + yj∗ > Cmax . But, as i < i∗, it can be deduced that xi + yj∗ < Cmax , which is in a contradiction with the previous statement. 2. The algorithm is now evaluating a couple (xi , yj ) with i > i∗. As i > i∗ > i, this means that xi +yj < Cmax , and the algorithm tries to find a greater value. As the algorithm is now evaluating (xi , yj ), this also mean that xi∗ + yj < Cmax . But, as j < j∗, we deduce that xi∗ + yj > Cmax , which leads again to a contradiction. n
The number of steps needed by this algorithm is 2 2 +1 as it goes through n both of the tables of 2 2 elements. The total running time, including sorting is n n n O(2 2 log 2 2 ) = O(n2 2 ).
4
Algorithm Branching on the Links Between Programs
In this section we propose an algorithm for which the complexity is based on the number of programs n and it is also parameterized by the number of the connected components T of the representation graph G = (V, E). Our algorithm
440
I. Kacem et al. T
has a complexity in O∗ (2n− 2 ) and asymptotically improves upon the complexity of the standard branching scheme. Instead of branching on the programs, this algorithm branches on the links between the programs, that is on the edges of the corresponding representation graph G. The algorithm considers each connected component Ct of G separately and it creates all possible partial schedules for the programs in Ct (t = 1, 2, ..., T ). Then, it combines the partial schedules created for all different connected components by using the algorithm solving 2|inc|Cmax presented in Sect. 3. In order to create a partial schedule for a connected component Ct , we use a branching scheme which at each step selects an edge e = (Pi , Pi ) such that exactly one of Pi and Pi is not yet considered and branches on it (except for the initial branching where an arbitrary edge is selected). Then, the two branches correspond to the decision of scheduling Pi and Pi on the same machine or not. Lemma 1. The branching rule described above produces a branching tree of height nt − 1. This tree enumerates all partial schedules for a connected component Ct . Proof. Note first that the branching rule cannot be applied after the (nt − 1)-th iteration since there is no program which is not considered. The total possible number of schedules for a connected component Ct of size nt is 2nt . However, the total possible number of partial schedules for Ct can be nt reduced to 22 = 2nt −1 , since the exact machine of each subset of programs is not specified in a partial schedule. Thus, each partial schedule corresponds to two complementary schedules obtained by swapping the subsets of programs in the machines. Moreover, the number of partial schedules created by our branching algorithm for Ct is 2nt −1 , since the height of the branching tree is nt − 1. Hence, in order to show that our tree enumerates all partial schedules for Ct , it is sufficient to show that all partial schedules created by our branching algorithm are distinct, which is the case since all partial schedules necessarily have a common predecessor in the branching tree at which the decision taken is different. T
Theorem 2. 2|shared|Cmax can be solved in time O∗ (2n− 2 ). Proof. Our branching algorithm can enumerate every partial schedule for one connected component Ct in 2(nt −1) steps (t = 1, 2, ..., T ). The total number of of partial schedules for all connected component is equal to Tcombinations (nt −1) 2 . Given a set of partial schedules for each connected component, we t=1 can use the 2|inc|Cmax algorithm in order to produce the best possible schedule, the partial schedules being structurally identical to the couples described in the demonstration. Thus, the algorithm solving 2|inc|Cmax problem is executed T (nt −1) times, with each execution processing T elements. Then, and since t=1 2 T n = t=1 nt , the complexity ofT our algorithm can be expressed as follows: T T ∗ (nt −1) ∗ n− 2 2 =O 2 . O 2 t=1 2
Exact Algorithms for Scheduling Programs with Shared Tasks
5
441
Algorithm Branching on the Shared Tasks
A first branching algorithm can solve 2|shared|Cmax by creating sub-instances of our initial problem by branching on the shared tasks being part of the input data, and deciding if they will be executed on the first, the second, or on both machines. Careful analysis allows us to identify the worst case instances for this and thus upper-bound its complexity, yielding a complexity of algorithm c n O 7 2 2 2 , where c is the number of shared tasks present in our input data. In this section, instead of assigning a shared task to the first, the second, or to both machines, our algorithm assigns a shared task either to one machine (without specifying which one) or to both of them. The key observation is that once we have this information for a degree-two shared task, we then can determine in polynomial time how its neighboring programs will be scheduled in the final schedule. Our 2|inc|Cmax algorithm is then used to optimally schedule clusters of programs of undetermined position. This approach allows us to solve n+c 3 c ∗ 2 ( 2 ) , with c being the number of shared tasks of 2|shared|Cmax in O 2 degree two in our input data, and c being the number of shared tasks of degree greater than two in our input data.
T1 T2 T3 T4 T5 P1
P2 (a)
T1 T2 T3 T4 T5
T4 T5 T1 T2 T3 T4
P3 (b)
(c)
Fig. 2. Example of an instance in hypergraph representation, and two corresponding possible schedules.
As an illustration, let us consider the example depicted in Fig. 2. If the task T2 has to be executed on one machine, then the tasks T1 , T2 , T3 and T4 will be present on the same machine. If T4 also has to be performed on one machine, then the task T5 will be present on the same machine as the other tasks. Now if T4 has to be executed on both machines, then the task T5 has to be present on the other machine with T4 , otherwise the program P3 would be split. Lemma 2. Given an input instance consisting of a single connected component (in terms of the associated graph) with n programs and c shared tasks, such that d(ci ) = 2, ∀i, and c = n − 1 (d(ci ) is the degree of task ci , i.e., the number of programs which contain it), any assignment letting us know which shared task has to be performed on one or two machines admits a unique feasible solution that can be constructed in polynomial time. Now, we introduce the notion of degree-two clusters. Let ci be a shared task of degree two in our input data. Then, the non-shared tasks belonging to the
442
I. Kacem et al.
two programs linked by this shared task, and this shared task, are part of the same degree-two cluster. If a program does not contain any shared tasks of degree two, then the non-shared tasks of this program belong to their own degree-two cluster. Let hi be the degree-two cluster containing the program Pi . Let H be the number of different degree-two clusters hi , i ∈ {1, . . . , n} (Fig. 3).
T1
T2 T4 T3
T2 T1
T6 T3
T5 T4
(a)
T8 T7
T11 T10 T9
(b)
Fig. 3. The first connected component admits a number of degree-two clusters equal to the number of program contained in it, as no degree-two shared tasks belongs to it. On the other hand, two degree-two clusters are present in the second connected component: h1 = {T1 , T2 , T3 } and h6 = {T5 , T6 , T7 , T8 , T9 , T10 , T11 }
If we know how every shared task of degree two will be scheduled in the final schedule, then we also know how the programs part of a degree-two cluster will be scheduled regarding one another. Let c = |{ci | d(ci ) = 2, i ∈ {1, . . . , c}}| and c = |{ci | d(ci ) > 2, i ∈ {1, . . . , c}}|. Enumerating every possible assignment of the degree-two shared tasks is done in 2c steps. Each assignment can be seen as a H-sized set of partial schedules. These partial schedules are then linked together with shared tasks of degree greater than two, which in turn can also be scheduled on one or two machines. Let cj be a shared task of degree greater than two. If cj has to be scheduled on one machine, then it allows us to know how the neighbors of cj will be scheduled regarding each other. The d(cj ) partial schedules that are the degree-two clusters hi , i ∈ {1, . . . , d(cj )} now form one unique partial schedule. We say that those degree-two clusters have been aggregated. If cj has to be scheduled on two machines there is no unique solution. The algorithm solving 2|inc|Cmax will be used in order to optimally schedule the neighbors. Our algorithm is then composed of the following steps:
1. Enumerate the 2c assignments of degree-two shared tasks. 2. Construct the corresponding partial schedules for each degree-two cluster using Lemma 2. 3. Enumerate the 2c assignments of shared tasks of degree greater than two. c +c possible combinations of assignments, aggregate the degree 4. For the 2 two-clusters linked by shared tasks of degree greater than two that must be scheduled on one machine, and apply the 2|inc|Cmax algorithm on partial schedules that cannot be aggregated (the partial schedules linked by shared tasks of degree greater than two that has to be performed on both machines, and elements part of different connected components).
Exact Algorithms for Scheduling Programs with Shared Tasks
443
Theorem 3. Let Cd be a connected component admitting n programs and c shared tasks, such that d(ci ) = 2, i ∈ {1, . . . , c}, and c = n. That is to say, the graph representation of Cd forms a cycle. Given a machine assignment, if the number of shared tasks of Cd having to be executed on both machines is odd, then this assignment produces an unfeasible schedule. Lemma 3. Let Cf be a connected component admitting n programs, c shared tasks of degree two, and Hf degree two clusters. It can be established that either Cf only admits shared tasks of degree two, and in that case Hf = 1, either Cf admits at least one shared task of degree greater than two, and in that case Hf = n − c . Remark 1. A connected component Cc containing only shared tasks of degree two can admit a number of degree two cluster Hc = n − c if and only if Cc admits at least one cycle. Hc = 1, and as n − c = 1, the graph representation of Cc cannot form a tree. Lemma 4. Let Cg be a connected component admitting a shared task cj with cj of degree greater than two. If cj has to be performed on one unique machine, then either this will produce an unfeasible schedule, or at least three degree two clusters will be aggregated during the fourth step of our algorithm. c +n Theorem 4. This algorithm solves 2|shared|Cmax in O∗ 2 2 ( 32 )c steps. Proof. The number of elements the 2|inc|Cmax algorithm is applied to, depends on how many shared tasks cj , j ∈ {1, . . . , c } have to be performed on both machines. Assume that for a given instance, every shared tasks cj , j ∈ {1, . . . , c } have to be performed on both machines. Then, at the Step 4 of our algorithm, no degree-two clusters are aggregated. In order to generate an optimal schedule, our 2|inc|Cmax algorithm will be run on H elements. In the 2c different ways to enumerate the assignments of shared tasks of degree greater than two, there is c 0 = 1 way this can be the case. Note here that depending on the input data, some of the c tasks of degree greater than two will not be aggregated, as seen in Lemma 4. In those cases, either the shared tasks of degree greater that two will not be considered at Step 4 of the algorithm, either trying to aggregate this shared task will lead to an unfeasible schedules. Those two cases lead to better time-complexity for our algorithm, and thus by considering c shared tasks we upper-bound this complexity. Now assume that on the c shared tasks, one has to be performed on one machine, and the rest on both machines. At Step 4 of our algorithm the neighboring degree-two clusters of one shared task cj will be aggregated. Lemma 4 demonstrates that this number of degree-two clusters is at least three. Thus the 2|inc|Cmax algorithm will be applied to at most (H − 2) elements. There is c1 = c configuration where this is the case. The same observation can be applied when i shared task of degree greater than two have to be performed on one machine, with i ∈ {0, . . . , c }. The number of steps needed to compute
444
I. Kacem et al.
c H−2i every 2|inc|Cmax instance is then i=0 ci 2 2 . This algorithm can then solve H−2i c c 2 2|shared|Cmax in 2c . i=0 i 2 Let Cc be a connected component admitting Hc degree-two clusters, nc programs, and cc shared tasks of degree two. Lemma 3 and Remark 1 demonstrate that if Hc = nc − cc , then Cc only contains shared tasks of degree two, and contains at least one cycle. Theorem 3 demonstrates that in this configuration, an odd number of shared tasks of degree two part of a cycle, having to be executed on both machine lead to an unfeasible schedule. Thus, when enumer ating the 2c shared tasks assignments, half will lead to unfeasible schedules. As c c n−c −2i c n−c −2i+1 2 2 2c > 2c ( i=0 ci 2 )/2, considering that H = n − c i=0 i 2 upper-bound the complexity of our algorithm. c c H−2i c c n−c −2i n−c c c i 2 2 Then, 2c = 2c = 2c 2 2 i=0 i 2 i=0 i 2 i=0 i 2 = 2
n+c 2
6
( 32 )c .
Conclusions and Perspectives
In this paper we have investigated the NP-complete problem 2|shared|Cmax , and presented different exact algorithms with worst-case time-complexity guarantees, using branching techniques. The principles of these algorithms are various (branching on the programs, branching on the links between the programs, branching on the shared tasks). We are currently working on the analysis of the practical experiment results of the algorithms presented in this paper, as well as improving these algorithms. We are also studying the field of parameterized complexity [1] with the purpose of establishing a fixed-parameter algorithm and obtaining a kernel for PAGINATION.
References 1. Cygan, M., Fomin, F., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized Algorithms. Springer, Heidelberg (2015) 2. Fomin, F.V., Kratsch, D.: Exact Exponential Algorithms. Springer, Heidelberg (2010) 3. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Fransico (1979) 4. Graham, R., Lawler, E., Lenstra, J.K., Kan, A.R.: Optimization and approximation in deterministic sequencing and scheduling: a survey (1979) 5. Grange, A., Kacem, I., Martin, S.: Algorithms for the bin packing problem with overlapping items. Comput. Ind. Eng. 115, 331–341 (2018) 6. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Heidelberg (2004) 7. Martello, S., Toth, P.: Knapsack Problems: Algorithm and Computer Implementations. Wiley, Hoboken (1990) 8. Sindelar, M., Sitaraman, R.K., Shenoy, P.J.: Sharing-aware algorithms for virtual machine colocation (2011)
Automating Complaints Processing in the Food and Economic Sector: A Classification Approach Gustavo Magalhães1, Brígida Mónica Faria1,2(&), Luís Paulo Reis2,3, Henrique Lopes Cardoso2,3, Cristina Caldeira4, and Ana Oliveira4 1
2
3
4
Escola Superior de Saúde – Instituto Politécnico do Porto (ESS-P.Porto), Rua Dr. António Bernardino de Almeida 400, 4200-072 Porto, Portugal {10170714,btf}@ess.ipp.pt Laboratório de Inteligência Artificial e Ciência de Computadores (LIACC), Porto, Portugal Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, S/N, 4200-465 Porto, Portugal {lpreis,hlc}@fe.up.pt Autoridade de Segurança Alimentar e Económica (ASAE), Rua Rodrigo da Fonseca 73, 1269-274 Lisbon, Portugal {accaldeira,amoliveira}@asae.pt
Abstract. Text categorization is a supervised learning task which aims to assign labels to documents based on the predicted outcome suggested by a classifier trained on a set of labelled documents. The association of text classification to facilitate labelling reports/complaints in the economic and health related fields can have a tremendous impact in the speed at which these are processed, and therefore, lowering the required time to act upon these complaints and reports. In this work, we aim to classify complaints into the main 4 economic activities given by the Portuguese Economic and Food Safety Authority. We evaluate the classification performance of 9 algorithms (Complement Naïve Bayes, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, KNearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, AdaBoost and Logistic Regression) at different layers of text preprocessing. Results reveal high levels of accuracy, roughly around 85%. It was also observed that the linear classifiers (support vector machine and logistic regression) allowed us to obtain higher f1-measure values than the other classifiers in addition to the high accuracy values revealed. It was possible to conclude that the use of these algorithms is more adequate for the data selected, and that applying text classification methods can facilitate and help the complaints and reports processing which, in turn, leads to a swifter action by authorities in charge. Thus, relying on text classification of reports and complaints can have a positive influence in either economic crime prevention or in public health, in this case, by means of food-related inspections. Keywords: Text classification Complaints classification Feature selection Machine learning Text classifiers
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 445–456, 2020. https://doi.org/10.1007/978-3-030-45691-7_41
446
G. Magalhães et al.
1 Introduction Economic and Food Safety Authority (ASAE) is Portugal’s national authority with administrative autonomy whose aim is to prevent and oversee the compliance with the current law regulating legislation regarding economical activities in the food and nonfood sectors, as well as reporting possible food chain-related risks. ASAE acts in a proactive or reactive way. It acts over possible detected irregularities in food genders sold to the consumers, emergency related situations such as food intoxications or food crises, information retrieved from information exchange systems [1]. In fact, it is based on a notification system which encompasses alert notifications, information notifications, border rejection notifications, original notifications and follow-up notifications and rejected and withdrawn notifications [2, 3]. The Economic and Food Safety Authority also acts on complaints and reports that are submitted directly to ASAE through their website, mail or fax, telephone contact or even presential complaints or reports. Every year, from 2014 to the time of writing, this authority receives around twenty thousand complaints. These complains are stored in a database and are later analyzed by ASAE’s inspectors. For the inspectors to act on this information, the complaints need to be, first and foremost, readable and the inspectors should be able to extract important information such as the possible infraction and the economic entity that is mentioned on them. This is not always the case as some complains have no valid information or do not mention the economic operator involved. In addition, some of these complains are not of ASAE’s jurisdiction and should be directed towards other entities, roughly one third of the complains submitted to ASAE. This constant review of data is time consuming and has a heavy toll on the speed at which these complains are followed on besides being, at the current time, a completely manual process. The possibility of helping inspectors process this information with machine learning tools is of great value to ASAE as it will allow its inspectors to act swiftly on any information received. This work is based on the problem created by the huge amount of data that ASAE receives and is incorporated on a module of the IA.SAE project. It involves categorizing several text-based documents present on ASAE’s database and labeling them according to their infraction. Throughout this paper, several classification algorithms were tested: Naive Bayes variants, Support Vector Machines, Logistic Regression, ensemble methods such as Random Forest and AdaBoost, k-Nearest Neighbors and Decision Trees. The process of categorizing documents by their respective categories is of utmost importance for a faster reports and complaints process flow as well as indirectly improving public health by facilitating data handling. The faster a complaint or report is addressed, the faster a possible health problem is detected, and thus swifter follow-up actions are done from ASAE’s inspectors and competent organizations.
Automating Complaints Processing in the Food and Economic Sector
447
2 Background and Related Work 2.1
Knowledge Discovery in Databases
In 1989, the estimated number of databases were five million, albeit most of them were small compared to today’s size [4]. Huge quantities of data are generated each day and the rate at which it grows does not drop but rise ever more [5]. The automated and intelligent analysis, information crossing and retrieval is a growing trend that allows companies and individuals the opportunity to generate value out of it. As Frawley, William stated in 1992, “there is a growing realization and expectation that data (…) will be a valuable resource to be used for a competitive advantage” [4]. This remains true today and is, indeed, ever more prevalent. Knowledge Discovery in Databases (KDD) is a set of chained processes that allow for the discovery of useful knowledge and meaningful knowledge from data. It embraces several domains ranging from machine learning, database technology, statistics and mathematics and is the overall arching process behind extracting knowledge and making sense of data [6, 7]. The KDD model can be summarized to five steps (selection, preprocessing, transformation, data mining, evaluation and interpretation) which can have several branched multiple sub-steps, from raw data selection to knowledge extraction [8–11]. 2.2
Related Work
Text classification and subsequently, text categorization, is a trending topic as many of the information found on databases is unstructured and the ability to extract valuable information from these databases is incredibly powerful. In fact, there have been several instances of text categorization done on the complaints field such as hotel customer complains analysis using structural topic models [12] and customer complaints analysis for market-oriented product development [13], automated detection of text containing misrepresentations such as fake reviews [14], text classification based on character-level convolutional network [15], review of user feedback from app reviews with an econometric approach [16] as well as opinion related [17], vehicle consumer complaints which were derived from incidents [18], tools development for mobile network complaint analysis [19] and even classification of agricultural Arabic text complaints, which also involved a novel term weighting scheme proposed by the authors [20]. As described, the applications of text mining to the complaints field can lead to endless possibilities in different fields and even in improving costumer protection and health safety related practices, as is our case. In the feature selection field, the authors of [21] conduct a review of feature selection methods providing an overview how different feature selection methods can be applied and related work that has been done in each case. In [22], the author conducts an extensive study on different feature selection methods for the text classification domain. The authors of [23] also conducted an evaluation of different feature selection methods but on two different datasets, a Portuguese and an English dataset. In [24], the authors propose a novel method to obtain a semantic representation of short texts. The authors begin by using a clustering technique called fast clustering which is based on density peaks searching, thus obtaining and discovering embedding spaces between the words. This procedure is
448
G. Magalhães et al.
then followed by a convolutional neural network with one convolutional layer which produces optimized batches of word clusterings. In the study of text classifiers and their performance, there are also numerous works related to new classifiers, fine-tuned existing classifiers and comparisons between already established ones. For instance, the authors of [25] propose a classifier called “fastText” to perform text categorization. The authors mention that it is often comparable to deep learning classifiers and several times faster for training and evaluation and evaluate their claims in tag prediction and sentiment analysis. In [26], the authors compare a set of classifiers – Naïve Bayes, random forest, decision tree, support vector machines and logistic regression – regarding their respective classification accuracy based on sample size and the number of n-grams used. A study conducted by [27] proposed the use of low-level features to classify web pages according to genres. The authors observed that the use of character n-grams as opposed to word n-grams provided better results for this task. The authors of [28] compared several classification experiments on different stages of the text classification process. The authors computed three versions of the most frequent unigrams in a dataset in which the first version contained stop words, the second version and third version with incremental stop word removal. The documents were represented by a bag-of-words method with term frequency weighting and four supervised algorithms were used for the classification task: Bayes Networks, SimpleLogistic, Random Forest and Sequential minimal optimization.
3 Methodology In order to extract knowledge from a database there needs to be proper selection of data that best represents our problem. The dataset was supplied by ASAE and it is extracted from the respective storage database. The dataset ranges from the years 2014 to 2018 with a total of 104116 entries and 49 attributes. This data results of the merge of two separate but complementary files with corresponding IDs for each row. This unique ID is attributed to each complain/report when stored in the database. The “e-mail content” field contains text that is submitted via e-mail to ASAE describing complaints or reports and its analysis allows for the completion of other attributes present on the database. In fact, it is based on this content that the complaint is filtered to evaluate if it is of ASAE’s jurisdiction according to its infraction and whether to follow on it or if it should be redirected to other entities. Since the infractions attribute is subdivided into an extremely large number of descriptors, a simpler representation of this attribute was created. It was decided that the different codes for each infraction would be grouped under 4 representative categories according to ASAE’s information. Category I represents complains that are related to food safety, categories II and III regard economic offenses (FisEc - purchasing power parity and FisEc - business corporation) and category IV relates to other complains that are neither of the food nor economic fields. This, in turn, effectively turned the multi-label process into a multi-class task.
Automating Complaints Processing in the Food and Economic Sector
3.1
449
Preprocessing and Word Normalization
In this work, the classification task begins by evaluating the classifiers on the capability to predict the related infraction on three different datasets with varying degrees of preprocessing. The first degree is a non-preprocessed e-mail content field, the second is a mere stop word removal applied to the contents of the first dataset and lastly, the third dataset contains the content which has been treated for stop word removal, punctuation, unreadable characters and non-alphabetic characters. All analysis were performed using a 10-fold stratified cross-validation process to better handle bias and variance. The process described below relates to the different steps taken for each dataset, Ds1, Ds2 and Ds3 as we labeled them. In the first dataset (Ds1), text was converted to tokens which are mostly words that are separated by white-spaces, punctuation or line breaks and thus represent strings extracted from the text. This process is called tokenization and mostly follows simple heuristic rules. Along with tokenization, punctuation was also removed from the samples. Punctuation often creates problems with text classification as it increases noise on the data and thus can have a negative impact on the classification task. The tokenization applied would sometimes remove accents or word related punctuation and thus change its context or even render it completely different of its intention. In addition, the next step in the text processing is stop-words removal. There are some available Portuguese stop-words lists even though it pales in comparison with more commonly analyzed languages in this field. The retrieved tokens were compared to a list of Portuguese stop words and removed if present. Stemming is the process in which the word is reduce to its most basic form – the stem. It aims to remove suffixes, prefixes and the words’ plural forms. Thus, if words share the same root it is also expected that it has the same meaning [29]. Porter stemmer is a widely used stemming algorithm proposed by [30] with proven results but does not work every time and is widely volatile in its impact on the classification task. This process is much more destructive than lemmatizing and the loss of information is higher. It is also much harder to understand the lexical context in which words occur. Lemmatization is a process that resembles stemming but instead of producing a stem it reduces verbs to their infinite tense and nouns to their singular form. In some cases, lemmatized or stemmed words can end up reduced to the same word [31]. An overview of the preprocessing methodology applied in this work can be seen in Fig. 1. Join reports / complaints by unique ID
Remove stop words (Ds2)
Remove HTML and separate infraction categories by codes
Tokenize e-mail content (Ds1)
Remove numbers and any non-alphabetical characters left (Ds3)
Fig. 1. Diagram representing the preprocessing steps performed on the dataset.
In a second instance of this work we report the results from applying different wellknown word normalization methodologies, stemming and lemmatization. Following
450
G. Magalhães et al.
the same procedures as the previous topic, we evaluate the average weighted accuracy for each classifier and the respective training times associated with each procedure. The results obtained are then used to perform the next analysis – feature extraction and feature selection. 3.2
Feature Extraction, Feature Selection and Evaluation
There are different feature selection methods as described in the state-of-the-art chapter and each has its own perks. As a first analysis, we establish comparisons between the simpler term frequency method (TF) and the term frequency-inverse document frequency (TF-IDF). The Chi-square method of feature selection and ANOVA feature selection are also taken in consideration for the task of feature selection as these can help reduce dimensionality, find terms which are better representations of the labels, while maintaining performance. The algorithms used during these tests were not optimized for the learning tasks as they were used with their default parameters. The selection of algorithms to use as a final analysis was derived from the results obtained and their performance was evaluated on all four performance metrics mentioned: accuracy, precision, recall and f1-score.
4 Results and Discussion In this section, first it was conducted an exploration analysis and how the data is distributed. We also tackle the performance of the algorithms when using different preprocessing methods such as stemming and lemmatization in addition to the impact of stop words, feature scaling and feature reduction. The final sub-section present results regarding the best performing algorithms. 4.1
Exploratory Analysis
The retrieved dataset contained 104116 with most of the text entries having Hyper Text Markup Language or unreadable characters in its midst. Both add unneeded dimensionality and no valuable information and, as such, were removed from the text content. In addition, some of the entries did not contain any valuable information and were also removed along with any text labeled as “undetermined” since this class is not actually labeling anything, unlike the others. Out of the total entries, 24017 complains had empty strings. In total, after the clean-up process, the dataset totaled 42670 entries. This difference is mainly due to the high amount of missing values present on the infractions attribute and were removed because they would not be used in the classification task. In addition, this attribute contained numerous amounts of labels describing the infraction that has taken place or is vised by the complaint. These infractions are code-labeled, and more than one can be attributed to the same complaint which turns it into a multilabel problem. In order to apply the classifiers, a multi-class approach was necessary and thus each category was grouped under an overruling category numbered 1 through 4. The majority of the samples regard infraction category 1 and 2 with 20954 and 19176 samples, respectively. Category 4 is grossly under-represented with 103
Automating Complaints Processing in the Food and Economic Sector
451
samples, but category 3 also has a low population count at 2437 samples. The average of documents length is 1575.04, with a standard deviation 1723.3. It was observed that 75% of the observations were up to 1659 characters long and that the mean was not far off the observed majority of sample lengths at 1575 characters long as was with the standard deviation. 4.2
Preliminary Results
As was previously mentioned 9 different classifiers were tested: MNB, CNB, BNB, SVM (linear), K-NN, DT, RF, AB and LR. The results are measured by mean weighted test set accuracy. The “e-mail content” label where all the textual information is stored was split into three different labels according to the processing methodology applied: Ds1: Original text obtained after HTML removal; Ds2: Original text but with stop words removed; Ds3: Fully processed text. Ds2 contains the original text like Ds1 but without stop words while Ds3 has no stop words as well as no numbers or any odd entity (only words). The same coding applies to both datasets as the textual content is shared with the only variation being the number of entries in each dataset and the target labels for each document. The following results were obtained with 10-fold stratified cross-validation and each value is the average weighted accuracy obtained at each split. We also used term frequency with consideration for unigrams only. From Table 1 it was observed that the majority of the classifiers performed better with the Ds3 dataset. The exception to this was DT with 76.3% average weighted accuracy value in both Ds2 and Ds3. Even though accuracy increases across the board, it is not as expressive in some cases such as the SVM, DT, LR and AdaBoost. Table 1. Average weighted accuracy with 10-fold stratified cross-validation with no preprocessing (Ds1), dataset with only stop words removed (Ds2) and fully processed dataset (Ds3). Classifier Multinomial NB Bernoulli NB Complement NB Linear support vector Logistic regression K-Neighbors Decision tree Random forest AdaBoost
Ds1 0.715 0.680 0.715 0.808 0.814 0.635 0.757 0.761 0.713
Ds2 0.723 0.686 0.723 0.808 0.814 0.646 0.763 0.781 0.720
Ds3 0.735 0.696 0.732 0.809 0.820 0.656 0.763 0.789 0.721
In the following experimentations, we use the Ds3 dataset as our dataset of choice as it allows for faster train times than the non-preprocessed dataset while achieving slightly better results in both datasets and labels tested.
452
G. Magalhães et al.
4.3
Word Normalization
In this sub-section we present the results regarding the use of stemming and lemmatization, two well-known word normalization methods. We establish comparisons between the previous Ds3 dataset and the following: Ds3-S: Text was stemmed over the previous Ds3 dataset; Ds3-L: Text was lemmatized over the previous Ds3 dataset. The same 10-fold stratified cross-validation was employed with the feature selection method being term frequency for unigrams. Table 2 shows the results without word normalization (Ds3), with stemming (Ds3-S) and with lemmatization (Ds3-L). Table 2. Average weighted accuracy with 10-fold stratified cross-validation with a nonnormalized dataset (Ds3), stemmed dataset (Ds3-S) and lemmatized dataset (Ds3-L). Classifier Multinomial NB Bernoulli NB Complement NB Linear support vector Logistic regression K-Neighbors Decision tree Random forest AdaBoost
Ds3 0.735 0.696 0.732 0.809 0.820 0.656 0.763 0.789 0.721
Ds3-S 0.733 0.696 0.728 0.801 0.824 0.689 0.762 0.780 0.731
Ds3-L 0.737 0.700 0.733 0.804 0.823 0.678 0.771 0.784 0.733
A slight overall increase was observed when using lemmatization over the nonnormalized dataset but for the SVM and RF and K-NN. The use of stemmed samples only increases accuracy for the K-NN, AB and LR classifiers with K-NN and LR being the two classifiers which achieved better performance with stemming when compared to the other two datasets. The SVM and RF achieved higher values in accuracy with the use of the Ds3 (non-normalized) dataset reaching 80.9% and 78.9%, respectively. Stemming the features still proves to be the most efficient way to handle the data timewise. In some cases, such as the Naïve Bayes variants, K-NN, SVM and LR, lemmatization proves to be less efficient than using the dataset which has not been normalized. It was denoted that the tree based methods such as the decision tree and random forest were observed to achieve better results (training times) with the lemmatized dataset in contrast to the other classifiers but AB. 4.4
Feature Selection
In this section, we address different feature selection methods and how they impact each classifier’s performance using the accuracy metric. We also evaluate different feature ranges from 10% up to 100% in increments of 10. The results regard the CNB, LR, SVM and RF classifiers due to their higher accuracy values obtained before. TF with normalization, TF-IDF, ANOVA and Chi-square feature selection methods on each increment were tested.
Chi2 TF
0.855 0.85 0.845 0.84 0.835 0.83
TF-IDF
453
ANOVA Chi2 TF 10 20 30 40 50 60 70 80 90 100
ANOVA
SVM - Accuracy
0.82 0.8 0.78 0.76 0.74 0.72 10 20 30 40 50 60 70 80 90 100
CNB - Accuracy
Automating Complaints Processing in the Food and Economic Sector
Percentage of features
TF-IDF
Percentage of features
Fig. 2. Accuracy values for CNB (left) and SVM (right) using ANOVA, Chi2, TF and TF-IDF.
ANOVA Chi2
0.84
TF
90
70
50
30
0.835 Percentage of features
TF-IDF
0.8 0.79 0.78 0.77 0.76 0.75
ANOVA Chi2 TF
10 30 50 70 90
0.845
RF - Accuracy
0.85
10
LR - Accuracy
From Fig. 2, TF was observed to negatively impact accuracy for the CNB classifier. Both TF-IDF and Chi-square show consistent and similar results across the feature ranges despite ANOVA achieving mostly the same performance but for the middle of the feature range (40–70% of features). In addition, all of the feature selection methods achieve lower results when the full number of features is taken into consideration. In similarity to what was found for the other classifiers, SVM performs better in both datasets with less features. The best results were observed at the 10% feature mark with either ANOVA or Chi-square feature selection. TF increases the SVM accuracy with more features but stabilizes around the same value after the 40–50% feature range. A decrease in performance was also found regarding Chi-square and ANOVA with more features added. In contrast, TF-IDF increases the SVM’s predictive capability despite being ranked lower in terms of performance then the other feature selection methods. It is also worth noting that the values are extremely close between all feature selection methods. The results on Fig. 3 denote the performance of the LR classifier. TF-IDF consistently achieves higher values across most feature ranges with a lower performance on the lowest feature range. It is also noticeable that the addition of further features does not incur in a significantly higher performance. In addition, it was observed that the ANOVA, Chi-square and TF-IDF methods fare better than TF which is always below the others across all the feature sizes considered.
TF-IDF
Percentage of features
Fig. 3. Accuracy values for LR (left) and RF (right) using ANOVA, Chi2, TF and TF-IDF.
RF classifier performed worse with the addition of more features to the analysis. Chi-square was observed to obtain the best result among the feature selection methods with 10% of the feature size but having a lower accuracy then the other methods at 100% of the features. The same pattern of decreasing accuracy values was observed among all methods with an odd increase in accuracy with ANOVA at 50% of the feature range.
454
4.5
G. Magalhães et al.
Final Evaluation
In this section we evaluate the best performing classifiers so far: LR, SVM and RF. We report the accuracy, precision, recall and f1-measure on macro level and extrapolate the results regarding their performance across all labels. The different metrics results are shown on Table 3.
Table 3. Evaluation metrics for LR, SVM and RF: Accuracy, precision, recall and f1-measure Metric\Classifier Accuracy Precision Recall F1-measure
LR 0.851 0.728 0.551 0.582
SVM 0.854 0.684 0.570 0.598
RF 0.821 0.620 0.456 0.463
Accuracy value obtained for the LR classifier is quite high, at 85.1%. The precision (72.8%) are higher than those obtained for the recall metric (55.1%). The harmonic mean of these two metrics (f1-measure) was observed to be much lower than the values obtained in the accuracy metric, at 58.2%. Of those documents actually belonging to the class, the algorithm was, for the most part, able to correctly label the given document even though this did not happen for almost half of the predictions. A high accuracy value was observed along with a precision value of 68.4% and a recall value of 57.0%. In the metrics concerning LR, a higher precision value was observed but a lower recall was found. This indicates that the SVM has overall, a lower ratio of correctly predicted documents over the total number of retrieved documents of a given label. Despite this, values for the SVM are still high along a higher harmonic mean of recall and precision at 59.8%. RF was found to be the worst performing classifier of the three. It achieved lower accuracy at 82.1%, lower precision than the previous classifiers at 62.0% as well as lower recall and f1-measure, both below 50%. The high accuracy value comes mostly from predicting the most prevalent categories correctly.
5 Conclusions and Future Work In this work, a text classification task involving several algorithms and different methods were study. It was tested different preprocessing methods, feature selection and reduction approaches and classifiers on a Portuguese dataset with the goal of predicting an encoded infractions attribute. This approach was also aimed at comparing the different procedures done as well as observing if the obtained results can help improve food safety and economic law enforcing by expediting the complains and reports flow at the Economic and Food Safety Authority. It was possible to conclude that the use of a dataset which has been processed for stop words, punctuation, odd characters and numbers shows slightly better results and that only removing stop words has a positive impact on accuracy. In addition, the use
Automating Complaints Processing in the Food and Economic Sector
455
of stemming carried a negative impact in MNB, CNB, SVM, DT and RF regarding accuracy although not by a significant margin. In contrast, it was proven to be the most impactful by lessening the time required to train for each classifier. Lemmatization was shown to obtain better results while carrying an increase in training times over stemming which was higher the more samples were involved in the process. The use of weighting feature selection methods that take in consideration term frequency over documents and the respective labels generally perform better than binary or simple count methods. The inclusion of more features did not improve the model’s performance and that selecting a smaller percentage of the features that best explained the respective labels wielded better results. All the findings present on this work reveal that the possibility of applying knowledge discovery and machine learning workflows to tasks that require extensive manual intervention can help facilitate and improve these tasks in the future by either serving as a support or even assisting in decision making. Acknowledgements. This work is supported by project IA.SAE, funded by Fundação para a Ciência e a Tecnologia (FCT) through program INCoDe.2030. This research was partially supported by LIACC - Artificial Intelligence and Computer Science Laboratory of the University of Porto (FCT/UID/CEC/00027/2020).
References 1. European Parliament and Council, Regulation (EC) No 178/2002 of 28 January 2002. Off. J. Eur. Commun. 31, 1–24 (2002) 2. European Commission, RASFF 2017 Annual Report (2017) 3. EC and EP, Directive 2001/95/EC of the European Parliament and of the Council of 3 December 2001 on general product safety. Off. J. Eur. Commun. (7), 14 (2002) 4. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: an overview. AI Mag. 13(3), 57 (1992) 5. Han, J., Cai, Y., Cerconet, N.: Knowledge discovery in databases: an attribute-oriented. In: Proceedings 18th VLDB Conference, Vancouver, Br. Columbia, Canada (1992) 6. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: towards a unifying framework. In: International Conference Knowledge Discovery Data Mining (1996) 7. Matheus, C.J., Piatetsky Shapiro, G., Chan, P.K.: Systems for knowledge discovery in databases. IEEE Trans. Knowl. Data Eng. 5(6), 903–916 (1993) 8. Ristoski, P., Paulheim, H.: Semantic Web in data mining and knowledge discovery: a comprehensive survey. J. Web Seman. 36, 1–22 (2016) 9. Soibelman, L., Kim, H.: Data preparation process for construction knowledge generation through knowledge discovery in databases. J. Comput. Civ. Eng. 16(1), 39–48 (2002) 10. Abidi, S.R.S.R., et al.: Cyber security for your organisation starts here. Haemophilia 11(4), 487–497 (2018) 11. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39, 27–34 (1996) 12. Hu, N., Zhang, T., Gao, B., Bose, I.: What do hotel customers complain about? text analysis using structural topic model. Tour. Manag 72, 417–426 (2019)
456
G. Magalhães et al.
13. Joung, J., Jung, K., Ko, S., Kim, K.: Customer complaints analysis using text mining and outcome-driven innovation method market-oriented product development. Sustainability 11(1), 40 (2018) 14. Pisarevskaya, D., Galitsky, B., Ozerov, A., Taylor, J.: An anatomy of a lie: discourse patterns in customer complaints deception dataset. In: The Web Conference 2019 Companion of the World Wide Web Conference, WWW 2019 (2019) 15. Tong, X., Wu, B., Wang, B., Lv, J.: A complaint text classification model based on character-level convolutional network. In: Proceedings of the IEEE International Conference on Software Engineering and Service Sciences, ICSESS (2019) 16. Tong, G., Guo, B., Yi, O., Zhiwen, Y.: Mining and analyzing user feedback from app reviews: an econometric approach. In: Proceedings - 2018 IEEE, SmartWorld/UIC/ATC/ ScalCom/CBDCo (2018) 17. Genc-Nayebi, N., Abran, A.: A systematic literature review: opinion mining studies from mobile app store user reviews. J. Syst. Softw. 125, 201–219 (2017) 18. Das, S., Mudgal, A., Dutta, A., Geedipally, S.: Vehicle consumer complaint reports involving severe incidents: mining large contingency tables. T. Res. Rec. 2672(32), 72–82 (2018) 19. Kalyoncu, F., Zeydan, E., Yigit, I.O., Yildirim, A.: A customer complaint analysis tool for mobile network operators. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2018) 20. Guru, D.S., Ali, M., Suhil, M.: A novel term weighting scheme and an approach for classification of agricultural arabic text complaints. In: 2nd IEEE International Workshop on Arabic and Derived Script Analysis and Recognition, ASAR 2018 (2018) 21. Jović, A., Brkić, K., Bogunović, N.: A review of feature selection methods with applications. In: 2015 38th MIPRO 2015 - Proceedings (2015) 22. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003) 23. Gonçalves, T., Quaresma, P.: Evaluating preprocessing techniques in a text classification problem. Unisinos (2005) 24. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016) 25. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: 15th EACL 2017 - Proceedings of Conference (2017) 26. Pranckevičius, T., Marcinkevičius, V.: Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Balt. J. Mod. Comput. 5(2), 221 (2017) 27. Kanaris, I., Stamatatos, E.: Learning to recognize webpage genres. Inf. Pro. Manag. 45(5), 499–512 (2009) 28. HaCohen-Kerner, Y., Dilmon, R., Hone, M., Ben-Basan, M.A.: Automatic classification of complaint letters according to service provider categories. Inf. Process. Manag. 56(6), 102102 (2019) 29. Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. LDV Forum Gld. J. Comput. Linguist. Lang. Technol. 20(1), 19–62 (2005) 30. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980) 31. Plisson, J., Lavrac, N., Mladenić, D.D.: A rule based approach to word lemmatization. In: Proceedings 7th International Multiconference Information Society (2004)
IoT Services Applied at the Smart Cities Level George Suciu1,2(&), Ijaz Hussain2, Andreea Badicu2, Lucian Necula2, and Teodora Ușurelu2 1
University Politehnica of Bucharest, Bucharest, Romania Beia Consult International, Peroni 16, Bucharest, Romania {george,ijaz,andreea.badicu,lucian.necula, teodora.usurelu}@beia.ro 2
Abstract. The Internet of Things (IoT) can consolidate transparently and seamlessly diverse or heterogeneous end systems by granting open access to chosen subsets of data for the development of an excessive amount of digital services. Forming a general architecture for the IoT is indeed a very challenging task, mainly because of the type of devices, link layer technologies, and services that may concern in such a system. IoT visualizes to connect billions of sensors to the Internet to utilize them for practical and productive resource management in Smart Cities. Today, infrastructure platforms and software applications administered as services using cloud technologies. An immense pressure towards organized city management has triggered many Smart City ambitions by both government and private sector businesses to finance in information and communication technologies and to find sustainable solutions to the increasing issues. This paper will be focused on the Smart Services Platform developed through the CitiSim project. Its main objective is to help in providing services for making city life better for the citizens. Also, it was designed to give users powerful monitoring and control infrastructure and to enable them to make critical management decisions on the information provided by the platform. Keywords: IoT
Smart services platform Critical management decision
1 Introduction Over the last decades, urban populations have faced a rapid rise that leads to lots of changes in the global infrastructure. In our modern days, the cities have expanded along with the evolution of technology, and they developed the need for integrating information and communication technologies to evaluate what is happening. With these technologies and the integrated sensors, the management of the cities can monitor in real-time the systems and make better decisions regarding the security, the water supply, the traffic, lighting systems, parking lots, among others. The concept of smart cities promotes the increase in the quality and the performance of the urban services, using communication and information technologies based on the IoT, among others, bringing a significant impact on the solutions for the modern cities. However, due to the variety of possibilities when choosing the devices and services involved, creating an IoT architecture can become a somewhat convoluted © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 457–463, 2020. https://doi.org/10.1007/978-3-030-45691-7_42
458
G. Suciu et al.
task. This way, the future of the smart cities will be improved starting from safety, lighting, security, traffic, and parking to the citizens’ engagement and the management of the cities [1]. One of the essential aspects in smart cities is mobility - smart traffic lights systems, intelligent parking, smart grid, and multi-modal transportation, helping to reduce congestion, diminishing the time of looking for a parking spot, monitoring the traffic flow, and offering citizens the possibility of checking in real-time information about the public transportation. Also, buildings that monitor their usage of energy are more economical, since they report the collected and analyzed data. The rest of the paper is organized as following: Sect. 2 analyzes related work, Sect. 3 presents the CitiSim platform, while Sect. 4 details the IoT-based smart services, Sect. 5 draws the conclusions and outlines future work.
2 Related Work The main component in smart cities that brings everything together is the IoT and the devices it comes with: sensors and beacons, which are essential in this context. All the information in a smart city is collected from deployed sensors is then sent to a complex system with the purpose of creating real-time management. Because too much data can be sometimes overwhelming, beacons come in the help of smart cities, primarily since they represent a lower cost solution also. In this sense, cities can deploy a private network of beacons to collect and transmit the information to an integrated hub [2]. Some implemented IoT solutions for Smart Cities are the following platforms described below. AirCasting is an open-source platform used for recording, mapping, and sharing environmental and health data through the citizens’ smartphones. Each new AirCasting session comes with the chance of capturing real-world measurements, define and document the received data. Through the mobile app of AirCasting, the users can record, map, and share the following [3]: sound levels, humidity, temperature, carbon monoxide (CO), nitrogen dioxide (NO2), fine particle matter (PM2.5) concentrations, breathing rate, heart rate, and heart rate variability, activity level measurements. Another solution for smart cities is Bigbelly - an easy-access and secure to hide in plain sight platform that delivers smart recycling and waste solutions. Besides giving an update to the city services, used for hosting supplementary technologies. This webbased, cloud-connected solution transmits insights data about the waste operations of the citizens. Bigbelly comes with a variety of services all combined under a compact model that captures information from the high-traffic areas, besides the less hectic ones. This integrated platform comes as a customizable turnkey solution with an increased capacity that can deploy from the users’ platform [4]. An important field of Smart City platforms is represented by monitoring the environmental factors, such as air quality, humidity, sun exposure etc. [5]. For this purpose, one of the most advanced trackers used for the environment is TZOA, a solution that uses internal sensors that measure some parameters like temperature, air quality, humidity, ambient light, atmospheric pressure and UV exposure from the sun, all included in one compact wearable device. It can be connected to the user’s
IoT Services Applied at the Smart Cities Level
459
smartphone so that it can help in the decision-making process since it displays in realtime a map of the environmental data. The air quality sensor presents concentrations, counts the number of individual particles, and identifies allergens, and different pollution indexes. The information is streamed onto the smartphone application, then sent in the cloud with the purpose of creating a map that is available to all the users. Moreover, the TZOA app sends actionable recommendations like choosing less polluted paths, opening the windows to refresh the air or making sure the users get the right amount of sunshine during seasons [6]. Another solution implemented in Smart Cities is Smart City Map, which is an online multifunctional data hub accessed from a mobile application. This solution was built as a customizable multilevel map, working with big data in real-time. Also, it provides localized information with the help of “CityDataPub” - a digital infrastructure, and it can also send incident reports directly. The features it comes with are: free Wi-Fi, city tourism, accessibility (helps the user can search and discover points of interest), local recycle (displays the nearest recycling centers and exchange centers using geolocation), incident report [7]. Traffic optimization is essential for Smart City platforms. In this regard, Current designed and developed an IoT platform called CityIQ [8] which provides opportunities for traffic optimization, bicycle flows optimization, safety and convenience optimization for citizens and parking improvement. This open platform is based on IoT devices mounted on lamp-post nodes for the provision of data. Using the GUI, the data can be visualized in near real time and is available via APIs to a client ecosystem. In the concept of Smart Cities, important roles are played by Big Data, for processing data collected through IoT devices and Cloud Computing, for reduced costs and reliability of services [9]. Huawei developed an open ecosystem called OceanConnect [10], which provides over 170 APIs for the development of Smart City services in the area of smart homes, smart utilities metering, smart parking and safe city [11]. The platform provides support for a wide range of protocols, such as Z-Wave, ZigBee and Wi-Fi, TCP, UDP, MQTT, CoAP and LWM2M. Hitachi Vatara by Hitachi [12] is another Smart City solution enabled by IoT, Big Data, analytics and video intelligence, which addresses two main areas: Smart Spaces [13], and Public Safety and Security [14]. Smart Spaces integrates data collected from IoT sensors with data provided by video cameras, social media and other sources to support public safety and security, traffic and mobility, airports and ports, campuses and retailers. The Public Safety and Security service solution generates automated alerts using data from video cameras, CAD systems, gunshot detectors, facial readers and license plate readers to provide automated alerts for intrusions, object detection and facial recognition. A Smart City environment entails the use of very diverse devices. Impact is an IoT platform developed by Nokia which connects various devices and enables the management of a wide array of IP based protocols and non-IP based Low Power WAN protocols, including NBIOT and LoRa [15, 16]. The platform’s main features include: auto detection and identification of subscriber devices, automation of remote provisioning, remote update and repair of device configurations, large-scale or bulk management actions, management of device faults and multi-domain/multi-protocol device management.
460
G. Suciu et al.
3 CitiSim – Platform for Smart Cities Solutions CitiSim is a smart services platform developed within the project with the same name. The main objective of this platform is to provide a reliable monitoring and control infrastructure that can take decisions in critical situations, based on the information provided by the platform components and gathered data. The structure of the architecture consists of four layers as follows: • IoT Layer - All data from the sensors are collected here. The sensors are registered in the message broker through an interface as publisher and each sensor call message broker interface periodically with its value and some metadata about the reading (e.g., Expiring time, Timestamp, quality of the reading etc.); • Core Layer - It represents the platform itself. The main components of this layer are: • Message Broker: It’s used for raw data, events and sensor information; • Filter Service: This service is defined in order to allow users to subscribe to a specific topic; • Property Service: This is a service devoted to storing static/semi-static properties of devices/services in an instance of the platform, like the position of a smoke sensor, the last revision of an extinctor, the manufacturer of specific actuators are examples of information stored/accessed through the property service; • Persistent Service: This service is subscribed to all topics in the message broker and store, in the data store; • Semantic KB: This knowledge database will store basically three types of information: the vocabulary and relations of concepts in a current city, the rules about how a city works regarding traffic and pedestrian and a service description of the instances running in this instance of CitiSim; • Scheduling Service: This experimental service will orchestrate complex behaviors according to the services deployed and a new desire expressed by a user or service. For example, the opening of a door to go into a building can be done by several methods (facial recognition, RFID tag, PIN code etc.) according to the TIC infrastructure deployed; • Semantic Service: This service will manage information at semantic level and it will integrate other domains with Smart Cities domain; • Manager Tool: Used for monitoring the platform and the state of the services and devices; • Adapters: These modules interconnect CitiSim with other domains (e.g., MQTT devices, Kafka based platforms, Sofia2 services etc.). • Urban modeling layer: This layer will store structural information about the city (urban furniture, street layout, 3D buildings models, supplies models); • Smart service layer: This layer takes the information collected in the core layer and by using the urban model layer provides a service related to stakeholders of a smart city. Here are defined and implemented the following smart services: Pollution, Energy and Infrastructure monitoring service, People and Traffic monitoring, Emergency service, Citizen Reporting, Semantic Search Service and Visual Wiki.
IoT Services Applied at the Smart Cities Level
461
4 IoT-Based Smart Services Two IoT-based smart services developed using the CitiSim framework are the Smart Energy Business Intelligence and the Environmental Motion Assistant. Both solutions are based on IoT networks that collect data in real time using urban and extra urban sensors. The Smart Energy Business Intelligence service facilitates the substantiation of decisions to invest in green energy production based on consumption patterns identified by collecting data from the building using energy and environmental sensors [17]. Figure 1 depicts the architecture of the Smart Energy Business Intelligence platform, which is based on three main layers: IoT services layer, the back-end and the front-end.
Fig. 1. Smart energy business intelligence platform architecture
A wide range of sensors from providers as Verbund, Siemens, Fibaro and Libelium are being used to collect real-time data from different cost centres. Using the CitiSim Subscriber Service that retrieves data from the CitiSim Data Broker, the data is sent to the back-end which follows a REST architecture based on Python Flask Rest services for data processing, and then to the front-end where it can be visualized by the user. The Environmental Motion Assistant (E.M.A.) was developed as a service that should help any potential user to have a clear understanding of the surrounding environment, while being specifically designed for closed mobile contexts, like for example the interior of a vehicle. The service and its associated device are not limited to interior conditions, the user having the possibility to use it outdoors, in various situations like for example when riding a bike or taking a walk. By using E.M.A., as a Smart Mobility Service, environmental and motion parameters (CO2, alcohol concentration, temperature, humidity, air quality, dust, speed, accelerations, rotations etc.) are continuously measured and stored for any user context, insightful visualizations being provided through a mobile and a web application. As depicted in Fig. 2, there are several components integrated into E.M.A.: (a) sensor pack, which is a prototyped device containing necessary sensors, (b) a database that stores the data generated by the devices, (c) a data collector machine
462
G. Suciu et al.
where the REST API is deployed, (d) an API that is used to collect the data coming from various sources, (e) the CitiSim Data Broker used for data integration and (f) a graphical user interface for user visualizations and interactions with the platform.
Fig. 2. Environmental motion assistant platform architecture
The REST service can be considered the integration core of the platform, as it is used for all important data operations: 1) getting data from the devices, 2) serving data to web application, 3) serving data to mobile application.
5 Conclusions CitiSim is the ideal solution when it comes to smart city platforms because it supports the development and simulation of intelligent services within a common framework. In order to demonstrate the usefulness of the platform and the services it offers, more applications will be implemented in the field of smart energy, smart mobility, and smart emergency. Smart City is on the rise and has a huge impact on the economy of the modern world. The benefits of CitiSim’s exploitation are to the society represented by citizens and the economy, in terms of market values. The major innovation of CitiSim is that it wishes to provide the first Smart City-specific platform to monitor in real time, and in 2D/3D a large infrastructure of the city and that enables to interact with users by developing or testing added-value and customized services in an agile, simple way. As future work we envision to perform experiments on live data from cities. Acknowledgment. This work has been supported in part by UEFISCDI Romania and MCI through projects CitiSim, PARFAIT and VLC/IR-RF, funded in part by European Union’s Horizon 2020 research and innovation program under grant agreement No. 826452 (Arrowhead Tools), No. 787002 (SAFECARE), No. 777996 (SealedGRID) and No. 872698 (HUBCAP).
IoT Services Applied at the Smart Cities Level
463
References 1. Zanella, A., Bui, N., Castellani, A., Vangelista, L., Zorzi, M.: Internet of things for smart cities. IEEE Internet Things J. 1(1), 22–32 (2014) 2. Maddox, T.: Smart cities: 6 essential technologies (2018) 3. Aircasting. http://aircasting.org/about 4. Bigbelly. https://bigbelly.com/platform/ 5. Suciu, G., Marcu, I., Balaceanu, C., Dobrea, M., Botezat, E.: Efficient IoT system for precision agriculture. In: 15th International Conference on Engineering of Modern Electric Systems, pp 173–176 (2019). https://doi.org/10.1109/emes.2019.8795102 6. TZOA. https://tzoa.com 7. Rotună C., Cîrnu C. E, Smada D., Gheorghiță A.: Smart city applications built on big data technologies and secure IoT (2017) 8. CityIQ. https://developer.currentbyge.com/cityiq/ 9. Agarwal, N., Agarwal, G.: Role of Cloud Computing in Development of Smart City, pp. 2349–784 (2017) 10. OceanConnect. https://developer.huawei.com/ict/en/site-oceanconnect 11. OceanConnect. https://developer.huawei.com/ict/en/doc/en_iot_oceanconnect_solutions/ index.html/en-us_topic_0038742819 12. Hitachi Vatara. https://www.hitachivantara.com/en-us/solutions/iot-insights/smart-cities. html 13. Hitachi Vatara. https://www.hitachivantara.com/en-us/solutions/iot-insights/smart-cities/ smart-spaces.html 14. Hitachi Vatara. https://www.hitachivantara.com/en-us/solutions/iot-insights/smart-cities/ public-safety-security.html 15. Impact. https://www.nokia.com/networks/solutions/impact-iot-platform/#overview 16. Impact. https://onestore.nokia.com/asset/205580 17. Suciu, G., Dițu, M. C., Rogojanu, I., Ușurelu, T.: Energy performance analysis using EnergyPlus for an office building. In: CIGRE Regional South-East European Conference – RSEEC 2018 (4th edition), RSEEC proceedings (2018)
Statistical Evaluation of Artificial Intelligence -Based Intrusion Detection System Samir Puuska(B) , Tero Kokkonen, Petri Mutka, Janne Alatalo, Eppu Heilimo, and Antti M¨ akel¨a Institute of Information Technology, JAMK University of Applied Sciences, Jyv¨ askyl¨ a, Finland {samir.puuska,tero.kokkonen,petri.mutka,janne.alatalo, eppu.heilimo,antti.makela}@jamk.fi
Abstract. Training neural networks with captured real-world network data may fail to ascertain whether or not the network architecture is capable of learning the types of correlations expected to be present in real data. In this paper we outline a statistical model aimed at assessing the learning capability of neural network-based intrusion detection system. We explore the possibility of using data from statistical simulations to ascertain that the network is capable of learning so called precursor patterns. These patterns seek to assess if the network can learn likely statistical properties, and detect when a given input does not have those properties and is anomalous. We train a neural network using synthetic data and create several test datasets where the key statistical properties are altered. Based on our findings, the network is capable of detecting the anomalous data with high probability. Keywords: Statistical analysis · Intrusion detection · Anomaly detection · Network traffic modeling · Autoregressive neural networks
1
Introduction
Neural networks are being increasingly used as a part of Intrusion Detection Systems, in various configurations. These networks are often trained in ways that include both legitimate and malicious recorded network traffic. Traditionally, a training set is used to train the network, while another set of samples is used to assess the suitability of the proposed architecture. However, further assessment of the network architecture depends on knowing what statistical properties the network can learn, and how it will react if these properties change. In this paper, we present a way to estimate if a network has the capability of learning certain desired features. Our analysis approach is to ascertain that the network can learn precursor patterns, i.e. patterns that are necessary but c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 464–470, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_43
Statistical Evaluation of AI -Based IDS
465
not sufficient conditions for learning more complex patterns of the same type. The goal is to supplement traditional sample-based learning with synthetic data variants that have predictable and desirable statistical properties. This synthetic data can then be used both to increase the dataset and to address known biases that often arise when collecting real-world data traffic. Certain real-life phenomena, such as network traffic, can be considered to have known intrinsic properties due to their artificial nature. In communication protocols, for example, certain hard limits must be observed for achieving any successful communication. Although protocols are sometimes abused for malicious purposes, there are still limits as to how extensive the effect can realistically be. On other occasions, there are limits on how much any given feature can be expected to correlate with anomalies. However, a combination of these weakly-correlated features may, if they form a specific pattern, signal for an anomaly. In artificial systems, it is sometimes possible to distinguish correlation from causation, and therefore make more intelligent predictions by considering only the direction that is actually feasible. Based on their basis of analysis, there are two classes of Intrusion Detection Systems (IDS): anomaly-based detection (anomaly detection) and signaturebased detection (misuse detection). Anomaly-based detection functions without earlier gathered signatures and are effective even for zero-day attacks and encrypted network traffic. There are various machine learning techniques implemented for classifying anomalies from network traffic but still, some flaws exist; a high amount of false alarms and low throughput [2,5,6]. In our earlier studies, we implemented two anomaly-detection based IDSs that utilized deep learning. Our first model was based on wavelet transforms and Adversarial Autoencoders [8]. That model was improved with a WaveNet [7] based solution [4]. In this paper, we perform a statistical experiment for determining the performance of a WaveNet based IDS system.
2
Method
We begin by outlining a statistical model which complies with our research goals. As stated, the idea is to construct a statistical distribution which contains so-called precursor or proto-elements of the actual phenomenon. The aim here is to ascertain that the network is capable of learning simpler versions of the relationships expected to be present in the real data. Network protocols have a certain degree of predictability. As previously stated, we can state certain hard limits for the features we have selected. Our model is designed to work with the Transport Layer Security (TLS) protocols, as encrypted HTTP traffic is a common communication channel for malware. We can identify various types of noise that usually occurs in the networks. The model should be resistant to this type of noise, as we know it arises due to the nature of data networks and is likely not associated with the type of anomalies we are interested in. Based on this reasoning we have constructed a model that incorporates three distributions modeling i) packet size, ii) packet direction, and iii) packet timings.
466
S. Puuska et al.
One packet is modeled using these three features. The packet structure is illustrated in Fig. 1. A connection consists of 250 packets (vectors), where timings are expressed using time differences to the next packet.
Fig. 1. Visual description of a single connection. Each packet consists of a vector that contains three elements: packet direction, packet size, and time difference to the next packet.
2.1
Packet Size and Direction
Based on the findings by Castro et al. [1], we model the packet size using the Beta distribution (α = 0.0888, β = 0.0967). We enforce two strict cut off points: the minimum (15) and maximum (1500). This reflects the packet size constraints that networking protocols impose on packet size. Packet direction is determined using the packet size. This models the realworld phenomenon where the requests are usually smaller than the responses. In the model packet direction, there is a binary value determined by packet size L; packets smaller than 30 are outgoing and larger than 1200 are incoming. If the size is 30 < L < 1200, the direction is decided randomly. 2.2
Packet Timing
Various separate processes affect packet timing: the nature of the protocol or data transfer type determines how fast packets are expected to be sent or received. For example, fetching an HTML page via HTTP creates a burst of packets going back and forth; however, malware that polls a Command and Control server at late intervals (for example hourly) may send just one packet and get one in response. However, a considerable amount of variance is expected when systems are under a high load or there is a network issue. Therefore, not all anomalies in the timing patterns are malicious in nature. Since we do not need to model the traffic explicitly, we use a packet train model [3] inspired composite Gaussian distribution model for creating packet timings. Originally, the packet train model was designed for categorizing reallife network traffic, not for generating synthetic network data. For the relevant parts, the packet train model is characterized by the following parameters; mean inter-train arrival time, mean inter-car arrival time, mean
Statistical Evaluation of AI -Based IDS
467
train-size. We capture the similar behavior by combining two normal probability density functions in range x ∈ [a, b] as: f (x) ≡ f (x; μ1 , μ2 , σ1 , σ2 , w1 , w2 , a, b) ⎧ ⎪ ⎨0 2 1) 1 = √R2π w − (x−μ + 2 exp 2 σ 2σ 1 1 ⎪ ⎩ 0
w2 σ22
if x < a 2 2) exp − (x−μ if a ≤ x ≤ b , 2σ22 if x > b (1)
where R is normalization constant that is calculated from normalization condition for total probability. In (1), μ1 and μ2 are mean values for sub-distributions (μ1 < μ2 ), and σ1 and σ2 are relevant variances. Sub-distributions have relative weights w1 and w2 . We chose to use a semi-analytical probabilistic model since it is easier to parameterize and understand than more generic Markov models. Our model captures the most relevant properties of the train packet model; roughly mean inter-train arrival time ∝ μ2 , mean inter-car arrival time ∝ μ1 , and mean train size ∝ w1 /w2 . Corresponding cumulative distribution function can be expressed with complementary error functions and solved numerically for generating random number samples with desired statistical properties. 2.3
Scoring
Since our neural network is trained by minimizing the mean of minibatches discretized logistic mixture negative log-likelihoods [9], we can detect the anomalous connections by observing the mean negative log-likelihood of the feature vectors in a single sample. Moreover, we introduce the different types of anomalies in varying quantities to the dataset to evaluate the neural network’s sensitivity and behavior. 2.4
Tests
We trained the neural network using data formed by previously described clean distributions. The size of the training set was 160000 samples. We reserved an additional 40000 unseen samples for the evaluation. The test procedure consists of generating samples where the parameters are drawn from a different distribution than the training data. These ”anomalous” samples are mixed with the evaluation data to form ten sets where the percentage is increased from 10% to 100%. Each of these datasets is evaluated using the neural network and the changes in the mean anomaly score are observed in Table 1, which describes the three types of alterations made to the samples. The alterations were chosen because they represent different correlations; namely, the directionality is determined between two features inside one packet, whereas the change in timing distribution is spread out between packets and does not correlate with other features inside a particular vector. This approach is expected to test the network’s capability to detect both kinds of correlations.
468
S. Puuska et al. Table 1. Descriptions of the alterations.
Test
Description
Direction
This test swaps the directionality decision criteria. Small packages are now incoming and large outgoing. The area where the directionality is randomly determined stays the same
Time
This test replaces the bimodal distribution on packet timing with unimodal Gaussian distribution μ = 50, σ = 80. The cut-off points remain the same
Combined The test combines both alterations to the dataset
3
Results
The results indicate that the network learned to detect anomalous data in all three datasets. The results are illustrated in Fig. 2. As the figure indicates, the anomaly score keeps increasing with the percentage of “anomalous” data. Packet direction seems to have an almost linear increase in the anomaly score, whereas changes in time distribution result in a sudden jump, after which the score keeps increasing relatively modestly. The combined data exhibits both the starting jump and the linear increase. This is a desired outcome, as it indicates that the anomaly score reflects the change in data in a stable fashion.
Fig. 2. Plot of test results from each anomaly type. The horizontal axis indicates the percent of samples that were altered. As expected, the mean anomaly score on the vertical axis increases with respect to the amount of altered samples in the test data.
Statistical Evaluation of AI -Based IDS
469
In summary, the network was able to learn the properties outlined in the previous sections. The results indicate that the network can detect correlation inside the vector, as well as between vectors. This outcome supports the notion that a neural network structured in this fashion learns useful relationships between the features.
4
Discussion
When constructing a machine learning solution for anomaly detection, the available data may not be suitably representative. This situation may arise, for example, when collecting or sampling the dataset in a statistically representative way is impossible for practical reasons. It is not feasible to expect a statistically representative sample of all possible network flows, even when dealing with one application. Moreover, the data in networks may exhibit correlations known to be unrelated to the type of the anomaly under examination. The statistical properties of network data may fluctuate due to multiple factors. By using synthetic data which contains correlations that are known to be relevant, it is possible to verify whether or not the proposed network structure is capable of detecting them in general. Moreover, the test may show how the classifier reacts to the increase in variance. In an ideal case a classifier should be relatively tolerant to small fluctuations; however, be able to reliably identify the anomalous samples. Future work includes refining the statistical procedures, as well as increasing the complexity of correlations in test data. Further research will be conducted on how the relationship between increasing variance and data are drawn from different distributions affects the anomaly score, and how this information may be used to refine the structure of the neural network classifier. Acknowledgment. This research is funded by: - Using Artificial Intelligence for Anomaly Based Network Intrusion Detection System -project of the Scientific Advisory Board for Defence (MATINE) - Cyber Security Network of Competence Centres for Europe (CyberSec4Europe) -project of the Horizon 2020 SU-ICT-03-2018 program.
References 1. Castro, E.R.S., Alencar, M.S., Fonseca, I.E.: Probability density functions of the packet length for computer networks with bimodal traffic. Int. J. Comput. Netw. Commun. 5(3), 17–31 (2013). https://doi.org/10.5121/ijcnc.2013.5302 2. Chiba, Z., Abghour, N., Moussaid, K., Omri, A.E., Rida, M.: A clever approach to develop an efficient deep neural network based IDS for cloud environments using a self-adaptive genetic algorithm. In: 2019 International Conference on Advanced Communication Technologies and Networking (CommNet), pp. 1–9 (2019). https:// doi.org/10.1109/COMMNET.2019.8742390 3. Jain, R., Routhier, S.: Packet trains-measurements and a new model for computer network traffic. IEEE J. Sel. Areas Commun. 4(6), 986–995 (1986). https://doi.org/ 10.1109/JSAC.1986.1146410
470
S. Puuska et al.
4. Kokkonen, T., Puuska, S., Alatalo, J., Heilimo, E., M¨ akel¨ a, A.: Network anomaly detection based on wavenet. In: Galinina, O., Andreev, S., Balandin, S., Koucheryavy, Y. (eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems, pp. 424–433. Springer International Publishing, Cham (2019) 5. Masduki, B.W., Ramli, K., Saputra, F.A., Sugiarto, D.: Study on implementation of machine learning methods combination for improving attacks detection accuracy on intrusion detection system (IDS). In: 2015 International Conference on Quality in Research (QiR), pp. 56–64 (2015). https://doi.org/10.1109/QiR.2015.7374895 6. Narsingyani, D., Kale, O.: Optimizing false positive in anomaly based intrusion detection using genetic algorithm. In: 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE), pp. 72–77 (2015). https://doi.org/10.1109/MITE.2015.7375291 7. van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: A generative model for raw audio (2016). https://arxiv.org/pdf/1609.03499.pdf 8. Puuska, S., Kokkonen, T., Alatalo, J., Heilimo, E.: Anomaly-based network intrusion detection using wavelets and adversarial autoencoders. In: Lanet, J.L., Toma, C. (eds.) Innovative Security Solutions for Information Technology and Communications, pp. 234–246. Springer International Publishing, Cham (2019) 9. Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017 (2017). https://openreview.net/references/pdf?id=rJuJ1cP l
Analyzing Peer-to-Peer Lending Secondary Market: What Determines the Successful Trade of a Loan Note? Ajay Byanjankar(B) , J´ ozsef Mezei, and Xiaolu Wang ˚ Abo Akademi University, Turku, Finland [email protected]
Abstract. Predicting loan default in peer-to-peer (P2P) lending has been a widely researched topic in recent years. While one can identify a large number of contributions predicting loan default on primary market of P2P platforms, there is a lack of research regarding the assessment of analytical methods on secondary market transactions. Reselling investments offers a valuable alternative to investors in P2P market to increase their profit and to diversify. In this article, we apply machine learning algorithms to build classification models that can predict the success of secondary market offers. Using data from a leading European P2P platform, we found that random forests offer the best classification performance. The empirical analysis revealed that in particular two variables have significant impact on success in the secondary market: (i) discount rate and (ii) the number of days the loan had been in debt when it was put on the secondary market.
Keywords: Machine learning lending · Secondary market
1
· Binary classification · Peer-to-Peer
Introduction
Peer-to-Peer (P2P) lending is a micro finance service operating online to connect borrowers and lenders for loan transactions [1]. P2P platforms allow for easy and quick loan processing for borrowers due to automated handling and less cost because of the absence of traditional financial intermediaries. Following from the premise, it provide higher return to lenders compared to similar traditional investments. In recent years, the service has been gaining popularity and growth as a result of quick and easy access to loan for borrowers [2]. However, there are risks associated to lending, the primary being the lack of collateral. Additional risk of investment loss arises from lack of analytical skills of investors and information asymmetry from online services. Investors fund borrowers in P2P lending platforms based on the information provided in the loan application. There is an automated service provided by P2P platforms that assists lenders to select the best options. This automated c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 471–481, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_44
472
A. Byanjankar et al.
service is used by most of the non-professional investors. Due to the use of automated service, there is very little opportunity for investors to screen the loan applications from the primary market for investments. P2P lending also has a secondary market, where investors can sell their loan holdings and this service is less automated compared to primary market allowing investors to analyse and select investments. Loan holders can split their investment to create several loan notes and put them in the secondary market for sell with either discount rates or premium depending on the present status of the loan [3]. Primary market being mostly controlled by automated service provides less information to study lenders’ behavior and hence secondary market could be a good place to analyse lenders’ investment behavior. Secondary market is more risky than primary market as loan holders put their holdings to sell when they see problem in loan recovery. Finally, most P2P platforms adopt an agent model that relies primarily on retaining and attracting a stable investor base to generate fee revenue. We believe our study could not only help them improve the selection of borrower characteristics and hence the investor-borrower matching, but also assist in the provision of liquidity of the secondary market. In this study, we analyze the investment behavior of lenders in P2P lending secondary market in understanding a loan note being successfully traded in the market. We study the relation between a loan note being successfully traded and the loan features to establish an understanding on lenders’ selection behavior of loan notes. Finally, with the help of classification models we verify our understanding of lenders’ selection behavior of loan notes from the secondary market.
2
Literature Review
In this section we will briefly summarize related literature. A search in academic databases shows that contributions considering secondary markets of peer-topeer lending are scarce. For this reason, we mainly focus on the general use of machine learning in P2P lending. 2.1
Machine Learning-Based Approaches in P2P Literature
Recent years have seen a continuously increasing number of academic contributions utilizing various machine learning algorithms in the context of P2P lending. This interest can largely be explained by the fact that researchers can have easy and direct access to loan data, which is typically not available from traditional financial institutions. This resulted in numerous publications focusing on different markets from around the globe, such as Bondora in Europe [4], Lending Club in the United States [2] and HouBank in China [5]). The most important issue modeled is the probability of loan defaults. There is a wide variety of machine learning algorithms that were utilized to construct optimal models. The most frequently used machine learning models used in P2P studies include logistic regression, random forests, gradient boosting machine and neural networks; these are the methods tested in the empirical part of this article.
P2P Secondary Market
473
As the typical baseline classification method in classification tasks in finance, logistic regression, is used in [6]. By analysing data from Lending Club, the authors identify using binary logistic regression that credit grade, debt-to-income ratio, FICO score and revolving line utilization are the most important factors that determine loan default. Another widely used machine learning model appearing in P2P literature is random forests. In, [7] a random forest based model is proposed to predict loan default and is found to outperform FICO credit scores. Gradient boosting machine (GBM) is based on the idea of combining several weak classifiers in an ensemble model. [8] combines extreme gradient boosting with misclassification cost and at the same time proposes to evaluate models based on annualized rate of return. Lastly, neural networks, probably the most widely used machine learning model across all domains, has been tested several times in P2P lending [4]. It is important to mention here, that additionally to traditional numerical data, some recent contributions make use of unstructured data, typically texts. In [9], topic modelling is utilized to extract features from loan descriptions with the most predictive power. Using data from the Eloan Chinese P2P platform, and by combining the extracted features with hard information in some traditional machine learning algorithms, the authors show that classification performance can increase by up to 5%. 2.2
P2P Secondary Markets
All the above mentioned articles concern the use of data from the primary market of P2P platforms. However, the option to resell (parts of) purchased loans to interested investors in a secondary market is present in various platforms; in the platform Bondora considered in the empirical analysis, secondary market became available already in 2013. Still, one can identify a very small number of contributions analysing data from secondary markets, although platforms, such as Bondora, typically make it also available. This lack of academic interest is especially staggering when we consider that according to an extensive report presented in [10] on the globally largest P2P market, already in 2017, more than 50% of the Chinese P2P platforms provided the possibility to perform secondary market transactions. Among the very few related contributions, one can highlight the study presented in [3] in which the authors study mispricing in the secondary market of Bondora analyzing more than 51000 loans. Utilizing least absolute shrinkage and selection operator (LASSO), the mispricing is explained as the consequence of mistaken perceived loan values. In another study [11], the author, analysing data from the LendingClub platform’s secondary market, collected data in a threemonth period, and found that the liquidity of the secondary market is very low, with less than 0.5% of listings resulting in a trade. The author concluded that it would be impossible to find the fair value for the pool of identical loan notes. In light of the importance of secondary markets in P2P lending and the small number of related contributions, in the following we address this gap in the literature in an empirical study on the secondary market of the Bondora platform.
474
3
A. Byanjankar et al.
Data and Exploratory Analysis
The data for the study was collected from a leading European P2P platform, Bondora. The data includes loan notes traded in the secondary market of the platform until July 2019. Bondora had launched the secondary market in March of 2013, four years after Bondora was established. The secondary market offers the opportunity to investors to resell outstanding loans from the primary market. Loan holders can split the outstanding principal on their loan holdings to create several loan notes and put them in the secondary market for sell with either discount rates or premium depending on the present status of the loan. In the analysis section a negative sign(-) is used to just differentiate discount rate from premium keeping the magnitude of the value intact. A loan listing is allowed to be placed in the secondary market for a maximum period of 30 days and if it is traded within the time limit it is given a status of ’Successful’ else it is removed automatically and given the status ‘Failed’. The data includes the information related to transactions of the loan notes, such as start and end date of the transaction, amount, discount or premium rate, number of days that the loan has been in debt, and the result indicating whether the loan note was sold or not. Each loan note is connected to its original loan id and based on this some relevant demographic and financial information on the loan note are extracted from the main loan database for additional features. After performing data preprocessing on the raw data, there are around 7.3 million records of loan notes. This large number (in contrast to the less than 100,000 loans) is the result of loan holders being able to split their holdings into multiple loan notes. The large number of loan notes provide investors with plenty of options to select for investments and also makes it more computationally challenging to apply sophisticated machine learning models requiring a large amount of computation. 3.1
Exploratory Analysis
In the following, we discuss the basic characteristics of the data. First, we can observe that the majority of the loan notes, 61% failed (not sold or cancelled before expiry) and 39% were successfully sold in the secondary market. The higher number of failed loan notes in the secondary market is expected as loan holders typically sell out loans on which they have difficulties in recovering the payments. Figure 1 illustrates the time in relative to the Loan Duration that has passed from the loan issue date to the time the loan was listed in the secondary market. From Fig. 1 we can see that majority of loans appear in secondary market after they have crossed 10% to 30% of their Loan Duration as usually borrowers tend to fail in their payments after few initial payments. In addition, investors in the secondary market are interested in purchasing loan notes which are in the early stage, as they behave more like new loans and hence have possibility of making higher profits through interest collections.The loan notes in the category ‘>100’ signifies that the loan notes have crossed their initial Loan Duration to a large extent.
P2P Secondary Market
475
1000000 750000
Result Failed
500000
Successful 250000 0 0 0−10 10−30 30−50 50−100 > 100 Days Passed Relative to Loan Duration(in %)
Fig. 1. Time since loan issue
In Fig. 2, the distribution of the number of days the loans were in debt at the time of listing and the discount rates are depicted. Majority of the loan notes have low debt days, between 0 and 10, and high probability of being sold out. It illustrates that loan holders in general have less patience and low risk tolerance and try to immediately resell the loans as soon as payments are late by a few days. Loan notes with more than 10 days in debt show very low probability to be sold on the secondary market, with the exception of the cases when higher discount rates are offered.
2.5e+06
2.0e+06
2.0e+06
1.5e+06
1.5e+06 1.0e+06 1.0e+06 5.0e+05 5.0e+05 0.0e+00
0.0e+00
0 −1
0
10
0 −5
−
0 10
50
0 10
−
0 25
0 25
−
0 50
0
0
0 50
−2
0 −2
Days in Debt
to
−1
0
−1
to
0 0
to
5 5
to
10 10
to
20
>
20
Discount Rate (in %) Result
Failed
Successful
Fig. 2. Debtdays and DiscountRate distribution
The effect of DiscountRate on the success can be seen in the right in Fig. 2. The bins in x-axis are arbitrarily created to better represent the distribution and the lower bound is excluded while the upper bound is included in the bins. Loan notes that were listed with DiscountRate between 10 and 0 constitute most of the notes, where loan notes with 0 DiscountRate is around 38% of the total loan notes and we can observe high average likelihood of success for these notes. However, loan notes with very high discount rates had low success rate as they might be perceived risky by investors. Similarly, we can observe the trend that the higher
476
A. Byanjankar et al.
the offered premium is, the less likely it is that the transaction will succeed. Figure 2 shows that the features DebtDays and DiscountRate tend to have high impact on a loan note being successful in the secondary market. Therefore, we further investigate these two features and also analyse their interaction. Based on the presented figures, high success rate can be observed together with low discount rates and number of days in debt. For this reason, first we look at the loan notes that correspond to these criteria. Figure 3 depicts the comparison of the effect of the two features on the success rate when they take on the value 0. The left part of the figure shows the case when each feature is considered individually without any interaction included. According to this, 0 DiscountRate or 0 DebtDays does not have significant impact on result: the proportion of failed and successful loan notes are almost equal in both the cases.
100%
100%
75%
75%
50%
50%
25%
25%
0%
0% DebtDays = 0
Discount = 0
Result
Failed
Both_0
Remaining Data
Successful
Fig. 3. Relation of debt days and discount with result
When we take the interaction into account, we can see a completely different result in the right part of Fig. 3. The loan notes that have both DiscountRate and DebtDays 0 are likely to be successful in the secondary market almost all the time with very few failed loan notes. Importantly, this group of loans account for approx. 23% of all loan notes. For the remaining data we see the opposite behavior: most loan notes not having 0 DebtDays and DiscountRate have failed in the secondary market. Further analysing the relation between DiscoutRate and DebtDays for the remaining data, a more refined picture on the relation between DiscoutRate and DebtDays for loan notes is presented in Fig. 4. The heat map in Fig. 4 shows the success rate of loan notes for different combination of DiscountRate and DebtDays where both DiscountRate and DebtDays are not 0. From Fig. 4, we can conclude that for the majority of combinations the success rate is close to 0. This shows a clear relation between DiscountRate, DebtDays and the success rate. As an exceptional group, loan notes with low DebtDays and higher DiscountRate have high success rate. The success rate decreases for high premium loans and is near to zero for loan notes listed at premium with high number of DebtDays. Similarly, success rate decreases as number of days in debt increases. As a summary, we can conclude that the investors
P2P Secondary Market
477
choose loan notes that have combination of lower DebtDays and higher DiscountRate and very clearly neglect loan notes with premium. The two features are likely to be very predictive of a loan note being successful in the secondary market.
(500,Inf] success_rate
DebtDaysAtStart
(250,500]
0.8 (100,250]
0.6 0.4
(50,100]
0.2 (10,50] [0,10] (−Inf,−20](−20,−10] (−10,0] (0,5] (5,10] DiscountRate
(10,20]
(20, Inf]
Fig. 4. Success rate at different levels
4
Classification Models and Results
According to the presented descriptive analysis, DebtDays and DiscountRate are very informative regarding the success of a loan note in the secondary market. Investors are very likely to make their investment selection decisions mostly based on these two features. To further validate our assumption, we trained several machine learning models to classify successful and failed loan notes and evaluate the importance of the features. Loan notes having both DebtDays and DiscountRate as 0 are almost always likely to be successful with a success rate of around 97% as seen in Fig. 3. With a reasonable amount of loan notes (around 23%) showing such behavior, it is safe to derive a simple rule that such loan notes will be successful in the secondary market. For this reason, we reduce our focus to the remaining data, where it is further split into train and test set for training and evaluating classification models to identify successful and failed loans.We treat being successful as the positive class. Multiple machine learning models, such as Logistic Regression with Lasso penalty (LR), Random Forest (RF), Gradient Boosting Model (GBM) and Neural Network (NN) were applied to train the classification models. The models were optimized through hyper-parameter search and applying feature selection with the feature importance from the model. AUC score (Area under the Receiver Operating Curve) was used as the primary evaluation metric. The results of the final models on the test data are presented in Table 2. The F1 score is reported as the maximum possible score and Accuracy is reported at the threshold for the maximum F1 score. Among the models, RF model performs the best with very high Accuracy, AUC and F1 score and low Logloss. The final set of features for the best performing model Random Forest can be seen in Table 1.
478
A. Byanjankar et al. Table 1. Features set for random forest Features
Description
DiscountRate
Discount or Premium offered
DebtDays
Number of days the loan was in debt
start day
Day of month the loan was listed in secondary market
start hour
Hour the loan was listed in secondary market
start month
Month the loan was listed in secondary market
start weekday
Weekday the loan was listed in secondary market
days passed
Days passed relative to original loan duration
Interest
Interest rate on loan
Probability of Default Probability of default within a year Rating
Rating assigned to the loan Table 2. Classification results with all features Models Accuracy AUC
Logloss F1
LR
0.745
0.800
0.446
RF
0.925
0.969 0.189
0.840
GBM
0.886
0.931
0.268
0.756
NN
0.894
0.940
0.252
0.774
0.571
The feature importance from all the models are illustrated in Fig. 5, that shows the top 8 features with the features’ name presented in their abbreviation form. All the models identify DebtDays(DDAS) and DiscountRate(DsCR) as the top two features, which matched our assumption from the earlier section. In addition, for the best performing model Random Forest, the importance of the two features is very high compared to rest of the features. Hence, the results show that the investors highly rely on the two features when deciding to select the loan notes from the secondary market. To further validate the importance and effect of the two features, we trained the classification model with only two of the features DebtDays and DiscountRate. The models with only these two features still achieved good results as seen in Table 3. The results indicate the strong predictive power of the two features. The partial dependence plots for the features DebtDays and DiscountRate in Fig. 6 show the mean effect of the features along with the variance. For DebtDays, mean success rate rapidly decreases as number of DebtDays goes above 0 and remains almost constant after about 100 days. This shows that most of the investments are made on loan notes with low DebtDays; for higher DebtDays there seems to be no investment pattern as shown by the constant and low success rate. For the DiscountRate, the mean success rate is higher at higher DiscountRate but decreases with the DiscountRate and is almost the same from −25 to 0. For the loan notes with premium the success rates decrease rapidly
P2P Secondary Market GBM
479
LR
DscR DDAS Intr strt_d strt_m dys strt_h Rtng
DDAS DscR s_.10 VTbp s_.6 UOL. VT.U Rt.F
NN
RF
DDAS DscR Intr dys NCC.T s_.4 st_.1 NCC.F
DscR DDAS strt_d strt_h strt_m Intr dys strt_w 0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Scaled Importance
Fig. 5. Feature importance Table 3. Classification results with two features Models Accuracy AUC
Logloss F1
LR
0.463
0.788
0.797
0.60
RF
0.831
0.866 0.362
0.645
GBM
0.827
0.859
0.370
0.638
NN
0.833
0.848
0.386
0.631
with higher premium. The high variance seen for the cases with higher DiscountRate hints that higher discount rates may alone not be the deciding factor, but is the combination of both DiscountRate and DebtDays.
5
Conclusion
An increasingly established way for P2P platforms to provide increased means of liquidity to investors is a secondary market. While academic research is widely available to understanding P2P platform processes, evaluation of platforms and models to estimate loan default to assist investors, research contributions focusing on secondary markets are scarce. In this article, we consider the problem of predicting the success of a posting in a P2P secondary market. Several widely used machine learning models are tested with data used from the leading European P2P platform Bondora. Based on evaluating the performance of models on more than 7 million postings, we found that most of the algorithms, except for logistic regression, perform very similarly, with AU C value as high as 0.925 is achievable. Furthermore, we found that success of a posting is largely determined by two specific variables: (a) the number of days since the loan has been in debt at the beginning of the posting and (b) the discount rate. As it is shown in the article, using only these two variables, one can construct models with high performance. In this work, we have done one of the first steps in trying to understand secondary markets of P2P platforms by utilizing machine learning algorithms. In the future, the study can be extended by incorporating other variables on
A. Byanjankar et al.
mean_response
480
0.75 0.50 0.25 0.00
mean_response
0
150
300
450
600
750
900 1050 1200 1350 1500 1650 1800 1950 DebtDaysAtStart
0.75 0.50 0.25 0.00 −25
−20
−15
−10
−5 0 DiscountRate
5
10
15
20
Fig. 6. Partial dependence plots
different loan characteristics that may impact the success of a transaction. Furthermore, in lack of identifier for investors in our dataset, it is not possible to assess any kind of network effect potentially present in the platform; this could be analysed with appropriate data available. Finally, our results reflect the data from a European P2P platform, we cannot claim that similar models would definitely show same performance when used to data from different areas, such as China, the largest P2P market; this would require further investigation.
References 1. Bachmann, A., Becker, A., Buerckner, D., Hilker, M., Kock, F., Lehmann, M., Tiburtius, P., Funk, B.: Online peer-to-peer lending-a literature review. J. Internet Bank. Commer. 16(2), 1 (2011) 2. Kumar, V., Natarajan, S., Keerthana, S., Chinmayi, KM., Lakshmi, N.: Credit risk analysis in peer-to-peer lending system. In: 2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA), pp. 193-196 (2016) 3. Caglayan, M., Pham, T., Talavera, O., Xiong, X.: Asset mispricing in loan secondary market. Technical Report Discussion Papers 19-07. Department of Economics, University of Birmingham (2019) 4. Byanjankar, A., Heikkil¨ a, M., Mezei, J.: Predicting credit risk in peer-to-peer lending: a neural network approach. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 719-725 (2015) 5. Guo, W.: Credit scoring in peer-to-peer lending with macro variables and machine learning as feature selection methods. In: 2019 Americas Conference on Information Systems(2019) 6. Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M.: Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Appl. Econ. 47(1), 54–70 (2015) 7. Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015) 8. Xia, Y., Liu, C., Liu, N.: Cost-sensitive boosted tree for loan evaluation in peerto-peer lending. Electron. Commer. Res. Appl. 24, 30–49 (2017)
P2P Secondary Market
481
9. Jiang, C., Wang, Z., Wang, R., Ding, Y.: Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Ann. Oper. Res. 266(12), 511–529 (2018) 10. Yin, H.: P2P lending industry in China. Int. J. Ind. Bus. Manage. 1(4), 0001–0013 (2017) 11. Harvey, S.: Lending Club’s Note Trading Platform Facade: An Examination of Peer-to-Peer (P2P) Lending Secondary Market Inefficiency. University of Dayton Honors Thesis (2018)
Experience Analysis Through an Event Based Model Using Mereotopological Relations: From Video to Hypergraph Giles Beaudon1(&), Eddie Soulier1, and Anne Gayet2 1
Tech-CICO Team, University of Technology of Troyes, Troyes, France {gilles.beaudon,eddie.soulier}@utt.fr 2 AI & Data, Paris, France [email protected]
Abstract. Improving the customer experience is now strategic for insurance business. Current practices focus on subjective customer experience. In this paper, we claim that experience could be defined as a situation being processed. Thus, we propose an artifact for observation of experiences through an event based model. From video corpus, this model calculates mereotopological relations to identify “drops of experiences” as a hypergraph. Designed as a tool for marketing teams, this artifact aims to help them identify relevant customer experiences. Keywords: Experience Data science Mereotopology Hypergraph
Event Computer vision
1 Introduction: Beyond Abstract Experiences This paper follows our previous works [1] relying on the case of a mutual healthinsurer, the third one in France. The French health-insurance market is completely transforming. The number of mutual health-insurers dropped from 1158 in 2006 to 421 in 2017 [2]. Many factors are involved in this process. Regulatory constraints upset market’s rules. More and more aggressive competitors enter in this market. Lastly, the need for customer personalization grows up. Thus, improving customer experience is becoming both a necessity and a strategy for mutual health-insurers. 1.1
Problem Statement
Customer experience concept comes from the “experience economy” [3]. More specifically [4] defines customer experience management as “the process of strategically managing customer’s entire experience with a product or a company.” A strategy to improve the customer experience relies on customer experience softwares. They are strongly dependent on the experience concept selected. There are two types of customer experience conceptions. The first one considers the customer experience as a mental representation in which tools [5] present the experience as an abstract phenomenon (with persona, blueprints, routes…). The second sees © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 482–492, 2020. https://doi.org/10.1007/978-3-030-45691-7_45
Experience Analysis Through an Event Based Model
483
the customer experience as a process that occurs, as a production. We will call it “Experience n°2” and it is part of pragmatic theory [6]. This approach has not provided a formal and above all a computational model. We propose to use Whitehead’s model developed in Process and Reality in 1929 [7]. He considered experience as a phenomenon of connections and inclusions of heterogeneous events. In parallel, we use mereotopology. Meretotopology is a theory of connected (boundaries) parthood (wholes, parts, parts of parts). Thus, it is possible to simulate experiences through a mereotopology calculation based on an event model. The constraint imposed by Whitehead’s theory is that every experience is unique and should not be understood as an abstract phenomenon, including mental representation [8]. Adopting “Experience n°2” model implies to discover and analyze the in-situ experience more than create abstract one. It means to observe occurring events in real situations. Thus, we propose to capture experience from videos. We will use this massive data as traces of events. Mereotopological techniques coupled with hypergraphs will reveal relations between events and simulate experiences. Consequently, the problem raised by this article is to design a tool for marketing teams to analyze customer experiences based on an event model emerging from grounded situations. In the first section, we will summarize current customer experience issues. In the second section, we will position our work in relation to customer experience management. Third section will describe how to define the event concept, which is crucial to observe organic experiences. Organic experience is the technical term to characterize experiences from the “Experience n°2”. Our fourth section will discuss our Experience Observer – a software to analyze customer experiences using videos corpus, processed by object tracking technique and a mereotopological calculation to form a complex graph – and its first results. Finally, this article will conclude with contributions and work limits. 1.2
Related Works
The related works on Customer Experience are extremely extended. We choose to focus this section on our problem statement. Our work pays attention to improve customer experience understanding with occurring events in real situations. To the best of our knowledge, this problem is not covered by current research. In the field of improving the customer experience through deep learning models, one seeks to understand the customers’ satisfaction, predict their next actions or recommend a product to them. Customer satisfaction analysis uses either NLP techniques from text data [9, 10] or facial recognition from video surveillance [11]. Recommendation systems are based on image-based learning models [12]. Prediction systems use neural networks with demographic data, billing information and usage [13]. It is never a question of observing the situation as it is happening. Next, we choose videos corpus to capture the experience. We process them with deep learning techniques (object tracking) in the field of computer vision. As result, we obtain entities, which are the events’ regions. There is many deep learning techniques in the field of computer vision. They are dependent on the objective of the calculation. We studied object detection [14], spatio-temporal video segmentation [15] and
484
G. Beaudon et al.
interaction detection [16]. We chose object tracking techniques [17, 18] because of entities detected are spatio-temporal regions. Finally, we defined experience as an in-situ connections and inclusions phenomenon; this leads us to the mereotopology. Mereotopology is a theory of connected parthood. [19] developed a formal mereotopological ontology. In the field of video, [20] works on space and movement. Combining these works, we could calculate mereotopological relations between spatio-temporal regions in videos. The connectivity of mereotopological components enables to understand the experience. Hypergraph techniques development [21] offers a calculation of the connectivity of connected regions. This approach is compatible with graph theory that is widely use to analyze customer experience: in recommendation systems [22], knowledge of customer opinions [23] or transport network experience [24].
2 Current Customer Experience Issues We conducted both a literature review and a study between November 2017 and February 2018 through fifteen exploratory interviews within the mutual health-insurer. We noted that no tool exists for the upstream activities of service design (such as observation). Many works focus on the design of the service and its usages. They seek to segment customers according to their personal and consumption data [13]. In each segment, they define a typical persona. Persona is “a semi-fictional representation of your ideal customer based on market research and real data about your existing customers” [25]. Next, they use that persona to describe its ideal journey throughout company’s touchpoints. [5] said “Customer Journey is the enterprise prescribed vision of the customer trajectory within a chronologically given touchpoint organization”. Finally, they attempt to measure customer engagement through their opinions and attitudes regarding the firm. In regards to the “Experience n°2”, these practices present three major problems. (1) Reality is restricted to human perception. Experiences are abstract constructs depending on the qualities that human gave to them (like persona or feeling). Our conviction is that experiences are unique interactions. They are not preconceived ideas. (2) Practices are established after the experiences occurred. They collect customer behavior, satisfaction and opinion after the interactions and without their contexts. Precisely, it concerns tools that reconstruct customer journeys. Beyond that, we want to observe the process of experience constitution through its connections. (3) Finally, in these approaches objects do not have relations. The customer journeys are then designed as a sequence of unitary interactions. While, in our understanding, interactions (i.e. events) are extended. Each interaction has a “here” and a “there”. If we want to grasp an interaction, we need to observe it relatively to all other interactions. These findings led us to ground our software within alternative theories of experience [7].
Experience Analysis Through an Event Based Model
485
3 Experience as Connected Events We chose to address existing issues of customer experience management with the organic experience theory [7]. In the organic experience paradigm, events are atomic by nature and are processes of development. A. N. Whithead give them the name of “actual occasion” [7] and W. James the term “drop of experience” [6]. We strictly use the term event to refer the both terms. All the “inert” objects of classical approaches are then redesigned to be the result of a process. It is the process of an event constitution. An event constitution correspond to the connection of the event with multiple heterogeneous other events. Thus, events are regions. These regions are connected and included in each other. They are spatio-temporal regions. Analyze experiences within “Experience n°2”, needs to observe the processes of events’ constitutions (connections and inclusions). It offers problem-solving opportunities in customer experience management. Firstly, we do not restrict experiences to human perception but to what happens in interactions. Secondly, we do not use experience as a given material but as an in-situ process. Thirdly, we consider all the connections between heterogeneous entities as constitutive of an event. Our objective is then to propose an IT artifact that complies both the constraints of organic experience (“Experience n°2”) and Design Science Methodology [26]. Designed as a tool for marketing teams, this artifact aims to help them identify relevant customer experiences. To achieve this result, we will rely on the observation of experiences emerging from events as defined above. The calculation of the emergence of events leaning on the connections of regions (i.e. events) in real situations. We name this artifact the Experience Observer.
4 The Experience Observer 4.1
Input Data and Computer Vision Techniques
From a video corpus, we could observe occurring events in real situations because it captures in-situ interactions. In this paper, we chose videos of spontaneous interactions in an urban context. We made that videos corpus during an observation protocol of a public square, in Paris, in August 2019. We acquire the spatio-temporal regions by the use of the object tracking techniques of [27]. We used Google’s object tracking techniques because of the very large number (about 30 000) of detected entities [27]. In addition, as an API tool, it is possible to use it without having to train the algorithm. The entities obtained are space-time regions. In the video, they are delimited, in each frame, by bounding boxes. All the entities’ bounding boxes represent the space of the region. On the other hand, an entity has a set of detection moment that represents the regions’ durations. This leads to spatio-temporal regions, shown in red in Fig. 1. 4.2
Mereotopology Calculation
As defined is “Experience n°2”, an event is a spatio-temporal region. [19], following [28] and [29] created a formal ontology to calculate spatio-temporal connections
486
G. Beaudon et al.
Fig. 1. Regions obtain by the means of object tracking techniques
relying on Whitehead’s theory of the organic experience (“Experience n°2”). This formal ontology is a mereotopology theory. Mereotopology is “a unified framework based on a single mereotopological primitive of connected parthood” [19]. To calculate connections between regions, one of the most recognized mereotopological ontologies is RCC8 (Region Connection Calculus) [30]. “RCC8 is a set of eight binary relations representing mereotopological relationships between (ordered) pairs of individuals” [30]. Thus, because our events are spatio-temporal regions, we can use RCC8 to calculate the connections between events (Fig. 2).
Fig. 2. The RCC8 relations from [30].
We propose to calculate mereotopological relation between events, which are regions, to reveal connected drops of experience. These are, in Whitehead vocabulary, Nexus. 4.3
The Graph of Connected Drops of Experiences (Graph of Nexus)
The result of the mereotopological calculation on spatio-temporal regions (i.e. events) is a graph of connections. Following the philosophy of organic experience, this graph is both, the source of events constitutions and constituted of events. At our level, we claim that we can detect customer experience based on this graph. It contains all the connected drops experiences (events), which, when they are grouped, form a Nexus. A nexus is a term used by Whitehead [7]. He defines a Nexus as an agglomeration of events. Thus, we can say that this graph contains all the pieces of the customer experience (nexus) necessary to analyze the customer experience. We conducted a test on a 104-second video from our corpus. The object-tracking algorithm returns 371
Experience Analysis Through an Event Based Model
487
entities and the following Table 1 summarizes the percentage of each type of entity detected. Table 1. Percentage by type of entity detected. Entity type Number Percentage Person 315 84,91% Bus 2 0,54% Building 11 2,96% Car 14 3,77% Dress 4 1,08% Jeans 8 2,16% Bicycle wheel 4 1,08% Bicycle 2 0,54% Jacket 3 0,81% Shorts 4 1,08% Footwear 2 0,54% Skateboard 1 0,27% Skirt 1 0,27%
We note that the algorithm detects several times an entity that is really only present once in the scene. This is due to our protocol. We use 2D videos. If one entity overlaps another, the algorithm loses track of the first one. For example, there are not 11 buildings in the video (let alone 315 people). When an entity passes in front of the building, it “disappears” and then “reappears” after its passage. Then, the algorithm detects the same building twice. We choose to keep the raw data because these redundancies correspond to “events”: entities passing over each other. It reveals events’ extensions. To calculate the mereotopological relations between entities we had to restrict the RCC8 ontology [30]. We done that because of our data. They are not sufficient to calculate the following relations: TPP, TPPi, NTPP and NTPPi. Once more, this is due to the 2D-videos format. It needs a depth dimension to define whether an entity is a proper part (or not) of another entity. We made the decision to group those types of relation in one. We named P for “x is Part of y”. P groups the relations: TPP, TPPi, NTPP and NTPPi. Thus, we obtain 14 208 connections distributed as follows: 52.60% of connections are ‘disconnected’ type (DC), 0.83% are ‘externally connected’ type (EC), 39.02% are ‘part of’ type (P) and 7.55% are ‘partially overlap’ type (PO). We use the graph tool Gephi [31] to visualize the graph of connections in Fruchterman Reingold [32] with non-directed connections. We obtain cliques of entities according to their connections in each moments. A clique must be understood as “pieces of instantaneous scenes” (i.e. connected regions). The result was not readable so we chose to use graph-clustering techniques. We chose the community detection technique to discover groups of connected events. We applied the Louvain modularity technique [33] without modulate the number of clusters on connections
488
G. Beaudon et al.
between bounding boxes. By results, we obtain agglomeration of events (or pieces of customer experiences), which are made up of connected events for a more or less significant duration (Fig. 3).
Fig. 3. Connected events visualization after the use of graph-clustering techniques
Based on the 9 246 bouding boxes of the 326 entities detected, we obtain 1 979 clusters. 81% of bouding boxes are clustered versus 19% are unclassified. The average density of a cluster is about 3.77 entities. That means that our clusters could be pieces of customer experiences, the Nexus of “Experience n°2” (i.e. connected regions).These clusters are sets of tuple[region, connection, region]. Here, the term region has to be taken as an event. As a result, all clusters form a “hypergraph” [21], and each cluster is a hyperedges. According to [21] a hypergraph is a hypernework. “Hypernetworks are a natural multidimensional generalisation of networks, representing n-ary relations by simplices with n vertices”. There are several techniques for manipulating hypergraphs, including the theory of simplicial complex structures as shown by [21]. Thus, we use it to calculate the connectivity between Nexus (i.e. between connected events, our clusters) and try to reconstruct experiences trajectories. These trajectories are experiences in the classical sense; marketing teams manipulate those. Finally, the analysts who will use the Experience Observer will be able to explore, observe and analyze these trajectories and their constitution in order to determine if they are significant interactions or not of the observed situation. 4.4
The Example of Two Persons Playing with a Ball
In our video, the algorithm identifies a large number of persons. After the mereotopological calculation and the graph clustering, we detect multiple clusters (or Nexus) connected in multiple moments. They are composed of two persons, still present in the video, connected through a ball. These are persons passing a ball. The players represent events (spatio-tempoal regions), which connect to each other (Nexus) to form piece of experience. The connectivity of these pieces of experience represents “the game of balls” (figure a in Fig. 4).
Experience Analysis Through an Event Based Model
489
Fig. 4. Visualization of connectivity between drops of experiences
At one point, the players’ connectivity is “broken” by two passing women (themselves forming a connected region). Thus in our hypergraph there is a rupture that can be detected (figure b in Fig. 4). Finally, a detected person detaches from a connected region (a group of people sitting down) to join the two players. The connected region of the original players is then “extended” by a new connectivity (figure c in Fig. 4). From the connectivity analysis of the pieces of experiences (i.e. connected regions or nexus), we are able to observe emerging experiences. This proves the relevancy of analyzing the customer experience with hypergraph techniques. Hypergraph results from the calculation of the mereotopological connections of events.
5 Conclusion 5.1
Limits and Future Developments
The current study presents both limitations and perspectives. First, we use Google’s object tracking algorithm. But, we face constraints due to performance in terms of entities detected and data characterizing them. Thus, we will test other computer vision techniques in order to determine the most efficient one. Secondly, we work with 2D videos in a fixed shot. It brings many constraints on mereotopological calculation. We should work with 3D videos and, ideally, fourdimensional ‘regions’ to improve the relevance of calculation. To do this, we need to set up a video collection protocol on several axes with additional data coming from connected objects. Finally, the calculation of the mereotopological connections takes as input all the entities, as if they belonged to the same spatio-temporal interval. We therefore plan to
490
G. Beaudon et al.
create a grid pattern of the video (in two dimensions at first) to calculate the connections between entities that are co-located within a zone of the grid. 5.2
Contributions
This research applied to mutual health-insurance emphasized the limits of current model customer experience based only on cognitive and non-situated views. We showed that it is possible to complement marketing analyst practices with alternative theories of experience. We used the theory of “organic experience” developed by Whitehead for these purposes. Our last contribution in this article was to propose a complementary artifact for customer experience analysis. On the one hand, this artifact complies with the ontological constraints imposed by the theory of organic experience, and on the other hand, it is a complementary tool to the practices of experts in the field of customer experience management. As can be noticed, the organic theory of experience, used in conjunction with mereotopology and a massive video corpus of spontaneous interactions, offers great potential for innovation in the field of management customer experience.
References 1. Beaudon, G., Soulier, E.: Customer experience analytics in insurance: trajectory, service interaction and contextual data. In: Information Technology and Systems, vol. 918, pp. 187– 198 (2019) 2. Perrin, G.: Mutuelles: La concentration du secteur se poursuit, Argus de l’assurance (2017) 3. Pine II, B.J., Gilmore, J.H.: Welcome to the experience economy. Harv. Bus. Rev. 76, 97– 105 (1998) 4. Schmitt, B.H.: Customer Experience Management. A Revolutionary Approach to Connecting with Your Customers. Wiley, New Jersey (2003) 5. Moschetti-Jacob, F., Création d’un artefact modulaire d’aide à la conception de parcours client cross-canal visant à développer les capacités des managers des entreprises du secteur du commerce, Thesis, Paris-Dauphine (2016) 6. James, W.: Philosophie de l’Expérience, Trad. by E. Brun et M. Paris. Original Title: A Pluralistic Universe. Paris: Ernest Flammarion Éditeur, 368 p. Collection: Bibliothèque de philosophie scientifique dirigée par Gustave Le Bon (1910) 7. Whitehead, A.N.: Process and Reality. An Essay in Cosmology (Gifford Lectures, University of Edinburgh). Cambridge University Press, New York (1929). Macmillan, New York 8. Plutchik R., Kellerman H.: Emotion: Theory, Research, and Experience: Biological Foundations of Emotions (1986) 9. Ramaswamy, S., DeClerck, N.: Customer perception analysis using deep learning and NLP. In: Procedia Computer Science, vol. 140, pp. 170–178 (2018) 10. Suresh, S., Guru Rajan, T.S., Gopinath, V.: VoC-DL: revisiting voice of customer using deep learning. In: Innovative Applications of Artificial Intelligence (IAAI-18) (2018) 11. Sugianto, N., Tjondronegoro, D., Tydd, B.: Deep residual learning for analyzing customer satisfaction using video surveillance. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018)
Experience Analysis Through an Event Based Model
491
12. Guan, C., Qin, S., Long, Y.: Apparel-based deep learning system design for apparel style recommendation. Int. J. Cloth. Sci. Technol. 31(3), 376–389 (2019) 13. Khan, Y., Shafiq, S., Naeem, A., Ahmed, S., Safwan, N., Hussain, S.: Customers churn prediction Using Artificial Neural Networks (ANN) in Telecom Industry. IJACSA 10(9), 132–142 (2019) 14. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You Only Look Once : Unified, Real-Time Object Detection. CoRR, abs/1506.02640 (2015) 15. Tokmakov, P., Schmid, C., Alahari, K.: Learning to segment moving objects. Int. J. Comput. Vis. 127(3), 282–301 (2019) 16. Wang, H., Pirk, S., Yumer, E., Kim, V., Sener, O., Sridhar, S., Guibas, L.: Learning a generative model for multi-step human-object interactions from videos. Comput. Graph. Forum 38, 367–378 (2019) 17. Arsic, D., Hofmann, M., Schuller, B., Schuller, B., Rigoll, G.: Multi-camera person tracking and left luggage detection applying homographic transformation, In: Proceedings 10th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, PETS 2007, in Association with ICCV 2007, pp. 55–62 (2007) 18. Weinzaepfel, P.: Motion in Action: Optical Flow Estimation and Action Localization in Videos. Computer Vision and Pattern Recognition. Université Grenoble Alpes (2016) 19. Varzi, A.: On the boundary between mereology and topology. In: Proceedings of the 16th International Wittgenstein Symposium, pp. 423–442, Vienna (1994) 20. Suchan, J.: Declarative Reasoning about Space and Motion with Video. KI - Künstliche Intelligenz 31(4), 321–330 (2017) 21. Johnson, J.: Hypernetworks of complex systems. In: Social-Informatics and Telecommunications Engineering. Lecture Notes of the Institute for Computer Sciences, vol. 4, pp. 364– 375 (2009) 22. Munoz-Arcentales, A., Montoya, A., Chalen, M., Velásquez, W.: Improve customer experience based on recommendation and detection of a pattern change in eating habits. In: IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, pp. 221–225 (2018) 23. Saru, B., Ketki, M.: A new approach towards co-extracting opinion-targets and opinion words from online reviews. In: 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, pp. 1–4 (2017) 24. Yang, C., Zhang, C., Chen, X., Ye, J., Han, J.: Did you enjoy the ride? Understanding passenger experience via heterogeneous network embedding. In: IEEE 34th International Conference on Data Engineering (ICDE), Paris, pp. 1392–1403 (2018) 25. Kusinitz, S.: The Definition of a Buyer Persona [in Under 100 Words], Hubspot (2014) 26. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28(1), 75–105 (2004) 27. Google Video Intelligence. https://cloud.google.com/video-intelligence/docs/?_ga=2. 171356386.-37744516.1574100239. Accessed 20 Dec 2019 28. Clarke, B.L.: A calculus of individuals based on “connection”. Notre Dame J. Formal Logic 22(3), 204–218 (1981) 29. Smith, B.: Ontology and the logistic analysis of reality. In: Proceedings of the International Workshop on Formal Ontology in Conceptual Analysis and Knowledge Representation, pp. 51–68 (1993) 30. Grüninger, M., Aameri, B: A New Perspective on the Mereotopology of RCC8, 13 p (2017)
492
G. Beaudon et al.
31. Gephi. https://gephi.org/. Accessed 20 Dec 2019 32. Fruchterman, T.M.J., Reingold, E.M.: Graph drawing by force-directed placement. Softw. Pract. Exp. 21(11), 1129–1164 (1991) 33. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 10, P10008 (2008). (12p)
Level Identification in Coupled Tanks Using Extreme Learning Machine Alanio Ferreira de Lima1, Gabriel F. Machado2, Darielson A. Souza2(&), Francisco H. V. da Silva3, Josias G. Batista2,4, José N. N. Júnior2, and Deivid M. de Freitas2 1
Federal University of Ceará-UFC, Campus Sobral, Sobral, CE, Brazil [email protected] 2 Federal University of Ceará-UFC, Campus Pici, Fortaleza, CE, Brazil {gabrielfreitas,juniornogueira}@alu.ufc.br, {darielson,josiasgb}@dee.ufc.br, [email protected] 3 National Industrial Learning Service - SENAI, Fortaleza, CE, Brazil [email protected] 4 Federal Institute of Education, Science and Technology of Ceará-IFCE, Fortaleza, CE, Brazil
Abstract. This paper presents a study on the use of intelligent algorithms in the identification of nonlinear plant systems. The method of applied is an Artificial Neural Network (ANN) called Extreme Learning Machine (ELM), its choice for this work was because of its simplicity and high computational power. The nonlinear plant used is a bench of two coupled tanks. Several types of ELM ANN architectures have been tested. The architectures are all compared to each other using the adjusted R2 metric, it will faithfully evaluate the model approach including the number of neurons used in each ELM ANN architecture. Keywords: Systems identification Coupled tanks Network Extreme Learning Machine
Artificial Neural
1 Introduction The emerging use of identification and control methods is growing as industrial problems arise. Sometimes when a complex control strategy based in process models is required, good identification results are crucial even when in practical applications the noise and the lack of data present as obstacles. According to [1] systems during identification and control are viewed as linear systems if they are nonlinear, thus using their operating points to do linearization. This methodology is simple and more common in industry, but for processes that have complex dynamics it is necessary to improve its results. Therefore, many works are investigating the lines of computational intelligence applied to identification and control techniques such as: Fuzzy systems, neural systems and evolutionary computation [2]. New technologies associated with
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 493–500, 2020. https://doi.org/10.1007/978-3-030-45691-7_46
494
A. F. de Lima et al.
algorithm techniques affect drastically the production results of an industry. A set of possibilities in the subject of identification and intelligent control has been studied and developed since computational science, technological complexity and communication technologies have become ever-growing efficient. The work of [3] presents a proposal to identify a robotic manipulator using Computational Intelligence techniques, the novelty is a hybrid system with ANN and PSO. In an industry, efficiency in production lines leads to a competitive operation with other enterprises. Recently, in [4], an intelligent fuzzy control algorithm is studied to solve problems in identification and path control of traditional handling robots. Comparisons between the classical path control method and the fuzzy algorithm show that the second one is, combining the fuzzy control and the path deviation corrections, more practical. In [5], a recent application in a solar plant shows that a proposed method using an adaptive intelligent tracker system can reach a more efficient energy collection even in cloudy weather when compared to biaxial solar tracker. This result implies a great applicability in industrial installations. Other papers present intelligent control and identification techniques for industrial processes, such as in [6] in which a thermal process is controlled by an auto-tuning fuzzy logic based on soft computing. This method provided good performance and robustness in the temperature control of an electric oven. In [7] a neural controller replaces conventional PID controllers and decouplers to control the level of a single tank, a SISO process, and of interacting tanks, a MIMO process, minimizing the settling time and reaching better performances. In this context, the use of intelligent computational techniques to explore systems identification applications is quite relevant. In this paper we will address a level identification of coupled tanks using Artificial Neural Networks (ANN) with the Extreme Learning Machine (ELM) training. Several architectures will be tested in the hidden layer and their performance will be evaluated by the adjusted R2. The organization of the article is as follows. In Sect. 2, the description of the experimental bench is presented. Section 3 presents the methodology employed. In Sect. 4, results and simulations are discussed and in Sect. 5, the conclusions of this study are presented.
2 System Description The bench set of coupled tanks was developed to do several tests in level, pressure, flow and temperature control. The bench is provided with a programmable logic controller (PLC) to implement the PI and PID controllers. In addition, it has a drive panel equipped with a human machine interface (HMI) for monitoring and adjustment of control loops. Figure 1 shows the coupled tanks system that will be used for identification. The equipments present on the bench are: two tanks with measuring scales of 15 L; a 0.5 HP centrifugal pump responsible for the displacement of water from the lower tank to the upper tank; a frequency converter for driving the pump’s MIT; an 0 to 10 V analogic ultrasonic sensor in the upper tank for level control; ON/OFF switches on top
Level Identification in Coupled Tanks Using ELM
495
Fig. 1. Coupled tanks system bench
tank for level ON/OFF control; a ON/OFF solenoid valve for the water to flow from the upper to the lower tank. Input is voltage signal and output is liquid level. The main feature of a coupled tanks system is that one tank can significantly influence the other, or vice versa. The nonlinear dynamic is one of the challenges in the area of identification and control in this type of system.
3 Methodology The main objective is to identify the coupled tanks dynamic model when considering an open loop step input. Soon the output data was collected, the research throughout the development followed concomitant and sequential steps described visually in Fig. 2.
Fig. 2. Steps applied in the methodology
496
3.1
A. F. de Lima et al.
Data Acquisition
The data is collected having as the input a step of 10 L and the output the reaction curve in liters. Figure 3 shows the open-loop system response for a 50 s experiment. Reaction curve 10 level step
9 8
Level (L)
7 6 5 4 3 2 1 0
0
5
10
15
20
25 Time (s)
30
35
40
45
50
Fig. 3. Level reaction curve.
The open loop system behaves very similar to the water flow, in which case reaches the reference in approximately 25 s. 3.2
ELM ANN
For the identification algorithm the GNU Octave® version 5.1.0 has been used. Octave is a free software licensed under the GNU General Public License (GPL), it uses a high-level language. The neural network chosen is the Extreme Learning Machine (ELM). ELM is a neural network which has advanced features during training and can be used for classification, regression, filtering and other applications. According to [8] the ELM network usually and almost always has one-time updated weight training, similar to a linear model, this is indeed one of its advantages over other ANNs. Figure 4 shows an example of an architecture of a neural network. By default, ELM has only one hidden layer.
Level Identification in Coupled Tanks Using ELM
497
Fig. 4. Architecture of ELM ANN.
The weights w are randomly initialized as in a standard neural network, and the initial weight ranges [−1 1] were used. The next step is to determine the matrix H in the hidden layer, that can be seen in Eq. (1). H i ¼ xi1
2 w11 6 . i . . .: xm ; 1 4 .. b1
.. .
3 2 3 w1d f ðH 1 Þ .. 7 4 : 5 . 5!H= f ðH N Þ bd
The ELM training is very illustrative, it is shown in Algorithm 1.
ð1Þ
498
A. F. de Lima et al.
4 Results With the implementation of several ELM network architectures, it is made an evaluation of the best model according to the adjusted R2. To avoid difficulties in the interpretation of R2, the adjusted R2 was proposed. According to Eq. (2).
R2a
n1 ¼1 1 R2p n ð p þ 1Þ
ð2Þ
The value of the adjusted R2 is strongly dependent on model parameters. For the model to be good, a balance must be made between the classic R2 and the number of vestments. After 10 rounds of executions in each architecture, the average and best result of adjusted R2 was observed, as shown in Table 1. Table 1. All identification results being evaluated by the adjusted R2 Architectures ELM (5 neurons) ELM (10 neurons) ELM (15 neurons) ELM (20 neurons) ELM (25 neurons) ELM (30 neurons)
Average (adjusted R2) 90.21 91.03 92.55 92.45 92.41 92.39
Max (adjusted R2) 93.12 93.56 95.54 95.50 95.55 95.48
According to the results of Table 1 the ELM with 15 neurons obtained the best average of adjusted R2 with 92.55, the best result among the 10 executions was the architecture with 25 neurons 95.55. Figure 5 presents the best result of the identification with 15 neurons and with the score of 95.54 (adjusted R2). The choice of an ANN with 15 neurons as the best was due to the better performance of the mean value.
Level Identification in Coupled Tanks Using ELM
499
Level (L)
10 Real Identification
5
0
0
5
10
15
20
25 Time (s)
30
35
40
45
50
0
5
10
15
20
25 Time (s)
30
35
40
45
50
error
0.5
0
-0.5
Fig. 5. Identification result
5 Conclusion This work presented the coupled tanks level system as an example of industrial application of the ANNs in identification of a process model. Some works in this niche were highlighted, as well as the real ones inherent to the research. The liquid level data was acquired and then identifications with several neural network architectures with the ELM training were made. The adjusted R2 was used as metric, because besides the model fit the metric also includes the number of parameters used in it. After a well explored evaluation with the number of neurons ranging from 5 to 30, it was concluded that the best choice is the network with 15 neurons. Acknowledgment. The authors thank the IFCE (Federal Institute of Ceará-Fortaleza) and SENAI, for providing the experimental bench for the article.
References 1. da Silva, P.R.A., de Souza, A.V., Henriques, L.F., Coelho, P.H.G.: Controle De Nível Em Tanques Acoplados Usando Sistemas Inteligentes. I Simpósio Brasileiro de Inteligência Computacional (2007)
500
A. F. de Lima et al.
2. Haykin, S.: Neural Networks: Principios and Pratice (2001) 3. Souza, D.A., Reis, L.L.N., Batista, J.G., Costa, J.R., Antonio Jr., B.S., Araújo, J.P.B., Braga, A.P.S.: Nonlinear identification of a robotic arm using machine learning techniques. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S. (eds.) New Knowledge in Information Systems and Technologies. WorldCIST 2019. Advances in Intelligent Systems and Computing, vol. 931. Springer, Cham (2019) 4. Yu, Z.: Research on intelligent fuzzy control algorithm for moving path of handling robot. In: International Conference on Robots & Intelligent System (ICRIS) (2019) 5. Mekhilef, S., Saymbetov, A., Nurgaliyev, M., Meiirkhanov, A., Dosymbetova, G., Kopzhan, Z.: An automated intelligent solar tracking control system with adaptive algorithm for different weather conditions. In: IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS) (2019) 6. Mastacan, L., Dosoftei, C.-C.: Temperature intelligent control based on soft computing technology. In: International Conference and Exposition on Electrical and Power Engineering (EPE) (2016) 7. Shamily, S., Praveena, Bhuvaneswari, N.S.: Intelligent control and adaptive control for interacting system. In: IEEE Technological Innovation in ICT for Agriculture and Rural Development (TIAR) (2015) 8. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)
Decision Intelligence in Street Lighting Management Diogo Nunes1 , Daniel Teixeira1 , Davide Carneiro1,2(B) , Crist´ ov˜ ao Sousa1,3 , 2 and Paulo Novais 1
3
CIICESI/ESTG, Polytechnic Institute of Porto, Felgueiras, Portugal {8140365,8140360,dcarneiro,cds}@estg.ipp.pt 2 INESC TEC, Felgueiras, Portugal [email protected] Algoritmi Center/Department of Informatics, University of Minho, Braga, Portugal
Abstract. The European Union has been making efforts to increase energy efficiency within its member states, in line with most of the industrialized countries. In these efforts, the energy consumed by public lighting networks is a key target as it represents approximately 50% of the electricity consumption of European cities. In this paper we propose an approach for the autonomous management of public lighting networks in which each luminary is managed individually and that takes into account both their individual characteristics as well as ambient data. The approach is compared against a traditional management scheme, leading to a reduction in energy consumption of 28%. Keywords: Smart cities
1
· Intelligent system · Energy management
Introduction
Cities currently face several challenges such as environmental stressors, overpopulation, and other related issues (e.g. traffic problems, air pollution). In the search for technological solutions for these problems, the so-called Smart Cities emerged [2]. At the most basic level, a Smart City implies the use of Information and Communication Technologies (ICT) to connect its different services and/or resources, allowing data to be collected, analyzed and acted upon in real time [3]. However, the simple interconnection of these elements is not enough to make a city “Smart”. Thus, a Smart City also implies the use of the collected data with the goal to improve the citizens’ quality of life [4], the efficiency of the city management [8], the management of resources, and the sustainability of the city and its growth [6]. Hence, and as put argued by Mumford [10], the challenges posed by smart city projects are, typically, socio-technical in nature. One particularly important element of life in a city is public lighting: it contributes to citizen’s comfort, safety and perceived security [9]. At the same c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 501–510, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_47
502
D. Nunes et al.
time, it has also a significant economic impact in the municipalities’ expenditures with energy. In Portugal, the latest data show that public lighting amounts to 3% of the country’s expenses with energy [12]. Current technological developments allow the development of solutions to minimize energy consumption. In this paper we propose such a solution, validated in a network of more than 300 luminaries, also considering guidelines and legislation concerning the ideal levels of light in public spaces. To this end, we explore two main aspects. First, each luminary is managed individually. Indeed, in previous work we have shown how different luminaries of the same model behave differently in terms of energy consumption or running temperature [7]. This can be explored to optimize each luminary individually, taking into consideration its characteristics. Second, we use data about ambient luminosity in order to dim intensity in real-time for each individual luminary. The proposed system was developed and validated in a test setting, and then extrapolated to 4 months of data collected from a public lighting network with 305 luminaries. When compared to the scheme that is currently used by the municipality to manage this lighting network, the use of the proposed system would lead to a decrease in energy consumption of 28%.
2
Energy Consumption Efficiency in the Context of Street Lighting
Energy issues continue on the agenda, in particular in what concerns to consumption efficiency aspects. In fact, we are witnessing a transitional period, wherein existing renewable and clean energy sources are not enough or are not sufficiently mature to answer current global demands. Meanwhile, new solutions focused on redesigning energy consumption patterns emerge, in order to overcome sustainability unbalance. That is what Smart cities stand for, that is, they are on pursuit of new resource management through digital sustainability approaches. Within this context, cities’ street lighting network has been shifting towards a new technology paradigm that allows it to benefit from significant energy savings. However, despite of the economic return, energy consumption efficiency is not guaranteed. Energy efficiency in the context of street lighting is a broader concept and socio-technical in nature. Measuring energy consumption efficiency in a decision support perspective, implies to understand energy efficiency at its basis and in the socio-economic perspective. 2.1
Understanding Energy Efficiency Panorama
The European Union has been making efforts to increase energy efficiency within its member states. The 2030 climate and energy framework, revised in 2018 for the period from 2021 to 2030, aims the reduction of 32.5% in energy consumption replacing with 32% of the energy coming from renewable sources keeping the value of 40% in greenhouse gas emissions. Portugal has also defined its own
Decision Intelligence in Street Lighting Management
503
energy strategy in accordance do the European Union commitment and developed the PNAEE (National Action Plan for Energy Efficiency), containing measures and guidelines to be followed in Portugal. In the energy efficiency panorama, “lighting represents approximately 50% of the electricity consumption of European cities” [13], endowing street lighting a crucial role on cities energy efficiency road map, whose guidelines were expressed in a set o documents/standards for road lighting (EN 13201:2015) [1]. 2.2
Energy Efficiency Semantic Outlook
The generic definition of energy efficiency is related to the use of the lowest possible energy to produce the same amount of services or useful output [11]. The idea is to minimize energy consumption without harming the service quality and user comfort. Following this, street lighting management optimisation is the core engine through which is possible to achieve energy consumption efficiency. Hence, the better energy efficiency is understood, the better the optimisation approach might be. In this context, energy efficiency must be approached as an holistic concept [1] including several factors or parameters that might be classified as internal and external factors, respectively. Internal factors are related to current (I), tension (v), electric power (w), color and temperature. Additionally, external factors include ambient data [5] and the location of street lighting installation [1], as well as life quality parameters [4]. Figure 1 depicts a conceptual ontology for street lighting energy consumption, representing a common understanding upon the domain. It is grounded on the rationale that street LED luminaries have a consumption behaviour that may be classified as typical, atypical and ideal. This classification is obtained according to the luminary energy consumption pattern defined according to several factors: i) Savings, given by the lamp efficacy ratio; ii) internal factors such as dimming, temperature and power; iii) external factors (moonlight, weather, traffic, dust), and; iv) light quality factor, which depends on the location of the street light installation and the underlying illuminance pattern. The location is classified in EN 13201 according to the street utility and user needs, which by its turn has an associated illuminance pattern. This conceptualisation effort followed and iterative and incremental process, supported by literature reviews, with sporadic interactions with domain experts. The conceptual ontology is to be further developed towards its formalisation. Meanwhile it contributes for: i) data interpretation; ii) classification of the consumption pattern within an holistic perspective. The formulation of useful decisions, typically composed of a chain of actions, must be aligned to the intended outcomes. Actions and outcomes relations are better understood when there is a shared vision of the domain. In this sense, the conceptualisation ease the classification of the “system status”, promoting decisions utility.
504
D. Nunes et al.
Fig. 1. Conceptual Ontology for street Lighting consumption
3 3.1
Case Study Data Set Characterisation
As mentioned earlier, providing smart decisions on how to minimize energy consumption requires contextualized data according to environmental factors (e.g., weather data, moonlight, traffic), normative factors and street lighting installation blueprint. The main source of data used in this work is a public lighting network in a production setting in a Portuguese municipality with 305 luminaries. According to EN13201 normative, the set of luminaries were installed in a P2 zone, characterised as a pedestrian zone defined by horizontal illuminance patterns. The luminaries are distributed unilaterally in a 4 m wide street plus 1 m of sidewalk. To this type of location, the average lux should be 15, with a minimum of 3. In this scenario it is assumed that the luminaries are at 4 m high, with 14,5 m distance in between. However, the data collected from this network was devoid from any kind of attributes than those related to internal factors. Thus, there was the need to enrich the available data-set with other relevant attributes, having into account the aforementioned domain model Fig. 1. Thus, a second source of data was considered: a group of two luminaries in a laboratory setting and additional sensors. This setting, asides from freedom
Decision Intelligence in Street Lighting Management
505
concerning the management of the luminaries, allowed us to deploy additional sensors to enrich the data collection process. Both data sources and the data collected are described in Sects. 3.2 and 3.3. In both settings the AQRUILED’s ARQUICITY R1 luminary is used. This luminary allows for different data to be collected from its functioning in real time. 3.2
Production Setting
The dataset collected from the production setting contains 3.855.818 instances of data. Each instance describes 5 min of operation of a specific luminary and includes, among others, data about instant voltage, luminary temperature, instant power, accumulated energy (Wh), uptime or dimming. These data were collected over a period of four months, between the September 5th 2017 and January 3rd 2018. The data from the luminaries was merged with weather data, also collected at 5-min intervals from a local weather station. These data include air temperature (◦ ), dew temperature (◦ C), humidity (%), wind speed (m/s), wind direction (degrees), wind gust (m/s), pressure (mbar), solar irradiance (W/m2) and rain (mm/h). This allows to study the influence of external factors such as temperature on energy efficiency. However, one of the limitations of this dataset is that it is a very homogeneous one. This derives from the fact that this is a production setting and must be managed according to the municipality’s policies. For instance, 90% of the data was collected from luminaries set at a dimming between 80% and 90%. This prevents us to explore the whole search space. We hypothesize that the ideal value of dimming is not static, i.e., it varies according to the conditions of the environment, and that it may be outside of the 80%–90% interval. For that reason, data was also collected in the laboratory setting as described in the following section. 3.3
Laboratory Setting
Given the limitations of the data collected in the production setting described in the previous section, data was also collected in the laboratory setting. In this setting, data was also collected at 5-min intervals from two luminaries in the test setting, between July 31st and October 4th, 2018. These data include, like in the production setting, data from the luminaries and weather data. However, additional sensors were also used to collect data regarding luminosity at two different points (Fig. 2): one at the pedestrian level and the other above the luminaries. These data were then combined, which allows to study the influence of each of two key variables in the level of luminosity experienced by the pedestrian: dimming and ambient luminosity. Moreover, the luminaries were programmed to change dimming at every 20 min, continuously changing between 50% and 100% with steps of 10%.
506
D. Nunes et al.
Fig. 2. Placement of the luminosity sensors in the laboratory setting.
This provides a much more complete dataset about the luminaries’ behavior in different schemes, widening the available search space. Figure 3 shows how the luminosity measured under the luminary varies according to the dimming, without ruling out the effects of ambient light.
Fig. 3. Distribution of luminosity measured at pedestrian level in different dimmings, without ruling out the effect of ambient light.
4
Implementation and Results
As described in the Introduction section, the main goal of this work is to devise an approach for an individualized management of each luminary in a network, that takes into consideration the individual characteristics of each luminary as well as the intensity of ambient light in each region, in order to minimize energy consumption while maintaining a comfortable and safe level of lighting. To this end, the minimum level of luminosity was defined as 20 lx, in accordance with the European Norm 13201, which defines the standards for road lighting. To achieve the proposed goal, the following methodology was implemented. A Machine Learning model was trained with the dataset collected at the laboratory setting, whose aim is to predict the luminosity at pedestrian level, given the luminary characteristics (including dimming) and the ambient light. This model thus infers the influence of each variable (among which dimming and ambient
Decision Intelligence in Street Lighting Management
507
light) on the level of luminosity experienced by a pedestrian. It then allows its use to estimate, for each luminary, the best configuration in order to minimize energy consumption while maintaining appropriate lighting levels. Specifically, a Random Forest model was trained. This is one of the most popular ensemble learning algorithms due to its relative simplicity and its resilience to overfitting. In this algorithm, multiple Decision Trees are used as weak learners. These trees are purposely made simple during training, namely by limiting its depth (branching is stopped early) or by limiting the amount of data or the feature vector used in each tree. Each tree is thus trained on a different set of data, a process known as Bagging. In this work, the resulting ensemble is composed of 50 trees, in which each tree was trained with 60% of the input variables, selected randomly. The ouput of the model is a numeric value that represents the predicted level of luminosity at pedestrian level, for a given scenario. The model exhibits an RMSE = 3.68 (r2 = 0.53). Figure 4 shows the observed value of luminosity at the pedestrian level for a randomly selected luminary/night, against the luminosity predicted by the model for the same data. The correlation between observed and predicted values is 0.89. The variations in luminosity throughout the night are due to the ongoing changes in dimming in the operation of the luminaries of the laboratory setting, as described in Sect. 3.3, and to eventual natural changes in ambient light.
Fig. 4. Observed luminosity at pedestrian level for a randomly selected night, and luminosity predicted by the model (ρ = 0.89).
Figure 5 shows how the level of luminosity varies during one night selected randomly, and the prediction of the model for different levels of dimming. This Figure shows that, for this specific night, the optimum dimming would be around 60% as it provides the minimum desired level of luminosity according to the EN 13201 standard while minimizing energy consumption.
508
D. Nunes et al.
Fig. 5. Observed ambient luminosity (bottom line) and predicted luminosity at pedestrian level with different dimmings.
4.1
Autonomous Luminary Management
Once the model was trained, the next step was to devise a method to implement the autonomous management of the public lighting network to optimize energy consumption while maintaining light quality. The current scheme used by the municipality to manage the production setting is rather rigid: in weekdays the luminaries are set to 80% of dimming and in weekends they work at 100%. This is hardly optimized for energy consumption. However, given that this is a production setting, we are not allowed to use this approach on site without prior validation. To overcome this drawback, the following approach was implemented, with the goal to estimate the energy savings of implementing an optimized management scheme in the production setting. First, we must note that the luminaries in the production setting do not have ambient luminosity sensors. This prevents us to directly apply the model. In that sense, for each day of data in the production dataset, we randomly selected one day of ambient luminosity data from the test dataset, and combined them. The goal is, in the absense of luminosity data, to simulate it in a realistic manner. With this, we are able to predict the luminosity at the pedestrian level. The next step was to devise a method for selecting the optimum dimming for each luminary. To this end, it is first necessary to predict power consumption from dimming. Since the power consumption in a luminary grows with dimming, a quadratic fit was calculated to model the relationship between the two variables (Fig. 6). The resulting quadratic function can thus be used to predict the power consumption associated to a given dimming.
Decision Intelligence in Street Lighting Management
509
Fig. 6. Relationship between dimming and power consumption (RMSE = 1.95).
The following approach is implemented to select the optimum dimming for a specific luminary, at any given time. A binary search scheme is used for the interval 50%–100% dimming. Thus, it starts at a dimming of 75%. For this dimming, and given the ambient light, both the power consumption and luminosity level at pedestrian level are estimated: the former using the quadratic regression, the latter using the trained model. If the predicted luminosity is below the 20 lx threshold established in the EN 13201 standard, the search continues to the right, i.e., at a dimming of 87.5%. Otherwise, it continues to the left, at a dimming of 62.5%. At each step, the estimations of power consumption and luminosity are updated. The process goes on until a value lower than 20 lx is reached and there is no possible search to the right. At this point, the previous value of dimming is selected. This approach was used for every instance of data in the production dataset, in order to simulate the operation of an autonomous system carrying out a real-time management of each individual luminary, based on real-time data on ambient luminosity and luminary state. Results show a decrease in power consumption of around 28%. The observed energy consumption over the 4 months of data was approximately 95.728 kW. In contrast, the estimated power consumption using the optimization scheme is of approximately 69.585 kW.
5
Conclusions
Most of the developed countries are currently committed with energy efficiency plans in which renewable energy sources and more efficient devices are used in order to decrease energy consumption and the associated carbon emissions. However, other technological developments such as those made possible by IoT and the Smart Cities umbrella allow further improvements. In this paper we presented a data-oriented approach for the autonomous management of public lighting networks. It is based on the acquisition of data from several sources, including luminaries, weather stations and ambient sensors. The proposed management scheme treats each luminary individually and takes into consideration
510
D. Nunes et al.
their surrounding conditions. When compared to the current management policy, the proposed approach leads to a decrease of 28% in energy consumption while still maintaining the lighting levels defined in the European norm for the corresponding zone. Acknowledgments. This work is co-funded by Fundos Europeus Estruturais e de Investimento (FEEI) through Programa Operacional Regional Norte, in the scope of project NORTE-01-0145-FEDER-023577 and by national funds through FCT – Funda¸ca ˜o para a Ciˆencia e Tecnologia through projects UID/CEC/00319/2019 and UIDB/04728/2020.
References 1. Road lighting standards. Standard EN 13201:2015, European Committee for Standardization (CEN) (2015) 2. Albino, V., Berardi, U., Dangelico, R.M.: Smart cities: definitions, dimensions, performance, and initiatives. J. Urban Technol. 22(1), 3–21 (2015) 3. Bakıcı, T., Almirall, E., Wareham, J.: A smart city initiative: the case of barcelona. J. Knowl. Econ. 4(2), 135–148 (2013) 4. Barrionuevo, J.M., Berrone, P., Ricart, J.E.: Smart cities, sustainable progress. IESE Insight 14(14), 50–57 (2012) 5. Boyce, P., Fotios, S., Richards, M.: Road lighting and energy saving. Light. Res. Technol. 41(3), 245–260 (2009). http://journals.sagepub.com/doi/10.1177/ 1477153509338887 6. Caragliu, A., Del Bo, C., Nijkamp, P.: Smart cities in Europe. J. Urban Technol. 18(2), 65–82 (2011) 7. Carneiro, D., Sousa, C.: The influence of external factors on the energy efficiency of public lighting. In: 2018 Proceedings of the CAPSI 2018 - 18th Conference of the Portuguese Association for Information Systems, vol. 39. CAPSI (2018) 8. Chen, T.M.: Smart grids, smart cities need better networks [editor’s note]. IEEE Network 24(2), 2–3 (2010) 9. Knight, C.: Field surveys of the effect of lamp spectrum on the perception of safety and comfort at night. Light. Res. Technol. 42(3), 313–329 (2010) 10. Mumford, E.: A socio-technical approach to systems design. Requirements Eng. 5(2), 125–133 (2000) 11. Patterson, M.G.: What is energy efficiency?: concepts, indicators and methodological issues. Energy Policy 24(5), 377–390 (1996). http://www.sciencedirect.com/ science/article/pii/0301421596000171 12. Pordata: Energy consumption in portugal by type. https://www.pordata.pt/ Portugal/Consumo+de+energia+electrica+total+e+por+tipo+de+consumo1124. Accessed 20 Nov 2019 13. Rabaza, O., Molero-Mesa, E., Aznar-Dols, F., G´ omez-Lorente, D.: Experimental study of the levels of street lighting using aerial imagery and energy efficiency calculation. Sustainability 10(12), 4365 (2018). http://www.mdpi.com/2071-1050/ 10/12/4365
Cardiac Arrhythmia Detection Using Computational Intelligence Techniques Based on ECG Signals Jean C. C. Lima1, Alanio Ferreira de Lima2, Darielson A. Souza3(&), Márcia T. Tonieto1, Josias G. Batista3,4, and Manoel E. N. de Oliveira2 1 Lourenço Filho College-FLF, Fortalezal, CE, Brazil [email protected], [email protected] 2 Federal University of Ceará-UFC, Campus, Sobral, CE, Brazil [email protected], [email protected] 3 Federal University of Ceará-UFC, Campus Pici, Fortaleza, CE, Brazil {darielson,josiasgb}@dee.ufc.br 4 Federal Institute of Education, Science and Technology of Ceará-IFCE, Fortaleza, CE, Brazil
Abstract. The article proposes a study of cardiac arrhythmia detection, because besides the elderly this type of anomaly can occur with adolescents as well. During an arrhythmia the heart may beat very fast, very slow, or with an irregular rhythm. Often the physician may have difficulty detecting which type of arrhythmia, so a system will be proposed to assist in the medical diagnosis to detect arrhythmia in the patient. For this will be a database of ECG signal of several patients, where was applied an ANN (Artificial Neural Network with ELM training) to make the classification. Both standardized and raw data were used during the training stage, as well as several neuron architectures in the hidden layer were tested to have a good accuracy that will be the metric used in the work. Keywords: ECG Cardiac arrhythmia Artificial Neural Networks Computational intelligence Pattern classification
1 Introduction One of the problems that can occur in the medical field is identifying what kind of arrhythmia the patient is experiencing. The very fast heartbeat is called tachycardia [1]. whereas the very slow heartbeat is called bradycardia. Most arrhythmias are harmless, but some can be serious or even fatal. During an arrhythmia, the heart may not be able to pump enough blood to the body. Lack of blood flow can damage the brain, heart and other organs. Thus it is important to know what type of arrhythmia the patient has, as this information can be used to make the necessary medications and treatments. There is a lot of research underway in the area of Computational Intelligence applied to health. In the study by [2], the performance of neural networks for the classification of electrocardiogram sequential (ECG) as normal or abnormal cardiac © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 511–517, 2020. https://doi.org/10.1007/978-3-030-45691-7_48
512
J. C. C. Lima et al.
arrhythmia was evaluated. MLP neural networks with backpropation training were used as a learning model. The work also makes a comparative analysis of the MLP ANN results with Support Vector Machine (SVM), random forests and logistic regression. In [3], the objective is mainly to classify diseases into normal and abnormal. UCI ECG signal data was used to train and test three different ANN models. ANN models are trained by the static momentum backpropagation algorithm to diagnose cardiac arrhythmia. Classification performance is assessed by measures such as mean squared error (MSE). The article by [4] conducts research with clustering methods, in which it aims to discover other groups. The work uses several databases including one from UCI’s ECG, where one uses a technique called Independent Variable Group Analysis (IVGA), it is based on unsupervised learning that, when modeling a dataset, also discovers a cluster input variables that reflect the statistical independence of the data. The work aims to develop an intelligent system for medical diagnosis assistance according to the database that will be extracted, which aims to use an ELM (Extreme Learning Machine) Neural Network to detect if the patient has arrhythmia. One of the major motivations of the work is to identify through the ECG signal and a neural network whether the patient during a medical appointment assists in the diagnosis of arrhythmia. The system aims to quickly help the diagnosis of the doctor to the patient. The work is organized in 5 sections including this one. The section is Artificial Neural Networks, it describes the features and some important definitions. The following section is the methodology, addressing the technologies and tools used to develop the work. Section 4 shows the results of the work, comparing various types of network architectures used, and finally the concluding section, discusses a general explanation of the work, detailing its contributions.
2 Artificial Neural Networks Nowadays with the internet and the use of cell phones it is possible to get updated information in real time, a specialist can more accurately analyze the patient’s progress, thus facilitating their diagnosis. Digital technologies abound for cardiac monitoring, blood pressure, sleep control, etc. Currently there are information systems that enable the accurate diagnosis of various diseases without the need for a specialist. The study of artificial intelligence covers several areas, it is true to say that “machines” dominate a good part of human life, in the sense of guiding, controlling and even deciding something based on more complex data. Studying the creation of an increasingly human-like robot is a reality in sight, both in appearance and in decision making. The most recently applied computational intelligence tools used are fuzzy inference, artificial neural networks (ANNs), and metaheuristics, in Fig. 1 shows a visual form of this representation.
Cardiac Arrhythmia Detection Using Computational Intelligence Techniques
513
Fig. 1. Representation of computational intelligence
According to [5], ANNs that are inspired by the learning capacity of the human brain, may represent an effective technique for estimating water consumption. An ANN is defined by one or more layers of fundamental constructs called neurons with weighted interconnections between each other, [6] and [7]. As can be seen in Fig. 2 an example of ANN composed of an input layer, two intermediate layers and an output layer. With the exception of the input layer, which has no neurons, all layers have computational power.
Fig. 2. Artificial neural network [8].
In the present work we used an ELM-type ANN, which is a feedforward neural network proposed in [9], with a simple and efficient learning algorithm. This network has only one hidden layer with random generated weights and one output layer with weights generated using the least squares method.
514
J. C. C. Lima et al.
3 Methodology The present work deals with an experimental research, since the object studied is detection of cardiac arrhythmia in a classification database. The method used will be a neural network with ELM training. As shown in Fig. 3, the development of the work that followed concomitant and sequential steps.
Fig. 3. Methodology steps
A freely available electrocardiographic data set was used, arrhythmia [10] used for diagnostic identification of cardiac arrhythmia. The base consists of 280 variables collected (characteristics) from 452 patients. The data set includes 74 real-value variables, such as patient age and weight, and wave amplitudes and durations of different parts of the signal recorded on each of the 12 electrodes. The 206 nominal variables codify, for example, the patient’s gender and the existence of several anomalies in each electrode. One variable describes the classification of a patient’s human cardiologist into one of 16 class types, whether or not containing arrhythmia. 3.1
ELM Training
The work used a model based on an Extreme Learning Machine (ELM) Neural Network, that is, an extreme learning machine. Octave software was used to simulate the estimates. The choice to use ELM is for its simplicity and faster performance efficiency. To classify if the individual has arrhythmia or not 20 neurons were used in the hidden layer. Thus the inputs were composed by the characteristics of each instance, while the output is composed of 16 classes. Figure 4 shows the structure of the ANN used.
Cardiac Arrhythmia Detection Using Computational Intelligence Techniques
515
Fig. 4. ELM structure
ELM training is very illustrative, can be seen in Algorithm 1 Begin Step 1: Randomly generates weights Wji, where i=1,..., N; Step 2: Matrix H of Hidden Layer; Step 3: Calculate output weights
;
End. Algorithm 1. ELM Training
Since the ELM forward network has only one round, it has been trained only once. The set was shuffled, in which the labels were in random positions, then split into 80% for training and 20% for testing, after which they were run 20 times with a single network architecture at each step. In addition to classifying the raw data from the sample set, we also classified the normalized samples using zscore. Zscore indicates how much above or below average a given score is in terms of standard deviation units. The same was calculated from Eq. (1). Another characteristic of Zscore is that the mean is null and variance is unitary. Z¼
Xl r
ð1Þ
516
J. C. C. Lima et al.
4 Results During the classification, the evaluation metric was accuracy, that is, the percentage of correct answers, as well as in normalized or non-standardized data. Many hidden layer network architectures were tested, and the most favorable results were the 18, 21, 25, 32 and 35 neurons. After submitting each of the candidates ANN topologies to the training process, the accuracy was calculated. Subsequently, the previously trained ANNs were submitted to the test set samples for their accuracy as well (with and without data normalization). Table 1 presents the best result of the classifications, the worst result and the average result of 20 executions. Table 2 presents all the accuracy results. Table 1. Best results during testing Computational model Number of neurons 18 21 Not normalized 78% 81% Normalized 79% 81%
hidden layer 25 32 35 88% 91% 90% 89% 94% 92%
By analyzing Table 2 the architecture with 32 hidden layer neurons had a better test phase result during 20 runs. Table 2 shows the average and worst result of the 20 test runs of each architecture. Table 2. Results with average and worst case Computational model
Number of neurons 18 21 Low (not normalized) 67% 68% Average (non-normalized) 75% 79% Low (normalized) 68% 68% Average (normalized) 76% 79%
hidden layer 25 68% 82% 70% 83%
32 72% 87% 73% 88%
35 71% 85% 72% 86%
By analyzing Table 2 in all cases, the 32 neuron ANN ELM architecture performed better than the other architectures. Consequently the architecture with 32 neurons is the best candidate to have its model used as a diagnostic tool. In both topologies a validation was used to add more value to the results. During the validation stage, the data set is shuffled and then 20% for testing and 80% for training. The validation process runs 20 times for each topology with non-normalized and normalized data. The results presented in Tables 1 and Table 2 proved the effectiveness of the models employed, and the topology with 32 neurons presented more efficient results in the classification of ECG signals in patients.
Cardiac Arrhythmia Detection Using Computational Intelligence Techniques
517
5 Conclusion In this work we presented the application of an ANNs with the Extreme Learning Machine (ELM) training algorithm applied in the classification of ECG signals in order to identify cardiac arrhythmia. Regarding the ELM-trained ANNs, several topologies were tested, but ANN with 32 hidden layer neurons performed better in relation to the presented problem, where the best accuracy was 91% with non-normalized data and 94% with normalized data thus validating. research as classifying this application with ANN ELM. The results showed that it is possible to use this type of ELM ANN as an ECG Signal Classifier to detect cardiac arrhythmia. However, the results of tests and training validate the research, showing that the model used can solve the proposed problem.
References 1. Awtry, E.H., Jeon, C., Ware, M.G.: Blueprints Cardiology (2006) 2. Haque, A.: Cardiac Dysrhythmia Detection with GPU-Accelerated Neural Networks (2015) 3. Jadhav, S.M., Nalbalwar, S.L., Ghatol, A.A.: Artificial neural network models based cardiac arrhythmia disease diagnosis from ECG signal data. Int. J. Comput. Appl. 44, 8–13 (2012) 4. Lagus, K., Alhoniemi, E., Seppä, J., Honkela, A., Wagner, A.: Independent variable group analysis. In: Learning Compact Representations for Data (2005) 5. Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Berlin (1996). https://doi. org/10.1007/978-3-642-61068-4 6. Haykin, S.: Redes Neurais: Princípios e Prática. trad. Paulo Martins Engel. 2.edn. Bookman, Porto Alegre (2001) 7. Kovacs, Z.L.: Artificial Neural Networks: Fundamentals and Applications, 4th edn. (2006) 8. Silva, I.N., Spatti, D.H., Flauzino, R.A.: Redes neurais artificiais para engenharia e ciências aplicadas. Artliber Editora Ltda, São Paulo (2010) 9. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing (2006) 10. Arrhythmia Data Set. Machine Learning Repository’s (1998). https://archive.ics.uci.edu/ml/ datasets/Arrhythmia
Personalising Explainable Recommendations: Literature and Conceptualisation Mohammad Naiseh1(B) , Nan Jiang1 , Jianbing Ma2 , and Raian Ali3 1
Faculty of Science and Technology, Bournemouth University, Poole, UK {mnaiseh,njiang}@bournemouth.ac.uk 2 Chengdu University of Information Technology, Chengdu, China [email protected] 3 Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar [email protected]
Abstract. Explanations in intelligent systems aim to enhance a users’ understandability of their reasoning process and the resulted decisions and recommendations. Explanations typically increase trust, user acceptance and retention. The need for explanations is on the rise due to the increasing public concerns about AI and the emergence of new laws, such as the General Data Protection Regulation (GDPR) in Europe. However, users are different in their needs for explanations, and such needs can depend on their dynamic context. Explanations suffer the risk of being seen as information overload, and this makes personalisation more needed. In this paper, we review literature around personalising explanations in intelligent systems. We synthesise a conceptualisation that puts together various aspects being considered important for the personalisation needs and implementation. Moreover, we identify several challenges which would need more research, including the frequency of explanation and their evolution in tandem with the ongoing user experience. Keywords: Explanations · Personalisation interaction · Intelligent systems
1
· Human-computer
Introduction
Information systems that have intelligent or knowledge components have been widely used including knowledge-based systems, decision support systems, intelligent agents, and recommender systems. With the increase in data volume, velocity and types, the adoption of solutions where intelligent agents and end-user interact and work closely have been increased in various application domains. The services provided to end-users range from presenting recommendations to the more interactive and engaging forms chatbots and social robots. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 518–533, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_49
Personalising Explainable Recommendations
519
The end-users’ trust is one of the main requirement for the success and acceptability of such services in real-world scenarios [19]. Generating understandable explanations has been considered as a fundamental demand to increase trust in recommendations made by the system [9,27,31,49]. In the literature, there are explanation models proposed and employed in the field of recommendation systems [8,15,43,45], machine learning [47], decision support systems [6,11] and robots [24]. For example in the field of machine learning, Sokol et al. [47] developed a voice-enabled device to answer user’s questions about the automated loan application decisions to help the end-users to understand the system rationale and identify potential errors and biases. Moreover, explanations help to address an openness culture around artificial intelligence applications and encourage the adoption of good practices around accountability [29,32], ethics [4] and compliance with regulations such as the General Data Protection Regulation in Europe (GDPR) [20]. The provided models for explanations in the literature were proposed to give enough information and compliance with regulations and social responsibility. The elicitation of users’ needs from these explanations remains limited, and the literature around how users want these explanations to be designed, timed and communicated has only recently started to become a pressing topic. Ribera et al. [44] suggested considering three groups of users, based on their goals, background and roles in the system. Various factors in the personalisation are yet to be researched and considered, and this includes the cognitive styles of users, their prior beliefs and personality and cultural characteristics such as agreeableness and uncertainty avoidance. In this paper, we survey the current approaches to personalising explanations in the literature of intelligent systems. We elicit and synthesise the main factors which are considered necessary for the design of personalised explanations and discuss research gaps for future research. The findings of the survey suggest that only a limited number of works have been conducted so far and that the number of publications started to increase in the last three years. Areas of interest include the need to evolve explanations together wither user experiences and their dynamic trust level and usage context.
2
Literature Review Process
To review the literature around personalised explanations in intelligent systems, we used several search engines; Google Scholar, IEEEXplore, Association for Information Systems (AIS), ScienceDirect (Elsevier), Springer and the ACM digital library. We started with a start-set of papers using the following keywords ‘personalized explanation’ and ‘personalised explanation’ for inclusivity of British and American English. As the number of papers retrieved was relatively low, the snowballing method was applied to ensure the best possible coverage of the literature. We followed the guidelines provided by Wohlin [53] for conducting systematic literature reviews using snowballing approach. Then, we iterated the
520
M. Naiseh et al.
backward and forward snowballing until no new papers were found. We did not restrict the starting date, and we found papers implementing personalisation for explanations from 1996. We stopped the search for papers at the end of November 2019. To decide whether to include a paper in the study, we needed to define the scope. The paper and its literature review method are mainly focused on explainable intelligent systems, by which we mean systems that inform the user about their reasoning and decision-making process including aspects about the dataset and training used and their certainty and chance for errors. Personalisation in the context of our study refers to “a toolbox of technologies and application features used in the design of an end-user experience” [17]. The included papers needed to match our definitions above and also discuss personalised explanations in intelligent systems, referring to how explanations can be tailored to a specific user or group of users. The number of papers retrieved by the keyword search and the snowballing was 206 papers. However, reading the paper and matching them to our criteria and scope above, we have a total of 48 papers. We excluded papers which talked of personalised explanations as a general requirement but did not focus on it as the main aspect. We also excluded papers which discussed personalisation of explanations as a general usability requirement without particular consideration of the nature of intelligent systems. Through a first read of the resulted 48 papers, we observed two main directions for the research in the area. The first focuses on the needs for personalised explanations from the users perspective and their requirements of such services. The second focuses on the implementation aspect and the process of designing explanation systems. This category of papers expresses partly of fully implemented personalisation technique in explainable systems. Figure 1 illustrates the number of papers in each category over the years. The figure denotes that papers that emphasise the needs for personalised explanations started in 2017. Generally, the number of papers in the needs category increased in the last three years. The reason for that could be attributed to public demand and new rights of people such as the right to automated decision making, including profiling in the European General Data Privacy Regulations (GDPR). On the other hand, we can see that the research on the implementation aspect started early in 1996. Even though the number of studies (48 studies) is relatively low, the increasing number of studies in the last three years shows that research is getting more interest in the academic community.
3
Personalised Explanations: Needs-Focused Research
The complexity of intelligent systems and their wide adoption in real-world and daily life applications like healthcare made them more visible and familiar but at the same time more questionable. Driven by theories of the ethical responsibility of technology development, informed consent and informed decision making, regulations started to emphasise and even demand the right of citizens to be
Personalising Explainable Recommendations
521
Fig. 1. The number of relevant papers in both categories (needs and implementations) over the years.
explained how systems work [20]. Researchers argued that the current approaches are limited in personalising explanations to end-users and their needs, which motives the researchers to match different end-users with different explanations types. For instance, Tomsett et al. [52] propose a model that define six different user roles in the machine learning system and argued that designers of the system should consider providing different explanations that match the needs of each role in the system. Also, Avi et al. [46] presented three categories of users who are different in their demands of the nature and level of explanations; Regular user, Expert user, and External entity. These results are supported by Millecamp et al. [36] who investigated how the end-user characteristics, including their personal profile and role in the system, affect the design of the explanations and the interaction with the system. We synthesised four categories of needs which can drive the personalisation of explanations and we explain them in the rest of this section. User’s Motivations and Goals. Users might pursue different goals throughout their interaction with the system. Goals such as curiosity, verify the output, learn from the system and improve future interaction require adaptive and personalise explainable systems to meet these motivations [7,21]. For instance, Gregor et al. [21] argued that novice users use the explanations for learning where experts need explanations for verification. Hind et al. [23] highlight that explanation should be provided to end-users based on their motivations and present four groups of users with different motivations. Those groups were i) End-users decision-makers, ii) Users affected by the decisions, iii) Regulatory bodies and iv) AI system developers. Cognitive Load. Personalisation can also consider the different ways people process information and use their thoughts, perception and memory to make a judgement. The cognitive load can also relate to the individual personality
522
M. Naiseh et al.
cognitive resources and learning style. For example, Feng et al. [18] point out that expert users process more information provided by the explanation interface to understand the output of the system compared to novice users. The user’s level of knowledge in the domain can also be a parameter that affects the cognitive load required to process explanations. Cost of the Decision. Personalising the explanation of a recommendation made by an intelligent system would also need to consider how expensive following that recommendation is for the user and whether it is critical (e.g. recommending changing the password v.s purchasing new security device) as argued by Kleinerman et al. [26] and Bunt et al. [5]. This estimation of costs and the acceptance of the recommendation and the demand for explanation and its personalisation could differ between users depending on personal and cultural factors, such as openness to new experience and uncertainty avoidance. We need user models for personalised explanations shall reflect such diversity. Compliance with Regulations. Krebs et al. [28] and Tomsett et al. [52] discuss the personalisation problem through the lens of compliance with the European GDPR (General Data Protection Regulations) where the users have the right to automated decision making, and this included being explained how they are being profiled and based on what data and processes.
4
Personalised Explanations: Implementation-Focused Research
In this section, we report on the literature that provided more practical processes and approaches to implementing personalisation techniques. This category considered either partly or fully such implementation through approaches like user modelling with the aim of adapting to the different individuals and groups. We centred our analysis of the literature in this area on the six-dimension model introduced by Fa et al. [17]. The adoption of this model helps our analysis to highlight different design facets of personalisation. The dimensions are Recipient, Entity, Channel, Mode, Tactics, and Unit of Analysis. In the following subsection, we elaborate how each of these dimensions was tackled in the literature and synthesise a conceptualisation of its facets. 4.1
Recipient
This dimension refers to users that receive and consume the explanation. More understanding about the user who is receiving the explanation and their previous interactions with the system is essential in order to time and convey the right explanation. In this regard, various factors were deemed important in the implementation of personalised explanations, and we detail them in the rest of this section. User Preferences. It refers to accommodating users’ preferences around the subject and filtering the recommendations and their explanations. In terms of information preferences, Chen et al. [9] provided personalisation approach based on
Personalising Explainable Recommendations
523
analysing users’ reviews to understand users’ preferences in a shopping recommender system. For example, the explanation communicated to user A emphasised the preference of the cuff of the T-shirt, whereas, user B explanation indicated the preference about the T-shirt collar. Similarly, Chang et al. [8] presented an approach based on analysing users’ reviews to find what are the movies features that the users prefer to be given in the explanation. User Personality. Kouki et al. [27] show how the end-user personality characteristics affect the choice of the explanations design elements where participants with low neuroticism prefer different explanation style compared to participants with high neuroticism. This means that emotional triggers, trust propensity and persistence in the explanation are main design facets. Cai et al. [7] state how different example-based explanations styles would be useful for users who undertrust the system more than users who over-trust the system. While trust is a dynamic phenomenon in its intrinsic nature [16], some users are more likely to be trusting than others due to personality differences. To date, little is known about the effects of user personality on the acceptance and usefulness of explanations. User Profile. Besides personality, that relates to the users mental and emotional states and typical attitude. The literature also considered user profile, which is more static and factual and relating to interaction with the system. For example, Kaptein et al. [25] compared the preference of explanation styles between adults and children. Similarly, Roitman et al. [45] generated explanations personalised to the patients’ medical record in a health recommender system. More personalised explanations have been studied in the literature and considered information like gender [26], domain knowledge [3,15] and user roles [3]. User-Controlled Personalisation. This style is defined in [12,22,30] where users are given control around how personalisation should be conducted. The approach considers the design of variable explanation system where users are responsible for controlling and configuring their own explanation. This can overlap with the other approaches. For example, users can configure what can be used from their profile and previous interaction to tailor a recommendation and whether they like algorithms like collaborative filtering to work and on what metrics of similarity index. 4.2
Entity
Fa et al. [17] define personalised entity as the object, entity or substance to be personalised and as being the instantiation of the action of personalisation. We reviewed the literature to conceptualise those entities in the content of explanations, and we discuss the obtained results in the following. Information. It refers to the information conveyed through the explanation and supposed to be consumed by the recipients. Carolis et al. [11] demonstrate how doctors in decision support systems require different explanation information compared to nurses and patients. From the doctors’ perspective, the explanation is not only about how the system comes up with the results, but also the
524
M. Naiseh et al.
cost of following the recommendation or overriding it. This means that the information content would be customised to the role, responsibility and liability of the recipients. Our literature review showed that implementation papers are driven by personalising information mainly. 27 papers out of 27 papers in the category of implementation-focused research tackled the personalisation of the information in the explanations. This entity could be personalised via different strategies, and we discuss this later in the Personalised Tactics section. Interface. It refers to the interface and interaction method utilised in order to present and convey the explanation to the recipients [9,12,27]. Different users may require different interface design in terms of complexity, interactivity, layout and multimedia used. There are also factors relating to the domain and its mission and sensitivity, which may affect the design choice. For example, patients in health recommender systems would rather prefer simple and direct interface design like text, user-friendly components, and easy to differentiate layout and colours. However, the domain can also impose some challenges on the simplicity requirements. Professionals like doctors and pharmacists interfaces might unavoidably need to be more complex to include graphs and advanced dialogues and visualisations. Learning is part of the personalisation of interfaces and its evolution. Diaz-Agudo et al. [12] build a Cased-Based Reasoning system to personalise the explanation interface to fit the explanation goal and the type of the user based on previous cases and feedback. Three papers out of 24 implementation-focused papers related to the interface design. In this section, we also outline open research challenges that may be considered in future research in term of entity dimension. The challenges are mainly focused on determining the frequency, which is the number of occurrences of explanations. In other words, it refers to the need overtime for the intelligent system to explain itself and how this can change. The main aim of using frequency in the personalisation process is to avoid information redundancy and overload and to fit the dynamics of users needs and the context. There are methods and implementations in the literature which can be potentially used for personalising the frequency. One example is the approach described by Sokol et al. [47], where the process of personalising frequency is left for the users to specify. On the other hand, Huang et al. [24] introduced an approach in which the frequency of the explanations is a system task. This approach is based on explaining the critical states and actions of the robots, rather than explaining all the robot actions. 4.3
Channel
This dimension refers to different methods and communication facilities through which users can access personalised content. Examples include online interfaces, printed documents, email, voice, non-verbal cues and haptic feedback. Our literature review noted a lack of research around this facet, and it is assumed that the choice is mainly relevant to the availability of resources within the task context where recipients and their characteristics and goals are not considered.
Personalising Explainable Recommendations
4.4
525
Personalisation Mode
This dimension refers to the type of interaction and dialogue between the system and the user to accomplish the personalisation. The user mode gives the individual users a choice to opt-in and specify their preferences and choose their explanations elements. Diaz-Agudo et al. [12] provide a configurable explanation interface which allows the user to select between different visualisation charts, change colours and sizes of the text. User mode can also enable users to request information when they need it. Chiyah Garcia et al. [19] provided an approach for on-demand queries for explanations from the intelligent system using Natural Language Generation techniques. Designer mode refers to the anticipatory or adaptive logic provided by the designers on how explanations made by the system should be derived and issued. An anticipatory personalisation is based on rules about the users’ profile and their characteristics such as preference, demographics, needs and cognitive. Bofeng et al. [3] classified the explanation content to five different groups for users based on their level of knowledge. The research matched the formed user models with the predefined explanation based on questions asked by the system in an expert system for earthquake prediction. Similarly, Quijano-Sanchez et al. [43] provide personalisation approach in a group recommender system based on a decision tree that steers the generation process of the personalised explanation. This approach combines knowledge extracted from Social Media, the knowledge generated by the user and group recommendations, and a number of additional factors like tie strength, satisfaction and personality to create word variances in the explanation. Adaptive techniques are more dynamic than anticipatory, where the system models the user behaviour based on its previous interaction with the system. Suzuki et al. [50] provide an approach to personalise the explanations using recurrent neural networks model that uses the users’ reviews as a training set for the generated explanations. We categorised the existing work based on the personalisation mode types. The results are shown in Table 1. 4.5
Personalisation Tactics
Fa et al. [17] define personalisation tactics as the different technological measures and strategies available for the designer to manipulate and enhance the effect of personalisation. From the papers we reviewed, we synthesise seven tactics that could be considered when designing personalised explanations and discuss them in the following. The choice of the strategy is not exclusive as the designer could have different strategies in the explainable interface (e.g. [3,11,12]). Complexity-Based Personalisation. This tactic reflects the adaptation of the explanation based on end-users’ ability to utilise the explanation concerning the level of complexity provided [3,19]. The complexity of the explanation could have a different number of lines, number of chunks, number of new concepts, rules, reasons, or level of details. Identifying what complexity factors of explanation to personalise is essential for balancing between the response time, information overload and the users’ information needs.
526
M. Naiseh et al.
Content-Based Personalisation. Information should be structured in a way that can be proceeded to meet the variable needs and context, and this concerns the indexing, tagging and filtering of the information in tandem with the users’ roles and their tasks and other usage characteristics. Chen et al and Lim et al. showed that users have different information needs [9,33], or users may request multiple information content from the system [13]. Stumpf et al. [49] showed the need to provide multiple information content to help the user to make informed decisions. Order-Based Personalisation. It refers to the order of the information content presented in the explanation and the phasing of explanation so that it meets the evolving users experience. This strategy appeared twice in our relevant papers [11,39] e.g. Carlios et al. [11] discussed how the order and the priorities of the presented information might differ between patients, nurses and doctors in medical decision support system. Evolvable Personalisation. The decision-making process underlying explanations derivation and delivery can be designed to learn and evolve during the time based on user feedback and actual interaction. Miller [37] argued that the intelligent system should consider the information that has already explained to end-users in order to evolve the explanation during the time. Milliez et al. [38] develop an algorithm based on this idea to update the users’ knowledge model so that the system can adapt the explanation to this level of knowledge. Similarly, Bofeng et al. [2] provide an evolvable approach based on an adaptive interview, which asks users questions to update and re-evaluate their knowledge during the usage time. Style-Based Personalisation. It indicates the orientation, level, granularity and framing adopted by the intelligent system explains the action and the underlying goal of the explanation. Our analysis showed that the explanation style is inherently related to the domain and the explainable algorithm. One example can be taken from the field of robots where Kaptein et al. [25] found that goalbased explanation, which provides information about the desired outcome from the decision was preferable by adults. Children’s preference, on the other hand, was more towards belief-based explanation, which explain the behaviour of the intelligent agent based on the reasons that let the agent choose one action over the others. Machine learning is also another field where the explanation style is studied. For example, Dodge et al. [13] showed how different explanation styles affect users’ perception of the fairness of the machine learning decisions, particularly, the difference between the local explanations and the global explanations. Moreover, Kouki et al. [27] used this tactic in recommender systems to personalise the explanation in a different format (text and graphics) and explanation styles (item-based, user-based and social-based). Presentation-Based Personalisation. It refers to the method used to convey explanations, including whether and how users can interact with the communication medium. Preferences towards presentation methods could differ between users based on their goals, level of knowledge and familiarity with the method.
Personalising Explainable Recommendations
527
Table 1. Categorisation of the existing work on personalised explanation using personalisation mode types. Reference Personalisation mode Approach User Anticipatory Adaptive [42]
x
–
–
Configure the explanation
[25]
–
–
–
Not available
[30]
x
–
–
Debug the explanation
[2, 3]
-
x
x
Fuzzy user model
[51]
–
–
x
Hybrid explanation method with regard to user preferences
[45]
–
x
–
Prioritization
[19, 47]
x
-
-
Dialogue
[15]
–
–
x
An algorithm that exploits users’ reviews and ratings
[43]
–
x
–
Decision tree
[10, 27]
-
-
x
Collaborative filtering
[9]
–
–
x
Collaborative filtering and exploiting users’ reviews
[50]
–
–
x
RNN that exploit users’ reviews
[8]
–
–
x
An algorithm based on modelling users’ interaction with the recommender system
[12]
x
–
x
Case-based reasoning and manual configuration
[14]
–
–
x
An algorithm based on analysing users’ reviews
[41]
–
–
x
A framework which takes the user profile and the recommendations to generate personalised explanations
[1]
–
–
x
An algorithm based on users’ goal
[38]
–
x
–
An algorithm based on hierarchical task network
[34]
–
–
x
A novel multi-task learning framework
[40]
–
–
x
A methodology based on exploiting users’ reviews
[11]
x
–
–
Questions for building user models
[35]
–
–
x
Multi-armed bandit algorithm
Feng et al. [18] indicated the importance of personalising explanations to enduser by studying the effect of the presentation method on expert and novice users. Results from their study produced a more accurate and realistic evaluation for machine learning explanations methods. Kouki et al. [27] used explanation
528
M. Naiseh et al.
presentation method as a control variable to find which visualisation method is more persuasive for the end-users over other methods. Format-Based Personalisation. It refers to change the language, framing and layout, including colours and font sizes to reflect the importance of certain parts of the explanation. Diaz-Agudo et al. [12] provide an approach to enable endusers to choose their explanation template in terms of colours, size and charts. 4.6
Unit of Analysis
This dimension refers to the view of the user that the personalisation design takes. The explanation can be designed to deal with categories of users, such as children or adults, experts or novices. In a different setting, it can be designed toward addressing a unique individual, assumed to be different from all others, e.g. the head of the emergency unit in a hospital. The literature around the unit of analysis dimension is summarised in details in Table 2. The majority of the personalisation techniques reviewed considered individual users as the unit of analysis. This is because most of the literature belonged to recommender systems research community where the user profile construction is the main Table 2. Research brief summary Data gathering methods
Implicit Explicit Mixed method Not available
[8–10, 14, 15, 27, 34, 40, 41, 43, 45, 50, 51] [3, 11, 19, 30, 35, 42, 47] [2, 12] [25]
[34, 40, 51] Recipient gathered User preferences data User profile Users knowledge No information
[8, 9, 11, 14, 15, 27, 41, 43, 47, 50]
Personalised content [2, 3, 10– Information 12, 19, 34, 42, 45, 47, 51] Interface
[8, 9, 14, 15, 25, 27, 30, 34, 40, 41, 43, 50] [9, 12, 27]
Personalised tactics [1–3, 10, 11, 14, 15, 19, 34, 38, 45, 47, 50]
Content
[8, 9, 30, 40–43]
Style Order Complexity Evolvable Presentation Format
[11, 25, 27, 35, 41, 51] [11, 39] [2, 3, 38] [2, 38] [12, 27] [12]
Unit of Analysis [1, 12, 14, 34, 38, 40, 41, 47, 50]
Individual
[8–10, 15, 27, 30, 42, 43, 51]
Group
[2, 3, 11, 25]
[10, 25, 45] [1–3, 11, 12, 38] [19, 30, 42]
Personalising Explainable Recommendations
529
personalisation practice. The groups profiling and their dynamics have been considered in the fields of machine learning and expert systems, but still the amount of research in that collective and group context is limited.
5
Discussion and Future Work
In this paper, we reviewed the literature of personalising the explanations in intelligent systems. We synthesised two classifications from different research domains such as Personalisation, User Experience and Explainable Artificial Intelligence [17,37]. Results from the analysis showed that current literature and approaches of explainable models in intelligent systems lacked user-based research. Such research is important for gathering the lived experience of users and their perception and preferences in the process and the match to their roles in the system [44,48], user personality [7,27], domain knowledge [1,18,38] and user goals [21,23]. Moreover, we also reviewed the current approaches in the Implementations category and outlined different algorithms, frameworks and methods for realising personalisation in explainable models. Table 2 presents a categorisation of the existing work in personalising the explanations. We note here that the consideration of users’ input and choices was not a main focus in the implementation strategies, although it was considered important, especially for giving a sense of control to users. The amount of interactivity and user control and the effect on user experience when balancing between intelligent and user-administered personalisation is an open research issue. Our aim in future research is to focus on the end-users’ perspective and their various personas and expectations, and we may need to produce explanations of a particular persona. To derive the goals of users and keep up to date with their dynamic nature, additional data are to be captured from the users, and this will require further explanation, i.e. meta explanation. We also need to provide approaches which enable the detection of needs and goals dynamics during the time to adapt these changes to the end-users. User studies such as diary studies, interviews and observations are needed to determine the nature of these dynamics. Results from these studies will provide us for guidelines to design long term explainable system and what data should be collected from the user to enhance this personalisation. Acknowledgments. This work is partially funded by iQ HealthTech and Bournemouth university PGR development fund.
530
M. Naiseh et al.
References 1. Barria-Pineda, J., Akhuseyinoglu, K., Brusilovsky, P.: Explaining need-based educational recommendations using interactive open learner models. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization UMAP’19 Adjunct, pp. 273–277. ACM, New York (2019). https://doi.org/10.1145/ 3314183.3323463 2. Bofeng, Z., Na, W., Gengfeng, W., Sheng, L.: Research on a personalized expert system explanation method based on fuzzy user model. In: Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No. 04EX788), vol. 5, pp. 3996– 4000. IEEE (2004) 3. Bofeng, Z., Yue, L.: Customized explanation in expert system for earthquake prediction. In: 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2005), pp. 367–371. IEEE (2005) 4. Bostrom, N., Yudkowsky, E.: The ethics of artificial intelligence. In: The Cambridge handbook of artificial intelligence, vol. 1, pp. 316–334 (2014) 5. Bunt, A., Lount, M., Lauzon, C.: Are explanations always important?: a study of deployed, low-cost intelligent interactive systems. In: Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, pp. 169–178. ACM (2012) 6. Bussone, A., Stumpf, S., O’Sullivan, D.: The role of explanations on trust and reliance in clinical decision support systems. In: 2015 International Conference on Healthcare Informatics, pp. 160–169. IEEE (2015) 7. Cai, C.J., Jongejan, J., Holbrook, J.: The effects of example-based explanations in a machine learning interface. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 258–262. ACM (2019) 8. Chang, S., Harper, F.M., Terveen, L.G.: Crowd-based personalized natural language explanations for recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 175–182. ACM (2016) 9. Chen, X., Chen, H., Xu, H., Zhang, Y., Cao, Y., Qin, Z., Zha, H.: Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2019) 10. Coba, L., Rook, L., Zanker, M., Symeonidis, P.: Decision making strategies differ in the presence of collaborative explanations: two conjoint studies. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 291–302. ACM (2019) 11. De Carolis, B., de Rosis, F., Grasso, F., Rossiello, A., Berry, D.C., Gillie, T.: Generating recipient-centered explanations about drug prescription. Artif. Intell. Med. 8(2), 123–145 (1996) 12. D´ıaz-Agudo, B., Recio-Garcia, J.A., Jimenez-D´ıaz, G.: Data explanation with CBR. In: ICCBR, p. 64 (2018) 13. Dodge, J., Liao, Q.V., Zhang, Y., Bellamy, R.K., Dugan, C.: Explaining models: an empirical study of how explanations impact fairness judgment. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 275–285. ACM (2019) 14. Dragovic, N., Madrazo Azpiazu, I., Pera, M.S.: From recommendation to curation: when the system becomes your personal docent IntRS (2018)
Personalising Explainable Recommendations
531
15. Dragovic, N., Pera, M.S.: Exploiting reviews to generate personalized and justified recommendations to guide users’ selections. In: The Thirtieth International Flairs Conference (2017) 16. Falcone, R., Castelfranchi, C.: Trust dynamics: how trust is influenced by direct experiences and by trust itself. In: 2004 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems AAMAS 2004, pp. 740–747. IEEE (2004) 17. Fan, H., Poole, M.: Perspectives on personalization. In: AMCIS 2003 Proceedings p. 273 (2003) 18. Feng, S., Boyd-Graber, J.: What can AI do for me?: evaluating machine learning interpretations in cooperative play. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 229–239. ACM (2019) 19. Garcia, F.J.C., Robb, D.A., Liu, X., Laskov, A., Patron, P., Hastie, H.: Explainable autonomy: a study of explanation styles for building clear mental models. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 99–108 (2018) 20. Goodman, B., Flaxman, S.: EU regulations on algorithmic decision-making and a “right to explanation”. In: ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY (2016) 21. Gregor, S., Benbasat, I.: Explanations from intelligent systems: theoretical foundations and implications for practice. MIS Q. 23, 497–530 (1999) 22. Groce, A., Kulesza, T., Zhang, C., Shamasunder, S., Burnett, M., Wong, W.K., Stumpf, S., Das, S., Shinsel, A., Bice, F., et al.: You are the only possible oracle: effective test selection for end users of interactive machine learning systems. IEEE Trans. Softw. Eng. 40(3), 307–323 (2013) 23. Hind, M., Wei, D., Campbell, M., Codella, N.C., Dhurandhar, A., Mojsilovi´c, A., Natesan Ramamurthy, K., Varshney, K.R.: TED: teaching AI to explain its decisions. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 123–129. ACM (2019) 24. Huang, S.H., Bhatia, K., Abbeel, P., Dragan, A.D.: Establishing appropriate trust via critical states. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3929–3936. IEEE (2018) 25. Kaptein, F., Broekens, J., Hindriks, K., Neerincx, M.: Personalised self-explanation by robots: the role of goals versus beliefs in robot-action explanation for children and adults. In: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 676–682. IEEE (2017) 26. Kleinerman, A., Rosenfeld, A., Kraus, S.: Providing explanations for recommendations in reciprocal environments. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 22–30. ACM (2018) 27. Kouki, P., Schaffer, J., Pujara, J., O’Donovan, J., Getoor, L.: Personalized explanations for hybrid recommender systems. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 379–390. ACM (2019) 28. Krebs, L.M., Alvarado Rodriguez, O.L., Dewitte, P., Ausloos, J., Geerts, D., Naudts, L., Verbert, K.: Tell me what you know: GDPR implications on designing transparency and accountability for news recommender systems. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM (2019). LBW2610 29. Kroll, J.A., Barocas, S., Felten, E.W., Reidenberg, J.R., Robinson, D.G., Yu, H.: Accountable algorithms. U. Pa. L. Rev. 165, 633 (2016)
532
M. Naiseh et al.
30. Kulesza, T., Burnett, M., Wong, W.K., Stumpf, S.: Principles of explanatory debugging to personalize interactive machine learning. In: Proceedings of the 20th International Conference on Intelligent User Interfaces, pp. 126–137. ACM (2015) 31. Lamche, B., Adıg¨ uzel, U., W¨ orndl, W.: Interactive explanations in mobile shopping recommender systems. In: Joint Workshop on Interfaces and Human Decision Making in Recommender Systems, p. 14 (2014) 32. Lepri, B., Oliver, N., Letouz´e, E., Pentland, A., Vinck, P.: Fair, transparent, and accountable algorithmic decision-making processes. Philos. Technol. 31(4), 611– 627 (2018) 33. Lim, B.Y., Dey, A.K.: Assessing demand for intelligibility in context-aware applications. In: Proceedings of the 11th International Conference on Ubiquitous Computing, pp. 195–204. ACM (2009) 34. Lu, Y., Dong, R., Smyth, B.: Why i like it: multi-task learning for recommendation and explanation. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 4–12. ACM (2018) 35. McInerney, J., Lacker, B., Hansen, S., Higley, K., Bouchard, H., Gruson, A., Mehrotra, R.: Explore, exploit, and explain: personalizing explainable recommendations with bandits. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 31–39. ACM (2018) 36. Millecamp, M., Htun, N.N., Conati, C., Verbert, K.: To explain or not to explain: the effects of personal characteristics when explaining music recommendations. In: IUI, pp. 397–407 (2019) 37. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2018) 38. Milliez, G., Lallement, R., Fiore, M., Alami, R.: Using human knowledge awareness to adapt collaborative plan generation, explanation and monitoring. In: The Eleventh ACM/IEEE International Conference on Human Robot Interaction, pp. 43–50. IEEE Press (2016) 39. Muhammad, K., Lawlor, A., Rafter, R., Smyth, B.: Great explanations: opinionated explanations for recommendations. In: International Conference on CaseBased Reasoning, pp. 244–258. Springer (2015) 40. Musto, C., Lops, P., de Gemmis, M., Semeraro, G.: Justifying recommendations through aspect-based sentiment analysis of users reviews. In: Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, pp. 4–12. ACM (2019) 41. Musto, C., Narducci, F., Lops, P., de Gemmis, M., Semeraro, G.: Linked open databased explanations for transparent recommender systems. Int. J. Hum Comput Stud. 121, 93–107 (2019) 42. Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., Mordvintsev, A.: The building blocks of interpretability. Distill 3(3), e10 (2018) 43. Quijano-Sanchez, L., Sauer, C., Recio-Garcia, J.A., Diaz-Agudo, B.: Make it personal: a social explanation system applied to group recommendations. Expert Syst. Appl. 76, 36–48 (2017) ` Can we do better explanations? A proposal of user44. Ribera, M., Lapedriza, A.: centered explainable AI. In: IUI Workshops (2019) 45. Roitman, H., Messika, Y., Tsimerman, Y., Maman, Y.: Increasing patient safety using explanation-driven personalized content recommendation. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp. 430–434. ACM (2010) 46. Rosenfeld, A., Richardson, A.: Explainability in human-agent systems. Auton. Agents Multi-Agent Syst. 33, 673–705 (2019)
Personalising Explainable Recommendations
533
47. Sokol, K., Flach, P.A.: Glass-box: explaining AI decisions with counterfactual statements through conversation with a voice-enabled virtual assistant. In: IJCAI, pp. 5868–5870 (2018) 48. Srinivasan, R., Chander, A., Pezeshkpour, P.: Generating user-friendly explanations for loan denials using GANs. arXiv:1906.10244 (2019) 49. Stumpf, S., Rajaram, V., Li, L., Wong, W.K., Burnett, M., Dietterich, T., Sullivan, E., Herlocker, J.: Interacting meaningfully with machine learning systems: three experiments. Int. J. Hum Comput Stud. 67(8), 639–662 (2009) 50. Suzuki, T., Oyama, S., Kurihara, M.: Toward explainable recommendations: generating review text from multicriteria evaluation data. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 3549–3551. IEEE (2018) 51. Svrcek, M., Kompan, M., Bielikova, M.: Towards understandable personalized recommendations: hybrid explanations. Comput. Sci. Inf. Syst. 16(1), 179–203 (2019) 52. Tomsett, R., Braines, D., Harborne, D., Preece, A., Chakraborty, S.: Interpretable to whom? A role-based model for analyzing interpretable machine learning systems (2018) 53. Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, p. 38. Citeseer (2014)
Colorectal Image Classification with Transfer Learning and Auto-Adaptive Artificial Intelligence Platform Zoltan Czako(B) , Gheorghe Sebestyen, and Anca Hangan Technical University of Cluj-Napoca, Cluj-Napoca, Romania {zoltan.czako,gheorghe.sebestyen,anca.hangan}@cs.utcluj.ro
Abstract. In automatic (computer-based) interpretation of medical images, the use of deep learning techniques is limited because of the lack of large publicly available datasets. With just hundreds of samples (images) in a dataset, the application of deep learning techniques is very hard, and the results are under expectations. Training a multi layer convolutional neural network requires thousands or even millions of images for an acceptable level of correct classification. In this paper we will present a novel approach that can be used to solve computer vision related problems (e.g. medical image processing) even when only a small dataset of images are available for training. We will show that combining Transfer Learning and some auto-adaptive artificial intelligence algorithms we can obtain very good classification rates even with the use of a limited dataset. As a demonstration of the effectiveness of our approach we will show the use of this technique to solve the polyp detection problem in endoscopic image sets. We show that using just a subset of the available images (from the original dataset containing 4000 images) the results are comparable with the case when all the images were used. Keywords: AutomaticAI · Particle Swarm Optimization · Simulated Annealing · Transfer learning · Colorectal polyp detection · Small data
1
Introduction
In the past few years, automatic image processing and computer vision has seen an enormous popularity in different application areas (e.g. medicine, industry, autonomous driving, robotics, etc.). This is mostly the effect of the latest impressive results on many computer vision related tasks, such as image classification, image segmentation, object recognition and detection, 3D scene reconstruction, etc. High quality results almost always are achieved with deep learning techniques, such as convolutional neural networks (CNN). Deep learning, a subset of artificial intelligence (AI) methods, has revolutionized the world of image processing. The availability of high performance computing infrastructures (e.g. parallel computers, cloud systems, GPUs, multi-core c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 534–543, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_50
AutomaticAI for Polyp Detection
535
processors, etc.) and the existence of publicly accessible huge amounts of data make deep learning a feasible solution for many areas. Nevertheless, there are still domains, such as medicine, where the amount of publicly available data is still limited, mainly because privacy related regulations or by intellectual property barriers. Its a common knowledge that Deep Learning techniques require huge amount of data for training. For example ImageNet [16] uses over 14 millions of images in order to classify them in approximately 22 thousands of classes. For some companies with impressive number of users and developers, collecting and labeling data may not be a problem, but for smaller researcher groups this task may be a bottleneck. Collecting input data is a very time consuming task and it requires human experts in that domain who can objectively label the input data in order to have a correct classification. Furthermore, usually obtaining data for healthcare is much harder, because of the legal constraints regarding patients’ privacy or the limitations of the medical equipment (not capable of providing raw data, just displayed results). To solve the problem of “Small Datasets” we developed a novel approach in which we split the image recognition problem into two steps: a feature extraction step and a classification one. For the feature extraction step, based on the Transfer Learning principle, we are using a publicly available pre-trained Convolutional Neural Network (CNN), (the ResNet’s [13] feature extraction part), which was trained to recognize thousands of image classes from millions of labeled examples. In the second classification step, we are using our AutomaticAI algorithm to automatically setup a classification layer for the pre-trained CNN. In this way much less labeled images are needed to train just the last part of the Deep learning pipeline. In order to test our method, we tried to solve the problem of detecting colorectal polyps using endoscopic images. We’ve chosen this problem, because this is one of the leading causes of cancer death. Statistics says that in 2018 there were 1.09 million new cases and more than 500.000 deaths due to this disease. This is a very challenging problem also because there are only a few datasets publicly available, so achieving a high performance score is a real challenge. The rest of this paper is organized as follows: in Sect. 2 we present some interesting methods used to solve colorectal polyp classification and compare them with our approach, in Sect. 3 we will describe the solution we used to create the classification pipeline, in Sect. 4 we present some experimental results and Sect. 5 concludes our research.
2
Related Work
After analyzing multiple colorectal polyp detection methods proposed in the literature we can organize them in multiple groups: solutions using traditional image processing methods, solutions using statistical methods or basic Artificial Intelligence approach, pure deep learning and the last category is the combined approaches. In the first category we can find different types of low-level image processing approaches using different types of filters and image transformation algorithms
536
Z. Czako et al.
to find the boundaries of the polyps. In [1] the authors assume that the polyps can be detected using only the local curvature of the image, but the problem is that this method is very dependent on the elevation of the polyp. The polyps that have a flat surface will be missed. Another interesting approach is presented in [2], where the authors use Log Gabor filters to simultaneously analyze the space and frequency characteristics of the images and Susan edge detector which uses circular masks. The basic assumption here is that the polyps will have circular forms. This method gives fairly good results, having a sensitivity over 95%, but the specificity of 67% is quite low, because of the assumption that polyps can only have circular form. Other papers like [3] or [4] are using texture-based or intensity valleys in their work, but these results are far from the scores obtained by state-of-art CNNs. In the group of AI approaches we can include papers like [5] or [6] in which Support Vector Machines (SVM) are used to predict colorectal cancer based on endoscopic images. Both research teams obtained impressive results, over 93% accuracy, but this result can further be increased using deep learning or hybrid approaches. There are lots of articles which are using only transfer learning [14] to solve this classification problem. For example in [7] the authors describe the effect of data augmentation techniques. In their solution, each image from the input dataset was randomly rescaled, rotated, flipped, cropped or the lights were modified in order to increase the number of training images. For the classification part they used a transfer learning approach. The chosen pre-trained model was the Inception V3 and the weights were ImageNet specific weights. In order to make this CNN compatible with the polyp classification problem, they added another fully connected layer containing 1024 neurons with ReLu activation function and at the end of the network they added 8 neurons for the 8 available classes. The final activation function was softmax. With this solution they obtained an f1-score over 91%. Other articles use down- and up-sampling networks in order to classify the images [8] or train multiple Neural Networks with different image scales and for the classification they use a single fully connected layer, this way combining the results for multiple CNNs [9]. In the last group of articles there are two categories of solutions. One category uses as first step a low-level image processing algorithm and transfer learning to obtain the result. The second category uses transfer learning for feature extraction and other AI algorithms for the second step of classifying those extracted features. For example, in [10], before feeding the CNN with the images, the region of interest was selected using low-level image processing techniques. This allows fast scanning of the image dataset and a more focused learning process, containing only the relevant part of the input images. This is a time optimization technique, it does not increase the detection accuracy, but it decreases the time to process the dataset and the training process. Similar solutions that use geometric features for generating polyp candidates are presented in [11] and [12]. Another way of solving the problem is by using CNNs for feature extraction and for example SVM to classify those features like in [18], [19] or [20].
AutomaticAI for Polyp Detection
537
Our approach is a hybrid solution, but it differs from the methods presented above, because instead of just manually selecting an Artificial Intelligence algorithm and tune its parameters to classify the feature vector provided by the CNN, we used our AutomaticAI algorithm to automatically select the best AI algorithm type based on the context of the problem and also automatically tune its parameters to maximize the classification score. This is a generic solution which can be used in any image classification problem and it does not require human intervention (the global optimal solution will be approximated automatically).
3
Colorectal Polyp Detection Using AutomaticAI
In our approach we used transfer learning from the previously trained convolutional neural network called ResNet50 [13], in order to resolve the problem of feature extraction. Then, we fed these generated feature vectors into our AutomaticAI algorithm to automatically select and tune the best AI algorithm used for classification. Because in the classification step we did not use a fully connected layer, which requires lots of data for the training process, this solution gave us high evaluation and testing scores even when using only 300 images per class. In the following subsections we will detail this solution. 3.1
Transfer Learning for Feature Extraction
Transfer learning [14] is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. In this solution we used ResNet50 trained on ImageNet [15,16]. ImageNet is a very large and generic image dataset containing over 14 millions of images and approximately 22 thousands of classes, so this pre-trained model is generic enough to be used for extracting high-level features from images. To read only the feature vectors generated by the ResNet50 we removed the last dense layer. 3.2
AutomaticAI for Automatic Classification Algorithm Selection and Optimization
In the second step of our solution, we used the AuomaticAI algorithm. The purpose of this algorithm is to automatically select a classification algorithm, which matches the current problem and to optimize its parameters, this way increasing the evaluation and testing score of the system. AutomaticAI is a generic algorithm that can be used in any kind of classification problem and it will automatically adjust itself and select a matching algorithm based on the current context, and on the input data. By combining this algorithm with a feature extraction step, we can resolve any kind of image classification problem. The most important benefit of the AutomaticAI is that it does not require lots of training data to reach good results and it does not require
538
Z. Czako et al.
experts to configure the AI algorithms. It will handle the problem of algorithm selection and optimization automatically, without human intervention. AutomaticAI is a combination of Particle Swarm Optimization and Simulated Annealing. Each particle has an acceptance criteria for the next position. This acceptance criteria is based on Simulated Annealing, which will help each particle to escape local best positions (local minimums or maximums, depending on the nature of the problem) much more easily. The basic idea is that, for each algorithm type (SVM, KNN, Random Forest, Logistic Regression, ExtraTrees, etc.) we create a separate swarm of particles. Each swarm has its own leader, the best performing particle, but there will be also a global best solution which will affect the behavior of each swarm. Each particle will represent an algorithm type. Each particle has a position vector and a velocity vector. Each element of the position vector represents a parameter of an algorithm type. For example, in case of SVM, the parameters will be C, the penalty parameter or regularization parameter, the Kernel used in the classification process and Gamma, which defines how far the influence of a single training example reaches. So the position vector will have three elements, it will define a three dimensional search space. Other algorithms may have less or more than three parameters, so the parameter search space is multi-dimensional and the number of dimensions depends on the type of the algorithm. The velocity vector defines the speed and direction of the particle movement in the parameter search space. In every epoch, we evaluate the performance of the particles and we remove the k worst performing particles and add k new particles in the swarm where the best performing particle is located. After a number of epochs all the particles will be located in the parameter search space of a single swarm or a single algorithm type, thus solving the problem of algorithm selection. With this representation and abstraction we also resolve the parameter optimization problem, because each position vector represents a different kind of parameter configuration. 3.3
The Colorectal Polyp Classification Pipeline
For solving the colorectal polyp image classification problem we created a pipeline which includes multiple algorithms. The first step is preprocessing. In this step we reduce the size of the images using bilinear interpolation. This way we will keep the form, the shapes of the objects from the image, only the resolution will be lower. The resulted image will have 224 × 244 × 3 pixels which is a required input shape for the ResNet50. In the second step we pass these images through the ResNet50 CNN this way obtaining a different multi-dimensional feature vector for each image. After the feature extraction part, we will have a dataset of feature vectors which will be used as inputs for the AutomaticAI algorithm. The AutomaticAI algorithm, it will return a classification algorithm with fine tuned parameters for the current image classification problem. We can use this returned algorithm to classify images into colorectal polyps or normal images (Fig. 1).
AutomaticAI for Polyp Detection
539
Fig. 1. The colorectal polyp classification pipeline
4 4.1
Experimental Results Dataset
For our experiments, we use the Kvasir dataset [17] that contains endoscopic images of the gastrointestinal (GI) tract related to gastrointestinal pathology. Sorting and annotation of the dataset was performed by medical doctors (experienced endoscopists). In this respect, Kvasir is important for research disease computer aided detection. This dataset is composed of 4,000 colored images. It has 8 classes representing several diseases as well as normal anatomical landmarks. We are interested only in colorectal polyp images and normal classes. The dataset has 500 examples for each class, so it is perfectly balanced. From all the images we will only use 300 for polyps and 300 for the normal class, this way demonstrating that our solution can have good results using only a few images. The images have resolutions up to 1920 × 1072 pixels, but with the preprocessing step we will convert them to the ResNet50 required size. In Fig. 2, in the first column we can clearly see that there is a polyp in the picture, so in this case classifying the image would be very easy also for a nonexpert. But if we take a look over the figure from the second column, things are getting harder, there is a flat polyp in the center of the picture, but for nonexperts or sometimes also for experts it could be hard to observe this. The third column shows an example of a healthy patient.
540
Z. Czako et al.
Polyp
Polyp
Healthy
Fig. 2. Input image examples
4.2
Metrics
In order to have a comprehensive evaluation of our solution we will use multiple evaluation metrics: 1. Accuracy: The number of correct classification compared to the total number of examples; 2. Precision: The ratio of the correctly classified positives to the total number of positive classifications; 3. Recall: The ratio of the correctly classified positives to the total number of positives from the dataset; 4. F1-Score: The harmonic mean of Precision and Recall; 5. Confusion Matrix: A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. The confusion matrix shows the ways in which your classification model is confused when it makes predictions; 6. Cohen’s Kappa: The Kappa statistic is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance). The kappa statistic is used not only to evaluate a single classifier, but also to evaluate classifiers amongst themselves. 4.3
Endoscopic Image Classification Results
After passing the training images through the pipeline the resulted algorithm is the Ridge Classifier, with the following parameter values: alpha = 76.255, solver = Cholesky, tol = 76.64. Ridge classifiers are powerful techniques generally used for creating parsimonious models in presence of a “large” number of features. To explain the reason why this Ridge Classifier outperformed others is the fact that high dimensional problems are likely to be linearly separable (as you can separate any d+1 points in a d-dimensional space with a linear classifier, regardless of how the points are labelled). So linear classifiers, are likely to do well. The alpha parameter controls the complexity of the classifier and help to avoid overfitting by separating
AutomaticAI for Polyp Detection
541
the patterns of each class by large margins. However to get good performance the ridge/regularisation parameters need to be properly tuned and tuning was resolved by the AutomaticAI algorithm. In this case the feature vectors returned by the ResNet50 will be very high dimensional vectors and this is the main reason for selecting the Ridge Classifier. This demonstrates that our AutomaticAI algorithm works correctly and returns useful classification models. The solver parameter defines the algorithm used to find the parameter weights that minimize a cost function. Here we used Cholesky matrix decomposition to find the parameter weights. The parameter named tol is the tolerance for the stopping criteria. This is used also to prevent overfitting of the model. After using the selected classification model with multiple test images, the results are very good: 1. 2. 3. 4. 5.
Accuracy - 98.33% Precision - 100.0% Recall - 96.87% F1-Score - 98.41% Cohen’s Kappa Score - 96.67%.
The confusion matrix can be seen in Fig. 3. As we can see in this matrix, the model has had only 3 mistakes, we have only 3 false positives, all the other images were classified correctly.
Fig. 3. Confusion matrix
The results prove the efficiency of our AutomaticAI algorithm when combining it with ResNet50 using a transfer learning approach. The results are very good even though we used a small number of images for training.
5
Conclusions
We used Transfer Learning combined with our hybrid Particle Swarm Optimization - Simulated Annealing algorithm called AutomaticAI to extract features
542
Z. Czako et al.
from input images and to classify those images. This solution gave us very good results, all the most popular classification metrics were above 96%, which is one of the best solutions taking in consideration the articles mentioned in the related works section. This result is really impressive, because this is a general solution, it can be used for any kind of image classification problem. Furthermore, this solution does not require long training times, because the feature extractor is a pretrained model, so no training is necessary and the AutomaticAI will achieve very good results with only a few training examples, so the training process will be very fast compared to other techniques like fully connected layers. It requires only a few images for the training, which can be a great help, since large datasets usually are not publicly available. Another advantage of this solution is the fact that it does not require AI experts, the classification model will be automatically selected and optimized by the AutomaticAI. As future work, we would like to include some dimensionality reduction/feature selection algorithms (like PCA, LDA, etc.) into the pipeline, this way reducing the size of the feature vectors generated by the ResNet50. The motivation is that the ResNet50 was trained for thousands of classes, but in our case we have only two classes, so theoretically we don’t need all the elements of the feature vector. By this observation we can reduce the size of the input data which will also reduce the training time of the AutomaticAI algorithm.
References 1. Figueiredo, P.N., Figueiredo, I.N., Prasath, S., Tsai, R.: Automatic polyp detection in pillcam colon 2 capsule images and videos: preliminary feasibility report. Diagnostic and Therapeutic Endoscopy, no. 182435, p. 7 (2011) 2. Karargyris, A., Bourbakis, N.: Identification of polyps in wireless capsule endoscopy videos using log gabor filters. In: IEEE Workshop LiSSA, pp. 143–147, April 2009 3. Kodogiannis, V., Boulougoura, M.: An adaptive neurofuzzy approach for the diagnosis in wireless capsule endoscopy imaging. Int. J. Inf. Technol. 13, 46–56 (2007) 4. Iwahori, Y., Shinohara, T., et al.: Automatic polyp detection in endoscope images using a hessian filter. In: Proceedings of MVA, pp. 21-24 (2013) 5. Zhi, J., Sun, J., Wang, Z., Ding, W.: Support vector machine classifier for prediction of the metastasis of colorectal cancer. IJMM 41(3), 1419–1426 (2018) 6. Zhao, D., Liu, H., Zheng, Y., He, Y., Lu, D., Lyu, C.: A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med. Biol. Eng. Comput. 57, 901–912 (2018) 7. Asperti, A., Mastronardo, C.: The effectiveness of data augmentation for detection of gastrointestinal diseases from endoscopical images. The effectiveness of data augmentation for detection of gastrointestinal diseases from endoscopical images. In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2 BIOIMAGING: KALSIMIS, pp. 199-205 (2018) 8. Chen, H., Qi, X.J., Cheng, J.Z., Heng, P.A.: Deep contextual networks for neuronal structure segmentation. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
AutomaticAI for Polyp Detection
543
9. Park, S., Lee, M., Kwak, N.: Polyp detection in colonoscopy videos using deeplylearned hierarchical features. Seoul National University (2015) 10. Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014) 11. Tajbakhsh, N., Gurudu, S., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015) 12. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automatic polyp detection using global geometric constraints and local intensity variation patterns. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2014, pp. 179–187. Springer (2014) 13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 14. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Guyon, I., Dror, G., Lemaire, V., Taylor, G., Silver, D., (eds.) Proceedings of ICML Workshop on Unsupervised and Transfer Learning, volume 27 of Proceedings of Machine Learning Research, Bellevue, Washington, USA, pp. 17–36. PMLR (2012) 15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097– 1105. Curran Associates, Inc. (2012) 16. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009) 17. Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D.-T., Lux, M., Schmidt, P.T., Riegler, M., Halvorsen, P.: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on Multimedia Systems Conference, MMSys-17, pp. 164–169. ACM, New York (2017) 18. Mesejo, P., Pizarro-Perez, D., Abergel, A., Rouquette, O., Beorchia, S., Poincloux, L., Bartoli, A.: Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE Trans. Med. Imaging 35, 2051–2063 (2016) 19. Billah, M., Waheed, S., Rahman, M.M.: An automatic gastrointestinal polyp detection system in video endoscopy using fusion of color wavelet and convolutional neural network features. Int. J. Biomed. Imaging 2017, 1–9 (2017) 20. Billah, M., Waheed, S.: Gastrointestinal polyp detection in endoscopic images using an improved feature extraction method. Biomed. Eng. Lett. 8(1), 69–75 (2018)
Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening of Idiopathic Scoliosis Dejan Dimitrijević1(&), Vladimir Todorović1, Nemanja Nedić1, Igor Zečević1, and Sergiu Nedevschi2 1
University of Novi Sad/Faculty of Technical Sciences, Novi Sad, Serbia [email protected] 2 Universitatea Technică Din Cluj-Napoca, Cluj-Napoca, Romania
Abstract. This paper presents our further efforts pertaining to the development of a noninvasive automated scoliosis screening and diagnostic solution, as well as some other spine disorders, using commercial-of-the-shelf (COTS) devices such as the recently announced Azure Kinect. The aim and main benefit of developing a MATLAB interface to the aforementioned device compared to a previously developed standalone programmed solution using a previous generation Kinect device is reducing the future development and trial costs by gaining a solution that can quantify precision and clinical tests more easily. The presented research is being done to also offset the upcoming clinical trial costs by making it available to other interested research groups, which in turn could validate if our developed computer vision methods can stand up to other already available commercial solutions. whose costs compared to acquiring our solution makes them prohibitive to be acquired by just one research group and thus have this solution be further continuously developed and improved. Keywords: Kinect
MATLAB Scoliosis
1 Introduction Just as was previously presented, during the ICIST 2019 conference at Kopaonik (Serbia) [1], our efforts to build the automated scoliosis screening solution are ongoing and with the recent announcement of the European availability of the next generation Azure Kinect [2] commercial-of-the-shelf (COTS) 3D depth camera have been further boosted hopefully resulting in a widely available solution for screening and diagnosing spinal disorders such as scoliosis soon. Automated spine disorder diagnostic solutions can use various diagnostic methods. Some of those are based on manually conducted physical examinations testing for deformities, or on back surface topographic and skeletal visualizations (from sensor inputs such as laser, infrared, ultrasound scanners, magnetic resonance imaging (MRI) and/or radiographic imaging i.e. ionizing radiation).
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 544–552, 2020. https://doi.org/10.1007/978-3-030-45691-7_51
Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening
1.1
545
Motivation
The most frequently used are manual deformity tests, such as Adams Forward Bend Test (AFBT [3]) performed by the sufficiently trained medical practitioners, that are conducted with or without additional aids (such as scoliometer [4]), but such tests take up valuable resources in terms of assigning personnel and time. Additionally, since the second most widely used test method for spine deformity disorder diagnostic is radiographic imaging and the recommended age for scoliosis testing is between ages 10 to 14 (recommended twice even within the same period for females [5]) with a positive diagnosis rate of *5%, avoiding radiographic imaging would be quite beneficial for any suspect non-positives. Furthermore, the recommended number of radiographic imaging does not include potential post diagnostic imaging follow-ups for positively diagnosed adolescents, thus the total amount of exposure to cumulative radiographic ionizing radiation can be even greater. Obviously, development of alternative noninvasive methods and techniques thus came from a need to reduce human resource strains and any negative effects of harmful cumulative ionizing radiation on adolescents when such exposure was still high [6]. 1.2
Previous and Similar Existing Research
As previously described, noninvasive methods and techniques for spine disorder diagnostics which didn’t use radiography came about after the time when researchers had proven ill long-term effects of exposing adolescents to ionizing radiation [7]. Since then many methods and techniques were researched, most based upon topographical landmarks identifiable on the subjects back which were scanned using various sensor devices and which either visualized underlying skeletal structures, usually by means of some generalized estimates [8], or relied on manually marking certain landmark points and calculating values such as their respective distance ratios comparing them to previously concluded range values for healthy subjects [9]. To further increase the validity of the findings certain topographical landmark points could be pinpointed in perpendicular independent plane cross sections of the subject’s body, such as the landmark points determined from the posterior/coronal and then axial/transverse planes. Combining the resulting distance ratios from distance ratio indices calculated in one plain independently from ratios in the other would then bring higher precision of the overall diagnostic [10]. In the past we have tried performing just to build a fully automated scoliosis diagnostic solution based upon a deterministic algorithm which relied upon identifying topographical landmark points from manually determined and programmed feature descriptors and fuzzy logic [10]. Unfortunately, identification of landmarks precisely was prone to frequent feature descriptor customizations due to the effect of varied parameters such as body mass index (BMI) on the back-surface appearance and thus precise positioning of landmark points in examined subjects sometimes failed. In such cases for sufficiently precise point recognition the solution was either to manually pick out the landmark points, as we have also done on some harder to determine landmarks initially, or to resort to multi-stage capture of harder to determine protruding back
546
D. Dimitrijević et al.
surface landmarks. Either of the two required extra effort from the otherwise busy medical practitioners, so that is the concrete reason we are now attempting to build a solution which employs deep leaning, i.e. one which will machine learn feature descriptors instead of having to manually program and customize them and an environment best suited to us for quick development and testing of such an approach seems to be MATLAB.
2 Methods This section describes some of the methods which were previously used to acquire the scanned back surface 3D point cloud model and designate all of the landmark points on it needed to determine the values of Posterior Trunk Symmetry Index (POTSI) and Deformity in the Axial Plane (DAPI) topographical indices (Fig. 1a), as well as proposes the method to do the same in the future via Azure Kinect interfaced to MATLAB.
>2m
2m
70°
1m
Fig. 1. Landmark points used to determine POTSI and DAPI topographical index values (left) [11] and positioning of the Kinect v2 sensor (right)
2.1
Point Cloud Capture
The posture capture solution consisted initially of a pretty much any personal computer (PC) equipped with a USB 3.0 port or higher speed compatible USB controller and specification to match its speed (lowest configuration tested - Lenovo Yoga 300, with N3050 Celeron CPU, 4 GB RAM, and SSD storage) along with a consumer-of-theshelf 3D depth sensor (previously Kinect v2, Microsoft, USA). The depth sensor was mounted sideways on a standard 1 m tall tripod so to use the fact its horizontal Fieldof-View (FoV) angle is greater than the vertical, *70° vs *60°; thus, the larger horizontal becomes vertical FoV (Fig. 1b). This was to enable the full body scan of subjects that are potentially 2 m tall or higher at a 2 m distance from a sensor. Although that depth sensor’s measurement range could be extended to 8 m, shorter
Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening
547
distance scans have higher accuracy. The posture could be scanned immediately just after making sure the floor plane beneath the subject standing at the designated 2 m distance was also in the scan, so that the scanned floor plane could be used to properly align the 3D model. Optionally, an additional, Wii Balance Board (Nintendo, Japan), also a COTS device, basically a 4-plate pressure sensor, on which the subject takes stand, could have been added to achieve better scans. The board tracks subject’s centerof-gravity (CoG) between its plates, making sure the scan is only initiated when the CoG is stable and the subject is standing relatively motionless. The new Azure Kinect 3D depth camera COTS device has a depth sensor with significantly improved specifications [11], which should enable scanning from a lesser range for subjects tall 2 m or higher, but other setup procedures and requirements save for the need of a USB-C (differing from Kinect v2’s regular USB-A) port should remain in place for it. Also, whilst the Kinect v2 has an official MATLAB Image Acquisition Toolbox support, the new Azure Kinect has not yet been supported, which is why there is currently a need for a 3rd party MATLAB interface to acquire the point cloud in it. 2.2
Point Cloud Processing
Our initial point cloud capture rig was used only to acquire the Kinect scan recordings in order to support tested lowest-cost CPU equipped PCs, so in order to support such hardware requirements-reduced solution the point cloud and subsequent mesh reconstruction from the captured data streams of the Kinect depth sensor on such PCs was moved to another more computationally powerful computer which we quickly realized that for development purposes it had to have the MATLAB software package installed. Such a computer beside our standalone programmed solution could also utilize an open-source 3rd party MATLAB Kin2 toolbox [12], which we adapted to reconstruct the point cloud and also combine per vertex colored mesh information using Microsoft’s Kinect Fusion algorithm. That mesh reconstruction algorithm could have been tuned via multiple variables for Kinect v2: its volume voxel per meter and XYZ voxel resolution, along with minimum and maximum depth clip distance variables, all in an attempt to achieve the best possible point cloud and mesh reconstruction quality without any frame, and thus also significant data, drops in the mesh reconstruction algorithm processing. The captured point cloud and reconstructed mesh could then in turn be either further examined and processed in MATLAB, determining predominant planes in the reconstructed point cloud, such as the floor plane, helping align and automatically segment out scanned subjects standing on the floor (or balance board). Then segmented-out subjects and their corresponding back-surface mesh can be used for visualizing inside a device such as HoloLens AR or Windows Mixed Reality (WMR) VR headsets, to pick out landmarks via natural user interaction (NUI) modes used by those also COTS devices (Fig. 2). One more optional reconstructed mesh processing step prior to visualizing it in the HoloLens device is to use a Depth Estimation Map (DEM) [13] to reduce the points needed to be visualized. By repeatedly projecting the DEM of points left from the coronal cut-off plane that is moved ever closer we could assess if the elimination of points would affect POTSI landmark points greatly (area reduction greater than an
548
D. Dimitrijević et al.
Fig. 2. Picking of landmark points via HoloPick app running on WMR headset (HoloLens recording [14])
empirically estimated value) and thus stop the cut-off process. The remaining points can then be used to reconstruct the mesh collider inside the application running on a HoloLens, AR device computationally challenged due to its portability (not so much an issue on the WMR headset connected to a PC with an NVidia RTX2060 desktop class GPU), which is used to enable NUI picking out of back surface landmark points by multiple medical practitioners. Unfortunately, same Azure Kinect workflow wasn’t yet MATLAB supported. 2.3
Deep Learning on Point Clouds
Point clouds are geometric data structures, but due to their irregular format, most researchers transform such data to regular 3D voxel grids or collections of images used for training convolutional neural networks (CNNs). That, however, can render collected and/or produced data unnecessarily voluminous and cause issues such as loss of some potentially significant information during transformations. So, for those reasons, we are focusing on using a novel type of neural networks that directly consume point clouds, which respect the permutation invariance of input points. Deep neural networks, such as PointNet++ [15], could provide deep hierarchical feature learning on point sets in metric spaces, but prior to any deep learning to occur, one needs to first collect and also provide annotated training and test data in suitable formats. Those in our case would be 3D shapes sampled from captured point clouds, which need to be labeled as precisely as possible by someone as domain knowledgeable as possible, i.e. the medical practitioners in this particular case, storing them securely afterwards as they represent personal data. Such privacy-respectful our data collecting solution prior used a custom website backend developed on top of Microsoft HealthVault [16] platform which stored captured data in compliance with the Health Insurance Portability and Accountability Act
Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening
549
(HIPAA) [17] ensuring privacy and security of all stored information, but unfortunately, Microsoft HealthVault platform was shut down recently. Discussing this issue, the paper later focuses on some details of first and foremost building a scalable landmark annotation solution that could be tested on some of our previously collected 3D shape data, but more details about that application and the adapted PointNet++ neural network along with the results of its application will be presented in some other follow-up paper.
3 Results All previously collected 3D shape data originates form an imaging study conducted during a fortnight period at the Institute for Student Health Protection of Novi Sad (IZZZSNS) among consensual subjects, all students and thus none of them minors (also not requiring parental consent), who at the time were undergoing their annual health checkups. The institute’s director board was approached previously at the time and their written approval was also given to conduct the study. 3.1
Intra-operator Repeatability Error
Since we managed to acquire the informed consent of 84 student subjects: 38 males and 46 females, mean age 21.07, height range 159–201 cm, weight range 46–113 kg, max BMI 31.3, they were scanned during their scoliosis screenings as previously described but were also surveyed about any of their previous scoliosis screening findings and were also subjected to an AFBT performed by the medical practitioner who conducted 3D scanning with and without physically marked landmarks in multiple stances. On that data we were able to conduct some intra-operator repeatability tests since the subjects and their back surfaces were actually also captured twice, once in almost regular anatomical position stance at 2 m distance from the sensor, and then second time after repositioning in the same stance but with physical markers placed on the back by the same medically trained practitioner (operator). Also, after the capture was complete and stored in raw data format to avoid any data loss by doing computationally intensive online point cloud and mesh reconstruction, and by reproducing that point cloud and mesh generation on a much more capable PC, we were able to produce the best quality 3D model instead of relying on a low-res one reconstructed using a less powerful PC at the time of the capture. Those reproduced 3D models were then reused to load into mixed reality devices and our custom-built application [18] for picking out the landmark points in virtual space by other practitioners (operators), which gave us some intra-operator error values, such as the ones presented on Fig. 3. The captured 3D shape data was then processed into a point cloud with maximum fusion parameters allowable for the Kinect Fusion algorithm to run on an available GPU (NVidia RTX2060 6 GB). The size of reconstructed per vertices mesh was on average *40 MB and with around 900 k facets and approximately 3 times less vertices. Although this was in total a considerable amount of data to be processed for visualization on a HoloLens device it was decided not to apply the DEM vertices reduction initially and keep the precision of the 3D model as best as possible for this
550
D. Dimitrijević et al.
Fig. 3. Intra-operator error measurement on a Kinect v2 3D scanned model (Manual left blue, Meshlab left yellow & HoloPick right yellow)
study. Loading and visualizing of these models on a PC with the said GPU takes under a minute to load and this was deemed acceptable time-wise on the hardware available at the time of the test conduction as illustrated by the time length values in Table 1. Table 1. Timing landmark marking Type
Appearance UI PC (MeshLab) Non-intuitive, time needed to train med. practitioners PC (HoloPick) Intuitive, VR limiting, prolonged use tiresome HoloLens Intuitive, AR liberating, usable for prolonged time (HoloPick) a w/o model loading – value not-dependent on hardware specifications b w/model loading – value dependent on hardware specifications
3.2
Timea 2–3 min 1 min 1 min
Timeb 2–3 min 1–2 min 3 min
Inter-operator Repeatability Error
The inter-operator repeatability error measured among multiple medical practitioners will be completed with the next generation Azure Kinect in another research group imaging study, and this is supposed to be done by more than half a dozen medical practitioners at another reputable healthcare institution conducting treatment as well as screening of scoliosis during Q1 2020, leading us to just conclude with the following for now: All in all, our custom-built app was deemed very easy to use, needing less than a minute of training to get the original operator started on using it and even less than 1–2 min to complete the annotation of a single model (up to 3 with long load time on the HoloLens) which is less or equal compared to the use of a non-specialized software (MeshLab) to achieve the same task requiring a lot more of training and time to complete the task of picking out and annotating harder to determine anatomical points (Fig. 2).
Toolbox for Azure Kinect COTS Device to be Used in Automatic Screening
551
New data acquiring will be done using the next generation Azure Kinect device, however for the scanned 3D point cloud data to be processed in the same workflow used with the prior Kinect v2 data which was described previously, we will need to interface it with MATLAB, which hasn’t yet been officially supported, so we started development of our own K4A3D toolbox which has been made open-source [19].
4 Discussion This paper described some of the work that went into developing of software tools for annotation of our previously collected 3D data, and at the end introduced a future toolbox currently necessary for reacquiring such data using the next generation Azure Kinect and processing within MATLAB. As stated, in the past we attempted to build a fully automated scoliosis diagnostic solution based upon a deterministic algorithm relying upon manually programmed features, but identifying landmarks correctly was prone to frequent feature description customizations due to varied parameters such as BMI of the examined subjects. For recognition to work reliably one had to either manually pick out certain harder to determine landmarks or to resort to a multi-stage capturing of landmark protruding back surfaces. This is the reason we are rebuilding as a deep leaning solution which will machine learn these features, thus should be able to work with minimal human effort producing sufficiently precise results. To achieve that goal of building a deep learning-based solution annotated data used to train it must be amassed, but considering the fact that its primarily intended for use on adolescents we must be mindful of the potential patient subject’s privacy and security. So, to remedy the recent HealthVault shutdown, we are now evaluating using existing or building our own HIPAA-compliant solution based on Health Level-7 (HL7) [20]. HL7 refers to a set of international standards for transfer of clinical and administrative data between software applications used by various healthcare providers, and in particular we would like to use its Fast Healthcare Interoperability Resources (FHIR) [21]. FHIR defines a set of “resources” each representing granular clinical concepts, and among them ImagingStudy pertaining to a diagnostics imaging study resource, which seemed best suited for our data. However, within its recent R4 release FHIR notes that resource is suited for DICOM imaging and associated information (which haven’t been collected yet), and proposes use of either Binary which can be used to store arbitrary content, or Media resources to track non-DICOM images, video, or audio, which is what our Kinect depth camera device collected raw data streams would present. However, size of those streams is extremely large as documented [22], making such data impractical for upload with slow internet access. So, our contemplated solution’s cloud storage, used also as backup of encrypted local storage, should support chunked upload as that would be of great importance in circumstances when remote research groups would be conducting image acquiring studies with constrained internet access, considering time it would take to successfully upload such data. Acknowledgment. The acknowledgement for any and all of their assistance goes out to the staff and medical practitioner and other personnel of the Institute for Student Health in Novi Sad, as well as all the fine people of Universitatea Technică din Cluj-Napoca, and especially the crew
552
D. Dimitrijević et al.
from prof. Nedevschi’s lab who were working there during the 2018 FTN Novi Sad – UT Cluj exchange stay, back when the rebuild of our previous project started, for any insightful inputs and all their help with coordination of the stay.
References 1. DD Homepage. http://dev.ftn.uns.ac.rs/icist.mp4. Accessed 21 Nov 2019 2. Azure Kinect. https://aka.ms/kinectdkannouncement. Accessed 21 Nov 2019 3. Adams, W.: Lectures on the pathology and treatment of lateral and other forms of curvature of the spine. Glasgow Med. J. 18(6), 462–463 (1882) 4. Bunnell, W.P.: An Objective Criterion for Scoliosis Screening. J. Bone Joint Surg. 66(8), 462–463 (1984) 5. Editorials: Referrals from Scoliosis Screenings - American Family Physician (2001) 6. Nash, C.L., Gregg, E.C., et al.: Risks of exposure to X-rays in patients undergoing long-term treatment for scoliosis. J. Bone Joint Surg. 61(3), 371–374 (1979) 7. Willner, S.: Moiré topography–a method for school screening of scoliosis. Arch. Orthop. Trauma. Surg. 95(3), 181–185 (1979) 8. Frobin, W., Hierholzer, E.: Analysis of human back shape using surface curvatures. J. Biomech. 15(5), 379–390 (1982) 9. Michoński, J., Glinkowski, W., et al.: Automatic recognition of surface landmarks of anatomical structures of back and posture. J. Biomed. Opt. 17(5), 056015 (2012) 10. Dimitrijević, D., Obradović, Đ., et al.: Towards automatic screening of idiopathic scoliosis using low-cost commodity sensors—validation study. In: Advances in Information Systems and Technologies, pp. 117–126. Springer, Heidelberg (2016) 11. Bamji, C.S., Mehta, S., Thompson, B.: IMpixel 65 nm BSI 320 MHz demodulated TOF Image sensor with 3 lm global shutter pixels and analog binning. In: International Solid State Circuits Conference (ISSCC), San Francisco, USA, pp. 94–96 (2018) 12. Terven, J.R., Córdova-Esparza, D.M.: Kin2. A Kinect 2 toolbox for MATLAB. Sci. Comput. Program. 130, 97–106 (2016) 13. Oniga, F., Nedevschi, S., et al.: Road surface and obstacle detection based on elevation maps from dense stereo. In: Intelligent Transportation Systems Conference, Seattle, USA, pp. 859–865 (2007) 14. DD Homepage. http://dev.ftn.uns.ac.rs/hololens4.mp4. Accessed 21 Nov 2019 15. Qi, C.R., Yi, L., et al.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems, Long Beach, USA, pp. 5099–5108 (2017) 16. Microsoft HealthVault. https://account.healthvault.com. Accessed 20 Nov 2019 17. Health Information Privacy. https://www.hhs.gov/hipaa/index.html. Accessed 21 Nov 2019 18. Microsoft Store. https://www.microsoft.com/en-us/p/holopick/9nvlcf2m5nk2. Accessed 21 Nov 2019 19. Github.com. https://github.com/dimitrijevic/K4A3D. Accessed 21 Nov 2019 20. Health Level 7 International – Homepage. https://www.hl7.org. Accessed 21 Nov 2019 21. HL7 Standards Brief – FHIR® R4. https://www.hl7.org/implement/standards/product_brief. cfm?product_id=491. Accessed 21 Nov 2019 22. Relyea, R., Marien, J.: Recording, Playback, and Gesture Recognition. https://smarturl.it/ kinectstreamdatasize. Accessed 29 Feb 2019
Big Data Analytics and Applications
Detecting Public Transport Passenger Movement Patterns Natalia Grafeeva(&)
and Elena Mikhailova
ITMO University, St. Petersburg, Russia [email protected]
Abstract. In this paper, we analyze public transport passenger movement data to detect typical patterns. The initial data consists of smart card transactions made upon entering public transport, collected over the course of two weeks in Saint Petersburg, a city with a population of 5 million. As a result of the study, we detected 5 classes of typical passenger movement between home and work, with the scale of one day. Each class, in turn, was clusterized in accordance with the temporal habits of passengers. Heat maps were used to demonstrate clusterization results. The results obtained in the paper can be used to optimize the transport network of the city being studied, and the approach itself, based on clusterization algorithms and using heat maps to visualize the results, can be applied to analyze public transport passenger movement in other cities. Keywords: Urban transit system Public transport Multimodal trips Pattern mining
1 Introduction According to statistics, in Russia more than 80% of passengers use travel cards (monthly or longer period) instead of single tickets. In Russia, each individual trip is paid separately. Currently, most public transport operators in large cities use automated fare collection systems. These systems are based on the use of contactless smart cards, which passengers use to pay their fares when boarding ground transport or entering the subway. The primary purpose of such systems is to simplify the interaction between passengers and the operator and the fare collection process. However, such systems also offer interesting capabilities for studying typical routes of passengers traveling within the transport network using smart cards. Upon fare payment, the server receives not only the card ID and the payment amount, but also various additional information: time of payment (transaction), card type, route number, and transport type. This data enables the detection of patterns in the movement of public transport passengers over the course of the studied time period. In turn, public transport passenger movement patterns are quite interesting to transport operators, since they can be used to analyze the productivity of the transport network, predict passenger activity on certain dates, and estimate the balance between supply and demand of transport services. Furthermore, they enable the estimation of transport accessibility of individual city districts and more timely changes to the transport network (for example, the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 555–563, 2020. https://doi.org/10.1007/978-3-030-45691-7_52
556
N. Grafeeva and E. Mikhailova
addition of new routes to reduce the load on transport network nodes, or the reduction of frequency of rarely used routes). In this paper, we detect and study public transport usage patterns based on the transport data of Saint Petersburg. The following problems were considered as part of the research: • Construction of public transport passenger profiles on the scale of one day. • Detection of temporal clusters based on the constructed profiles. • Analysis and interpretation of results.
2 Related Work Some of the first papers on potential uses of the data produced as a result of smart card use were [4, 5], which discuss the advantages and disadvantages of smart card use, and also offer a number of approaches to rule-based transport data analysis. The authors also pointed out that transport data does not always contain complete information (lacking, for instance, any information about the purpose of travel), so it is best to use it in conjunction with data from specialized surveys. The paper [1] describes a study of transport habits of passengers of Shenzhen based on four consecutive weeks’ smart card data. In that paper, every transaction was described using 27 variables: card number, date, day of the week (D1 through D7) and 24 Hi variables, where Hi = 1 indicated that the card was used at least once during the i-th hour of the day. Then the authors used the k-means algorithm to clusterize the transactions, thus obtaining generalized (mean) profiles of public transport passengers. A similar study based on similar data is described in another paper by the same authors [2], but in that paper the trips are combined into week-long transactions (Hi,j = 1 if the card was used at least once on during the j-th period of the i-th day), after which the data is passed through clusterization algorithms. This approach allowed for the detection of certain deviations in transport behavior, such as a reduction in the use of student cards during school holidays. The methods presented in [2] were extended and improved in [13]. There, the authors discuss the advantages and disadvantages of profile clusterization methods based on Euclidean distances, and also present their own perspective on clusterization, using topic models and latent Dirichlet allocation, interpreting passenger profiles as collections of words (for example, a passenger who made a trip at 10 am on a Friday was associated with the word “Friday 10 am”). Based on the same data (and partly by the same authors as in [13]), another study was conducted [7]. Unlike [13], the paper rejects hourly aggregation of trips and criticizes it for inaccuracy. For instance, a passenger who starts their trips at 8:55 am and at 9:05 am on different days falls into different groups in case of hourly aggregation, which can make results less accurate that they actually are. The paper considers an approach based on continuous presentation of time and a Gaussian mixed model. The paper [12] proposes and confirms that transport use patterns radically differ for different age groups, including different rush hours, distances, and destinations.
Detecting Public Transport Passenger Movement Patterns
557
One of the key features of transport flow analysis is detecting passenger correspondences using smart cards. A correspondence is a sequence of trips with brief transfers between different transport routes. This is the problem studied in [6]. In that paper, the authors proposed two very important hypotheses which significantly simplify the construction of correspondences: most passengers begin a new trip near the station where the previous one ended; for most passengers, the destination of the last trip matches the starting point of the day’s first trip. Later, these assumptions were expanded in [16]. The authors showed that, for sequential trips with brief transfers, in the vast majority of cases the following is true: passengers don’t use additional vehicles (bicycles, cars, etc.) between consecutive trips and don’t take long walks between stations. Furthermore, the maximum distance between transfer stations is around 400–800 meters [16]. Constructing user behavior patterns requires identifying correspondences for which main passenger locations (home, work) can be determined. Such correspondences are sometimes called regular correspondences. For identifying regular correspondences, it is common to use so-called origin-destination matrices [15] or a statistical rule-based approach. In [8], the authors suggest to consider long breaks between consecutive trips (over a predetermined number of hours) as work activity. The home location was determined by the destination of the last trip of the day. At the same time, in [10], a similar model is proposed to estimate the frequency with which the user visits certain places. Thus, the most common stop was considered to be the home, and the second most visited, the workplace. In [3] and [11], an attempt is made to join the aforementioned approaches and combine frequency estimation with time limits. The methods in [3] were tested in London, where model accuracy reached 82% (compared to 59% for the model from [10] with the same data).
3 Dataset As initial data for this paper, we used the correspondences of public transport passengers in Saint Petersburg. The technology for acquiring such correspondences is described in [9]. This technology is based on ideas described in [6, 16]. The total number of correspondences is about 25 million. To construct correspondences, we used trip data over a two-week period that did not contain public holidays. The amount of analyzed smart cards is over 2,250,000. Each smart card has a unique identifier, which cannot be used to determine the card owner’s personal information, such as their name, sex, address, or phone number. Each correspondence is described with the following parameters: • • • • • • •
Card number. The correspondence’s index (within the current day). The starting time of the correspondence. The ending time of the correspondence. Smart card type and name. Starting station ID. Ending station ID.
558
N. Grafeeva and E. Mikhailova
• Starting station coordinates. • Ending station coordinates.
4 Methodology The methodology we use to determine passenger profiles consists of the following steps: • • • • • •
Primary filtering (getting rid of noise and insufficiently representative data). Determining home and work locations (for each passenger). Pattern detection (up to a day). Determining temporal profiles (for each pattern). Profile clusterization. Interpreting results. Below is a more detailed description of each of the stages.
4.1
Primary Filtering
The initial dataset turned out to be rather raw (noisy), so, to simplify the study, we had to filter it and eliminate the noise. In the context of a study of the transport network, we were mostly interested in regular passengers with a sufficiently high number of trips. For this reason, we decided to exclude from consideration passengers who, over the course of the study period (14 days), used public transport on less than 7 different days and on less than 3 days in each of the considered weeks. 4.2
Determining Home and Work Locations
In this paper, we are interested in patterns of regular correspondences, in other words, of correspondences whose starting and ending locations correspond to home and work. To do that, it was necessary to determine, for each owner of a smart card, an approximate location of their home and workplace. Following the assumptions in [6] and [16], we considered only those days on which the starting point of the day’s first correspondence matches the ending point of the last correspondence or is in walking distance from it, i.e. within 500 m, and the end of the first correspondence is in walking distance from the start of the last correspondence. To clarify, the days when a passenger made only one trip were excluded from consideration. Since our data covers a period of 14 days, we used the threshold value of 7 to determine regular correspondences. That is, if such correspondences repeated at least 7 times for the same passenger, we declared them to be regular. The start of the first correspondence was declared to be the home, and the start of the last correspondence was declared to be the workplace. Non-regular correspondences were excluded from the dataset being analyzed. As a result, the dataset contained only passengers with a defined workplace location, which they visited at least 3 times a week and at least 7 times over the study period.
Detecting Public Transport Passenger Movement Patterns
4.3
559
Pattern Detection
Regular correspondences were further used to determine each passenger’s work schedules (for example, 5 working days and 2 days off; 3 working days and 2 days off; etc.) For that purpose, we encoded passenger behavior as sequences of digits 1 and 0, which indicated working days and days off, respectively. A working day was determined by the presence of a regular correspondence from home to work and back, and a day off, by the absence of one. We call such a character sequence a schedule. In turn, a pattern is defined as the shortest repeating substring that satisfies the following requirements: • the pattern always begins with 1; • the pattern always end with 0, except when it consists of only one letter. Table 1. Examples of schedules and their respective patterns. Schedule 11110001111000 11111001111100 11001100110011 10101010101010
Pattern 1111000 11111001111100 1100 10
Table 1 shows examples of schedules and their respective patterns. When processing schedules it is important to keep in mind that the data may not be entirely accurate, since on one day a passenger might, due to unforeseen circumstances, skip work or take a taxi. Thus, a weakened assumption was made that a pattern is the shortest repeated substring that matches the schedule with up to one error. Levenshtein distance was used to determine errors. Identifying patterns allowed us to qualitatively group passengers for later determination of temporal profiles and more detailed study. The most numerous groups of passengers are shown in Fig. 1.
12.80% 2.40% 5.20%
1111100 1111110 1111000 1100 Others
8.70%
70.90%
Fig. 1. Distribution of St. Petersburg passengers by pattern.
560
N. Grafeeva and E. Mikhailova
4.4
Determining Temporal Profiles
After grouping passengers by pattern, we ask quite a reasonable question: what distinguishing features does each group have and how exactly do they use public transport? At this stage of the study, the goal is to detect subgroups within groups based on the temporal habits of the group’s passengers. For that, we collected all regular correspondences of each passenger into a single profile, describing the distribution of their trips by each hour (0 to 23) of each day of the group’s pattern. In other words, each profile P is represented as a multidimensional vector whose elements describe the total number of correspondences made by the passenger on the first day of the pattern from midnight to 1 am, then from 1 am to 2 am, and so on. Note that if the number of days comprising the pattern is D, then the resulting vector’s length is 24 D. Such a multidimensional vector can be illustrated using a heat map (Fig. 2).
day hour 0
1
2
3
4
5
6
1 2 3 4 5 6 7
7
8
1
1 2 2 2 2
9
10 11 12 13 14 15 1
1 1 2 2 1
16 17 18 1 1
19 20 21 22 23
1
1
Fig. 2. Example of a temporal profile.
4.5
Profile Clusterization
Then we clusterize the profiles. Profiles were interpreted as points in a multidimensional space, and the distance between them was calculated as Euclidean distance in a multidimensional space. In [1, 2, 14] it was mentioned that the k-means algorithm is well suited for clusterization of passenger profiles. That is the algorithm that we used in our study. 4.6
Interpreting Results
To demonstrate clusterization results and their interpretation, we used heat maps. Each cluster was used to produce its own heat map. The map was divided into cells corresponding to time intervals and days of the week. The larger was the percentage of passengers who used the transport network during a particular day of the week, the darker is the color of the corresponding cell. As an example, we provide the heat map visualization of the most numerous (and the most predictable) group, which corresponds to the pattern 1111100. This group can be split into 7 temporal clusters, as shown in Fig. 3. The largest clusters, 1 and 2 (22.69% and 18.19%), describe passengers working from 8 am to
Detecting Public Transport Passenger Movement Patterns
0%
561
100% Cluster 1: 22.69%
D
H
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
9
10
1 2 3 4 5 6 7
Cluster 2: 18.19% D
H
0
1
2
3
1 2 3 4 5 6 7
Cluster 3: 16.53% D
H
0
1
2
3
1 2 3 4 5 6 7
Cluster 4: 15.70% D
H
0
1
2
3
1 2 3 4 5 6 7
Cluster 5: 9.46% D
H
0
1
2
3
1 2 3 4 5 6 7
Cluster 6: 10.33% D H
0
1
2
3
4
5
4
5
6
7
8
11
12
13
14
15
16
17
18
19
20
21
22
23
1 2 3 4 5 6 7
Cluster 7: 7.11% D
H
0
1
2
3
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 2 3 4 5 6 7
Fig. 3. The visualization of the temporal clusters of the group corresponding to pattern 1111100.
562
N. Grafeeva and E. Mikhailova
5 pm or 6 pm, respectively; clusters 3 and 6 depict the behavior of passengers who go to work from 7 am to 5 pm or 6 pm; cluster 4 contains passengers who go to work later than usual (to 9 am); cluster 7 consists of passengers who start working earlier than usual (6 am); and, finally, cluster 5 has no defined time of either the beginning or the end of work. It is easy to see that, in general, this group’s passengers work standard eight-hour days (most likely, with a lunch break). An interesting fact is the “boot” that is clearly observable in most clusters around the end of the work day on the fifth day out of seven: it perfectly matches the common understanding that on Fridays people prefer to leave work earlier. One of the reasons for this is the shortened working day on Fridays in most public institutions.
5 Conclusion Smart cards provide curious capabilities for studying the transport network. In this paper, we processed and analyzed a huge array of transport data over a two-week period, allowing us to determine and qualitatively interpret groups of passengers with similar habits of public transport use. We hope that the results we obtained will influence the optimization of the transport network of the city we studied, and the approach itself, which is based on clusterization algorithms and visualizing clusterization results using heat maps, will find its use in analyzing the movement of public transport passengers in other cities.
References 1. Agard, B., Morency, C., Trepanier, M.: Analysing the variability of transit users behaviour with smart card data. In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 44–49 (2006) 2. Agard, B., Morency, C., Trepanier, M.: Mining public transport user behavior from smart card data. IFAC Proc. Vol. 39, 399–404 (2006) 3. Aslam, N., Cheng, T., Cheshire, J.: A high-precision heuristic model to detect home and work locations from smart card data. Geo-spatial Inf. Sci. 22(11), 1–11 (2018) 4. Bagchi, M., White, P.R.: The potential of public transport smart card data. Transp. Policy 12, 464–474 (2005) 5. Bagchi, M., White, P.R.: What role for smart-card data from bus systems? Municipal Eng. 157, 39–47 (2004) 6. Barry, J., et al.: Origin and destination estimation in New York city with automated fare system data. Transp. Res. Rec. 1817, 183–187 (2002) 7. Briand, A.-S., et al.: A mixture model clustering approach for temporal passenger pattern characterization in public transport. Int. J. Data Sci. Anal. 1, 37–50 (2015) 8. Devillaine, F., Munizaga, M., Tr´epanier, M.: Detection of activities of public transport users by analyzing smart card data. Transp. Res. Rec. 2276, 48–55 (2012) 9. Graveefa, N., Mikhailova, E., Tretyakov, I.: Traffic Analysis Based on St. Petersburg Public Transport. In: 17th International Multidisciplinary Scientific GeoConference: Informatics, Geoinformatics and Remote Sensing, vol. 17, no. 21, pp. 509–516 (2017) 10. Hasan, S., et al.: Spatiotemporal patterns of urban human mobility. J. Stat. Phys. 151, 1–15 (2012)
Detecting Public Transport Passenger Movement Patterns
563
11. Huang, J., et al.: Job-worker spatial dynamics in Beijing: insights from smart card data. Cities 86, 83–93 (2019) 12. Huang, X., Tan, J.: Understanding spatio-temporal mobility patterns for seniors, child/student and adult using smart card data. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XL-1, pp. 167-172 (2014) 13. Mahrsi, M.E., et al.: Understanding passenger patterns in public transit through smart card and socioeconomic data: a case study in Rennes, France. In: The 3rd International Workshop on Urban Computing, New York (2014) 14. Bouman, P., van der Hurk, E., Li, T., Vervest, P., Kroon, L.: Detecting activity patterns from smart card data. In: 25th Benelux Conference on Artificial Intelligence (2013) 15. Zhao, J., Rahbee, A., Wilson, N.H.M.: Estimating a rail passenger trip origin-destination matrix using automatic data collection systems. Comput. Aided Civil Infrastruct. Eng. 22, 376–387 (2007) 16. Zhao, J., et al.: Spatio-temporal analysis of passenger travel patterns in massive smart card data. IEEE Trans. Intell. Transp. Syst. 18(11), 3135–3146 (2017)
A Parallel CPU/GPU Bees Swarm Optimization Algorithm for the Satisfiability Problem Celia Hireche(B) and Habiba Drias Department of Computer Science, LRIA, University of Sciences and Technology Houari Boumediene, 16111 Algiers, Algeria {chireche,hdrias}@usthb.dz
Abstract. Metaheuristics and especially Swarm Intelligence represents one of the mostly used aspect of Artificial Intelligence. In fact, these algorithms are exploited in several domains from theoretical problem solving to air traffic management. The evaluation of such methods is defined by the quality of solution they provide or effectiveness and the spent time to reach this solution or efficiency. We explore, in this paper, the technology offered by the Graphic Processing Unit -GPU- to improve the efficiency of the Bees Swarm Optimization algorithm -BSO- by proposing a novel and parallel CPU/GPU version of the later algorithm. The algorithm being greedy when the problem size is important, which is almost always the case. The proposed parallel algorithm is integrated in the proposed method of clustering-solving hard problems presented in [1], adding the exploitation of GPU performance to that of data mining to improve the resolution of hard and complex problems such as Satisfiability problem. Keywords: Parallelism · GPU · BSO Complexity · Satisfiability problem
1
· Efficiency · Solving problem ·
Introduction and Motivation
The architecture of computers and especially that of their processors has seen a spectacular evolution over the years. Indeed, with the explosion of the number of data and information to be processed, an improvement of the computing capabilities of these machines is necessary. In parallel to the evolution of the CPUs (Central Processing Unit), the advent of the GPU (Graphic Processing Unit) proposes an architecture quite different from that of the CPU. This architecture can be likened to a multiprocessor with a strong and massive capacity for data parallelization and computation. The goal of our work is to harness this technology to solve hard problems. In fact, some problems and particularity NP-complete problems [2,3] represent the c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 564–573, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_53
A Parallel CPU/GPU Bees Swarm Optimization Algorithm
565
hardest problems to solve nowadays. Plethora of metaheuristics and swarm intelligence algorithms [4] are proposed and used for solving these problems. However, the hardness of these problems came from their size and their complexity. We propose in this work, to exploit the computing power offered by the GPU to introduce a parallel CPU/GPU of the Bees Swarm Optimization -BSOalgorithm [5]. The proposed algorithm is used for solving hard problem and is integrated in the new approach presented in [1]. The remainder of this document is organised as follow. First, a section introducing metaheuristics ans especially the BSO algorithm is presented. The next section introduces the technology of GPU, followed by a section dedicated to the proposed parallel CPU/GPU BSO. Section five presents the functioning and integration of the proposed algorithm within the bidimensional modelling of DBSCAN. The conducted experiments and the obtained results are presented in the last section. Conclusions are nally summarized and some perspectives are suggested.
2
Metaheuristic - Swarm Intelligence - BSO
In artificial intelligence, metaheuristics and especially swarm intelligence algorithms represent one of the most important tools since they tend to reproduce natural and animal functioning. Several types of metaheuristics exist nowadays, from which those reproducing the animal behaviour, namely bio-inspired metaheuristics, are mostly used. These algorithms are classified under population-based algorithms which manipulate and improve multiple solutions at each step. It regroups genetic algorithms, particle swarm optimization and swarm intelligence. One of the mostly used bio-inspired algorithm is BSO, which is inspired from the functioning of bees when looking for a food’s source. It has been demonstrated [6] that when foraging, bees indicate the direction and the distance separating them from a food source by performing a dance in eight (8). The closer the source is, the more vigorous the dance is. By analogy to the bees when looking for a food’s search, the functioning of the BSO algorithm [5] is as follows; A random solution named Sref is generated by an initial bee BeeInit. From this solution, a search zone is determined SearchArea and each of these solutions is assigned to a bee. Each bee performs a local search and returns, in a table named Dance, its best solution. The best solution of this table is considered as Sref and the search process resumes until the problem’s solution is found or a stop condition is reached. Although the metaheuristics were designed to find a compromise between effectiveness and efficiency compared with complete algorithms which test all the possible solutions until finding the problem’s solution and which execution time is exponential according to the problem’s size. However, with the large size of problems nowadays, the number of necessary executed iterations of these metaheuristics for the generation of a satisfying solution leads to a loss of their efficiency.
566
C. Hireche and H. Drias
In order to improve the efficiency of these algorithms, we are interested by the exploitation of the technology proposed by the programmable graphics cards NVIDIA that allow parallel computing.
3
GPU Computing
The history of microprocessors dates back to the 70s with the introduction by Intel of the first 4bits microprocessor to be commercialized (Intel 4004) with a clock frequency equivalent to 740 kHz. From then on, and to satisfy the needs of more and more users, these microprocessors have undergone a great evolution. One of the first parameters to be improved in order to increase the performance of the processors was the frequency of the clock. It is one of the main parameters involved in calculating the execution time of a program. However, this improvement has been slowed down because of heat dissipation difficulties. The computer industry has moved then to multiprocessing, which allowed to increase the computing power of computers without increasing the clock frequency and consequently reduced the thermal output. Nevertheless, and despite the multicore architecture of CPUs that provide a high computing power, this architecture is limited by the high latency of data transfer between the memory and the microprocessor. The use of GPUs [7,8] in processing and calculations has become an unavoidable reality. Figure 1 [7] illustrates the evolution of CPUs computing performance as well as GPU (NVIDIA) in number of floating point operations per second (FLOPS).
Fig. 1. Evolution of CPU and GPU computing performance [7]
A Parallel CPU/GPU Bees Swarm Optimization Algorithm
567
The almost exponential increase of GPU power shown in Fig. 1 is due to its architecture offering intensive and massive parallel processing, consequence of the specialization and management of transistors. Indeed, the GPU architecture further favours the number of transistors allocated to the calculation rather than the management and storage of the data stream. A modern GPU can be seen as a grid or a set of streaming multiprocessor (SM) where each SM is represented by a number of streaming processor (SP) as well as shared global memory DRAM (Dynamic Random Access Memory) with a high bandwidth. Each of its SM processors is seen as a Single Instruction Multiple Data (SIMD) machine, which can by definition perform the same instruction on different data simultaneously. In summary, the GPU is considered a multitude of SIMD machines. Programmable GPU and especially Computer Unified Device Architecture (CUDA)[8] ones can be illustrated as; a set of grids, where each grid is represented by a set of blocks running the same kernel. A block being a set of threads cooperating and working together. Each block has an identifier relative to the grid with which it is associated, and can access the global memory of the GPU. Threads in the same block have shared memory and have the ability to be synchronous.
4
Parallel CPU/GPU BSO
The proposed parallel CPU/GPU version of BSO is designed as follows; First, the initial Sref solution is generated randomly on CPU and transferred to GPU. The generation of the search area is then done simultaneously in GPU, each thread generates its own solution from Sref by flipping the variable corresponding to its index. After what, each solution is supported by a number of threads corresponding to that of the bees within the swarm. Each of these threads copies its solution into n threads, n corresponding to the solution length, and each of these n threads flips the solution creating the neighbourhood of the later and executing a parallel local search. The best solution is selected by one thread and the whole process restarts until reaching either the solution or the stop condition. Figure 2 introduces the general functioning of parallel CPU/GPU BSO. The following algorithm (Algorithm 1) presents the functioning of Parallel Local search on GPU. The same reasoning is applied for generating the search area. Since each step of the algorithm is done simultaneously, the complexity of the algorithm is reduced to the number of iterations without considering an important set of instructions for each iteration. This improvement allows an important and considerable reduction of execution time of the algorithm.
568
C. Hireche and H. Drias Parallel CPU/GPU BSO CPU
GPU
GPU threads
Create a random solution : Sref
Copy Solution CPU >> GPU
Generation of Search Arez
Copy Sref >> Solutions [thread_idx] 1 Thread
flip of variables of [thread_idx]
Stochastic Local Search
Copy Solutions [thread_idx] >> ListSolutions [thread_idx] ListSolutions threads
flip of variables of ListSolutions [thread_idx]>>Solution ListSolutions threads
Select best solutions from ListSolutions [thread_idx] and copy >> Solutions[thread_idx]
Max_Iteration
Return best solution
Copy Solution GPU >> CPU
Select best solution
1 Thread
Select best solution from Solutions and copy >> Sref
Max_Iteration
Return best solution
Fig. 2. Parallel CPU/GPU BSO algorithm
5
Modelling DBSCAN for SAT - Parallel BSO Solving
We have seen in [1] a novel manner of solving hard problem’s instances by exploiting data mining techniques [9] to reduce the instance complexity, and we had proposed a bidimensional modelling of DBSCAN for SAT. As a reminder, the SAT problem or boolean satisfiability problem [3] consists in assigning a truth value to a set of propositional variables with the aim of satisfying a formula in conjunctive normal form, ie a conjunction of clauses where each clause is a disjunction of literals (propositional variable or its negation). In this work, we keep the same clustering-resolution reasoning as in [1] and each time a cluster is created it is solved to propagated its solution on the not yet solved clusters. In this current version, a cluster is solved using either the DavisPutnam-Logemann-Loveland (DPLL) algorithm [10] or the parallel CPU/GPU BSO algorithm presented in this work. The DPLL being used for a number of variables less than a certain threshold and returns the solution if it exists, it does not need to parallel version. The following figure (Fig. 3) exhibits the functioning of this improvement.
6
Experimentation
To show the efficiency of the proposed algorithm, some experiments were conducted on well-known SAT instances (IBM1-IBM2-IBM7-IBM13) [11,12], (aim200-6) [13], and (fla-500-0) [14]. Experiments were done under an i7 2.40 Ghz, 4G RAM machine with an NVIDIA CUDA capable GPU graphic card GeForce MX130. Implementation was done under with Microsoft Visual Studio CSharp 2013 for DPLL, BSO and DBSCAN and with CUDA (Microsoft Visual Studio C++ extend CUDA) [8] for proposed parallel BSO.
A Parallel CPU/GPU Bees Swarm Optimization Algorithm
569
Algorithm 1. Paralell Local Search Require: Sref: initial Solution n: Bees number Neighbourhood: List of list of solutions BEGIN Kernel copySolutionT oSolutionsList > (SearchArea, N eighbourhood) Kernel RechercheLocale > (N eighbourhood) Kernel BestSolution > (N eighbourhood, SearchArea) END ————————————————————————— Kernel copySolutionToSolutionsList (Neighbourhood,SearchArea) Require: SearchArea: List of solutions Neighbourhood: List of list of solutions BEGIN T hreadidx = blockIdx.x ∗ blockDim.x + threadIdx.x (Memory Index of the thread on GPU) N eighbourhood → Solutions[threadidx ] → Solution[0] = SearchArea → Solution[threadidx ] CopySolutionT oSolutions > (N eighbourhood → Solutions[threadidx ] → Solution[0], N eighbourhood → Solutions[threadidx ] END ————————————————————————— Kernel RechercheLocale ( Neighbourhood) Require: Neighbourhood[i]: List of solutions BEGIN T hreadidx = blockIdx.x ∗ blockDim.x + threadIdx.x → Solution[threadidx ] = N eighbourhood → Solutions[threadidx ] (N eighbourhood → Solutions[threadidx ] → Solution[threadidx ] + 1)%2 F itness > (N eighbourhood → solutions[threadidx ] END ————————————————————————— Kernel BestSolution (ListSolution, Solution) Require: ListeSolutions : List of solutions Solution : meilleure solution BEGIN T hreadidx = blockIdx.x ∗ blockDim.x + threadIdx.x Indice = threadidx while indice < threadidx ] + tailleSolution do if ListeSolutions → Solution[indice] → evaluation > Solution → evaluation then Solution = ListeSolutions → Solution[indice] → evaluation end if Indice =indice+1 end while END
The following table (Table 1) introduces the descriptions of the used benchmarks. A preprocessing cleaning step is performed on these instances before resolution, the latter being presented in [1], we present the resulting instances.
570
C. Hireche and H. Drias
SAT instance
Yes
Sequential Solving DPLL - CPU
# Variables < threshold
Cluster creation
No
Parallel Solving BSO - GPU
Fig. 3. Parallel Cpu/GPU BSO in bidimensional DBSCAN for SAT Table 1. Benchmark description after preprocessing -cleaningBenchmark name Number of variables Number of clauses ibm1
9685
55613
ibm2
2810
11266
ibm7
8710
39374
ibm13
13215
64938
fla-500-0
500
2205
aim-200-6
200
1182
Table 2 and Fig. 4 exhibit the results of the proposed bidimensional DBSCAN clustering with a parallel CPU/GPU solving in term of satisfiability rates and execution times. The number of applied DPLL and BSO are exposed for more comprehension. Since the bidimensional modelling of DBSCAN defining a region by the euclidean distance is the most efficient proposed approach within [1], we have integrated the proposed parallel algorithm on this approach. The results presented in Table 2 show a correlation between the increase of the radius defining the cluster and the increase in the number of clusters solved by the BSO-parallel algorithm. The comparison of these results with the approach defined [1] indicates an equivalence in satisfaction rates of both the two methods since the resolution process remains the same except for the random generation of the initial solution. Moreover, a considerable time saving is to be noted, and this is thanks to the exploitation of the massive parallelism proposed and offered by GPU from CPU to CPU/GPU calculation, which is the aim of this work. Figure 4 illustrates a comparison between the execution times of the two approaches for the three IBM1, IBM2, IBM7 instances (for more readability). We notice from this figure, a real improvement of the efficiency of the proposed approach showing the impact of exploiting the technology offered by GPU in addition to that offered by data mining for problem solving.
A Parallel CPU/GPU Bees Swarm Optimization Algorithm
571
Table 2. Euclidean bidimensional modelling of DBSCAN for SAT - parallel BSO solving Euclidian radius bidimensional DBSCAN for SAT Name
Clauses Radius SAT(%) Time(s) DPLL BSO
Ibm1
54682
R=50 R=100 R=150 R=200 R=250
97.44 95.65 95.68 95.58 95.75
890.65 117.81 126.67 133.15 173.03
538 392 381 362 361
20 29 25 31 28
Ibm2
10561
R=50 R=100 R=150 R=200 R=250
98.90 99.01 99.02 98.48 98.92
6.2 4.51 4.36 5.13 21.63
83 74 71 69 60
1 1 1 1 1
Ibm7
37388
R=50 R=100 R=150 R=200 R=250
99.15 98.50 98.47 98.40 97.94
54.84 95.09 63.29 80.2 89.48
225 170 172 170 142
4 8 7 8 10
Ibm13
63123
R=50 R=100 R=150 R=200 R=250
98.42 96.77 95.89 95.09 95.02
2743.11 266.84 1382.65 414.53 2254.82
809 609 621 584 540
7 29 39 40 47
Aim200 1182
R=50 R=100 R=150 R=200 R=250
91.37 92.05 91.12 90.61 91.03
12.61 3.19 43.53 6.76 6.48
13 9 7 4 5
0 1 2 2 2
Fla500
R=50 R=100 R=150 R=200 R=250
96.96 97.32 96.96 94.47 94.06
0.55 103.96 237.56 23.96 14.17
59 42 35 32 20
0 0 0 2 4
2205
572
C. Hireche and H. Drias
Fig. 4. Bidimensional DBSCAN - comparison seuqnetial and parallel BSO - execution time
7
Conclusion
Metaheuristics represents the mostly used class of algorithms for solving hard problems such as SAT. Plethora of algorithms and solvers were proposed to solve this problem. In [1], a novel approach were proposed for solving SAT problem by exploiting the technology of data mining were proposed. A bidimensional modelling of DBSCAN for SAT was proposed to reduce the instance’s complexity by extracting sub-instances that are solved independently by either DPLL or BSO. In the current work, we propose to exploit the GPU architecture to improve the efficiency of the previously introduced method. In fact, the highly parallel architecture of GPUs as well as the different GPU programming tools such as CUDA allow a massive parallel computation and considerable gain in term of execution time. In this document, a parallel CPU/GPU version of the BSO algorithm is first proposed. Each step and part of the sequential known BSO is transcript into a massive parallel instructions permitting to reach the potential solution in a very small execution time, improving the efficiency of the latter algorithm. This parallel algorithm is then integrated into the bidimensional DBSCAN keeping the same clustering-solving schema proposed in [1], each time a cluster is created, the latter is either solved using DPLL or parallel BSO. The experimental tests show the impact of the use of massive parallelism offered by the GPU architecture for solving hard problems by conserving the solution effectiveness and improving considerably its efficiency. As future work, we will integrate the parallel BSO algorithm to the multidimensional DBSCAN approach proposed in [15] and solve all the generated clusters simultaneously offering a double level of parallelism and more and more efficiency.
A Parallel CPU/GPU Bees Swarm Optimization Algorithm
573
References 1. Hireche, C., Drias, H.: Density based clustering for satisfiability solving. In: World Conference on Information Systems and Technologies, pp. 899-908. Springer, Heidelberg (2018) 2. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide of the Theory of NP-Completeness. A Series of Books in the Mathematical Sciences, p. x+338. W.H. Freeman and Co., New York (1979). ISBN 0-7167-1045-5. MR 519066 3. Cook, S.: The complexity of theorem-proving procedures. In: Proceeding 3rd Annual ACM Symposium on the Theory of Computing, New York, pp. 151-198 (1971) 4. Glover, F., Kochenberger, G.A.: Handbook of Metaheuristics. Springer, US (2005). ISBN: 978-1-4020-7263-5. https://doi.org/10.1007/b101874. 5. Drias, H., Sadeg, S., Yahi, S.: Cooperatives bees swarm for solving the maximum weighted satisfiability problem. In: Proceeding of IWANN 2005, LNCS, vol. 3512, pp. 318-325. Springer, Barcelona (2005) 6. Seeley, T.D., Camazine, S., Sneyd, J.: Collective decision-making in honey bees: how colonies choose among nectar sources. Behav. Ecol. Sociobiol. 28, 277–290 (1991) 7. NVIDIA CUDA C programming guide version 4.0. Nvidia Corporation (2012) 8. Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Handson Approach. Morgan Kaufmann, Burlington (2016) 9. Han, J., et al.: Data mining, concepts and techniques. Third Edition (The Morgan Kaufmann Series in Data Management Systems) (2011) 10. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5(7), 394–397 (1962) 11. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: The Proceedings of the Workshop on Tools and Algorithms for the Construction and Analysis of Systems (TACAS99), LNCS. Springer, Heidelberg (1999) 12. BMC. http://www.satcompetition.org/2013/downloads.shtml 13. Artificially Generated Random. https://baldur.iti.kit.edu/sat-competition-2016/ index.php?cat=benchmarks 14. Random SAT. https://baldur.iti.kit.edu/sat-competition-2016/index.php? cat=benchmarks 15. Hireche, C., Drias, H.: Multidimensional appropriate clustering and DBSCAN for SAT solving . Data Technol. Appl. J. (2019). Emerald Publishing Limited
Comparison of Major LiDAR Data-Driven Feature Extraction Methods for Autonomous Vehicles Duarte Fernandes1 , Rafael N´evoa1 , Ant´ onio Silva1(B) , Cl´ audia Sim˜ oes2 , 1 1 1,3 Jo˜ao Monteiro , Paulo Novais , and Pedro Melo 1
3
Algoritmi Centre, University of Minho, Braga, Portugal [email protected], [email protected] 2 Bosch, Braga, Portugal Universidade de Tr´ as-os-Montes e Alto Douro, Vila Real, Portugal
Abstract. Object detection is one of the areas of computer vision that has matured very rapidly. Nowadays, developments in this research area have been playing special attention to the detection of objects in point clouds due to the emerging of high-resolution LiDAR sensors. However, data from a Light Detection and Ranging (LiDAR) sensor is not characterised by having consistency in relative pixel densities and introduces a third dimension, raising a set of drawbacks. The following paper presents a study on the requirements of 3D object detection for autonomous vehicles; presents an overview of the 3D object detection pipeline that generalises the operation principle of models based on point clouds; and categorises the recent works on methods to extract features and summarise their performance. Keywords: LiDAR · Point clouds Classification · CNNs
1
· 3D Object Detection and
Introduction
Deep Learning research area has been witnessing tremendous growth, leading to developments that allow its implementation in a wide range of applications. It has been mostly used in object detection and classification tasks, with autonomous driving systems as one of its main targets. These deep learning algorithms for object detection can be implemented following a specific Neural Network architecture and adopt a technology, of which RGB cameras are the most widely used. However, this technology has some disadvantages - prone to adverse light and weather conditions and no depth information is provided - that have been hampering the 3D object detection models of achieving a fully reliable and feasible solution. For these reasons, Light Detection and Ranging (LiDAR) D. Fernandes and R. N´evoa—Both authors contributed equally to this work. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 574–583, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_54
LiDAR-Based Feature Extractors
575
sensing technology has drawn the attention of both academic and industrial community. It offers 360◦ Field-of-View and introduces a third dimension that allows precise distance measurements, etc. [7]. To achieve a fully safety-critical system for autonomous vehicles, 3D object models must meet the following fundamental requirements: (1) real-time operation, which is feature sensor-driven; and (2) detection of a wide range of classes with high accuracy and (3) . These requirements are limited by state-of-the-art of LiDAR sensors. For instance, LiDAR sensors Velodyne VLS-128 and Velodyne HDL-64E are able to offer a frame rate up to 20 Hz [7]. Thus, inference time lower than 50 ms must be imposed to 3D object detection models. Moreover, models must detect also objects smaller than cars, such as cyclists or pedestrians, requiring a high density of points. Velodyne VLS-128 and Velodyne HDL-64E provide a point cloud with 3 and 1.3 million points, respectively, resulting in point clouds with different sparsity as shown in Fig. 1. However, computing more points will naturally affect negatively the inference time of a network. Assuring these requirements is one of the main challenges of 3D object detection models. Although a 3D object detection is composed by several blocks, the key design to enable real-time operation and also accuracy is the process of feature enconding [6]. This block is the only block of the model pipeline that directly processes all input points from the point cloud for feature extraction that feeds the following block. Therefore, it is expected that the feature extractor are fast enough at processing the data to assure that satisfactory inference timed are achieved. However, as LiDAR beams are narrow by nature, sensors are likely to disregard narrow objects (e.g. lampposts or even persons) [10]. In order to overcome this limitation, current sensing technologies tend to either increase the number of reading when scanning the sensor surrounding or increase the LiDAR sensors resolution. As the number of points increases, it is expected to improve algorithms accuracy, but it might sacrifice the inference time of the solution. Therefore, features extractors must apply mechanisms to exclude points with no relevant information and assure that only extracted features that are meaningful in the context of current vehicle’s surrounding scenario are forwarded to the following block. This article will pay special attention to the feature extractors addressed in the literature, analysing how techniques suggested on noteworthy research projects have evolved to better explore the nature of point clouds and optimise their performance metrics. This paper is structured as follows: Sect. 2 highlights the main research challenges and describes the generic pipeline of a 3D object detection model; in Sect. 3, we categorise feature extractors according to the architecture adopted, review the models addressed in the literature, and compare its performance on a benchmark dataset; and Sect. 4 provides a brief summary and concludes this document.
2
LiDAR-Based Object Detection Challenges
The LiDAR sensor is becoming a key element in self-driving cars, mainly because of its long-range detection abilities, its high resolution and the good performance
576
D. Fernandes et al.
Fig. 1. Point Clouds of the same scene obtained by two different sensors. The top image displays a point cloud with 3 million points, while the bottom image is the result of a frame with 1.3 millions points. Image from [2]
under different lighting conditions. Several approaches developed research based on LiDAR data to provide real-time object detection and identification as will be further shown. Thus, research lines were guided to the development of suitable object detector algorithm architectures for self-driving cars, using LiDAR sensing technology. However, LiDAR data consists of high-dimensional unstructured point clouds of sparse nature, which affect the solution both methodologically and computationally. LiDAR sensors produce unstructured data containing typically around 106 3D points per 360-degree sweep, which impose large computational costs. Processing and inferring objects in a LiDAR point-cloud will directly impact the inference time of the model, which can make the solution unsuitable for realtime applications. Also, point clouds come with non-uniform density in different areas, which introduces a significant challenge for point set feature learning [9]. To allow a better trade-off between accuracy and inference time, several design choices were adopted. The resulting design solutions led to an appropriate downstream detection pipeline, which includes the following stages: (1) LiDAR data representation: Organise the point clouds into a structure that allows further computations; (2) Data Object Detector: According to the data representation, this stage aims to perform at least two tasks: feature map extraction and detection of objects of interest; (3) Multi-task header: This aims at performing object class prediction and bounding box regression. In the next section, we will provide a proper description of a generic 3D object detection pipeline based on LiDAR data.
LiDAR-Based Feature Extractors
2.1
577
Generic 3D Object Detection Pipeline
Figure 2 depicts the generic pipeline architecture used by 3D object detection algorithms. This pipeline architecture evidences the diversity of solutions and design choices in each architecture stage. As mentioned before, this architecture comprises three stages: (1) Data Representation, (2) Data Object Detector, and (3) Multi-task header.
Fig. 2. Generic architecture pipeline proposed for description of the operation principle of any 3D object Detection algorithm.
In the Data Representation stage, the raw point clouds are organised into a structure that allows the next block to process it more suitably according to their design choices. The existing models have adopted the following methodologies: voxels, frustums, pillars or 2D projections. The Data Object Detector will receive point clouds in a structure according to the before-mentioned representations and will perform at least two tasks: feature map extraction and detection of objects of interest in these point clouds. Several methodologies have been adopted to extract low- and high-dimensional features from point clouds to produce the feature map. Some research works opt by hand-crafted feature encoders, others use deep network architectures to apply convolutions and extract features. Finally, the multi-task header performs object class prediction, bounding box regression, and determination of objects orientation. These tasks are accomplished using the feature maps and the objects of interest generated by the Feature Encoder.
3 3.1
Feature Extractors CNN-Based
Convolutions networks are one of the preferred techniques used for extracting features from Spatio-temporal data as evidenced by the large of projects adopting it. Standard ”dense” implementations of convolutional networks are a well-matured technique, with multiple variations derived from extensive research
578
D. Fernandes et al.
studies, achieving high performance when applied to dense data. Applying these convolutional architectures to sparse data, such as LiDAR point clouds, is a very inefficient process. Considering that moving from two- to three-dimensional space, the number of points to process increases significantly and the higher dimensional space is, the higher the probability of relevant input data being sparse, it makes sense to take advantage of this spatial sparsity to speed up the feature extraction process. This minimises the number of points to be processed, reducing computational time and resources. In [4], an approach for dealing with sparsity in 2D image classification and online handwriting recognition is presented, wherein a ground state is considered for hidden variables which receive no meaningful input, thus only having to be calculated once per forward pass during training and once for all passes during test time. Consequently, only the value of the hidden variables that differ from their ground state must be calculated, memoizing the convolutional and maxpooling operations. The forward propagation of the network is performed by calculating, for each layer, a feature matrix – composed of one-row vector for the ground state and one for each active spatial location in the layer – and a pointer matrix – to store the number of the corresponding row in the feature matrix. Based on the aforementioned work in [3], the same author adapted this concept to perform sparse convolutions on 3D space. In the approaches presented in [3,4], a site in the input/hidden layer is defined as active if any element in the layer that it takes as an input is not in its ground state. This leads to a rapid increase in the active sites of deeper layers, “dilating” the sparse data in every layer during a forward pass, making it impractical when implementing modern convolutional neural networks such as VGG networks, ResNets and DenseNets. To overcome these challenges, the research work in [5] offers a new approach to sparse convolutions, largely based on the mechanisms of [3,4]. The authors of Submanifold Sparse Convolutional Networks propose two slightly different sparse convolution operations, a Sparse Convolution (SC) and Valid Sparse Convolutions (VSC). SC interprets active sites and ground states - replaced with a zero vector in this implementation - the same way as the sparse convolutions mentioned before, and since there is no padding, the output size is reduced. VSC’s methodology also ignores ground states, replacing them with a zero vector, while distinctly handling the active states. First, padding is applied, so that the output is of the same size as the input. Then, instead of making a site active if any of the inputs to its receptive field is active, only its central position is considered, dealing with the dilation of the set of active sites and ensuring that the output set of active sites mirrors that of the input set. Without the problem of dilation of active sites, the networks implemented with these convolutional operators can be much deeper. On the other hand, restricting the output of the convolutions using this method can hinder the hidden layers ability to capture relevant information. Implementation of these convolutions is done using a feature matrix - containing one row for each active site – and a rule generation algorithm using a hash table to store the location and row for all active sites. A comparison between the operation of sparse convolutions and
LiDAR-Based Feature Extractors
579
submanifold sparse convolutions is depicted in Fig. 3. The middle extractor in [12] uses a combination of the techniques mentioned above, taking advantage of submanifold sparse convolutions and sparse convolutions (to perform downsampling), improving the speed of the algorithm by using a custom GPU-based rule book algorithm.
Fig. 3. 3×3 Sparse convolution vs 3×3 Submanifold convolution.
PointNet [9] extract point-wise features directly from point clouds using a novel CNN-based architecture. This network encodes space features of each point within a subset of points from a euclidean space and extracts local and global structures. Then, it combines both input points and the extracted features into a global point cloud’s signature. For this purpose, it implements a non-hierarchical neural network that comprises three main blocks: a max-pooling layer, a local and global information combination structure, and two joint alignment networks. 3.2
Compound Methods
Figure 4 depicts an architecture of a compound feature extraction method to learn more meaningful information from point clouds. It merges two different types of feature extractors to complement each other, forming a “single-stage” end-to-end feature extractor. The purpose of this synergy is to explore the advantages of feature extractors based on PointNet to encode local-features from regions, and a CNN-based extractor to take the role of global-feature extractor. The latter extractor leverages the local-features previously extracted to add more context to the shape description. This solution outputs a feature description in the form of a tensor, designated single feature map representation (c.f. Fig. 4) to feed a Region Proposal Network (RPN) [6,12,13] or a Multi head [14].
580
D. Fernandes et al.
Fig. 4. Architecture of a compound-based feature extractor.
Projects VoxelNet [13], SECOND [12], MEGVII [14] and PointPillars [6] are examples of recent and novel research works that followed up the above described method. The former three research works converts point cloud data into equally spaced 3D voxels that feed the feature extractor based in the architecture depicted in Fig. 2. These projects differ from each other in the choices made for each stage as shown in Table 1. Regarding the local-feature extractor, both VoxelNet and SECOND implement an solution called Voxel Feature extractor that follows a voxel-wise approach, whereas the MEGVII rely on a extractor called 3D feature extractor. The Voxel Feature extractor applies a simplified version of PointNet to take all the points within a voxel as input and takes 20 ms to extract point wises features from it. The 3D feature extractor addressed in MEGVII adopts the solution previously detailed that combines regular 3D sparse convolutions for features extraction and a submanifold sparse convolution to downsample the feature map. On the other hand, the PointPillars project divides the point cloud data in pillars and applies a local-feature extractor called Pillar Feature Net with a runtime of 1.3 ms [6]. The pillars points are subject to data augmentation as this model introduces a variable regard to its distance to the arithmetic mean of all points of respective pillar. To convert a point cloud to a sparse pseudo-image, the encoder first learns a set of features from pillars, then scatter them back to the original pillar locations to create the 2D pseudo-image. This image is forwarded to a 2D CNN that follows the same architecture as the one analysed in Subsect. 3.1. This solution relies in a simplified version of PointNet to extract local features from pillars. Regarding the global-feature extractor, it consists in the element responsible for grouping all local feature into larger units and processing it to produce higher level features. Literature has shown that different solutions can be used to perform these tasks. The research project VoxelNet opted by implementing a 3D CNN, however it reduces the processing speed of features extraction. The 3D CNN runtime, 170 ms, is much higher than the required inference time of the whole network. For this reason, SECOND, MEGVII and PointPillars adopted faster run time local-features extractors. SECOND replaced 3D CNN by a Submanifold with Sparse Downsampling, reducing runtime to 20 ms. However, more recent works, such as MEGVII and PointPillars, have already outperformed SECOND in terms of inference time. PointPillars resorts to a 2D CNN with a runtime of 7.7 ms to output the final features as result of the concatenation of all features originated from different strides. The project MEGVII sets a RPN like
LiDAR-Based Feature Extractors
581
the one adopted in VoxelNet but to operate only as a global-feature encoder, i.e. does not perform the object detection as in [13]. Therefore, this RPN concatenates all features to construct the high resolution feature map need by further blocks to detect and classify objects. According to authors this project outperformed previous works in terms of accuracy, nevertheless, this model was tested against a manipulated dataset. Data augmentation techniques has been applied to generate a more balanced data distribution - this technique allow trained models to achieve better performing results. This project does not provide runtime analysis. Table 1. Architecture design choices and run time of feature extractors and Benchmark KITTI accuracy performance on moderate level
3.3
Projects
Local extractor
Global extractor
Time (ms) AP(%)
VoxelNet SECOND MEGVII PointPillars
Voxel-Feature Voxel-Feature Submanifold CNN Pillar-Feature
3D CNN 190 Submanifold CNN 22 RPN 2D CNN 9
65% 74% -% 75%
Fusion-Based Methods
Several methods, such as PointsNet, MV3D and PointFusion, propose a combination of images and LIDAR data to improve performance and detection accuracy. These models succeeded in difficult scenarios, such as classifying small objects. Frustum PointNet [8] uses a feature extraction based on an object-wise approach. It utilises 2D CNN detectors to propose 2D regions from the image and classify them. These 2D regions are then lifted to a 3D point cloud, becoming frustum proposals also called Frustum-cloud. Afterwards, they apply PointNet++ to the regions to further estimate the location, size and orientation of 3D objects. Another relevant fusion-based work is MV3D [1], which fuses data from RGB images and LiDAR sensors, to extract high-dimensional features. It produces a multi-view representation of 3D point clouds and extracts feature maps from each view. They combine features from the front view, bird’s eye view (BEV) and camera view to alleviate the information loss. Instead of using object-wise features, such as in F-PointNet, it uses region-wise features extracted from BEV. RPNs are applied to generate 3D object proposals using the bird’s eye view map as input. Then, a region-based fusion network combines features from the multiple views and provides object proposal’s class predictions and oriented 3D bounding box regression. PointFusion [11] provides 3D object detection by fusing images and 3D point cloud’s information. Firstly, they supply the RGB image to an RPN that proposes 2D object crops. Then, they combine point-wise features provided by
582
D. Fernandes et al.
PointNet in 3D point clouds, image geometry features using ResNet architecture, and combine both outputs in a fusion network. This fusion network provides 3D bounding box regression for the object in the crops. Fusion-based methods present some drawbacks that must be considered. First, the need for synchronisation time and calibration with the LIDAR sensor is a limiting factor. This time synchronisation and calibration task makes the solution more sensitive to sensor failures. Then, there is an increase in costs associated with the use of an additional sensor. For the three before-mentioned projects, we compare the results achieved in KITTI benchmark for 3D AP on KITTI test set for moderate difficult in three categories, namely pedestrian, car and cyclist. PointFusion states that their model in car class achieved 63 AP, while MV3D and F-PointNet achieved 62.68 and 70.39 respectively. For the pedestrian class, PointFusion achieved 28.03 AP, and F-PointNet 44.89 AP. For the cyclist class, PointFusion achieved 29.42 AP, and F-PointNet 56.77 AP. Regarding inference time, F-PointNet performs detection at 8.3 Hz, MV3D at 2.7 Hz and PointFusion at 0.77 Hz. Due to the poorer accuracy on the detection of smaller objects, data-fusion is stated to be the bets approach, with research work [10] suggesting the use of a Ultrasonic Rangefinder.
4
Conclusions
Object detection research works have either prefer to preserve all geometric information and perform object detection with 3D ConvNets, or compact all information and perform 2D convolutions. The former approach requires expensive computations, which hinder the inference time (3D CNN takes 170 ms in VoxelNet [13] against the 7.7 ms runtime of the 2CNN implemented in [6]), which may turn the deployed model impractical for real-time requirements with no sign of accuracy performance improvement, according to the reported results. On the other hand, compacting the information in 2D projections and performing 2D convolutions is less computationally heavy, but introduces information losses. For this reason, some research works rely on compound methods that take advantage of the sparse nature of point clouds or fusion-based methods, suggesting that the direct application of a 2D CNN-based extractor leads to a poorer performance. In addition, the fusion-based performance results above presented show that excluding the third dimension of the point cloud leads to poorer accuracy, while the response time is still higher than compound-based solutions even though the number points are lower. Compound-based solutions are only based in LiDAR and have achieved the best performance so far, showing that splitting feature extractor into two stages conduct to better accuracy, while the achieved response time is satisfactory in some cases thanks to the exclusion of points with no representative value and application of 2D CNNs to process local-features instead of the whole point cloud. Summing up, in the present SoA, feature extractors solutions have achieved low runtimes and techniques to manage the trade-off between metrics have emerged, but accuracy of such solutions seems still far from the performance needed. For this reason,
LiDAR-Based Feature Extractors
583
projects have explored the fusion of technologies, showing promising results but also raising some new challenges. Furthermore, MEGVII project has shown that data augmentation strategies lead to models with better accuracy results. Acknowledgements. This work is supported by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n◦ 037902; Funding Reference: POCI-01-0247-FEDER-037902]
References 1. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017) 2. Forbes: Velodyne rolling out 128-laser beam lidar (2017). https://www.forbes. com/sites/alanohnsman/2017/11/29/velodyne-rolling-out-128-laser-beam-lidarto-maintain-driverless-car-vision-lead/. Accessed 22 Nov 2019 3. Graham, B.: Sparse 3d convolutional neural networks (2015). CoRR abs/1505.02890, http://arxiv.org/abs/1505.02890 4. Graham, B.: Spatially-sparse convolutional neural networks (2014). CoRR abs/1409.6070, http://arxiv.org/abs/1409.6070 5. Graham, B., van der Maaten, L.: Submanifold sparse convolutional networks (2017). CoRR abs/1706.01307, http://arxiv.org/abs/1706.01307 6. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds (2018) 7. Prime, A.: Velodyne alpha puck (2017). https://velodynelidar.com. Accessed 22 Nov 2019 8. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018) 9. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017) 10. Wiseman, Y.: Ancillary ultrasonic rangefinder for autonomous vehicles. Int. J. Secur. Appl. 12(5), 49–58 (2018) 11. Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018) 12. Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018) 13. Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection (2017) 14. Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3d object detection (2019)
Where Is the Health Informatics Market Going? André Caravela Machado1, Márcia Martins1, Bárbara Cordeiro1, and Manuel Au-Yong-Oliveira1,2(&) 1
Department of Economics, Management, Industrial Engineering and Tourism, University of Aveiro, 3810-193 Aveiro, Portugal {amachado,marcia.martins,barbara.cordeiro,mao}@ua.pt 2 GOVCOPP, Aveiro, Portugal
Abstract. At an estimated annual growth rate of 13.74%, the global health informatics market can reach $123 billion, by 2025, figures that exemplify the development trends of an ever-growing industry. The authors decided to collect data from the general public through an anonymous survey on the subject of health informatics. This survey was developed on Google Forms and later sent to multiple recipients by email and shared on social networks. A total of 165 people, aged 16 to 81 years old, participated in this survey. 98.8% of the respondents consider response time in health a determining factor. 97.6% of the survey participants consider that it is possible to make more accurate and viable clinical diagnoses using health informatics. Furthermore, according to our survey, people do not mind mortgaging their personal data (which is known from the outset for its incalculable value) because they know that, in return, they will benefit from better living conditions. Three experts were also interviewed and according to one of them, one of the biggest challenges in health informatics is “understanding and detecting diseases long before they happen”. This interviewee also stressed the importance of artificial intelligence “in helping people to improve their health through indicators that alert and recommend certain habits and influence the improvement of people’s quality of life”. Finally, the emphasis needs to be on eliminating health costs and facilitating life for people with chronic diseases. Keywords: Health
Informatics Patient Market Data
1 Introduction Nowadays, the influence of technology on the process of communication and information exchange between people and in the business world is clear. Specifically, in the health sector, the use of Information and Communication Technologies (ICTs) has the potential to promote several advantages such as less costs, information sharing and a better quality of life, as can be analysed throughout this study [2]. Informatics can be defined as a tool with the main objective of speeding up information in the most diverse areas. In the healthcare area, the growing number of processed and stored data has been demanding of professionals with multidisciplinary © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 584–595, 2020. https://doi.org/10.1007/978-3-030-45691-7_55
Where Is the Health Informatics Market Going?
585
knowledge, namely in the efficient use of IT tools, tools increasingly incorporated into health services. In this way, the concept “Health Informatics” comes up and it can be defined as the development and assessment of methods and systems for the acquisition, processing and interpretation of patient data with the help of knowledge from scientific research [3]. The motivation of the current article was, firstly, to try to understand the level of importance and the impact that Health Informatics represents today, both for companies and for patients. In addition, this article also allows us to understand the position of this area in the market, and which strategies are being used by companies in the field of health information technology, in order to stay open in an increasingly competitive market with high growth trends. Therefore, this study intends to make known in a practical and real way the experiences, challenges and ambitions of companies that perform functions in the area of Health Informatics through the collection of information through interviews, while also seeking to explore the relationship of patients with this technological area through a survey which was also conducted. By exploring this subject, this article looks to answer the following research questions: • How does health informatics contribute to the well-being and improvement of patients’ quality of life? • How receptive are patients in relating to health informatics in their daily lives? In the next section a literature review was done in order to advance different authors’ perspectives on the research theme. The main goal was to understand what knowledge already exists in the literature. Furthermore, the literature review provided a solid basis for the analysis and discussion. After the literature review, the methodology used for this study is presented - interviews and a survey. Finally, the article ends with the presentation of the research results, their discussion and the conclusions drawn from this work.
2 Literature Review Health Informatics has been studied by a number of authors. Thus, we concluded, based on our literature review, that: 1. The development of Health Informatics has influenced the evolution of healthcare through the use of technology and electronic health systems [15]. 2. Very important changes in information technology have occurred in recent years. Artificial intelligence, virtual reality, massive databases, and multiple social networking sites are a few examples of how information is used differently as a result of technological developments. An important outcome of these changes is that the ability to effectively and efficiently access and use information has become an important source of competitive advantage in all industries. Information technology
586
3.
4.
5. 6.
7.
8.
9.
A. C. Machado et al.
advances have given small firms more flexibility in competing with large firms, if that technology can be used in an efficient way [5]. The contribution of data record systems, responsible for discovering new preventive forms in medicine, has improved the populations’ quality of life by decreasing the financial impact of diseases [16]. There are several mobile applications that intend to answer the data collection needs of patients in a fast and accurate way, in a short time frame and with lower costs. An example is the Global Outbreak Alert and Response Network of WHO that is based on digital resources available on the internet for permanent and daily surveillance [16]. Big Data in health care exceeded 150 exabytes after 2011, and a study showed that data size in health care is estimated to be around 40 zettabytes in the year 2020 [11]. Indeed, the real experiences of users show that their initial expectations change since the moment in which they use tools related to health informatics. The confirmation of the benefits of these tools is instrumental, guaranteeing the permanence and credibility of the services connected to health informatics [17]. Adoption and diffusion increase after establishing trust in the e-services offered and so trustbuilding structures and mechanisms need to be a priority [17]. Telemedicine techniques constitute one of the most important health informatic applications, allowing for advice from a distance. The data base about the patient and the basis of medical knowledge with expert systems are crucial in this area and will be oriented to the support of medical work. As regards hospital information systems, progress is expected, particularly in the field of treatment. Concerning data protection new solutions are also being developed, through unique standards creation, classifications, nomenclatures and coding [7]. Regarding one of our research foci, some authors announced that there are some entities, namely, governmental, that had the initiative to regulate, protect and assure the privacy of data collected from patients [15]. Patients are the central focus of healthcare; therefore, they must also be the central focus of health informatics. At a certain point, without patients there would be no need for health informatics, since it is the patient who generates data and information, and most of all communication within hospitals and medical centres is patient-centred; medical data management focuses on patient diagnostics and therapeutic procedures and medical information departments concentrate their attention on the handling of patient records and patient databases [3].
3 Methodology The methodology used in this exploratory study involved a literature review, related to informatics and technology in health, and the position of this area in the business market. The gathering of some of the literature was done in databases such as ResearchGate, a popular academic social network which facilitates access to academic articles, and Scopus.
Where Is the Health Informatics Market Going?
587
In order to complement the literature review, a quantitative and a qualitative method were followed. Firstly, a survey was developed in order to collect people’s feelings and experiences in what concerns technology and health. The questions presented had as a main goal to find out people’s perspectives regarding the use of health technologies in their daily lives. The reason why this tool was chosen is because it was considered a straightforward way to get to a larger number of people and to be able to analyse data more precisely afterwards. Furthermore, respondents were, in one of the questions, allowed to give an open answer – a factor that allowed a deeper analysis of different points of view concerning the same topic. Secondly, three interviews were conducted with the objective of analysing Health Informatics in a more strategic and entrepreneurial way through the knowledge of professionals in the research and business sector of this area, as can be seen in the Results section.
4 Results First of all, considering the results obtained, both quantitative and qualitative, they impressed very positively. On the one hand, because it was possible to realize that respondents, in general, are opening their minds to the application of new technologies and that is reflected in the improvement of people’s quality of life and health. On the other hand, for the technical, expert and knowledgeable approach of the stakeholders who volunteered to respond to our interviews, whom we thank in advance: Frans van Houten (CEO of Royal Philips, Netherlands), Dr. Luís Bastião Silva (CTO of BMD Software, Portugal) and Dr. Ana Dias (Assistant Professor at the University of Aveiro). The principles that led to the adoption of these two distinct data and information methodologies were based on two questions: i) Is the general population prepared to interpret health informatics as a perpetual benefit? ii) Do companies in the health informatics sector only want to look at profit or do they also have as their main objective, while considering strategy and competitiveness, improving people’s quality of life? 4.1
Quantitative Analysis
To answer these two questions, it was first decided to collect data from the general public through an anonymous survey on the subject of health informatics. This survey was developed on Google Forms and later sent to multiple recipients by email and shared on social networks (e.g. Facebook, Linkedin and Whatsapp) so that the scope, both localization and public, was completely differentiated and could, therefore, reach all areas of analysis. A total of 165 people, aged 16 to 81 years old, participated in this survey, sixteen of them aged 22 years old, representing the largest number of participants by age: 9.7%. In terms of gender, 49.1% of respondents were female (representing a total of 81 participants) and 50.9% of respondents were male (representing a total of 84 participants).
588
A. C. Machado et al.
By region, the results show participants from various countries, such as: • • • • •
Portugal (139 respondents) - 84.4% France (4 respondents) - 2.4% Brazil and Luxembourg (3 respondents each) - 1.8% each Germany, Angola and Spain (2 respondents each) - 1.2% each Andorra, Cape Verde, Canada, Italy, Japan, Morocco, Sweden, East Timor, Ukraine and Uruguay (1 respondent each) - 0.6% each
To finalize the characterization of the public, academic qualifications were also considered, with 46.7% of the respondents having an academic qualification equal to or lower than secondary education (representing 77 participants) and 37.6% of the respondents having a qualification equal to a degree (representing 62 participants). Regarding the remaining numbers, 7.3% of the respondents have a master’s degree (representing 12 participants), 6.7% of the respondents have a Higher Vocational Technical Course and, lastly, 1.8% of the respondents have a PhD degree (representing 3 participants). The first question on the subject of health informatics in this survey had to do with response time in health, i.e. our audience was asked: ‘Do you consider response time in health a determining factor?’ There was a massive 98.8% of affirmative responses. In other words, 163 of the respondents consider the response time in health as a determining factor. However, 2 of the respondents, i.e. 1.2%, think not. In order to have a more direct approach to the subject, our audience was asked: ‘Do you believe that is it possible to make more accurate and viable clinical diagnoses using health informatics?’ 97.6% of the respondents answered yes (representing 161 participants), while 2.4% of the respondents, answered no (representing 4 participants). Another relevant issue at this time in the big data era [reference below] relates to technology’s own patient data storage: ‘Do you consider the storage and management of patient medical history via technology relevant?’ Quite surprisingly, by the result below, no one is opposed. 100% of the respondents (the 165 participants) clearly say yes. On a scale of 1 to 5 (1 “I disagree” and 5 “I totally agree”), our audience was asked whether or not the use of health data science improves customer service and satisfaction: ‘From 1 to 5, do you agree that using health data science improves patient care and satisfaction?’ (Fig. 1). The graph depicts that there is a large difference between “agree” (at level 4, 15.8% of the respondents, 26 participants) and “fully agree” (at level 5, 83% of the respondents, 137 participants). Also to be considered is the middle point which means “neither agree nor disagree” (at level 3, 1.2% of the respondents, 2 participants). It was also intended on a scale from 1 to 5, as explained above, to understand the level of agreement regarding the financial investment applied in health informatics with the following question: ‘From 1 to 5, do you consider the investment in the informatics area well applied to health?’. 74.5% of the respondents (123 participants) set at level 5 (strongly agree), 16.4% of the respondents (27 participants) fixed at level 4 (agree), 7.3% of the respondents (12 participants) fixed at the midpoint (neither agree nor
Where Is the Health Informatics Market Going?
589
Fig. 1. Improving patient care and satisfaction through health informatics.
disagree) and, finally, 1.8% of the respondents (3 participants) settled at level 2 (disagree). The following question consisted of five areas of application of health informatics: (i) telemedicine, (ii) electronic medical records, (iii) information retrieval, (iv) diagnostic and decision support systems, and (v) online health portals. It was intended to know through the public which areas of computer application they considered most relevant: ‘Which of the following areas of application of health informatics do you consider most relevant?’ (Fig. 2). 36.4% of the respondents (60 participants), in blue, understands it to be telemedicine; 32.7% of the respondents (54 participants), in green, understand it to be the diagnosis and decision support systems; 19.4% of the respondents (32 participants), in orange, understands it to be electronic medical records; 6.7% of the respondents (11 participants), purple, understands it to be the online health portals; and 4.8% (8 participants), in yellow, understands it to be information retrieval.
Fig. 2. Health informatics applications.
Finally, in this sample of results, one question was left open to all those who wanted to argue about this theme for the future: ‘How do you consider health informatics to be an asset to your quality of life and future well-being?’.
590
A. C. Machado et al.
In total, 25 written answers were obtained that clearly benefited the understanding of this question, meeting the information raised through the literature review as well as the answers given to previous questions of the survey. The answers highlighted several advantages of health informatics, highlighting the improvement of the quality of the diagnoses performed, faster patient care, faster access to the patient’s medical history and innovation in this area in general: • “The technology could make it easier for doctors and nurses to work on diagnostics and therefore have more time available to devote to patients.” • “Faster response to medical needs and possibility of greater interdisciplinarity between specialties.” 4.2
Qualitative Analysis
Following another approach, through the qualitative sample, three interviews were conducted. The first interview was with a major global company that is growing in the global health computing market: Royal Philips, in the person of its CEO, Dr. Frans van Houten. Another interview was conducted at BMD Software CTO, a Portuguese company that is taking its first steps in this area, namely with Dr. Luís Bastião Silva. Finally, we did an interview with an Assistant Professor at the University of Aveiro, in Portugal, who conducts research on health care issues, Dr. Ana Dias. The main purpose of data collection in these interviews was to allow for an interdisciplinary, plural, independent and non-binding interpretation of any tendency for the sampling of results to be clear. Following prior contact with respondents through their offices, questions were sent and answered by email during October 2019. Interview 1: Ana Dias Ana Dias completed her PhD in Health Sciences and Technology, due to the circumstances, because she was led to some work in this sector. Her research project for her PhD had to do with the importance that the integration of health information has in the guarantee and continuity of care. There are some publications on the interoperability of existing health data collection solutions and those used in care delivery. In the interview, she revealed that “it is crucial that health data can be integrated and used in health care delivery, but solutions need to communicate and therefore there must be interoperability, semantics and technology. If not, we will be collecting data that can hardly be used in the provision. A central challenge: how to deal with the complexity and changeability of clinical information and thus the difficulties in dealing with information interoperability.” When addressed with the question: “Which sectors of Health Informatics have significant development margin?” Ana Dias says that the “focus should be on ensuring the integration of information” and that “the solutions that currently exist, without interoperability, may hinder the establishment of a remote market for patient monitoring solutions.” There was also a desire to know her opinion regarding the market players, in order to understand if there is any significant flaw in their performance. In this regard, Ana
Where Is the Health Informatics Market Going?
591
Dias has no doubt in saying “In regulation. New developments in these areas should be further and better assessed.” Interview 2: Luís Bastião Silva As far as Luís Bastião Silva is concerned, he has been at BMD Software CTO for three years. The main goal of this company is essentially to build the best technological solutions for some of the most critical problems in the health and life science industries, providing computational solutions for better decision making and knowledge management in healthcare [9]. To be more specific, the main field of this company’s expertise is the development and maintenance of medical imaging solutions, of biomedical applications, and of bioinformatics tools [14]. As an answer to the question, “What is the role of BMD Software when we talk about Health Informatics?” Luís considers that “One of the fundamental pillars of the company is the transfer of research and innovation results to operating environments, especially in the Medical Imaging sector and development for clinical data analysis” and that, for him and BMD Software, “medical information analysis tools are actively growing”. In addressing the issue of the era of Big Data and the effects of using patient data in terms of threat vs. opportunities, Luís Bastião Silva noted that “BMD is already in some projects with large amounts of data and therefore considers it an opportunity.” BMD Software CTO also pointed out that BMD has been able to “transform innovative products and prototypes into solutions of regular use and with incredible advantages in clinical practice and with direct advantages for patients”. Interview 3: Frans van Houten Frans van Houten has been CEO of Royal Philips since April 2011. As CEO, he is also Chairman of the Board of Management and the Executive Committee. One of the company’s main goals is extending its leadership in health technology, making the world healthier and more sustainable, with the goal of improving three billion lives per year by 2030 [10]. According to [13], Royal Philips said it expects third-quarter 2019 Group sales of approximately 4.7 billion euros, a 6% comparable sales growth. He notes that one of the biggest challenges in health informatics is “understanding and detecting diseases long before they happen”. This interviewee stresses the importance that the role of artificial intelligence will play “in helping people to improve their health through indicators that alert and recommend certain habits and influence the improvement of people’s quality of life” and what they want through Philips, “eliminate health costs and facilitate people with chronic diseases”. It has broadly broadened Philips’ business concept by providing software solutions, systems and services and taking responsibility “for health outcomes, improved health productivity and patient experience”. “A very strong investment in artificial intelligence in connection with health informatics and a strong hiring of data scientists to support doctors, nurses and patients to make results more discreet and comfortable.” he said.
592
A. C. Machado et al.
Speaking of company figures, he said that “25% of Philips’ 18 billion euros in turnover comes from healthcare informatics” and that the major markets are in Asia, specifically China, which is experiencing strong growth.
5 Discussion Since the Roman Empire, Seneca, a writer, philosopher and master of the rhetoric art, asserted that “the desire to be healed is part of healing.” According to [12], in researching on health informatics, the main focus is on future conditions so that millions of people can eliminate unsustainable costs, poor outcomes, frequent medical errors, poor patient satisfaction, and worsening health disparities. Based on the data analysed, from the first approach to the sample of the population that responded to the survey, it was possible to realize that the reduction of response time in health is a determining factor. This means that, currently, health systems are not yet able to respond quickly to health issues effectively and with certainty, and the best understanding of this is the reality we observe in Portugal regarding public hospitals: a long waiting list, months or years long, for patients who need care or a simple diagnosis. This is where health informatics can collaborate, speeding up processes, reducing costs and time, and considerably improving the lives of many patients. For that to happen, people do not mind mortgaging their personal data (which is known from the outset for its incalculable value) because they know that, in return, they will benefit from better living conditions. Storing and managing patients’ medical history, through technology, does not seem to be a problem for people, as our survey reveals. Data science decision-making raises the bar so that these two aspects can be brought together - on the one hand, patient data; on the other, decision making based on scientific data for health. The belief that it is possible to improve the quality of health services, influenced by health informatics, is widely appreciated by the public. Hence, this desire to improve the quality of health services, improving accuracy and response times, is part of a cure that patients, perhaps in less time than they expect, may see improvements in their quality of life. Thus, this first screening comes down to the conclusion of our public survey: there is hope that health informatics will address various aspects to improve people’s quality of life. Albeit, deep down, who is to make this hope a reality? The market in this sector, obviously. The analysis of the context of the market players allowed us to observe several aspects, namely: • Unregulated market performance is one of the factors to consider and observe in this health informatics sector; • Information needs to be integrated and interoperability solutions created; • Medical information analysis tools are actively growing;
Where Is the Health Informatics Market Going?
593
• There has been an attention to ideas and prototypes that are turned into solutions for patients; • There is a concern among market players in reducing healthcare costs; • In an articulated manner, they intend to integrate artificial intelligence, data science and health informatics as a form of cooperation between doctors, nurses and patients; As is well known, all companies exist to generate profit and value. In the case of companies in the health informatics sector, we see no exception. It is estimated that the market could reach $123 billion, by 2025 [1], equivalent to twice the gross domestic product of Luxembourg or approximately that of Kuwait. These are companies whose market operations are located in hospitals, specialized clinics, pharmacies, laboratories, among others. At the demographic level, the largest companies in the industry are located in the United States, Canada, the Netherlands, France, Germany, China and Japan. Additionally, is the health computing market strategy working for people or working for profit? From what it was possible to conclude from the interviews that were conducted and the survey, that analyses some questions that clarify doubts about the perception of the people, it is concluded that this sector works for both and that, hopelessly, both depend on each other. And how do you come to this conclusion? Firstly, if it is a market that generates profits, it means that it is a market that has solutions to present. Spreading these solutions across your customers, who are clearly buying new health computing developments, means that there is a need-generating class that needs these solutions to improve their processes, reduce costs and apply to their goals or obstacles. This is why people benefit from market developments, healthcare providers’ acquisitions so that they can make a significant improvement, either in their own health or in the mechanisms at their disposal to solve their problems. Apart from that, not everything is clear: a lack of regulation is a worrying situation, especially regarding the use and processing of patient data and therefore the red lines of how far this patient can go cannot be known. We need a market which cannot arbitrarily abuse the resources at its disposal - following its path by benefiting people’s lives.
6 Conclusions and Suggestions for Future Research After observing all the parameters, it is safe to say that applied health informatics translates as an important area for the production of knowledge and practical solutions. Along with the hope indicated in the discussion, there are forecasts of business progress so that people’s lives will become better and better, associating applied health informatics with an area of trust and continuous improvement based on innovation. Thus, the strategy of the health informatics market is to present evidence that its solutions are beneficial for their use in a network: among doctors, nurses, patients and other clinical groups and various specialties. The great majority of the survey respondents consider response time in health a determining factor (a negative aspect of Portuguese public health care at present), as
594
A. C. Machado et al.
well as that it is possible to make more accurate and viable clinical diagnoses using health informatics. Furthermore, according to our survey, people do not mind mortgaging their personal data because they know that, in return, they will benefit from better living conditions. One of the biggest challenges in health informatics is, according to an interviewee, “understanding and detecting diseases long before they happen”. This interviewee also stressed the importance of artificial intelligence “in helping people to improve their health through indicators that alert and recommend certain habits and influence the improvement of people’s quality of life”. Finally, the emphasis needs to be on eliminating health costs and facilitating life for people with chronic diseases. To conclude, it is considered extremely pertinent for future research to address the Health Informatics market from a regulatory perspective, the existence of laws on these matters and the extent to which legal requirements are matched with technological advances. It is also considered important for future research to compare health informatics markets with other related markets in order to make an assessment and to understand, on a scale, how quickly these markets are actually evolving.
References 1. Global Healthcare Informatics Market worth $123.24 Billion by 2025 - CAGR: 13.74%: Global Market Estimates Research & Consultants. https://markets.businessinsider.com/ news/stocks/global-healthcare-informatics-market-worth-123-24-billion-by-2025-cagr-13-74global-market-estimates-research-consultants-1027348258. Accessed 3 Oct 2019 2. Rouleau, G., Gagnon, M., Côté, J.: Impacts of information and communication technologies on nursing care: an overview of systematic reviews (protocol). Syst. Rev. 4(75) (2015) 3. Imhoff, M., Webb, A., Goldschmidt, A.: Health informatics. Intensive Care Med. 27(1), 179–186 (2001) 4. Report: Global Healthcare IT Market to Hit $280B by 2021. https://www. hcinnovationgroup.com/clinical-it/news/13028238/report-global-healthcare-it-market-to-hit280b-by-2021. Accessed 5 Oct 2019 5. Hitt, M., Ireland, R., Hoskisson, R.: Strategic Management: Concepts. Competitiveness and Globalization, 9th edn. South-Western, Mason (2011) 6. Top 10 companies in healthcare IT market. https://meticulousblog.org/top-10-companies-inhealthcare-it-market/. Accessed 23 Oct 2019 7. Masic, I.: The history and new trends of medical informatics. Donald Sch. J. Ultrasound Obstet. Gynecol. 7(3), 72–83 (2013) 8. 2018 Global Health Care Outlook: The evolution of smart health care. https://www2.deloitte. com/content/dam/Deloitte/global/Documents/Life-Sciences-Health-Care/gx-lshc-hcoutlook-2018.pdf. Accessed 27 Oct 2019 9. BMD. https://www.bmd-software.com/company/. Accessed 01 Nov 2019 10. van Houten, F.: https://www.philips.com/a-w/about/company/our-management/executivecommittee/frans-van-houten.html. Accessed 01 Nov 2019 11. Hong, L., Luo, M., Wang, R., Lu, P., Lu, W.: Big data in health care: applications and challenges. Data Inf. Manag. 2(3), 175–197 (2018) 12. Marvasti, F., Stafford, R.: From “sick care” to health care: reengineering prevention into the US system. N. Engl. J. Med. 367(10), 889–891 (2012)
Where Is the Health Informatics Market Going?
595
13. Royal Philips Q3 Comps. Up 6% - Quick Facts. https://www.nasdaq.com/articles/royalphilips-q3-comps.-up-6-quick-facts-2019-10-10. Accessed 01 Nov 2019 14. BMD Software. https://www.linkedin.com/company/bmd-software/about/. Accessed 01 Nov 2019 15. Bell, K.: Public policy and health informatics. Semin. Oncol. Nurs. 34(2), 184–187 (2018). https://doi.org/10.1016/j.soncn.2018.03.010 16. Aziz, H.: A review of the role of public health informatics in healthcare. J. Taibah Univ. Med. Sci. 12(1), 78–81 (2017). https://doi.org/10.1016/j.jtumed.2016.08.011 17. Shin, D., Lee, S., Hwang, Y.: How do credibility and utility play in the user experience of health informatics services? Comput. Hum. Behav. 67, 292–302 (2017). https://doi.org/10. 1016/j.chb.2016.11.007
Data Integration Strategy for Robust Classification of Biomedical Data Aneta Polewko-Klim1(B)
and Witold R. Rudnicki1,2,3
1
3
Institute of Informatics, University of Bialystok, Bialystok, Poland [email protected] 2 Computational Center, University of Bialystok, Bialystok, Poland Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
Abstract. This paper presents the protocol for integration of data coming from two most common types of biological data (clinical and molecular) for more effective classification patients with cancer disease. In this protocol, the identification of the most informative features is performed by using statistical and information-theory based selection methods for molecular data and the Boruta algorithm for clinical data. Predictive models are built with the help of the Random Forest classification algorithm. The process of data integration includes combining the most informative clinical features and the synthetic features obtained from genetic marker models as input variables for classifier algorithms. We applied this classification protocol to METABRIC breast cancer samples. Clinical data, gene expression data and somatic copy number aberrations data were used for clinical endpoint prediction. We tested the various methods for combining from different dataset information. Our research shows that both types of molecular data contain features that relevant for clinical endpoint prediction. The best model was obtained by using ten clinical and two synthetic features obtained from biomarker models. In the examined cases, the type of filtration molecular markers had a small impact the predictive power of models even though the lists of top informative biomarkers are divergent. Keywords: Random Forest Biomedical data
1
· Data integration · Feature selection ·
Introduction
In last years, the classification of clinical endpoints (CE) and clinical outcomes based on the molecular data significantly increased efficiency of diagnostics, prognostics, and therapeutics in patients with cancer [1,2]. Unfortunately, the cancer pathophysiology is related with both genetic and epigenetic changes that are described by various types of biological data. What is more, each type of cancer is very complex disease, with high variability of sources, driver mutations and c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 596–606, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_56
Data Integration Strategy for Robust Classification of Biomedical Data
597
responses of the host to therapy. This makes prediction of cancer CE very difficult, and only weakly reflected by a current state of the molecular signature collected from the mix of cancer cells in different states mixed with normal cells. Nevertheless, we can treat each type of molecular data as additional source of information, complementary to clinical data (CD). We then can integrate this information with other evidence in the predictive models [3–5]. Generally, three major strategies are used for integration of heterogeneous data measured on the same individuals from different experiments for machine learning models [6]. In the early integration strategy, one combines several data sets and relies on the machine learning algorithm for finding meaningful relationships across data sets boundaries. In the late integration strategy separate models are built for each data set, and then their results are used as input for another machine learning algorithm in the second level. These two strategies can be generally executed using standard machine learning algorithms. The third approach, the intermediate integration strategy, relies on the algorithms explicitly designed for integration of multiple data sets, where models are built jointly on multiple separate data sets. Integration of clinical and molecular breast cancer data has been performed using various machine learning algorithms, such as the Bayesian networks [7], the Support Vector Machines [8], Random Forest [9] k-NN [10]. In most studies, integration involves only one type of molecular data [8,10]. The feature selection (FS) is often either limited to molecular data [7] or it is performed on combined dataset [8]. In the current study we propose robust methodology for integrating various types of molecular data with clinical data. It is a mixture of early and late integration strategy, yet it can be performed using standard machine learning algorithms. We first develop independent predictive models for each type of the molecular data. Then we use the results of these models as a new synthetic molecular descriptor of the clinical state of the patient. Finally we build machine learning models using clinical data augmented with new synthetic molecular descriptors. Development of a robust classification model based on molecular data is a cornerstone of this method and involves robust and computationally intensive procedure. A following general protocol was used: 1. 2. 3. 4.
data preprocessing (specific to a particular data set); identification of informative features and feature selection; model building; entire procedure is performed within k-fold cross validation scheme, and is repeated several times, to obtain good estimate of expected variance.
The methodology outlined above was applied for prediction of clinical endpoints in breast cancer patients.
2
Materials and Methods
All the data analyses were conducted using the open source statistical software R version 3.4.3 [11] and R/Bioconductor packages [12].
598
A. Polewko-Klim and W. R. Rudnicki
The data for the study was obtained from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) project [13]. We used clinical data (CD), Illumina Human HT-12 v3 microarray gene expression profiles data (GE) and Affymetrix SNP 6.0 copy-number alterations data (CNA). 2.1
Data Preprocessing and Integration
Clinical Data. Twenty five clinical features were selected to be included in the clinical data set. Choise was based on diagnostic test used in clinical practice: Prosigna, Breast Cancer Index, EndoPredict, MammaPrint, Mammostrat and Oncotype DX DCIS. The samples, which had missing values of these features were omitted, with the exception of the tumor stage feature, where the null value were replaced by 0. All qualitative CD data were converted into numerical data. Molecular Data. The primary GE set contains 1906 samples described by 24369 continuous variables corresponding to the gene expression levels. The CNA set contains 1483 samples described by 22544 discrete variables, corresponding to alterations of number of copies of genes. The missing values of probes were imputed by mean and median values, respectively for GE and CNA sets. Additionally, the filtering of the variables was performed on GE set, with the help of dedicated function from genefilter R package [14], based on the quality of the signal. Two criteria were used - the sufficiently high intensity of the signal and sufficiently high variation of intensities. The low intensity of measured gene expression is generally considered as noise. Also gene expression should have sufficiently high variation to be included in analysis - the small changes of activity, even if statistically significant, are very unlikely to have any biological relevance. The intensity threshold was set at first quartile of distribution of the maximum gene expression levels. Only genes for which at least 10% of samples have intensity greater than this threshold are included. The variation criterion was that the robust coefficient of variation [15] was higher than 0.05. This procedures limited this number of GE features to 8673. Decision Variable. Following [16] we use the disease-specific survival as the clinical endpoint rather than the overall survival. The former yields more accurate prognostic model for survival in patients with breast cancer. The final data set contained records of 1394 patients (781 survivors and 613 deceased) in three subsets containing clinical data (CD), gene expression profiles (GE), and copy number alteration profiles (CNA). 2.2
Identification of Informative Variables
Identification of the informative variables for molecular datasets was performed using two alternative methods namely Welch t-test [17] and MDFS [18] implemented in the MDFS R package [19]. Welch t-test is a simple statistical test
Data Integration Strategy for Robust Classification of Biomedical Data
599
that assigns probability to the hypothesis that two samples corresponding to two decision classes (vital status of patients: death/alive) are drawn from populations with the same average value. MDFS algorithm measures decrease of the information entropy of the decision variable due to knowledge of k-dimensional tuples of variables and measures influence of each variable in the tuple. It is worth noting that the bias in the filtering FS methods has better generalisation property than other FS methods, e.g. wrapper and embedded [20], because it is not correlated with the classification algorithms. The corrections of p-values due to multiple testing were handled using two alternative approaches: the Hochberg FWER correction [21] and SGoF procedure [22]. Both GE profiles and CNA profiles contain multiple highly correlated variables. Presence of highly correlated variables can have adverse effect on classification accuracy, therefore simple dimensionality reduction was applied, using greedy approach. The final set of variables was determined by removing variables that were highly correlated with higher-ranking ones, the cut-off level of correlation coefficient was set to 0.7. Clinical data descriptors are non-uniform and even though all were represented in a numerical form, some of them represent categorical variables. Therefore the CD set is not suitable for analysis with tests that require numerical data. Instead the all-relevant FS algorithm Boruta [23], implemented in R package Boruta [24], was used to identify and rank of the most relevant clinical variables. This selection was performed only once, using the entire data set. Forgoing cross-validation in this case, could lead to positive bias in estimated quality of models based on CD data only. Consequently this may lead to a small negative bias in the estimate of improvement due to adding molecular data to description. Nevertheless, we use this approach since it is both simpler for interpretation and computationally less expensive. 2.3
Predictive Models
Random Forest (RF) [25] algorithm, implemented in randomForest R package [26] was used to build predictive models. Random Forest works well on datasets with small number of objects, has few tuneable parameters that don’t relate directly to the data, very rarely fails and usually gives results that are often either best or very close to the best results achievable by any classification algorithm [27]. Random Forest is an ensemble of decision trees, each tree is built on a different bagging sample of the original data set. For each split a subset of variables is selected randomly and the one that allows to achieve highest Gini coefficient for resulting leaves is selected. The principal individual RF models from molecular data were constructed in n-loops of the following procedure: 1. split randomly the dataset into k equal partitions; 2. set aside one partition as a validation set and use remaining four partitions as a training set; 3. obtain the strongest informative features using one of the feature selection method on training set;
600
A. Polewko-Klim and W. R. Rudnicki
4. reject all the variables with Spearman’s correlation coefficient with already selected ones greater than 0.7; 5. limit the selected features to the number N (or to the number of identified relevant variables); 6. build Random Forest classifier on the training set using selected variables; 7. evaluate quality of the models on the validation set; 8. repeat steps 2–5 for all k partitions. The RF models from a set of clinical data were constructed by using toprelevant features with Boruta algorithm. All feature selection and classification processes were performed within k = 5 fold full cross-validation repeated n = 30 times. The quality of models were evaluated using three metrics: the classification accuracy (ACC), the Matthews Correlation Coefficient (MCC) [28] and the area under receiver operator curve (AUC). It should be noted that MCC and AUC metrics are better suited to evaluate quality of classifier for the unbalanced population than simple ACC, hence only MCC and AUC are used for diagnostic, whereas ACC is only reported for completeness of results. Integrated Datasets. In the initial approach we simply merged all the relevant clinical features with top-N most relevant features obtained from the molecular data set under scrutiny. Unfortunately such approach did not improve the results of the classifier based on the clinical data alone. Therefore an alternative approach was tested. In this approach we first created independent predictive RF model for the breast cancer clinical endpoint. Then a new variable was created using results of such model. The value of this new feature was set as a fraction of votes for survival of the patient returned by the RF model. This feature represents composite synthetic information on the decision variable that is contained in molecular data. Then the final predictive model is built using selected clinical variables and the synthetic molecular variable, with the help of Random Forest classifier. The scheme of this procedure is displayed in Fig. 1. Sensitivity Analysis. We performed analysis of the sensitivity of the predictive model to removal of the descriptive variables, for clinical data and for integrated model. The analysis was performed in two variants. In the first one, we built predictive models for the decision using data set with single feature removed. In this way we could establish influence of the single feature on the quality of the model using all other informative features. In the second variant, we built a series of predictive models starting from all features, and removing the least important ones, one after another. The second approach is similar to the wellknown recursive feature elimination method. It was used for comparison between reference model that was built using only clinical data, with the models that were built using in addition composite molecular features.
Data Integration Strategy for Robust Classification of Biomedical Data
601
Fig. 1. Procedure scheme for feature selection and clinical endpoint prediction with clinical and molecular data. See notation in text.
3 3.1
Results Informative Variables
Application of the Hochberg FWER correction [21] to variables CNA data resulted in very small number of uncorrelated variables. Therefore the SGoF procedure was used for CNA data set. For consistency also it was also used for GE data set. Among three filtration methods the one-dimensional variant of MDFS was least sensitive, whereas two-dimensional variant of MDFS was most sensitive. Sensitivity of t-test was similar, albeit slightly higher, to that of the one-dimensional variant of MDFS. The number and ranking of variables deemed relevant varied strongly between the folds of the cross-validation procedure. In particular, for gene expression, for t-test, the total number of variables deemed relevant in 150 repeats of cross-validation was 6227, the average number for a single fold was 3817. For comparison, 4208 variables were deemed relevant when t-test with SGoF correction was applied to entire data set. For two-dimensional MDFS the corresponding numbers are 8610, 4378 and 4722. The number of variables deemed relevant is higher for CNA data set. For t-test, the total number of variables deemed relevant was 13 010, the average number was 5 663 and the number identified for entire data set was 6 842. For two-dimensional MDFS, corresponding numbers are 22 183, 11 465 and 13 099. However, the CNA data is much more correlated than GE data. For example, the number of uncorrelated variables returned by t-test was 1472 for GE data set and 77 for the CNA data set. For both data sets the rankings of the most relevant features are not stable in cross-validation. The feature sets of different methods are also quite divergent. Nevertheless, models developed on different feature sets give comparable results. This is due to a high correlation between variables and application of a greedy algorithm for selection of representative of cluster of similar variables. Slightly different rankings of variables are obtained by various filters on various training sets. After applying greedy selection of representative they are converted to divergent sets of variables. Nevertheless, these variables still contain very similar information on decision variable. Therefore the final predictive model is stable
602
A. Polewko-Klim and W. R. Rudnicki
and does not depend on particular choice of representatives of clusters of similar variables. This effect is well-known for *omics data [29]. The number of features from GE and CNA sets, that were used for RF model building was set to N = 100. This value was established experimentally by comparison of quality of the models as a function of N . The following 17 clinical descriptors (sorted by importance) were deemed relevant by Boruta: intclust, cohort, age at diagnosis, NPI, ER IHC, breast surgery, three gene, claudin subtype, chemotherapy, radio therapy, grade, tumor size, tumor stage, ER status, HER2 status, PR status, oncotree code. The sensitivity analysis performed for these features revealed that removal of any of the 17 relevant features decreased the quality of the RF models, hence all of them were used to build predictive models. 3.2
Predictions of the Clinical Endpoints
As a first step, the GE, CNA and CD data were studied independently. The value of the threshold in the RF model (cutoff hyper-parameter i.e. the number of votes that lead to a choice of a decision class), was optimised to return correct proportion of classes in out-of-bag (OOB) predictions. The optimal value is 0.58 and is similar to proportion of classes in the set. One should note that value of this parameter impacts only ACC and MCC metrics. The results for models built for separate data sets are shown in three upper sections of Table 1. The best classification results for individual data sets were obtained for models using CD data. These results are far from perfect, nevertheless they are relatively good. Models build using gene expression patterns have significantly lower predictive power. The quality of models does not depend on the method used to select predictive genes. Finally, models built using CNA data set were the weakest, although still statistically significant. With hindsight such result could be expected, given our understanding of the biological processes in cancer. The alteration of the number of copies of genes results in modified expression patterns in cells that in turn can lead to development of lethal forms of cancer. Nevertheless, each of these steps is mostly non-deterministic and depends on individual history of the patient. Hence, most information is contained on the clinical level, less information on the gene expression level, and even less on the genetic alterations level. In the next step we examined, whether extending clinical data with molecular data can lead to improved predictive power of machine learning models. The direct extension of CD data set with most relevant features from GE and CNA data sets did not lead to better models. This happens because the individual molecular features carry very little information in comparison with any clinical feature. Consequently, they are very seldom used for tree building and have no influence on the final predictions of the RF model. Instead we extended the CD data set with the composite features, corresponding to the fraction of votes for the deceased class in RF classifier built from molecular data – either gene expression profiles or copy number alteration. The results of predictive models built on CD augmented in this way are displayed in
Data Integration Strategy for Robust Classification of Biomedical Data
603
Table 1. The prediction power of Random Forest models for clinical data (CD), gene expression data (GE), copy number aberrations (CNA) and for clinical data combined with the classification results of the molecular data (CD+GE, CD+CNA and CD+GE+CNA), built in 30 × 5-fold cross validation for different filter feature selection (FS) methods. Data set
FS method ACC
MCC
AUC
CD
Boruta
0.677 ± 0.002 0.361 ± 0.004 0.739 ± 0.002
GE
t-test MDFS-1D MDFS-2D
0.616 ± 0.002 0.255 ± 0.004 0.680 ± 0.002 0.618 ± 0.002 0.260 ± 0.004 0.681 ± 0.002 0.618 ± 0.002 0.261 ± 0.004 0.681 ± 0.002
CNA
t-test MDFS-1D MDFS-2D
0.583 ± 0.002 0.200 ± 0.004 0.634 ± 0.002 0.579 ± 0.002 0.196 ± 0.004 0.634 ± 0.002 0.583 ± 0.002 0.208 ± 0.004 0.638 ± 0.002
CD+GE
t-test MDFS-1D MDFS-2D
0.680 ± 0.002 0.369 ± 0.004 0.748 ± 0.002 0.684 ± 0.002 0.375 ± 0.004 0.749 ± 0.002 0.685 ± 0.002 0.375 ± 0.004 0.747 ± 0.002
CD+CNA
t-test MDFS-1D MDFS-2D
0.677 ± 0.002 0.361 ± 0.004 0.743 ± 0.002 0.675 ± 0.002 0.359 ± 0.004 0.743 ± 0.002 0.678 ± 0.002 0.364 ± 0.004 0.747 ± 0.002
CD+GE+CNA t-test MDFS-1D MDFS-2D
0.682 ± 0.002 0.373 ± 0.004 0.751 ± 0.002 0.685 ± 0.002 0.379 ± 0.004 0.753 ± 0.002 0.686 ± 0.002 0.381 ± 0.004 0.754 ± 0.002
the three lover sections of Table 1. The addition of the composite variable based on gene expression increases predictive power of the model slightly. The degree of improvement depends on the metric and the FS method used for construction of RF model. In particular accuracy and MCC are improved only for the MDFS-based models, whereas improvement in AUC is similar for all models. The results of the extension of the CD with CNA-based composite variable are more varied. No significant improvement is observed for accuracy and MCC. On the other hand small, but significant increase of AUC is observed, in particular for model derived from features obtained with MDFS-2D. The best results in all metrics are obtained for a model that includes both GE- and CNA-based features derived with MDFS-2D. This may indicate some synergistic interactions between CNA biomarkers, that contribute some unique information to the model. The final step of our study was sensitivity analysis conducted on the best combined CD+GE+CNA model. In the first type of sensitivity analysis a single feature was removed from the description, and the decrease of the AUC of the model was used as a measure of importance. Both molecular features, i.e. the Random Forest votes based on GE and CNA data, turned out to be relatively strong, see third and fourth position of features on Fig. 2 (left panel). What is
604
A. Polewko-Klim and W. R. Rudnicki
more, only in seven cases (age at diagnosis, NPI, molecular GE, molecular CNA, cohort, intclust, tumor size), removal of the feature lead to decreased AUC. This is a strong contrast with results of sensitivity analysis for clinical features only, where removal of each feature lead to decreased AUC.
Fig. 2. Left panel: change of AUC of the predictive model after removal of particular features. Middle panel: AUC of the predictive model built on 150 subsets of the most important features. Blue dots denote the mean values of AUC. Red line indicates the level of AUC for the classifier built on all the N = 19 most relevant features. Dotted lines mark the 95% confidence level. All CD+GE+CNA models built on variables selected using MDFS-2D method in 30 × 5-fold cross validation. Right panel: Survival plots (in a Kaplan-Meier estimation) for high and low risk group of patients according to the prediction models: CD – hazard ratio 3.3 ± 0.5; GE – hazard ratio 2.3 ± 0.4; CNA – hazard ratio 1.9 ± 0.4; CD+GE+CNA – hazard ratio 3.6 ± 0.6. All notations are in text.
The second type of sensitivity analysis was the recursive feature elimination, where least important features were iteratively removed from description, see Fig. 2 (right panel). In this case the quality of the model was stable until twelve features were left in description, namely age at diagnosis, NPI, molecular GE, molecular CNA, cohort, intclust, tumor size, breast surgery, chemotherapy, tumor stage, three gene, grade. The predictive power of RF models developed within this study is moderate. Nevertheless, models allow to divide patients into high- and low-risk classes. Survival ratio for these classes differ considerably. The hazard ratio between two group of patients reaches 3.6 for the combined CD+GE+CNA model (see right panel of Fig. 2).
4
Conclusions
Machine learning methods are used to build better probabilistic models which can use information that is well hidden from the straightforward analysis. They can find complicated, non-linear relationships between decision variable and descriptor variables. Nevertheless, machine learning cannot help when information is not hidden but missing. The results strongly suggest, that molecular
Data Integration Strategy for Robust Classification of Biomedical Data
605
*omics data does not add much new information to the clinical descriptors. Nevertheless, the results of the predictive models built using *omics data can be used as another useful synthetic clinical descriptor. The relevance of the descriptors shows that these synthetic descriptors are very significant. Omission of the molecular GE descriptor in the data leads to similar drop of AUC as omission of N P I - a synthetic clinical indicator of prognosis in breast cancer, and omission of molecular CN A is only slightly less important. What is more, the inclusion of these synthetic clinical descriptors leads to models that can use less variables, and therefore should be less prone to overfitting and easier to generalise.
References 1. Burke, H.: Biomark. Cancer 8, 89–99 (2016) 2. Lu, R., Tang, R., Huang, J.: Clinical application of molecular features in therapeutic selection and drug development. In: Fang, L., Su, C. (eds.) Statistical Methods in Biomarker and Early Clinical Development, pp. 137–166. Springer, Cham (2019) 3. Yang, Z., et al.: Sci. Rep. 9(1), 13504 (2019) 4. Xu, C., Jackson, S.: Genome Biol. 20(1), 76 (2019) 5. de Maturana, E.L., et al.: Genes 10(3), 238 (2019) 6. Zitnik, M., et al.: Inf. Fusion 50, 71–91 (2019) 7. Gevaert, O., et al.: IFAC Proc. Vol. 39(1), 1174 (2006) 8. Daemen, A., et al.: Proceedings of the 29th Annual International Conference of IEEE Engineering in Medicine and Biology Society (EMBC 2007), pp. 5411–5415 (2007) 9. Boulesteix, A., et al.: Bioinformatics 24, 1698–1706 (2008) 10. van Vliet, M., et al.: PLoS ONE 7, e40385 (2012) 11. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017). https://www.R-project.org/ 12. Gentleman, R., et al.: Genome Biol. 5(10), R80 (2004) 13. Pereira, B., et al.: Nat. Commun. 7, 11479 (2016) 14. Gentleman, R., et al.: Genefilter: methods for filtering genes from high-throughput experiments. R package version 1.60.0 (2017) 15. BD Biosciences: Robust Statistics in BD FACSDiva Software. https://www. bdbiosciences.com/documents/Robust Statistics in BDFACSDiva.pdf. Accessed 16 Jan 2019 16. Margolin, A., et al.: Sci. Transl. Med. 5(181), 181re1 (2013) 17. Welch, B.: Biometrika 34(1/2), 28 (1947) 18. Mnich, K., Rudnicki, W.R.: All-relevant feature selection using multidimensional filters with exhaustive search. Inf. Sci. 524, 277–297 (2020) 19. Piliszek, R., et al.: R J. 11(1), 2073 (2019) 20. Jovi´c, A., et al.: 2015 38th International Convention on Information and Communication Technology Electronics and Microelectronics (MIPRO), vol. 112, no. 103375, p. 1200 (2015) 21. Hochberg, Y.: Biometrika 75(4), 800 (1988) 22. Carvajal-Rodriguez, A., et al.: BMC Bioinform. 10, 209 (2009) 23. Kursa, M., et al.: Fund. Inform. 101(4), 271 (2010) 24. Kursa, M., Rudnicki, W.R.: J. Stat. Softw. 36(11), 1 (2010) 25. Breiman, L.: Mach. Learn. 45, 5 (2001)
606 26. 27. 28. 29.
A. Polewko-Klim and W. R. Rudnicki Andy, L., Wiener, M.: R News 2(3), 18 (2002) Fern´ andez-Delgado, M., et al.: J. Mach. Learn. Res. 15(1), 3133 (2014) Matthews, B.: Biochim. Biophys. Acta 405(2), 442 (1975) Dessi, N., et al.: BioMed Res. Int. 2013(387673), 1 (2013)
Identification of Clinical Variables Relevant for Survival Prediction in Patients with Metastatic Castration-Resistant Prostate Cancer Wojciech Lesi´ nski1 , Aneta Polewko-Klim1(B) , and Witold R. Rudnicki1,2,3 1
3
Institute of Informatics, University of Bialystok, Bialystok, Poland [email protected], [email protected] 2 Computational Centre, University of Bialystok, Bialystok, Poland Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
Abstract. Prostate cancer is the most common cancer among men in high-income countries. This study examines which clinical variables are useful for prediction of clinical end-point for patients with metastatic castration-resistant prostate cancer. First informative variables were found using several feature selection methods and then used to build Random Forest models. This procedure was performed for several distinct predictive horizons, starting from ninety days and up to five years. The quality of predictive models increases with the prediction horizon, increasing from AU C = 0.66 and M CC = 0.24 for 3-months prediction to AU C = 0.82 and M CC = 0.50 for 60-months prediction. Informative variables differ significantly for different predictive horizons. In particular the results of laboratory tests describing physiological state of the patient are most relevant for a short-term prediction. On the other hand, variables pertaining to activity of immune system as well as socioeconomic indicators are most relevant for a long-term prediction. Two clinical variables, namely the level of lactate dehydrogenase and level of prostate specific antigen are relevant for all horizons of prediction. Keywords: Prostate cancer Random forest
1
· Feature selection · Machine learning ·
Background
Prostate cancer is one of the main cause of cancer death for men in Western societies [1]. Metastatic castration-resistant prostate cancer (mCRPC) is a phase of cancer’s development, when it has spread to parts of the body other than the prostate, and it is able to grow and spread even though drugs or other treatments to lower the amount of male sex hormones are being used to manage the cancer. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 607–617, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_57
608
W. Lesi´ nski et al.
The focus of this study is on prediction of patient’s survival, based on selected descriptors pertaining to clinical, laboratory and vital signal information. In particular, it aims at identification of descriptors that are relevant for prediction of survival at different time horizons. Prediction of survival of prostate cancer patients was a subject of multiple studies. For example, Halabi et al. [2,3] build proportional hazard model using 8 features: lactate dehydrogenase, prostate-specific antigen, alkaline phosphatase, Gleason sum, Eastern Cooperative Oncology Group performance status, hemoglobin, and the presence of visceral disease. Another works on patients survival in mCRPC based on clinical variables are described in [4] and [5]. In paper [6] survival prediction was made using gene expression data. Recently a large international experiment, Prostate Cancer DREAM Challenge, has been carried out for prediction of survival of prostate cancer patients [7,8]. In this experiment, data from four clinical trials were provided to the international community of researchers within the framework of DREAM challenge. The DREAM Challenges [9] are a non-profit, collaborative community effort consisting of contributors from across the research spectrum including researchers from universities, technology companies, not for profits, and biotechnology and pharmaceutical companies. Fifty groups has participated in the Challenge and contributed final predictions. The results of predictions were compared with the results obtained from the model developed by Halabi et al., that was used as a reference. Significant improvement was observed for best groups, and in particular for the ensemble of best predictions. The detailed results of the Prostate Cancer DREAM Challenge are described in [7] and [10]. In the current study the data from the Prostate Cancer DREAM challenge [8] is used to explicitly examine predictive power of models for six different time horizons, namely 3 months, 6 months, one year, two years, three years and five years. By developing separate models for each time horizon, it was possible to examine which variables describing patients were most relevant for each of them.
2 2.1
Materials and Methods Data
Data used in this work was the original training set from the Prostate Cancer DREAM challenge. It was collected by Celgene, Sanofi and Memorial Sloan Kettering Cancer Center within phase 3 clinical trials of drugs. It consists of medical records of 1600 patients. These records include clinical covariates such as patient demographics, lesion measure, medical history, prior surgery and radiation, prior medicine, vital sign, lab, etc. There are five groups of variables and two clinical indicators of prostate cancer progression in the data. The variables are collected in the following longitudinal data tables that contain event data: – PriorMed includes all medications a patient took or has taken before first treatment of the trial.
Variables Relevant for MCRPC Survival
609
– LabValue includes all lab tests a patient took from screening up to 84 days after first treatment date. – LesionMeasure includes all lesion test result a patient has from screening up to 98 days or Cycle 4 after reference day. – MedHistory includes all medical diagnoses patients provided at screening, which covers co-existing conditions patients have. – VitalSign includes all vital sign a patient took from screening up to 84 days after reference day. The clinical indicators of disease progression are Gleason Score and ECOG Performance Status. The Gleason Score is a system of grading prostate cancer tissue based on how it looks under a microscope. Gleason scores range from 2 to 10 and indicate how likely it is that a tumor will spread. A low Gleason score means the cancer tissue is similar to normal prostate tissue and the tumor is less likely to spread; a high Gleason score means the cancer tissue is very different from normal and the tumor is more likely to spread. The ECOG score, also called the WHO or Zubrod score, runs from 0 to 5, with 0 denoting asymptomatic for disease and 5 death. In the DREAM challenge the core data table contained 129 descriptive variables per patient along with two decision variables. The decision variables hold the survival status of the patient (the time of last observation and information whether patient was alive at that time). Preprocessing. In the current study, the data was extended and transformed into six data sets. Each set contains a common set of descriptive variables and single binary decision variable. The decision variable denotes whether patient was dead after three, six, twelve, twenty four, thirty six and sixty months, respectively. The five longitudinal data tables were merged into a single data set. Missing values for all patients were replaced with median value of the respective variable. Finally, the data set was extended by summary variables, such as maximum or minimum, computed for selected variables. The final data set consisted of 1600 records containing 228 independent variables. Imbalanced Data. Data sets analysed in this work are imbalanced, especially for short term prediction horizon. In tasks like this, important problem is to find proper impact of minority class. There are many techniques for dealing with imbalanced data problem [11]. In the current study a simple downsampling techniques were used. Downsampling involves randomly removing observations from the majority class to prevent its signal from dominating the learning algorithm. While is not ideal method when the highest possible classification accuracy is the goal of the project, nevertheless it suited the goals of our study. Here the focus is on identification of informative variables, and classification accuracy served as a confirmation that these informative variables are indeed useful for making predictions. By using downsampling to a balanced data set the same protocol could be used for all predictive horizons, without arbitrary cost functions for classification. What is more, by balancing the data sets one gives equal weight to both classes in the feature selection filters.
610
W. Lesi´ nski et al.
Two measures that are appropriate for imbalanced data were used for evaluation of models’ quality, namely Matthews Correlation Coefficient (MCC) [12], and area under receiver operator curve (AUC). The AUC is a global measure of performance that can be applied for any classifier that ranks the binary prediction for objects. It is dependent only on the order of predictions for objects, hence it is insensitive to the selection of threshold value separating the prediction into classes. The MCC belongs to the group of balanced indicators of accuracy, such as F1 score or balanced accuracy, that take into account number of good predictions in both classes. MCC measures correlation of distribution of classes in predictions with real distribution in the sample. It is the only measure that properly takes into account the relative sizes of the classes [13]. What is more, it has well defined and easily interpreted statistical distribution: χ2 (1) M CC = n These two measures should be directly transferable to estimates performed on data with unbalanced classes. Downsampling. Experiment was carried on 6 different time points and 4 feature selection methods. Majority class was randomly downsampled to the minority class dimension. The number of observations used at each predictive horizon was different, due to downsampling. In particular, models were built for 72 patients for 3 months, 198 for six months, 564 for twelve months, 1110 for twenty four months, 1300 for thirty six months and 1326 for sixty months. Then the feature selection algorithms mentioned earlier were applied to the particular For short predictive horizons the feature selection filters returned very few relevant variables, nevertheless, models were built using 20 highly ranked variables. For longer-term predictive horizons (twelve months and longer) only relevant variables were used. 2.2
Feature Selection
Classifiers achieve best accuracy, when all used variables are informative. This is especially important when the number of variables is very large, since both model quality and computational performance are degraded when large number of variables are used. What is more, the information which variables are most informative with respect to the decision variable may both give additional insight for understanding the models and hints for clinicians for improvement of treatment. Feature selection was performed using three different feature selection methods: – Welch t-test for differences in sample means [14]. – multidimensional filter based (MDFS) on information theory developed in our laboratory [15,16] and implemented in the R package MDFS. Two variants of this method were used - one dimensional, and two dimensional analysis.
Variables Relevant for MCRPC Survival
611
The former performs analysis of dependence of a decision variable on a single descriptive variable. The latter takes into account pairs of variables and therefore also interactions between variables. – Random Forest feature importance, based on out-of-bag classification. In this case one obtains only a ranking of variables. Both filter methods (t-test and MDFS) return ranking of variables by p-value, that after applying correction for multiple testing can be converted to the binary decision about relevance of the variable. Three feature selection methods have different properties and may reveal slightly different sets of relevant variables. In particular Welch test is sensitive to the extreme values, MDFS can detect non-linear effects and interactions between variables, and Random Forest ranking is most consistent with the classification algorithm, hence it should lead to best classification results. 2.3
Classification
Random Forest classification algorithm [17] was selected for building predictive models. Random Forest is a classifier that works well out of the box on most data sets [18]. In particular, it works well on datasets with a small number of objects, has few tuneable parameters that don’t relate directly to the data, very rarely fails, and usually gives results that are often either the best or very close to the best results achievable by any classification algorithm [18]. It was proposed by Breiman in 2001 and combines Breiman’s [19] “bagging” idea and the random selection of features in order to construct a collection of decision trees with controlled variation. Random forest is an ensemble of decision trees built in the randomised manner on the bagging subsamples of the data set. Each tree predicts class for the object. The predictions of individual trees are then used as “votes” for the class. For each object the decision is assigned to a class with most votes from all the trees in the forest. In some cases it is desirable to shift the balance of classes, then the number of votes for each class is multiplied by the weight assigned to the class. 2.4
Robust Machine Learning Protocol
The modelling approach is based on the following general protocol: – – – – – –
Split the data into a training and a validation set; Identify the informative variables in the training set using FS method; Select variables for model building; Remove redundant variables; Build a Random Forest model on training set; Estimate the model’s quality both on training and validation set.
612
W. Lesi´ nski et al.
The procedure outlined above is cast within the 10 repeats of the 5-fold crossvalidation scheme. In each repeat the data set was split randomly into 5 parts in a stratified manner. Five models were developed for such split using four parts as training set and one part as validation set. Each part served once as a test set, and four times was included in the training set. This cross-validation was repeated 10 times with different random splits of data. The process allows to obtain an unbiased estimate of model quality using data unseen during feature selection and model building. Repeating the cross-validation allows to obtain an estimate of distribution of results achievable with the procedure and hence to asses what are the reasonable expectation of performance on unseen data. This is very important part of the procedure that is often overlooked in applications of machine learning. In particular single split between training and test set is not sufficient for obtaining reliable estimate of the variance.
3 3.1
Results Variables Relevant for Predicting Survival for Different Time Horizons
Three different feature selection methods were used, one of them in two variants, hence four different rankings of variables are obtained. Evaluation of feature importance was built based on 10 repeats of 5 folds cross-validation procedure. The cumulative ranking of importance for each period was obtained as a number of occurrences of variable in the top ten most relevant variables in all 4×50 = 200 repeats of the cross-validation procedure. This ranking was used as a measure of relevance even if the statistical test of relevance did not indicate that variables are relevant. This happened for the shortest predictive horizon of 3 months, where t-test returned no relevant variables in any repeat and MDFS in both variants returned only one or two relevant variables in few folds. In the case of 6 months horizon the feature selection filters returned on average about ten variables, but many of them were strongly correlated, hence only three of four independent variables were deemed relevant. The number of relevant variables surpassed ten for longer predictive horizons of one, two, three and four years. The results of feature selection identified three distinct, albeit overlapping, sets of predictive variables, see Table 1. Only two variables, namely lactate dehydrogenase (LDH), and value of prostate specific antigen (PSA) are relevant for all predictive horizons. Short-Term Predictive Horizon. In addition to LDH and PSA, the set of variables that were most relevant for the 3- and 6-months predictions comprised of values of laboratory test results – hemoglobin level (HB), neutrophils level (NEU), platelet count (PLT), albumin level (ALB), aspartate aminotransferase (AST), magnesium level MG), as well as ECOG scale of patient’s performance. Between these descriptors ECOG, AST and ALB are consistently most relevant.
Variables Relevant for MCRPC Survival
613
Table 1. Ranking of ten most informative variables for different predictive horizons. Variables used in [3], are highlighted using bold face. Relevant variables, but not ranked within top ten are indicated with *. Prognostic horizon Short Medium Long Marker
3
6
12 24
36 60
Hemoglobin (HB) Neutrophils (NEU) Platelet count (PLT)
2 5 6
4 6 7
-
-
-
-
Aspartate aminotransferase (AST) 4 3 10 10 Magnesium (MG) 3 1 ECOG 1 5 Albumin (ALB)
9 3 6
* 10
-
-
Lactate dehydrogenase (LDH) Prostate specific antigen (PSA)
7 9
* 2
2 3
3 7
4 8
4 7
Target lesions
-
-
2
5
-
-
Other neoplasms Red blood cells (RBC)
-
* -
1 2 10 6
2 *
2 8
World region Corticosteroid Lymphocytes (LYM) Social circumstances (SC) Testosterone Gleason score Total protein
-
* -
* -
1 4 9 8 * * -
3 1 6 5 7 * 9
1 3 6 5 * 9 6
Blood urea nitrogen (BUN) Ever smoked Total bilirubin (BLRB) Lesion in lungs Analgesics Glucose Cardiac disorders
8 -
8 9 -
8 7
-
10 -
10 -
Medium-Term Predictive Horizon. Twelve months prediction horizon marks a transition from short-term to long-term predictions. Three variables relevant for short-prediction, namely HB, NEU and PLT, are no longer relevant. ECOG is still most important, but some cancer properties like target lesions, other neoplasm in patient history, as well as the level of red blood cells become relevant. The transition to the long-term set continues at twenty four months prediction horizon. Two laboratory descriptors, namely AST and MG, are no
614
W. Lesi´ nski et al.
longer relevant. What is more, while ECOG status continues to be relevant it is a borderline case. In turn socio-economic descriptors start to be relevant (world region and social circumstances), along with variables connected to activity of immune system and hormone levels: corticosteroid level as well as lymphocytes count. Long-Term Predictive Horizon. The transition to set of variables relevant for long-term prediction is complete at 36-months horizon. The ECOG status and albumin level, that were strongly relevant for short-term prediction are no longer present. What is more, the target lesions, that were relevant for mediumterm, are also not among most relevant for long-term predictions. Instead two new descriptors, namely testosterone level and total protein level, appear in the set of most relevant variables. The most important predictors are corticosteroid level, presence of other neoplasms and world region, become the most relevant predictors. The same set of predictive variables is most relevant for 60-months horizon. Interestingly, Gleason score, appeared within top-ten variables only at the longest predictive horizon, nevertheless, it was relevant also for closer horizons, albeit just below the top-ten most relevant variables. Several variables were identified as relevant only for a single predictive horizon. Their relevance was relatively weak in all cases (ranks between 7 and 10). They were not included in the discussion above since they don’t fit into any regular pattern. Most likely they contain some information related to the progression of cancer, but their appearance in the ranking was due to random amplification of the strength of the signal for a single predictive horizon.
Fig. 1. ROC curves of models built for different predictive horizons (left panel) and models built using different feature selection algorithms for 60-month predictions (right panel).
3.2
Prediction of Clinical End-Point
The summary of the results is collected in Table 2. As can be expected, the worst performing models were built for the shortest horizon, most likely due to very
Variables Relevant for MCRPC Survival
615
Table 2. Results of Random Forest (MCC and AUC) build in cross validation loop for different time points Prediction horizon T-test
MDFS-1d
MDFS-2d
RF feature importance
MCC AUC MCC AUC MCC AUC MCC AUC 3 months
0.11
0.59
0.19
0.61
0.21
0.64
0.24
0.66
6 months
0.17
0.60
0.28
0.68
0.28
0.70
0.22
0.65
12 months
0.31
0.71
0.35
0.74
0.32
0.72
0.36
0.73
24 months
0.29
0.71
0.39
0.76
0.35
0.74
0.41
0.77
36 months
0.39
0.74
0.46
0.81
0.47
0.81
0.47
0.81
60 months
0.45
0.78
0.50
0.82
0.50
0.82
0.47
0.81
limited data set. In particular, models built using variables highly ranked by t-test were just a little better than random (AUC = 0.59, MCC = 0.11). Models built using variables from other filtering methods were slightly better, with best result (AUC = 0.66, MCC = 0.24) obtained by model using variables ranked by random forest importance. The models built for 6 months prediction, were better, all of them significantly better than random, with best models obtained using variables from MDFS2d ranking. The quality of the models improves significantly with increasing prediction horizon, which is shown in the left panels of Figs. 1 and 2. The best metrics, namely AU C = 0.82 and M CC = 0.50 were obtained for models built using variables from MDFS-2d ranking at 60 months predictive horizon. The quality of the models built with variables returned by four feature selection models is compared in the right panel of Figs. 1 and 2. Interestingly, both feature selection methods that take into account interactions between variables (RF ranking and MDFS-2D) result in models that are more stable then models build on single variable filters (t-test and MDFS-1D).
Fig. 2. Distribution of AUC from 10 repeats of 5-folds cross validation for different predictive horizons (left panel) and different feature selection algorithms for 60-months horizon (right panel).
616
4
W. Lesi´ nski et al.
Summary and Conclusion
It was established that different descriptive variables are most relevant for different horizons of predictions of clinical end-point of prostate cancer. The good quality of predictions confirms that selected variables are indeed relevant, even though in some cases the feature selection algorithm did not confirm their relevance. The set of 19 variables was identified that are relevant for prognosis of clinical endpoint of metastatic castration-resistant prostate cancer. This set contains all variables that were used in the Halabi et al. [3], used as reference in the DREAM challenge. One should stress that entire analysis was performed with the help of robust machine learning protocol. In particular all modelling was performed within cross-validation scheme. Reported metrics of the ML models are best estimates for performance on new data drawn from the same distribution. The results presented here are not directly comparable with results obtained within the DREAM Challenge, [7,8] due to differences in methodology. In particular, the results of the DREAM Challenge were reported for a single split between training and test set, the test set was significantly different than the training set and the results were reported for a subset of the predictive horizons used in the current study. Nevertheless, the results reported in [7,8] are within the bounds of variance established within the current study.
References 1. Jemal, A., et al.: CA: Cancer J. Clin. 61(2), 69 (2011) 2. Halabi, S., et al.: J. Clin. Oncol.: Official J. Am. Soc. Clin. Oncol. 21, 1232 (2003). https://doi.org/10.1200/JCO.2003.06.100 3. Halabi, S., et al.: J. Clin. Oncol.: Official J. Am. Soc. Clin. Oncol. 32 (2014). https://doi.org/10.1200/JCO.2013.52.3696 4. Smaletz, O., et al.: J. Clin. Oncol.: Official J. Am. Soc. Clin. Oncol. 20, 3972 (2002) 5. Armstrong, A.J., et al.: Clin. Cancer Res. 13(21), 6396 (2007) 6. Okser, S., et al.: PLoS Genet. 10(11), 1 (2014) 7. Guinney, J., et al.: Lancet Oncol. 18, 132 (2017). https://doi.org/10.1016/S14702045(16)30560-5 8. Project data sphere: prostate cancer dream challenge. https://www.synapse. org/#!Synapse:syn2813558/wiki/ 9. Costello, J., Stolovitzky, G.: Clin. Pharmacol. Ther. 93 (2013). https://doi.org/ 10.1038/clpt.2013.36 10. Seyednasrollah, F., et al.: A DREAM challenge to build prediction models for short-term discontinuation of docetaxel in metastatic castration-resistant prostate cancer patients. JCO Clin. Cancer Inf. 1 (2017). https://doi.org/10.1200/CCI.17. 00018 11. He, H., Garcia, E.A.: IEEE Trans. Knowl. Data Eng. 21(9), 1263 (2009) 12. Matthews, B.: Biochim. Biophys. Acta 405(2), 442 (1975) 13. Chicco, D.: BioData Min. 10, 35 (2017). https://doi.org/10.1186/s13040-017-0155-3 14. Welch, B.L.: Biometrika 34(1/2), 28 (1947) 15. Mnich, K., Rudnicki, W.R.: arXiv preprint arXiv:1705.05756 (2017)
Variables Relevant for MCRPC Survival 16. 17. 18. 19.
Piliszek, R., et al.: R J. 11(1), 198 (2019) Breiman, L.: Mach. Learn. 45, 5 (2001) Fern´ andez-Delgado, M., et al.: J. Mach. Learn. Res. 15(1), 3133 (2014) Breiman, L.: Machine learning, pp. 123–140 (1994)
617
Human-Computer Interaction
Digital Technologies Acceptance/Adoption Modeling Respecting Age Factor Egils Ginters1,2(&) 1
Riga Technical University, Daugavgrivas Street 2-434, Riga 1048, Latvia Sociotechnical Systems OU, Sakala Street 7-2, 10141 Tallinn, Estonia [email protected]
2
Abstract. New technologies are needed to ensure a targeted development of the economy which measures the well-being of the population and the quality of life. For the technology to be successfully implemented, it must first be accepted, but then adopted. It’s not always the same thing. The aim of the study is to identify age groups that are prospective and responsive to the introduction of new digital technologies. An Acceptance/Adoption/Age (3A) model based on systems dynamics simulation is being examined. The model respects the age group of the potential adopter and allows for predicting possible results of the introduction of new digital technologies. Keywords: Digital technologies model
Acceptance and adoption simulation 3A
1 Introduction The population on the planet is growing steadily every year. At the same time, human life expectancy is increasing, but this is due to better household conditions and medical care. Biological changes of the human organism are not possible in such a short time, and therefore the natural ageing of the body has a significant impact on the workability. Memory, vision and hearing worsening that makes it difficult to see information, to learn new knowledge, and to maintain/develop communication capacity. Natural aging causes movement disorders. The ageing of the workforce leads to a lack of skilled work force in economy. Estimates [1] show that in 2030 the EU more than half of a qualified workforce will have of more than 50–55 years. If we respect the fact that the well-being of civilisation is mainly driven by the introduction and application of new technologies, then keeping a skilled labour force on the market as long as possible is critical for economic development. The planet needs to be able to sustain an increasing population that is impossible without the intensive use of modern technologies. Digital technologies (mobile telecommunications, data processing devices, artificial intelligence, social networks, virtual and augmented reality technologies, computerized bio-implants and biochips, e-health etc.) are one of the options for maintaining and improving the ageing-related workability and quality of life. A prerequisite for successful business is the eligibility of demand for a supply that cannot be accompanied by statutory methods of creating violent artificial demand in a © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 621–630, 2020. https://doi.org/10.1007/978-3-030-45691-7_58
622
E. Ginters
democratic society. Therefore, the issue of the study in question relates to the digital technologies acceptance/adoption factors by human at different stages of the life. The modern society is sociotechnical where every moment in our live is inevitably related to the presence of technology and their impact on the environment and society. In turn, society and the environment define and influence trends for technological developments. Technological developments and expectations are well characterized by the Gartner Hype Cycle for Emerging Technologies [2], which mainly consists of two distinct steps: a waiting phase that shows how long it is to wait for the society to accept the technology to become widely applicable, as well as a phase of technology use. The curve shows that there are a series of revolutionary solutions that have been waiting patiently for their turn in the queue for more than 10 years, such as exoskeleton, quantum computing etc., but despite this the society for adoption isn’t ready. In general, the Gartner curve also indirectly describes the sustainability of technology, that is, to use technology, it must first be accepted by the society, and only then can it be debated how long this technology will be competitive and usable. The Gartner curve describes the adoption of technology in general, where waiting time for adoption or rejection are not predicted, but only the fact that technology is accepted is identified. The society is more interested in whether the technology proposed could be accepted at all, and could this acceptance fact ever be found? Nevertheless the author is more interested in adopting or rejecting technology related to belonging to specific age groups. Currently, the theme of Digital Transformation is topical [3]. As development is spiral, hopefully it is upward, and it is progress, not the other way around. However, let us not forget similar trends about 50 years ago. At that time, however, the term Automatization was used, which implied the introduction of various types of automated systems in society and the economy. At that time, several major errors were made, such as the automatization of ongoing processes, rather than the development of new computer-based management schemes to eliminate critical outlines and prevent unnecessary waste of resources. As a result, such coarse automation provided virtually no advantages rather it made the work of businesses and offices more difficult. There is now a similar initiative called Digital Transformation, and hopefully it will not climb on the same rake. Another of the mistakes of that time was the underestimation of the acceptance and sustainability of digital technologies. If technology is introduced by force, it will never work successfully. It also makes no sense to introduce technology with minimal sustainability, as it also has to cover the costs of implementation and adaptation. Thus, a precondition for successful digital transformation is a thorough and detailed assessment of the sustainability of the digital technologies being introduced, which for objective reasons was not possible 50 years ago. There are a number of theories and models that describe acceptance and sustainability of technology, such as Expectation-Confirmation theory, Motivational model, Technology-Organization-Environment Framework etc. [4]. The Technology Acceptance Model (TAM) [5] and Unified Theory of Acceptance and Use of Technology (UTAUT) may be mentioned as ones of the most commonly used [6]. The Technology Acceptance Model (TAM) [5] explores the causal relationship between the perceived usefulness of technology and ease of use. Both notions are the
Digital Technologies Acceptance/Adoption Modeling
623
independent variables that determine the values of the dependent variable - behavioural intention to use. The measurements are based on about 10 questions about the benefits to the user of the technology, how easy they think it will be and whether they are going to use it. Unified Theory of Acceptance and Use of Technology (UTAUT) [6] is one of the most popular methodologies for evaluating technology acceptance, based on a largescale set of questions that are asked in interviews. The UTAUT methodology combines the results of eight earlier studies and analytical approaches, including TAM, and defines criteria to justify a possible decision by users on the adoption or rejection of technology. The following four criteria are: performance expectancy, the user’s belief that the use of technology will improve his capabilities; effort expectancy – ease of use, simplicity, clarity; social influence – the views of other potential users on the use of technology; facilitating conditions – is characterised by the availability of the infrastructure needed to operate the technology. The UTAUT model mainly tries to explain how individual user differences affect the use of technology, considered to be influenced by age, gender and experience. One of the UTAUT problems is the combining of these different impact factors into a single model, as well as the inclusion of the impact of the gender factor. However, the main problem of the use of UTAUT is the need for a significant amount of interviews, which makes UTAUT inappropriate to assess smaller projects. Some of the shortcomings of the aforementioned theories can be corrected by Rogers’ theory of innovation diffusion [7, 8], which reduces the amount of information required for decision making and the laboriousness of processing. One of the most recent methodologies for technology acceptance and sustainability forecasting is the Integrated Acceptance and Sustainability Assessment Model (IASAM) [9], which has been developed over the past 10 years as a tool for assessing the sustainability of new and/or existing digital technologies. IASAM is based on the systems dynamics simulation and Skype reference curve use. The methodology serves to self-evaluate the projects by measuring their sustainability index in skypes. This makes the result easier to catch and understand. The approach is based on the interoperability simulation of the four flows (Quality, Management, Acceptance and Domain influence) and the impact on sustainability of the technology assessed. The diffusion theory of innovation is used to assess Acceptance. IASAM allows for a quantitative comparison of different projects, which reduces subjectivity and excludes voluntarily decision making. Unfortunately, this benefit can sometimes hinder the wider adoption of IASAM. The above methodologies only meet the necessary condition, i.e. acceptance, but do not assess the sufficient condition - adoption, which determines the user’s ability to use the technology. The author of the article believes that there could be a different approach to evaluating acceptance/adoption technology, while respecting the age group belonging to the user. Financial capabilities, ability to perceive and solve complex problems, sufficient time, and the need/demand to use technology are integral attributes of every age group. The above attributes are basic elements of digital technologies Acceptance/Adoption/Age (3A) model respecting age factor.
624
E. Ginters
2 Welfare Influence on Technologies Acceptance and Adoption The potential user may like digital technology, accept it, but cannot buy and use it successfully due to financial constraints. Well-being usually depends on the age of the user. Welfare trend curve (see Fig. 1) describes the financial opportunities of a potential digital technology adopter over a lifetime as a percentage.
Fig. 1. Welfare trend curve.
In the wake of an interest in technology, purchasing them in childhood and school years depends on the well-being of parents. The studies started in youth reduce the amount of funding available for interest purposes. However funds can be increased by additional work during the years of study. Nevertheless, this additional work may have a bad impact on the quality of studies. The next phase of career-building provides a steady increase in funding, the best results of which are achieved in middle-aged period, when the user already holds better-paid posts, also some provisions have been made. At this time, however, there is already an inherent concern for children, as well as financial benefits are reduced by credit payments. Unfortunately, middle-aged people are already experiencing the first significant expenses of restoring health to maintain quality of life. Getting older the individual’s performance abilities deteriorate, increase competition in the labour market and the risk of unemployment. Step by step, financial welfare is declining as it becomes dependent on earlier investments in pension capital and targeted stocks.
Digital Technologies Acceptance/Adoption Modeling
625
3 Demand Factor Impact in Technologies Use The basic issue is the technology spotlight that is whether it is needed by the user, and what factors lead to the desire to use that technology. These factors are both subjective and objective (see Fig. 2).
Fig. 2. Demand trend in technology acceptance and adoption.
Childhood and youth are affected by the desire to self-certify, to be fit and even to be a leader in a certain social group. If the social group is dominated by the view that the use of the latest digital technologies is stylish, then the individual’s desire to use these technologies is decisive. This is a subjective factor, because really this technology is not needed by an individual, at least not the latest and most modern. Objectively, technologies demand is rising during studies, when the highest point is reached. Technologies demand remained stable during the career development phase and decreased when business becomes stable. In future, objective requirements for the use of new technologies are reduced and limited to the use of technologies that are necessary every day. It would be strange for a middle-aged individual to promote a proprietary latest model iPhone as an excellence. However, increasing impacts of natural ageing factors, such as deteriorating vision, hearing and memory, disturbance of movement and etc., reveal objective requirements for digital technologies use. Focused use of digital technologies would improve communication skills, perception, prevent disability and maintain quality of life. As the individual ages, the role of these objective factors becomes more important.
626
E. Ginters
4 Complexity Resolving Ability - Important Factor for Technology Acceptance In this respect, the potential user of technology is considered to have no specific knowledge and special training for the use of technology, nor are such skills required. There are no skills at birth, only instincts, but every individual learns and does so every day for the rest of their lives. Particularly rapidly, complexity solving ability is increasing in childhood, which has a sponge effect. The skills and capabilities to deal with complex issues are growing every day. They grow especially fast when someone teaches, that is, during studies (see Fig. 3).
Fig. 3. Complexity resolving ability trend.
For a while, these abilities remain, which is the effect of earlier studies. However, during careers, in most cases, work is no longer so diverse, which diminishes those abilities. Fundamentally new technologies are emerging, whose operational principles are no longer so understandable, since their features can no longer be explained by the knowledge gained during studies. Later, objective problems associated with the body’s natural ageing and deteriorating health begin. The conservatism of beliefs also appears, and the fear of looking incompetent. Increasing age will reduce social communication opportunities as a group of friends and acquaintances belonging to an individual are shrinking. The number of potential advisers to discuss technology use problems is declining. It is possible that fundamentally entirely new technologies are emerging again, which are no longer fully understood.
Digital Technologies Acceptance/Adoption Modeling
627
5 Busyness as a Barrier to Technologies Adoption It is possible that an individual wants to use digital technology and can buy it, and also understands how to use it properly, but does the individual have enough time for it? The time factor is one of the most significant, and its adequacy trove is depicted in Fig. 4.
Fig. 4. Time availability for technology use.
There is a lot of spare time in childhood and adolescence due to a lack of burden of mandatory duties. However, when starting to learn and later also studying, free time decreases rapidly. Even more spare time is missing at the beginning of a career when an individual has to prove himself, and later when to think about retention the job. In the years of maturity, the lack of time is a concern for the young generation. The amount of spare time increases before retirement at an age when job responsibilities are declining, while children have already grown. Loans previously taken have been paid, and work is gradually being phased out. In retirement age, leisure is getting even too much.
6 Digital Technologies Acceptance/Adoption 3A Model Design The four most important factors determining acceptance/adoption of digital technologies are discussed above, and in particular: the existence of sufficient financial resources, the ability to address complex challenges, technology demand and the adequacy of time resources.
628
E. Ginters
Digital technologies Acceptance/Adoption/Age (3A) concept model design is based on systems dynamic simulation for interoperability and influence analysis. These four factors can be represented as flows affecting the Acceptance/Adoption trend resource (see Fig. 5).
Fig. 5. 3A concept model simulation.
The analysis allows designing an Acceptance/Adoption trend curve, according to the age of the individual. Systems dynamic simulation environment STELLA [10] is used for modelling, however any other tools can be used. Feedback Noname 1 (see Fig. 5) is added only for service purposes in order to stabilise the resource and shall not be attributable to the result of the study. There are a variety of theories and opinions at which stages a human life cycle can be breakdown. There are authors [11] who divide human life into 12 different stages, believing that it begins with the Prebirth period, followed by the Birth event. Childhood is divided into 4 stages (Infancy (0–3), Early Childhood (3–6), Middle Childhood (6–8) and Late Childhood (9–11)). Next comes the turbulent Adolescence period (12–20). Adult time is also divided into 4 periods (Early Adulthood (20–35), Midlife (35–50), Mature Adulthood (50–80) and Late Adulthood (80+)). And the last event is Death and Dying. Such a detailed breakdown is not required for this specific task. The author offers his subjective opinion considering that the life for modelling needs could be better matched by the breakdown of 5 steps: till 25 Youth I, 25–35 Youth II, 35–55 Maturity I, 55–70 Maturity II and 70–90+Senior. The resulting digital technologies acceptance/adoption curve depicts a trend depending on an individual’s age. The results show that the most likely segment for the introduction of digital technologies is a 30–45 year old audience, while investments in the age group up to the age of 20 may be eligible to provide a basis for the development of the next user group. The orientation of digital technologies to an age group of 50–65 does not bode well. On the other hand, the next age segment, which is usually neglected, can produce quite surprising results, because acceptance/adoption trend is positive.
Digital Technologies Acceptance/Adoption Modeling
629
7 Conclusion In order to introduce new digital technology, the acceptance of the potential user audience is first required that the technology is good and useful. However, this is just the necessary, but not the sufficient condition. For technology to be adopted, the potential user must be sufficiently financially supplied, be able to overcome the complexity of technology, it must indeed be desirable to use this technology, and the user must have sufficient spare time. The Acceptance/Adoption/Age (3A) conceptual model proposed by the author enables the simulation of digital technologies acceptance/adoption trend line respecting age group. Simulation is used as the basic research method. One might ask why? The operation of a technical system is determined by a limited set of relevant factors, since these systems are the result of human activity and serve a narrow and specific purpose. Technical systems are closed, they work according to certain algorithms, even if they are partially stochastic, in any case the next state of the system can be calculated with a certain probability. Performance measurement scales for technical systems are standardized in unitary measurement systems. This means that technical systems can be described with analytical solutions and produce accurate and quantifiable results. The social system is open. It is characterized by a large number of significant and stochastic influences, many of which are not quantifiable. Therefore, the use of analytical models in the study of social systems and obtaining accurate results is a rare phenomenon. The purpose of modelling a social system is to understand the trend, not to get an accurate and quantified result. That is why simulation is used instead of analytical equations and in this case system dynamics simulation is applied. As in any objective reality representation (model designing) the basic issue is the initial data gathering. Model 3A is based on the author’s 48 random acquaintances’ responses. Data confidence was statistically estimated using nonparametric methods, and more specifically, the Kolmogorov-Smirnov test. Respecting 95% confidence, the results were found to correspond to the theoretical normal statistical distribution. This means that the model is adequate only for these 48 acquaintances. According to Slovin’s formula [12], in order to be considered adequate, model 3A would require responses from about 400 randomly chosen people on the planet. Although the small size of the audience to be interviewed seems wholly strange, no author has yet convincingly demonstrated that the Slovin’s formula would yield the wrong result if the model is used to identify a trend, which is the main task of this study. In the further phases of the study, it would be desirable to perform calibration and validation of 3A model by comparing the results obtained with the findings of other authors. The further development of 3A Model will be related with the designing of web technology, which would allow the expert to vary with the weight of the relevant factors in the overall solution to ensure that the curve is adapted to specific sociocultural and historical contexts in different societies. Acknowledgements. The article publication is initiated by EC FLAG-ERA project FuturICT 2.0 (2017-2020) “Large scale experiments and simulations for the second generation of FuturICT”.
630
E. Ginters
References 1. Ginters, E., Puspurs, M., Griscenko, I., Dumburs, D.: Conceptual model of augmented reality use for improving the perception skills of seniors. In: Bruzzone, A.G., Ginters, E., González Mendívil, E., Gutierrez, J.M., Longo, F. (eds.) Proceedings of the International Conference of the Virtual and Augmented Reality in Education (VARE 2018), Rende (CS), pp. 128–139 (2018). ISBN 978-88-85741-20-1 2. Panetta, K.: 5 Trends appear on the Gartner Hype Cycle for emerging technologies 2019. Smarter with Gartner. https://www.gartner.com/smarterwithgartner/5-trends-appear-on-thegartner-hype-cycle-for-emerging-technologies-2019/. Accessed 12 Oct 2019 3. Reis, J., Amorim, M., Melão, N., Matos, P.: Digital transformation: a literature review and guidelines for future research. In: In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies. WorldCIST 2018, pp. 411– 421. Springer, Cham (2018) 4. Taherdoost, H.: A review of technology acceptance and adoption models and theories. Procedia Manuf. 22, 960–967 (2018). https://doi.org/10.1016/j.promfg.2018.03.137 5. Turner, M., Kitchenham, B., Brereton, P., Charters, S., Budgen, D.: Does the technology acceptance model predict actual usage? A systematic literature review. Inf. Softw. Technol. 52, 463–479 (2010). https://doi.org/10.1016/j.infsof.2009.11.005 6. Venkatesh, V., Morris, M., Davis, G.B., Davis, F.D.: User acceptance of information technology: toward a unified view. MIS Quart. 27(3), 425–478 (2003). https://doi.org/10. 2307/30036540 7. Rogers, E.M.: Diffusion of Innovations, 5th edn. Simon and Schuster, New York (2003) 8. Hameed, M.A., Counsell, S., Swift, S.: A Conceptual model for the process of IT innovation adoption in organizations. J. Eng. Tech. Manage. 29(3), 358–390 (2012). https://doi.org/10. 1016/j.jengtecman.2012.03.007 9. Aizstrauta, D., Ginters, E.: Using market data of technologies to build a dynamic integrated acceptance and sustainability assessment model. Procedia Comput. Sci. 104, 501–508 (2017). https://doi.org/10.1016/j.procs.2017.01.165 10. ISEE Systems. Stella Architect: Premium modeling and interactive simulations. https:// www.iseesystems.com/store/products/stella-architect.aspx. Accessed 12 Oct 2019 11. Armstrong, T.: The Human Odyssey: Navigating the Twelve Stages of Life. Ixia Press, Calabasas (2019) 12. Ryan, T.P.: Sample Size Determination and Power. Probability and Statistics. Wiley, Somerset (2013)
Location-Based Games as Interfaces for Collecting User Data Sampsa Rauti(B) and Samuli Laato University of Turku, Turku, Finland [email protected]
Abstract. Location-based games (LBGs) are becoming increasingly popular. These games use player’s physical location as a game mechanic, and many of the games are played in real time. This study investigates the affordances that three popular LBGs, Ingress, Pok´emon GO and The Walking Dead: Our World, provide for users to collect location data from other players. To this end, the game mechanics of the games and the end user privacy policies are analyzed and compared together. The results reveal several privacy concerns which are not currently adequately addressed in the privacy policies of the games. As LBGs are becoming increasingly complex, the risk of unwanted player data collection opportunities rises, combating which is predicted to be a major design challenge for LBG developers in the future.
Keywords: Location-based games data · Data leakage
1
· Privacy · Time-stamped location
Introduction
A location-based game (LBG) is a game in which the gameplay revolves around and is affected by the player’s physical location. Today, LBGs are usually played with mobile devices, and, location-based mobile games such as Pok´emon GO and Ingress have reached high popularity. By making use of augmented reality and encouraging players to move, LGBs can provide several benefits for the players, such as inspiring to learn about their environment [22] and increasing physical exercise [2]. When the LBG also has social features prompting players to work together or to compete with each other, social relationships between players can be created and maintained [8,28]. However, these kinds of pervasive games blending with the physical world and promoting mutual interaction between players, also have their drawbacks. One potential downside pertains to the player’s privacy and consequently their security. The sensitivity of timestamped location-data has been discussed in academia recently [3,12,15]. Time-stamped location data can reveal a lot about users’ movement and actions. LBGs have been claimed to collect this information in exchange for allowing players to play their game [11], but there is also a risk c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 631–642, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_59
632
S. Rauti and S. Laato
that end users utilize the game as an interface to spy on other players. Yet, in a recent study of Pok´emon GO players, it was found that privacy concerns do not negatively correlate with intentions to keep playing the game [10]. There are at least three possible explanations for this: (1) The users are not aware or do not fully realize the extent of personal information they are giving away while playing [1], (2) Those concerned with their privacy do not begin to play in the first place and thus do not respond to such a questionnaire or (3) Users are aware they are giving away their privacy but do not care [1,13]. The last option might be the most logical explanation, as the mobile phone has so many apps collecting sensitive information all the time that keeping track of them all is impossible, and thus it is easier to just resign the attempt to maintain any privacy [13]. It has also been suggested that instead of seeing the disclosure of location information as a privacy threat, it can actually give the player a feeling of controlling one’s surrounding area and a sense of security [6]. The aim of this study is to investigate the affordances that currently popular LBGs provide for other users on collecting location-related sensitive data from players. To identify the affordances for players to spy on their peers, the game interfaces and all shared data is observed in three selected case games. The privacy policies of the games are also read with the purpose of identifying how they take privacy concerns related to the users location into account. The findings of this study are expected to bring transparency to the possibilities of leaking personal time-stamped location information to the public through LBGs and what potential harm this may bring to players. The rest of the paper is organized as follows. Section 2 discusses the research design, defines important terms, introduces the case games of the study and presents the research methods. Section 3 discusses the results, describing the affordances the studied case games have for data collection by other players. An analysis of privacy policies of these games is also provided and the found affordances are compared with the statements made in the privacy policies. Section 4 discusses the possible ways the acquired location-based data can be used, implications for society and limitations of the study. Finally, Sect. 5 concludes the paper and suggest topics for future work.
2
Research Design
Previous studies on privacy in LBGs have focused mainly on how the developer of the games can collect sensitive information from the players (e.g. [11]). Figure 1 illustrates that this is only one side of the privacy concerns in LBGs. Thus, we supplement the previous studies by focusing on how third parties, or players themselves, can use these games as interfaces for sensitive data collection from other players. We sort these risks into two categories as shown in Fig. 1: intended (controlled) and unintended. For the purpose of discovering data collection opportunities in LBGs for 3rd parties, we look at the game mechanics of three popular and unique LBGs Pok´emon GO, Ingress and The Walking Dead: Our World.
Location-Based Games as Data Collection Interfaces
633
Fig. 1. A high-level categorization of the privacy risks in location-based games.
2.1
Definition of Terms
The definition of game mechanics by Sicart [24] being methods invoked by agents for interacting with the game world is adopted. All three games contain virtual points of interest (PoIs) which are located in geographical locations in the real world [17]. Travelling to these PoIs and moving around to find new ones can be regarded as the main gameplay. Also a definition for the term affordance is required. The word affordance at its core describes the physical interaction between objects and people [21]. The inventor of the word, Gibson, originally limited the meaning of the world only to real, existing properties but later Norman expanded the meaning to also cover perceived properties [18]. In this study we limit the analysis to the domain of real empirically verifiable affordances as defined by Gibson [18] as we observe LBG game mechanics to identify real opportunities for collecting location data from players. 2.2
Case Games
Of the three case games, Ingress is the oldest being released for beta in 2012 and fully in 2013. The game focuses on navigating to PoIs called portals, capturing them and linking them together to form triangles. Ingress monetized itself by crowdsourcing the development of a global database of geographical virtual PoIs [16], which proved to be a good strategy as the same PoI database was later used in the smash hit Pok´emon GO [17]. Pok´emon GO was released in summer 2016 and is currently regarded as the most popular LBG of all time in terms of number of installs, total revenue created and the number of active players. In the game the user is tasked to move around to find pok´emon spawns which can be captured and collected via completing a ball throwing minigame. Since its release the game has been updated heavily and many new features have been added. The final game, The Walking Dead: Our World, is different from the two previous games in the way that its PoIs are not based on real world objects [17]. However, gameplay-wise it is similar, as users move around to find PoIs and clicking them initiates a minigame which allows players to obtain rewards. 2.3
Methods
The game mechanics [24] of the case LBGs are analysed and used to define affordances for user-data collection, focusing specifically on how other players
634
S. Rauti and S. Laato
can harness the multiplayer elements of LBGs to collect sensitive location-related information from other players. Identified scenarios of how location-related data (both intended and unintended) from users can be retrieved and used by other players are presented. Previous studies looking at privacy policies have almost explicitly found them to be difficult for users to understand [29]. However, in many cases users merely glance through the policy without reading it carefully [25]. Studies have found privacy policies to be sometimes intentionally vague and have developed frameworks for identifying these spots [23]. We study the privacy policies of the case games to identify and record all paragraphs where sharing location based data with other users is discussed. Loosely based on the approach of Shvartzshnaider et al. [23], these paragraphs are then analyzed to find vague points in the text where the privacy concerns are not addressed in sufficient detail. The affordances of game mechanics to collect user data and the information retrieved from the privacy policies are then compared.
3 3.1
Results Affordances of LBG Game Mechanics for Data Collection by Other Players
Pok´ emon GO. Since its initial release in 2016, Pok´emon GO has been updated over a hundred times with several new game mechanics and content being introduced, making it a complex game. Thus, the opportunities the game provides for following its players’ locations are many-fold. This section presents the game mechanics which were found to, at least in some way, reveal sensitive or private information about players’ whereabouts to others. In Pok´emon GO, gyms are PoIs scattered throughout the map where trainers battle the pok´emon of opposing teams for control over the gyms. Looking at the information in gyms is the most obvious way for the players to get other players’ location information. In every gym, the players currently holding the gym and the time they visited the gym are displayed. This information can be used to track players’ movements as seen in Fig. 2. Lures are another mechanism that discloses the location of the player who deploys them. A lure module is attached to a pok´estop (a type of PoI) and it attracts pok´emon for 30 min. During this time, other players can see the name of the player who has activated the lure module. The player can also usually be quite easily found in the real world location of the pok´estop, which makes it possible to connect the player’s nickname to the real person. Raids, in which trainers fight a powerful pok´emon together, also offer a possibility to find out more about the real people behind nicknames. In the raid lobby, a player can see nicknames of other players preparing for the raid, as well as three statistics of the players: battles won, pok´emon caught and distance walked. Because raids gather people together in the physical world, trainer nicknames can be connected with real persons in a manner similar to lure locations.
Location-Based Games as Data Collection Interfaces
635
Fig. 2. The Gyms in Pok´emon GO display the unique player name and deployment time, thus providing time-stamped location data where the player was at the given time.
Befriending people in-game discloses even more information about a player. Gathering new friends is not only possible in real world but also online simply by exchanging player codes, which facilitates befriending strangers and potentially collecting information about them. Therefore, a malicious actor could distribute his or her player code with the aim of gathering sensitive data. Pok´emon GO includes gifts that can be exchanged between friends. When opening a gift, the location of the PoI from which the sender who picked up the gift is revealed. This way, friends can obtain information on the player’s location and the routes they usually take. This can be done from the other side of the world, and an adversary can find out the exact locations frequently visited by the player by gathering pok´estop data from gifts, and then using a tool such as Ingress Intel Map [19], that displays the PoIs and their locations in a certain area, to find where exactly these pok´estops are located. It is also worth noting that as there is lots of contextual information (such as a PoI title, an image and a description) associated with each pok´estop player has visited, the adversary can also derive additional information from this data. For example, the location of the player’s home and workplace could be revealed. Friends can see which pok´emon the player has caught last, as well as battles won, distance walked and pok´emon caught. All this information is updated in real
636
S. Rauti and S. Laato
time, so the friends can see when the player is playing the game. The recently captured pok´emon and their types might give the adversary clues about the player’s location. For example, if a player catches a pok´emon that can only be obtained from a raid at certain time, the adversary can deduce where the raid has taken place in player’s home area. There are also regional pok´emon in the game, and seeing these in the recently caught pok´emon list can reveal that the player is currently travelling. However, Pok´emon GO currently allows players to disable the feature which shows friends their recently caught pok´emon. The Walking Dead: Our World. In the Walking Dead: Our World, players currently have only a single method of revealing their location to everyone, that is, to place houses on the map or place survivors in them. Even so, there is no time-stamp to these actions, making it really difficult to collect data from other players whereabouts. The game does, however, contain social features which are currently limited to the players group. Joining a group is voluntary, and sharing information in the group is also voluntary. If the player so chooses, they can reveal their location to others via placing a flare or revealing their location in the chat. However, no other forms of revealing ones location currently exists in the game. Ingress Prime. Ingress, sometimes also referred to as Ingress Prime, differs from the other two games in that it openly broadcasts almost all player actions to every other player via a communication channel called COMM. These actions are time-stamped and the location of the actions is also provided, meaning players are able to follow each other effectively. As previously mentioned, in Ingress, players are tasked to travel to PoIs called portals, capture them and link them together to create fields. There are currently two teams in the game and destroying portals, links and fields from the enemy team is a major part of the game as well. Players can also deploy items called frackers on portals to double the amount of items they receive from hacking. Players’ actions can be visualised using official tools provided by Niantic such as the Ingress Intel Map [19] or unofficial community developed tools such as Ingress Intel Total Conversion [7]. The following player actions are visible to others directly through the communication channel of the game: (1) capturing a portal, (2) creating a link, (3) creating a field, (4) destroying a portal, (5) destroying a link, (6) destroying a field, (7) any messages sent in COMM (unless private) and (8) placing a fracker on a portal. On top of these, players can indirectly follow each others movement by, for example, observing how their AP (Actions Points, measures players progress and is rewarded from almost all in-game actions) increases which reveals charging or hacking portals, or by looking at XM (Exotic Matter, disappears from the ground when a player collects it) in the ground which reveals player’s movement. Because players’ actions are publicly visible to others, players might be more consciously aware of it and avoid playing a certain way or playing at all when they do not wish to reveal their location.
Location-Based Games as Data Collection Interfaces
3.2
637
Analysis of Privacy Policies
Niantic’s privacy policy [20] contains a paragraph describing the data shared with other players. The privacy policy states that “when you use the Services, and in particular when you play our games, use social features within those games, or take part in live events, we will share certain Personal Data with other players.” The concept of personal data is further elaborated as follows: “This Personal Data includes your in-game profile (such as your username, your avatar, and your team), your in-game actions and achievements, the real-world location of gameplay resources you interacted with when playing the games (for example Pok´eStops within Pok´emon GO, Fortresses within Harry Potter: Wizards Unite, or Portals within Ingress), and your public in-game messages.” The privacy policy also gives some additional game-specific information about the data shared to other players: “When you take certain actions in Pok´emon GO and capture a Gym, your (or your authorized Child’s) username will be shared publicly through the game, including with other players, in connection with that Gym location.” This statement is vague, as the meaning of “certain actions” is left completely unexplained. It also leaves out many features currently implemented in the game such as lures, raids and all the information friends of the player can see. In Ingress, the game discloses “your in-game username, messages sent to other users in COMMS, and in-game portals that you interact with, as well as your device location whenever you take an in-game action.” Niantic’s privacy policy is vague at many points. While the text is detailed at times, such as when explaining the data shared to others in a gym, the description is mostly not that comprehensive, failing to define “in-game actions” and “achievements” specifically, for example. The significance of time (e.g. the fact that it is often possible to see when the players visited a specific PoI) and the contextual information (e.g. in the PoI descriptions) is also ignored in the text. Only after playing the games and learning their core mechanics can a player fully appreciate what the actions taken in the game mean in terms of privacy. When it comes to data shared to other player’s, Next Games’s privacy policy [9] seems to be even more vague: “Social features are a core component of our games. Other players and users may, for example, see your profile data, in-game activities, leaderboard position and read the messages you have posted.” The text does not refer to location or time at all, although it does mention that in-game activities are shared. 3.3
Comparison Between Affordances for Data Collection and What the Privacy Policy Says
In Table 1 we have sorted the findings of previous sections into three categories: (1) time-stamped location data, (2) contextual data and (3) circumstantial evidence. The time-stamped location data is the most accurate and can be used to identify with certainty, assuming the player is playing with their own account,
638
S. Rauti and S. Laato
Table 1. Categorisation of what types of location data the case LBGs yield to other players Category
Examples of found affordances
Time-stamped location data Player actions in Comm (Ingress), Deployment times at gyms (Pok´emon GO), Flares (The Walking Dead) Contextual data
Gift locations and descriptions (Pok´emon GO), Disappearing XM (Ingress)
Circumstantial evidence
Capturing regional Pok´emon (Pok´emon GO), Deploying buildings (The Walking Dead), Increasing AP, acquired medals (Ingress), player profiles (All games)
where the player was at a given time. Contextual data and circumstantial evidence are less accurate, but when combined with other information, they can be used to derive information such as where the player lives or works or even the accurate location of the player. The main differences between the observed affordances for data collection and contents of privacy policies can be summarized in the following items: – Features. It appears privacy policies either completely omit the explanation of privacy implications of specific features of the studied games by using general terms such as “in-game actions”, or alternatively only mention few selected features (such as gyms in the case of Pok´emon GO). While it is understandable privacy policies covering several games aim to be generic, some pivotal affordances for collection of private data, such as friends in Pok´emon GO, should be better covered in privacy policies regarding their privacy implications. – Time. The studied privacy policies do not mention the temporal dimension at all. Both real-time observations (e.g. alerts in Ingress) and time-stamped data on the past events (e.g. time of placing a pok´emon in a gym) fall into this category. The fact that time-stamped location data could be extracted from the game could be explained better. – Contextual data. Niantic’s privacy policy mentions the contextual data sometimes delivered along with the player’s location. However, the implications of contextual data, such as the potential disclosure of player’s daily routine and places they frequently visit is not discussed in any way. The contextual information associated with PoIs helps the adversary to gather useful information even when they are not familiar with the environment where the player plays the game. – Combining pieces of information. While some LBGs such as Ingress readily broadcast alerts on the player actions in a certain area, some games such Pok´emon GO require more work from the adversary to combine all pieces of the puzzle and to profile a player’s movements and daily routine. Although it is not the purpose of privacy policies to instigate threat scenarios, more care
Location-Based Games as Data Collection Interfaces
639
should be taken to make players understand that the pieces of data in games can potentially be used against them. In complex systems, it may also be legally unclear who (companies or people) is responsible for potential abuse of acquired data.
4
Discussion
4.1
What Can Be Done with the Shared Location Data?
The easy accessibility of a player’s location information to other players can be employed for malicious or commercial purposes. For example, an adversary might use a horde of bots (automatic programs posing as players) to track the movements of players. This might not be as accurate as the game developers database on player locations and in-game actions, but it is still an effective way to track active players. Empirical evidence of this was obtained in 2017 when a whistleblower in Ingress revealed large scale use of a datascraping tool called Riot, which was used to spy on fellow player’s movements and actions [5]. APIs for accessing game data have been developed for other games besides Ingress1 too (e.g. Pok´emon GO23 ). At the end of 2019, Ingress Player Tracker database was widely advertized in Ingress communities. Purchasing access to the database allows tracking all players’ movements and activity. Thus, while Niantic, for example, only shares “anonymous data with third parties for industry and market analysis”, malicious parties might acquire unanonymized information without permission by using the game as an interface for data collection. Depending on the objectives of the adversary, this can be done to gather large amount of data on all players generally or to spy on specific players. The acquired location data could be used to identify where players move on their daily basis, when they are travelling, with which other players do they play together and so on. Simply revealing player’s home address or workplace location can be an issue, as demonstrated by, for example, swatting, that is, using a fake excuse to call the United States’ Special Weapons and Tactics Squad on a person’s home [14]. Besides illegal use of the data, it could be used to identify commercial opportunities and solve problems such as when and where to setup a food truck. 4.2
Implications for Society
Playing LBGs blends with everyday activities, thus increasing the value of personal location data acquired through the game. The ubiquitous presence of data collection in the digital society makes it impossible for individuals to be aware 1 2 3
https://github.com/IITC-CE/ingress-intel-total-conversion. pogo api: https://github.com/Grover-c13/PokeGOAPI-Java. https://github.com/rubenvereecken/pokemongo-api.
640
S. Rauti and S. Laato
of where everywhere their personal data is being held, despite legislation such as the European Unions GDPR [26] offering users the possibility to retrieve their personal data from companies. Still, legislation such as GDPR can help mitigate the negative impacts of known intended data collection. The problem with unintended data collection opportunities remains, and, therefore, companies providing open interfaces containing sensitive information such as location data should be careful about the affordances of such an interface for malicious purposes. More and more services are being focused to the mobile phone. Identification and banking services, communication and games are just few examples of what an increasingly large quantity of users are using their phones for. Gaining access to a modern person’s mobile phone essentially means getting hold of their entire life. Therefore, constantly donating personal location information for the world to see via playing LBGs can expose players as easy targets to burglars or even muggers [4]. While it is generally a positive thing to increase the quality of virtual PoIs in pervasive technologies such as LBGs [17,22], this can also increase the opportunities for identifying user movement patterns and propensity. A real life example is the removal of a pok´emon gym from Pentagon, as defense officials were concerned that playing Pok´emon GO inside the building could give away sensitive information such as room locations [27]. 4.3
Limitations
This study is limited by its scope of looking only at location-related sensitive data, instead of sensitive data more holistically. Also only three popular LBGs were looked at, and accordingly, our findings are tied to these games despite attempts to generalize the findings. We observed the current state of the games and privacy policies, however, both are subject to change and thus the results of this study depict the current state of affairs. Due to the technical complexity of the games, and the architecture of the games not being publicly available, room for speculation remains as to whether there are technical exploits or other unintended ways to obtain sensitive information from players which were not covered in this study.
5
Conclusions and Future Work
With our findings we confirmed the following: – Location-based multiplayer games can be used as interfaces to gather detailed information on location and movements of players. – Privacy policies provided by game development companies are often vague and lack details and information on the possibilities of other players or third parties to obtain the player’s private information. LBGs have been around for over 10 years, but information collection through them is still a relatively fresh topic. Minor cases of location-data abuse have been
Location-Based Games as Data Collection Interfaces
641
reported, however, currently there is no evidence that data from LBGs would have been systematically used for malicious purposes. Currently LBG players are not concerned with leaking sensitive information such as time-stamped location data online while playing [10]. This could change in the forthcoming years unless negative aspects of users location-data leakage can be mitigated. The findings of this study urge LBG developers to ensure their games cannot be used as interfaces for player location surveillance.
References 1. Acquisti, A., Brandimarte, L., Loewenstein, G.: Privacy and human behavior in the age of information. Science 347(6221), 509–514 (2015) 2. Althoff, T., White, R.W., Horvitz, E.: Influence of Pok´emon GO on physical activity: study and implications. J. Med. Internet Res. 18(12), e315 (2016) 3. Brown, J.W., Ohrimenko, O., Tamassia, R.: Haze: privacy-preserving real-time traffic statistics. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 540–543. ACM (2013) 4. D’Anastasio, C.: Pok´emon GO streamer mugged live on Twitch [UPDATE] (2016) 5. D’Anastasio, C.: Ingress players use unofficial tools to stalk one another (2017) 6. De Souza e Silva, A., Frith, J.: Location-based mobile games: interfaces to urban spaces. In: Frissen, V., Lammes, S., de Lange, M., de Mul, J., Raessens, J. (eds.) Playful Identities. Ludifizierung von Kultur (2015) 7. C Developed: Ingress Intel total conversion (2019). https://iitc.me/. Accessed 6 Nov 2019 8. Finco, M.D.: I play, you play and we play together: social interaction through the use of Pok´emon GO. In: Augmented Reality Games I, pp. 117–128. Springer (2019) 9. Next Games: Privacy policy (2019). https://www.nextgames.com/privacy-policy/. Accessed 11 Nov 2019 10. Hamari, J., Malik, A., Koski, J., Johri, A.: Uses and gratifications of Pok´emon GO: why do people play mobile location-based augmented reality games? Int. J. Hum.-Comput. Interact. 35(9), 804–819 (2019) 11. Hulsey, N., Reeves, J.: The gift that keeps on giving: Google, Ingress, and the gift of surveillance. Surveill. Soc. 12(3), 389–400 (2014) 12. Jedrzejczyk, L., Price, B.A., Bandara, A.K., Nuseibeh, B., Hall, W., Keynes, M.: I know what you did last summer: risks of location data leakage in mobile and social computing, pp. 1744–1986. Department of Computing, Faculty of Mathematics, Computing and Technology, The Open University (2009) 13. Kang, R., Dabbish, L., Fruchter, N., Kiesler, S.: “My data just goes everywhere:” user mental models of the internet and implications for privacy and security. In: Eleventh Symposium on Usable Privacy and Security ({SOUPS} 2015), pp. 39–52 (2015) 14. Karhulahti, V.M.: Prank, troll, gross and gore: performance issues in esport livestreaming. In: DiGRA/FDG (2016) 15. Kulik, L.: Privacy for real-time location-based services. SIGSPATIAL Spec. 1(2), 9–14 (2009) 16. Laato, S., Hyrynsalmi, S.M., Paloheimo, M.: Online multiplayer games for crowdsourcing the development of digital assets - the case of Ingress. In: Software Business - Proceedings of 10th International Conference (ICSOB 2019), Jyv¨ askyl¨ a, Finland, 18–20 November 2019, pp. 387–401 (2019)
642
S. Rauti and S. Laato
17. Laato, S., Pietarinen, T., Rauti, S., Laine, T.H.: Analysis of the quality of points of interest in the most popular location-based games. In: Proceedings of the 20th International Conference on Computer Systems and Technologies, pp. 153–160. ACM (2019) 18. Lee, J., Bang, J., Suh, H.: Identifying affordance features in virtual reality: how do virtual reality games reinforce user experience? In: International Conference on Augmented Cognition, pp. 383–394. Springer (2018) 19. Niantic: Ingress Intel map (2019). https://intel.ingress.com/intel. Accessed 6 Nov 2019 20. Niantic: Niantic privacy policy (2019). https://nianticlabs.com/privacy/. Accessed 11 Nov 2019 21. Norman, D.A.: Affordance, conventions, and design. Interactions 6(3), 38–43 (1999) 22. Oleksy, T., Wnuk, A.: Catch them all and increase your place attachment! The role of location-based augmented reality games in changing people-place relations. Comput. Hum. Behav. 76, 3–8 (2017) 23. Shvartzshnaider, Y., Apthorpe, N., Feamster, N., Nissenbaum, H.: Going against the (appropriate) flow: a contextual integrity approach to privacy policy analysis. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, pp. 162–170 (2019) 24. Sicart, M.: Defining game mechanics. Game Stud. 8(2), n (2008) 25. Steinfeld, N.: “I agree to the terms and conditions”: (how) do users read privacy policies online? An eye-tracking experiment. Comput. Hum. Behav. 55, 992–1000 (2016) 26. Tankard, C.: What the GDPR means for businesses. Netw. Secur. 2016(6), 5–8 (2016) 27. Thielman, S.: Pentagon’s Pok´emon orders: game must go (outside) for security reasons (2006). https://www.theguardian.com/technology/2016/aug/12/pentagonpokemon-go-restrictions-security-concerns. Accessed 11 Nov 2019 28. Vella, K., Johnson, D., Cheng, V.W.S., Davenport, T., Mitchell, J., Klarkowski, M., Phillips, C.: A sense of belonging: Pokemon GO and social connectedness. Games Cult. 14(6), 583–603 (2019) 29. Winkler, S., Zeadally, S.: Privacy policy analysis of popular web platforms. IEEE Technol. Soc. Mag. 35(2), 75–85 (2016)
Application of the ISO 9241-171 Standard and Usability Inspection Methods for the Evaluation of Assistive Technologies for Individuals with Visual Impairments Ana Carolina Oliveira Lima1 , Maria de Fátima Vieira2 Ana Isabel Martins1 , Lucilene Ferreira Mouzinho1, and Nelson Pacheco Rocha3(&)
,
1
2
Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal {ana.carolina.lima,anaisabelmartins, lucileneferreira}@ua.pt Department of Electric Energy, Federal University of Campina Grande, Campina Grande, Brazil [email protected] 3 Medical Sciences Department, University of Aveiro, 3810-193 Aveiro, Portugal [email protected]
Abstract. This paper describes a usability evaluation process, based on ISO 9242-171 standard and inspection methods, to assess features to improve accessibility to information services by individuals with visual impairments. A usability evaluation process supported on a consistency inspection protocol conformed to the checklists provided by the ISO 9242-171 standard together with heuristic evaluation was applied to assess the accessibility of screen readers typically used by individuals with visual impairments. The results show that the approach proved to be adequate for supporting the assessment of accessibility features of the considered screen readers. Keywords: Assistive technologies ISO 9241-171 standard Usability evaluation Inspection methods
Accessibility
1 Introduction Some people cannot see clearly or distinguish certain colors, and some cannot operate normal keyboards or mice. Access to information by people with physical disabilities requires different forms of interaction based on the use of, for example, voice. Information systems should be inclusive and universal, so they favor not only the disabled people, as well as all others, whether they have a limitation or not. Therefore, good usability has a huge impact in different information services, including multicultural applications, multiplatform software or multimedia services [1].
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 643–653, 2020. https://doi.org/10.1007/978-3-030-45691-7_60
644
A. C. O. Lima et al.
Accessibility refers to the ability to use and benefit from an entity and is used to describe the degree to which different entities can be used by as many people as possible, including people with physical, mental or sensory disabilities. Therefore, accessibility is often focused on people with disabilities and their rights to access products and services, namely through the use of assistive technologies [2]. Accessibility has become a major concern of information system’s developers. Therefore, it is required the systematization of good practices to evaluate the quality of products and systems to be used by particular users or communities. The present paper reports a study aiming to verify the adequacy of a usability evaluation process supported on a consistency inspection protocol conformed to the checklists provided by the ISO 9242-171 standard together with heuristic evaluation to assess the accessibility of screen readers typically used by individuals with visual impairment. In addition to this introductory section, the paper comprises four more sections: Background, Materials and Methods, Results, and Discussion and Conclusion.
2 Background The ISO 9241 standard [3] defines usability as the efficiency, effectiveness and satisfaction achieved through a product, during the execution of tasks by users. The ISO 1941-171 standard [4] is a complementary document to the ISO 9241 standard and provides ergonomic orientation for the project and development of accessible interactive systems, to be used in work, domestic, educational and public places, by any users, including individuals with physical, sensory or cognitive limitations. In particular, these standards should be used to support the design, development, application and evaluation of interactive information systems, as well as being a reference of their selection processes. There is a variety of usability evaluation methods for all stages of design and development, from product specification to final design adjustments [5]. Moreover, evaluation methods can be empirical (i.e., based on actual usage data) or analytical (i.e., based on examination of an interactive system and/or potential interactions with it). Empirical methods include test, inquiry or controlled experiments methods [6]. Test methods involve observing users while they perform predefined tasks, inquiry methods involve collecting qualitative data from users, and controlled experiments methods involve the application of the scientific method to test hypotheses with real users through the control of variables and use samples big enough to determine statistical significance [6]. In turn, analytical methods include inspection methods, which involve the participation of experts to assess the various aspects of the user interaction of a given system, namely: • Cognitive walkthrough - Cognitive walkthrough is a method that evaluates whether the order of tasks in a system reflect the way people cognitively process tasks and anticipate “next steps”. Designers and developers go step by step as a group, asking a set of questions in each step. After identifying issues that can be improved,
Application of the ISO 9241-171 Standard and Usability Inspection
645
specialists gather information in a report and then the application is redesigned to address the issues identified [5]. • Usability inspection - Usability inspection is a review of a system based on a set of guidelines. The review is conducted by a group of experts who are deeply familiar with the concepts of usability in design. The experts focus on a list of areas in design that have been shown to be troublesome for users [5]. • Consistency inspection - In consistency inspection, expert designers review products or projects to ensure consistency across multiple products to look if it does things in the same way as their own designs [5]. • Heuristic evaluation - Heuristic evaluation involves having a small set of evaluators examining the interface and using recognized usability principles named “heuristics”. As mentioned, heuristic evaluation relies on expert reviewers to discover usability problems and then categorize and rate them based in the heuristics (i.e., a set of commonly recognized principles). It is widely used due to its speed and costeffectiveness in terms of application [7, 8]. The heuristic includes: • Visibility of system status - The system should always keep users informed of what is happening, through appropriate and timely feedback. • Correspondence between the system and the real world - The system must use concepts and information must be presented in a logical and natural order. • User control and freedom - Users must have undo and redo support, particularly when they mistakenly select functions that lead the system to evolve into an unwanted state. • Consistency and standards - Users should not have to worry about whether different words, situations or actions mean the same thing or not. • Error prevention - Better than good error messages are the provision of mechanisms that prevent the occurrence of errors, namely eliminating or avoiding error prone conditions. • Recognition instead of recall - The memory load of the users should be minimized, ensuring that objects, actions and options are visible or easily retrievable whenever necessary. • Flexibility and efficiency of use - The system should be tailored to different user profiles (e.g., more experienced users or less experienced users) in a transparent way. • Aesthetic and minimalist design - Dialogues should not contain irrelevant or unnecessary information. • Support for users to recognize, diagnose and recover from errors - Error messages should be expressed in plain language, if possible unencrypted, accurately indicating the problem and suggesting a constructive solution. • Help and documentation - Any system should provide help information and it should be easy to search, not too extensive and focused on the tasks that users have to perform. The study reported by this paper aimed to verify the application of a consistency inspection protocol conformed to the checklists provided the ISO 9241-171 standard
646
A. C. O. Lima et al.
and heuristic evaluation to access the accessibility of assistive technologies for individuals with visual impairments, namely screen readers.
3 Materials and Methods The study consisted in evaluating the usability of two accessibility aids for voice synthesizer for individuals with visual impairments: DOSVOX and Job Access With Speech (JAWS). DOSVOX software was developed for Portuguese language and is available for free for individuals with visual impairments, while JAWS is the world’s most popular screen reader, developed for individuals whose vision loss prevents them from seeing screen content or navigating with a mouse. These two applications were selected for evaluation due to their popularity amongst users of the Institute for Blind People in Brazil. The text editor, the screen reader and the e-mail were the accessibility features evaluated from DOSVOX. In turn, in what concerns JAWS, the screen reader and the internet browser were the accessibility features evaluated. This study was conducted using two usability inspection methods, namely consistency inspection and heuristic evaluation. The consistency inspection was performed using the ISO 9241-171 standard as a basis. The ISO 9241-171 standard [4] is composed by four clauses (8, 9, 10 and 11) that contains detailed recommendations to promote accessibility. These recommendations are applicable to hardware devices control, interface management, sound reproduction, among others. Most recommendations are applicable to multiple layers of the systems, including operating systems, databases, which ensures the recommended accessibility characteristics across all interdependent layers. Still in accordance to the ISO 9241-171 [4], the evaluation procedures should consider typical users performing classical tasks and critics in typical environments, which includes, for instance, the dialog styles (e.g., menus or direct manipulation). The product’s compliance is assessed by verifying all the applicable requirements with the support of a systematic list that contains all the standard recommendations [4]. Then, the adoption rate is calculated. This adoption rate must be obtained from the number ratio of recommendations to be adopted by the product and the number of recommendations applicable to the product [3]. Moreover, the consistency inspection followed a protocol that consists of a set of interrelated steps, processes and activities to guide the evaluation during different phases: planning, conducting, collecting data, analyzing and reporting the results [9]. The protocol proposed by [9] was used to verify if a specific feature conforms to an adequate technical standard of accessibility, according the adequacy requirements presented in [10] and [11]. Since the protocol has a comprehensive and modular structure, the adaptation required to accommodate the assessment of the accessibility of the selected features resulted in small changes in the steps and processes. Moreover, no changes were required in terms of structure. In turn, the heuristic evaluation consisted in the heuristics set proposed by Nielsen [12], which comprises ten general features focused on product’s usability, resulting from the analysis of 294 usability problems types set, identified in empirical studies.
Application of the ISO 9241-171 Standard and Usability Inspection
647
The heuristic evaluation was carried out by five evaluators, following the existing recommendations [12].
4 Results The applicability analysis of each ISO 9241-171 [4] recommendation of parts 8, 9, 10 and 11 was based on information obtained from the documentation provided with the product. Based on the inspection lists, the adoption rates shown in Tables 1 and 2 were calculated for the evaluated products. Table 1. The result of applying the consistency inspection of DOSVOX. ISO 9241-171 parts 8 General guidelines 9 Input 10 Output 11 Documentations Total
Applicable issues Approved issues Adoption rate (%) 42 6 14 24 19 79 19 11 58 3 0 0 88 36 41
Table 2. The result of applying the consistency inspection of JAWS. ISO 9241-171 parts 8 General guidelines 9 Input 10 Output 11 Documentations Total
Applicable issues Approved issues Adoption rate (%) 42 33 79 24 22 92 19 18 95 3 3 100 88 76 86
Considering the consistency inspection, usability problems were identified in both systems, after the analysis of 88 items of the ISO 9241-171 standard, the relation between the number of items inspected and the total items, resulted in adoption rates: 41% for the DOSVOX system and 86% for the JAWS system. The results from the evaluation of Heuristic 1 (H1), Heuristic 2 (H2), Heuristic 3 (H3), Heuristic 5 (H5), Heuristic 6 (H6), Heuristic 8 (H8), Heuristic 9 (H9) and Heuristic 10 (H10) of DOSVOX and JAWS systems are presented in Tables 3, 4a, 4b, 4c, 5, 6, 7a, 7b, 7c, 7d, 8a, 8b, 9 and 10. No results were found for Heuristic 4 (H4) and Heuristic 7 (H7).
648
A. C. O. Lima et al.
The results show that the DOSVOX system faults were identified in all heuristics except Heuristic H4 and H7, with a total of 12 faults. While in the JAWS system, only two faults related to heuristics H2 and H6. From the evaluator’s point of view, an average of three usability problems were found, which is equivalent to a quarter of the total problems met using this evaluation method.
Table 3. The result of applying the Heuristics to the JAWS and DOSVOX systems for H1. Fail description Dosvox
Jaws
Result The absence of alternative mechanisms to access the relevant audio and video task: the system does not communicate to the user from the voice synthesis which installation procedures presented visually in the video monitor, the system also does not inform the process progression. The absence of this mechanism for voice synthesis systems is fundamental, so the execution of this task by individuals with visual impairments is being hampered. According to the heuristic application analysts, the system does not report on the visibility of the current system context Without fail
Number of analysts 3
N/A
Table 4. The result of applying the Heuristics to the JAWS and DOSVOX systems for H2. Fail description a Dosvox
Jaws b Dosvox
Jaws c Jaws Dosvox
Result
Number of analysts
Names and labels of inappropriate user interface elements (meaning): Not all elements (windows) are given a meaningful name. When the text editor application (levox) is activated via the l command in the system start menu, the window generated by the system does not receive the name of the levox application, but the address “C:\winvox\levox.exe”, meaning element and neither for the application Without fail
2
User interface element names not suitable for platform conventions: Disk, file, and subdirectory elements have the same functionality (access to documents or system folders). Just as text edit, read, and print commands are functions of the text editing application Without fail
3
Some names given to the menu elements are not significant, such as “voice” in “options”, or “video interceptor manager” Without fail
3
N/A
N/A
N/A
Application of the ISO 9241-171 Standard and Usability Inspection
649
Table 5. The result of applying the Heuristics to the JAWS and DOSVOX systems for H3. Fail description Dosvox
Jaws
Result No mechanisms to “undo” or “confirm” user actions: When the levox application is accessed, the system initially prompts the user to “enter the file name”, but if the user enters the wrong file name, the system only notifies the user: “file not found” and the application is terminated, not allowing the user to “undo” option, the user must restart the entire application to access levox, in order to enter the name correctly Without fail
Number of analysts 2
N/A
Table 6. The result of applying the Heuristics to the JAWS and DOSVOX systems for H5. Fail description Dosvox
Jaws
Result Lack of element selection mechanism as an alternative to typing: to access a file in the levox, file access can only be performed by scanning the name and path of a file, the system does not allow the user to choose a file (name or address) from a predefined list Without fail
Number of analysts 1
N/A
Table 7. The result of applying the Heuristics to the JAWS and DOSVOX systems for H6. Fail description a Dosvox
Result
Number of analysts
Lack of names and labels for user interface elements: When accessing the command to access the keyboard test application, the name of the application is not displayed in the video monitor, nor is it spoken by the synthesizer. According to the analysts that evaluated the system from the application of the heuristic, this absence of nomenclature gives discomfort to the user, because it is necessary to remember the information or part of the dialogue. If the user leaves the window that is running and wishes to return to the same screen, he should memorize the application that was being executed
2
(continued)
650
A. C. O. Lima et al. Table 7. (continued)
Fail description Jaws b Dosvox
Jaws c Dosvox
Jaws d Jaws
Dosvox
Result Without fail
Number of analysts N/A
The absence of explicit and implicit indicators for access to commands: in the standard inspection it was verified that the user is not presented with the commands that should be used to access the menu. Also, the command (F1) or the down arrow command that gives access to the second menu of the system¸ and that gives access to the command options is not presented. This problem is repeated for all menu options Without fail
4
There are no mechanisms to reduce the number of steps required to perform tasks: to access a website, for example, it is necessary to use three commands: initially the letter r, when the system asks the user: what you want, followed by the letter h when the system asks: what is the letter of the network program. At this point, the system opens another window of the application called Webvox in which the system asks: what is your option: then the user must enter the letter t, so that, finally, the user can enter the address of the site desired by the user. This procedure requires a lot of user action, which makes the action laborious Without fail
2
It has been found that even though the user has access to the system menu by means of the implicit indicators when initiating this system, only a sound notification is informed: “JAWS is active on your computer”. Command options for navigating the operating system (accessing applications) or reading the icons or user interface elements, or even for accessing the system menu are not informed Without fail
4
N/A
N/A
N/A
Application of the ISO 9241-171 Standard and Usability Inspection
651
Table 8. The result of applying the Heuristics to the JAWS and DOSVOX systems for H8. Fail description a Dosvox
Jaws b Dosvox
Jaws
Result
Number of analysts
The aesthetic design is not minimalistic: it has been found that the information initially presented visually in the user interface, is repeated for most applications Without fail
3
Menu options not grouped by context: the main menu is presented in two parts. The first part is triggered using the “F1” command and contains options for keyboard testing, editing text, printing, files, and disks. Other options such as network and Internet access are only displayed by the use of commands activated by any of the keyboard arrows, in which the same commands from the first part are repeated unnecessarily Without fail
2
N/A
N/A
Table 9. The result of applying the Heuristics to the JAWS and DOSVOX systems for H9. Fail description Dosvox
Jaws
Result The aesthetic design is not minimalistic: it has been found that the information, initially presented visually in the user interface, is repeated for most applications Without fail
Number of analysts 2
N/A
Table 10. The result of applying the Heuristics to the JAWS and DOSVOX systems for H10. Fail description Dosvox
Jaws
Result The absence of understandable help documentation: The system documentation was not written in clear language, an inappropriate term is used when using: “the machine informs”, in addition to the use of subjective terms such as “xx” Without fail
Number of analysts 3
N/A
5 Discussion and Conclusion In this study, two usability inspection methods were applied: consistency inspection and heuristic evaluation. Identifying usability issues through analyst-centric techniques has anticipated and confirmed potential problems with real users.
652
A. C. O. Lima et al.
The consistency inspection protocol conformed to the checklists provided the ISO 9241-171 standard identified usability problems in both systems. After the analysis of 88 items of the standard, the relation between the number of items inspected and the total items, resulted in 41% for the DOSVOX system and 86% for the JAWS system. Using the checklists provided in the ISO standard allowed the understanding of the applicability and adoption of recommendations. From the heuristic evaluation, 12 issues were identified in the DOSVOX system and two in the JAWS system. The five evaluators detected different usability problems, thus broadening the scope of evaluation results. Comparing both usability assessment methods used, the consistency inspection using ISO 9241-171 proved to be more sensitive to usability issues. Concerning the protocol used to support consistency inspection, although it was initially developed for usability evaluations, the results show that with minor modifications it is adequate for supporting the evaluation of the accessibility features. Given the protocol modularity, the necessary modifications to adapt it to evaluate accessibility features consisted in minor adjustments in some steps, processes and activities. Specifically, modifications were introduced in the planning steps of the experiment to adapt the testing environment for the specific needs of the group of participants. Acknowledgments. This work was financially supported by National Funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., under the project UI IEETA: UID/CEC/ 00127/2019, and Laboratorio de Interfaces Homem-Maquina (LIHM), Department of Electric Energy, Federal University of Campina Grande.
References 1. Lima, A.C.O., de Fátima Queiroz Vieira, M., da Silva Ferreira, R., Aguiar, Y.P.C., Bastos, M.P., Junior, S.L.M.L.: Evaluating system accessibility using an experimental protocol based on usability. In: 2018 International Conferences on Interfaces and Human Computer, pp. 85–92 (2018) 2. Queirós, A., Silva, A., Alvarelhão, J., Rocha, N.P., Teixeira, A.: Usability, accessibility and ambient-assisted living: a systematic literature review. Univ. Access Inf. Soc. 14(1), 57–66 (2013) 3. ISO 9241: Ergonomics of Human-System Interaction, Genève (2009) 4. ISO 9241-171: Ergonomics of Human-System Interaction – Part 171: Guidance on Software Accessibility (2008) 5. Hanington, B., Martin, B.: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions. Rockport Publishers, Beverly (2012) 6. Martins, A.I., Queirós, A., Silva, A.G., Rocha, N.P.: Usability evaluation methods: a systematic review. In: Human Factors in Software Development and Design, pp. 250–273. IGI Global, USA (2015) 7. Nielsen Norman Group: How to Conduct a Heuristic Evaluation. http://www.nngroup.com/ articles/how-to-conduct-a-heuristic-evaluation/. Accessed 13 Nov 2019 8. Nielsen, J., Mack, R.L.: Usability Inspection Methods, vol. 1. Wiley, New York (1994)
Application of the ISO 9241-171 Standard and Usability Inspection
653
9. Aguiar, Y.P.C., Vieira, M.F.Q.: Proposal of a protocol to support product usability evaluation. In: Fourth IASTED International Conference Human-Computer Interaction, pp. 282–289 (2009) 10. Lima, A.C., Vieira, M., Aguiar, Y., Fatima, M.: Experimental protocol for accessibility. In: IADIS: International Conference of Interfaces and Human Computer Interaction (2016) 11. Lima, A.C.O., Vieira, M.D.F.Q., Martins, A.I., Rocha, N.P., Mendes, J.C., da Silva Ferreira, R.: Impact of evaluating the usability of assisted technology oriented by protocol. In: Handbook of Research on Human-Computer Interfaces and New Modes of Interactivity, pp. 275–296. IGI Global (2019) 12. Nielsen, J.: Usability Engineering. Academic Press, Cambridge (1993)
Augmented Reality Technologies Selection Using the Task-Technology Fit Model – A Study with ICETS Nageswaran Vaidyanathan(&) Department of Digitalisation, Copenhagen Business School, Copenhagen, Denmark [email protected], [email protected]
Abstract. The emergence of Augmented Reality (AR) to enhance online and offline customer experiences is fueling great opportunities and multiple business cases. This paper examines how the AR technologies are selected by the Infosys Center for Emerging Technologies Solutions (ICETS) to develop solutions for various global users. A combination of interviews, emails, questions and a survey methodology was used to understand how the AR set up is selected with the users, where these technologies are being employed, the strengths and limitations, and where there is continued development with partners to mature. The iterative process enabled the mapping of user expectations and values to the technology characterisitcs to select the AR technology for a specific task. The Task Technology Fit (TTF) model was used to explain the theory for how user expectations and technology characteristics can be used to determine fit especially where the technology is still maturing. Keywords: Augmented Reality Emerging technology Task-Technology Fit (TTF) Survey
AR technology
1 Introduction Augmented reality (AR) is a direct or indirect view of a physical, real-world environment whose elements are augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics or GPS data. AR technologies overlay a layer of virtual world on top of the real world with the help of a device. A whole new world of opportunities in the AR arena with unique functionalities and platforms is emerging. Scalability of AR adoption was limited due to technology drawbacks [1]. However, this trend has shifted and AR has seen investments in education, manufacturing, retail, health care, construction and tourism with new business and operating models [2]. The implications of AR use, based on research, has been predominantly studied in the context of education and games [3], but the trends are providing opportunities to develop differentiation in different domains [4]. In this paper, the Task-Technology Fit model is used to understand how user expectations of the task to be performed and technology characteristics are examined together to design an AR technology set up. An iterative process is used to emphasize © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 654–663, 2020. https://doi.org/10.1007/978-3-030-45691-7_61
Augmented Reality Technologies Selection Using the Task-Technology
655
the engagement of the users with the technologies available to build the AR technology to meet the needs and performance expectations for the task at hand. This paper attempts to answer the following question - How to select an AR technology that ties user expectations of a task with technology characteristics to perform that task?
2 Related Research and Theoretical Framework for Study Immersive Virtual Environments (IVE) is an umbrella term of technologies and their uses to create immersive experiences where the real and virtual worlds intersect [5]. With basis in the Mixed-reality continuum that uses the degree of immersion of the user into the virtual world to distinguish popular terms associated with IVEs [6], AR is defined as a type of IVE where the user remains close to the real world and enhances the real-world situation with a digitally induced experience [7, 8]. 2.1
AR Technology Components and Usage
AR enhances a user’s interaction with reality through a digital environment [9]. AR refers to overlaying virtual images over real-world objects. n a simple form, AR comprises of three foundational components [10]: (a) a rendering method that connects virtual and real worlds, (b) a projection device that transform data to visual stimuli, and (c) a data feed that provisions data to process and display (Fig. 1). D ata provision
Projection device Smartphone
Preloaded
AR technical design
Display
Dynamic Smart glasses
SLAM
Marker
Location
Rendering m ethod
Fig. 1. Components of an AR setup
The rendering method deploys virtual images over real-world objects via one of these approaches: Simultaneous Localization and Mapping (SLAM), Marker and Location The projection of data to visual stimuli typically uses one of these presentation devices: Smart Phones: On smart phone devices and tablets, Large displays and Smart glasses: Head-mounted displays, glasses, and lenses The data provision in AR brings data to overlay visuals in a three-dimensional space. Two primary forms include static and dynamic data provisioning.
656
2.2
N. Vaidyanathan
Task Technology Fit Model (TTF)
The Task-Technology Fit (TTF) model is used [12] to understand how technology leads to performance, assess usage impacts, and judge the match between the task and technology characteristics [13]. The model states that the underlying technology should be a good fit with the tasks it supports to be utilized and positive affect user engagement [14]. Both task characteristics and technology characteristics can affect the tasktechnology fit, which in turn determines users’ utilization of technology and their task performance (Fig. 2).
Fig. 2.
The Task-Technology Fit (TTF) model [11]
There are five key constructs in the TTF model: task characteristics, technology characteristics, task-technology fit, utilization, and performance impact [11]. The TTF model suggests that users fit of technology for their actions is not only based on technology characteristics, but also on the extent to which that technology meets users’ needs and their individual abilities (i.e., task-technology fit) [11]. The TTF model focuses on the fit between task characteristics and technology characteristics [15]. Users have different expectations from AR to fulfill the tasks at hand which could be finding something new, selecting what they need from a range of products, improving their experience and engagement, ability to select and make decisions as well as virtually interact with products. The opportunities to design AR technologies with different attributes provides the possibility in to influence the perception of the user’s expectations for the activity [8]. The activity characteristics for how the users want to engage with AR may include the engagement for different reasons. Following the generic TTF model [11, 12] and related research on AR [16] there is evidence that the technological design has implications for its use and an impact for usage. The Task Activity-Technology fit as supported by TTF can be characterized as to how individuals think about the probability of improving performance on activities through use of a given technology. The fit refers to the ability of the technology set up to help users fulfill their activity expectations and value using AR and if it directly influences intentions to use a given technology [32]. Following the general structure of TTF [12], it is required to understand how the users’ perception of the technology affects its adoption in terms of engaging with the technology for intended purposes for which the activity is being performed. Prior research, drawing on TTF [12], has shown that the effects of a new technology is a core determinant of the adoption intention of new
Augmented Reality Technologies Selection Using the Task-Technology
657
technologies. The adoption of the AR technology to perform the activity further enhances the user’s expectations of AR and its values. This requires the technology design to continue to mature in terms of how data is rendered, how data is presented in a static or dynamic mode and the experience for the users.
3 Methodology The paper is based on a descriptive study [17] to uncover the technology designs that meets the expectations for task performance based on the technology characteristics. It uses the TTF model to validate how users’ expectations and values and technology characteristics play an important role in the selection of an AR technology. The research followed a positivist case study approach [18] with the ambition to develop a theory [18] covering the constructs and relationships [19] of relevance for AR technology selection for fulfilling the task at hand. 3.1
Case Description
Infosys was chosen as the company to work on this study given its global presence and a separate unit focused on emerging technologies – Infosys Center for Emerging Technology Solutions (ICETS). The study was conducted over four months with ICETS to understand how users’ expectations of the task are mapped to technology characteristics to select an AR technology setup using the TTF model. The AR setups were developed based on the task characteristics, technology characteristics, the relevance for its use and whether it offered opportunities based on its applicability, drawbacks or limitations to meet the user expectations. 3.2
Data Collection
The primary data collection was via interviews, questions posed to the different team members of ICETS and a survey. Additional data was collected through ICTES provided videos, press releases available publicly as well as reports published in the media used to inspire and complement the primary data collection. Quotations found in publicly available articles were used to supplement the data collection this study was conducted for [20]. Data was collected via an introductory email sent to the ICETS head and was followed by a series of interviews with different team members and then a list of questions in a survey format sent to the 30 ICETS team members. The sample size for the survey was determined based on the degree of precision and differentiation in the ICETS group structure, ability to gain access to the named subjects in the team, the stratification of the subjects to get different viewpoints from their perspectives, and the ability to conform to the underlying theoretical model. Data analysis was conducted using a framework guided by the four stages of analysis [21]: comprehending, synthesizing, theorizing and re-contextualizing. The specific strategies for analysis in these stages centered on the work of [22], which has been successfully used in this form of research. The information collected was assembled together and integrated into a description that
658
N. Vaidyanathan
was shared with the AVP of ICETS for accuracy and additional clarification. Leaders in the company provided contacts and inputs and learnings to substantiate the findings. Empirical sources of data used for this study is shown below (Table 1).
Table 1. Empirical sources of data ICETS source of data AVP, Infosys emerging technologies head Head of XR Labs Principal product architect Senior designer (XR) Technical lead (XR) XR developers Principal architect - emerging technologies and innovation Press releases Media reports YouTube videos provided by ICETS
3.3
Number of team members 1 1 1 4 4 18 1
Data collection method Interviews, email, questions, feedback Interview, Quotation, research tidbits Interview, survey Interviews, survey Interviews, survey Survey Interview, Survey
Connecting the Data Collected to the TTF Model
The data collected in this descriptive study was used to understand the typical use cases how the AR technologies are used, what user expectations of the task are being met, what the current maturity of the AR technologies are, and to be aware of the drawbacks and limitations to perform a particular task and its use. The TTF model is used to understand whether the findings explained the task technology fit for a specific activity that the user wanted to perform given AR is still an emerging technology and user expectations of the tasks are still forming based on the limited available research in this area to date [12, 18]. 3.4
Map the User Expectations of the Task to Technology Characteristics to Determine Fit for Use Case
ICETS works closely with the users to develop the personas, designs, prototypes and implementation approach to understand the user expectations for the task to be performed. This is based on what the task characteristics are and whether the available technology can meet the needs of the task. This is achieved via an iterative approach to develop the AR design and technology solution. The setup is monitored and improved as part of the continued engagement with the users to understand how the performance of the desired activities and feedback is used to continually improve the experience. Among the anticipated impacts are improved user experience, technical quality and feasibility of the solution, and commercial viability of the solution.
Augmented Reality Technologies Selection Using the Task-Technology
659
Questions asked to the users to understand the task and to select the appropriate AR technology include: What are the user expectations for the task to be performed? What need is being fulfilled? What task characteristics are important? Is the technology to power the solution available or within reach? What technology characteristics are important? Do we have the ability to modify or customize the current offering and can we work with solution partners? Will the solution align with the goals for the use case to perform the intended task? The iterative process using the responses to these questions included the following steps: Inspiration: involves an accurate understanding of a problem or product where AR can add value - envisioning the application. Empathy: look at the needs of the users, not only gathering user requirements but what users would like to experience. It could be higher image resolution, larger display, and quicker response and so on. Ideation: this is the most important milestone in enhancing user experience as it deals with challenges and ability to find innovative answers to the findings during the empathy phase but also the opportunities and limitations it provides during the prototyping phase. Prototyping: involves implementation and testing of the AR app in an agile manner. Testing is crucial in producing an application that is reliable and answers to the user expectations. 3.5
User Expectations and Task Characteristics
ICETS, via the engagement of users, documents the key expectation areas expressed by its users to perform the task. These expectations are based on a set of values – extrinsic, intrinsic, interests and goals to meet the expectations. These expectations are experiential, training, educational, decision making, selecting products, virtual settings, training, new operating and business models, changing mix of offering, etc. The task characteristics are learning, utilitarian, enjoyment, engagement, aesthetic or fun. 3.6
Technology Characteristics to Fit the Task Characteristics
ICETS works with the users on the technology characteristics that make sense with respect to utilization and expected performance for the task at hand. The responses formulate the solution approach for technology fit for the use case. This is part of the feasibility constraints as well as viability for how it maps to the user’s budgets and expected performance. Key questions answered include: • The type of positioning – Head-worn or hand-held based on the needs for the solution • Understand the role/persona and function for which the AR solution is being designed for: Does the user require hands free operation? Whether the solution needs to be mobile or stationary? Is the solution for indoor or outdoor use? Does the design need to be interactive? Is it single or multi-user based?
660
N. Vaidyanathan
• Based on the requirements for the solution, the following specific details are focused upon: Marker based or Markerless, Brightness- fixed or adjustable, Contrast – fixed or adjustable, Resolution of display – fixed or adjustable, Duration of use and impact on power/battery life, Cost of devices, Field of view – Limited, Fixed or Extensible, Stereoscope enabled or not, Dynamic refocus – is that required? • Is this a new design or can an existing design be leveraged? Is this an Experiment, Proof of Concept or for Active use – this focuses on heating, networks, battery life, hardware constraints, etc. • Is the user subject to any geographic, social or demographic barriers? That affects the solution approach Through a series of prototyping and user tests, the AR design is validated to ensure the underlying use case characteristics and features and expectations is being met.. 3.7
Findings from the Study
The set up with the users with continuous feedback and understanding what the solution provides, its shortcomings and limitations, usage and expected experience via the iterative process aligns with the TTF model to select the technology to fit task at hand and how it is utilized and performed. The study shows that users and technology providers have to continue to engage to understand what is needed and how the current technology available can provide for the needed performance and utilization for the set up selected. Survey data has been summarized in the table (Table 2) below to provide context for the use of the different AR set ups, and what the limitations are.
Table 2. Summary of the survey findings on technology characteristics versus task fit Positioning
Head-worn
Technology
Retinal
Hand-held
Example
HTC Vive Pro
HoloLens
Magic Leap
All Android ARCore
Mobile (Yes, No)
No
Yes
Yes
Yes
Yes
Outdoor Use (Yes, No)
No
Yes
Yes
Yes
Yes
Interaction (Yes, No)
Yes
Yes
Yes
Yes
Yes
Multi-user (Yes, No)
Yes
Yes
Yes
Yes
Yes
Brightness (Adjustable, Fixed)
Fixed
Adjustable
Adjustable
Adjustable
Adjustable
Contrast (Adjustable, Fixed)
Fixed
Fixed
Fixed
Fixed
Fixed
Resolution (Adjustable, Fixed)
Fixed
Fixed
Fixed
Fixed
Fixed
Field of view (Limited, Extensible, Fixed)
Fixed
Limited, Fixed
Limited
Limited
Limited
Full color (Available, Limited, Not Available)
Available
Available
Available
Available
Available
Optical
Yes
Yes
Available
Not Available
Not Available
Available
Available
Occlusion (Yes, No)
No
Yes
Yes
Yes
Yes
Power Economy (Yes, No)
Yes
No
No
Yes
No
Extent of Use (Experiment, PoC, Active Use)
Active use Avaibility in specific regions of world (Geographic ), Cost, VR enabled PC requirement
Active use Cost, Avaiability in specific regions of world (Geographic )
Active use
Active Use
Active Use
Barriers (Geographic, Social, Demographic, etc. indicate all that apply)
Technology Drawbacks - indicate all that apply)
Yes
All iOS ARKit
Dynamic Refocus (Available, Not-Available)
Stereoscopic (Yes, No)
Yes
Optical
None
VR enabled PC requirement, Limited play limited FOV, short battery FOV, short battery life, area within base stations, wired HMD life, Slow charging, Uncomfotable to wear for a Uncomfotable to wear, long period of time cheep headstrap
Yes
Demographic Battery drain, FOV
Demographic Battery drain, FOV
Augmented Reality Technologies Selection Using the Task-Technology
661
The responses from the interviews provided a view into the different user expectations being met and how the technology selection is influenced by the type of use as well as the degree of understanding for how the technology fulfills the needs for the task performance. The key questions responded to were the intended task characteristics, understanding of what the AR technology provides to meet that need, the degree of use of the technology to utilize to perform the task and interest in developing the solution further than the current immediate need. Though the study is primarily descriptive in nature, the ICETS process for AR selection and the data collected was used to validate the TTF theoretical model. The five constructs in the TTF model that were qualitatively validated were task characteristics, technology characteristics, task-technology fit, utilization, and performance impact [11] based primarily on the data collected via interviews and survey results that the team members provided for this study. The TTF model suggests that fit of technology for actions is not only based on technology characteristics, but also on the extent to which that technology meets users’ needs and their individual abilities [11]. The model focuses on the fit between task characteristics and technology characteristics [15]. By engaging users via an iterative process to document the task characteristics, ICETS was able to ask the users what they expected from the task. This helped determine the technology characteristics to fulfill the task at hand to formulate the AR design and the AR technology selection for the task to be performed was made. There are drawbacks that still need to be addressed affecting continued maturing of the design as outlined from the survey (Table 2). The study demonstrates that from the data collected from the interviews and surveys that fit for purpose AR set ups are emerging for users to understand the scope and possibilities to meet their underlying use cases.
4 Discussion and Theoretical Contributions User expectations and values are continuing to grow as a result of exposure to digital platforms and emerging immersive environments while AR design is still maturing with these emerging user expectations and expected value for the activity being performed. The technology landscape is maturing with the different characteristics needed to support the needs. Key technology constraints continue to exist with hardware, battery life, network bandwidth, data visualization, user perceptions of privacy and safety concerns. The study contributes to the extant literature by extending the explanation of the technologies and its uses, strengths and limitations currently. The TTF model has potential to validate the dependencies between users’ expectations of the task, what the technology offers, especially where the technologies are emerging and how the fit of the technology for the task provides the utilization and performance of the task. This study has some limitations. The sample size was limited. Second, this study used qualitative data to address the research question for how AR technology selection is made mapping user expectations for the task and hand and the technology characteristics of available technology to determine fit for task. The TTF model was not
662
N. Vaidyanathan
validated quantitively and the user perspectives were not provided by ICETS due to privacy. The study does not consider demographic, emotional, cultural or personality impacts on how users interpret task characteristics and the fit of the technology characteristics to meet particular expectations.
5 Conclusion Augmented Reality technologies continue to evolve in different settings. The maturity of the technology, digital platforms that enable integration between partners in a secure and safe way, users understanding of the value proposition and implications of this technology will create the stickiness and adoption of AR over time. This conceptualization provides a basis for investigating the selection of AR relative to different use scenarios and contexts with users engaged provides for an exciting prospect for further research in this emerging area.
References 1. Fox, J., Arena, D., Bailenson, J.N.: Virtual reality: a survival guide for the social scientist. J. Media Psychol. 21(3), 95–113 (2009) 2. Cummings, J.J., Bailenson, J.N.: How immersive is enough? A meta-analysis of the effect of immersive technology on user presence. Media Psychol. 19(2), 272–309 (2016) 3. Hofma, C.C., Henningsson, S., Vaidyanathan, N.: Immersive virtual environments in information systems research: a review of objects and approaches. In: Academy of Management Proceedings 2018, vol. 2018, no. 1, p. 13932. Academy of Management, Briarcliff Manor (2018) 4. Philips, D.: Retail faces a new reality as AR and VR adoption swells. The Drum, vol. 7 (2017) 5. Cahalane, M., Feller, J., Finnegan, P.: Seeking the entanglement of immersion and emergence: reflections from an analysis of the state of IS research on virtual worlds. In: International Conference on Information Systems, ICIS 2012, vol. 3, pp. 1814–1833 (2012) 6. Milgram, P., Kishino, F.: “A taxonomy of mixed reality visual displays”, IEICE (Institute of Electronics, Information and Communication Engineers) Transactions on Information and Systems, Special issue on Networked Reality, December 1994 7. Pavlik, J.V., Bridges, F.: The emergence of Augmented Reality (AR) as a storytelling medium in journalism. Journal. Commun. Monogr. 15(1), 4–59 (2013) 8. Ohta, Y., Tamura, H.: Mixed Reality: Merging Real and Virtual Worlds. Springer Publishing Company, Incorporated (2014) 9. Furht, B.: Handbook of Augmented Reality. Springer, New York (2014) 10. Azuma, R.T.: A survey of augmented reality. Presence 6(4), 335–385 (1997) 11. Eccles, J.S., Adler, T.F., Futterman, R., Goff, S.B., Kaczala, C.M., Meece, J.L., Midgley, C. (1983) 12. Goodhue, D.L., Thompson, R.L.: Task-technology fit and individual performance. MIS Q. 19(2), 213–236 (1995)
Augmented Reality Technologies Selection Using the Task-Technology
663
13. Wu, B., Chen, X.: Continuance intention to use MOOCs: integrating the technology acceptance model (TAM) and task technology fit (TTF) model. Comput. Hum. Behav. 67, 221–232 (2017) 14. El Said, G.R.: Understanding knowledge management system antecedents of performance impact: extending the task-technology fit model with intention to share knowledge construct. Future Bus. J. 1, 75–87 (2015) 15. Lu, H.P., Yang, Y.W.: Toward an understanding of the behavioral intention to use a social networking site: an extension of task-technology fit to social- technology fit. Comput. Hum. Behav. 34, 323–332 (2014) 16. Jankaweekool, P., Uiphanit, T.: User acceptance of augmented reality technology: mobile application for coin treasury museum. In: Proceedings of ISER 72nd International Conference 2017, August, pp. 8–13 (2017) 17. Yin, R.K.: Case Study Research Design and Methods: Applied Social Research and Methods Series, 2nd edn. Sage Publications Inc., Thousand Oaks (1994) 18. Dubé, L., Paré, G.: Rigor in information systems positivist case research: current practices, trends, and recommendations. MIS Q. 27(4), 597 (2003) 19. Weber, R.: Evaluating and developing theories in the information systems discipline. J. Assoc. Inf. Syst. 13(1), 1–30 (2012) 20. Yin, R.K.: Case Study Research: Design and Methods, 3rd edn. Sage, Thousand Oaks (2003) 21. Morse, J.M.: Designing funded qualitative research. In: Denzin, N.K., Lincoln, Y.S. (eds.) Handbook of Qualitative Research, pp. 220–235. Sage Publications Inc., Thousand Oaks (1994) 22. Miles, M.B., Huberman, M.A.: Qualitative Analysis: An Expanded Sourcebook, 2nd edn. Sage, Thousand Oaks (1994)
Towards Situated User-Driven Interaction Design of Ambient Smart Objects in Domestic Settings Victor Kaptelinin(&) and Mikael Hansson Umeå University, 901 87 Umeå, Sweden [email protected], [email protected]
Abstract. In recent years, involving users in the design of Internet of Things (IoT)-based solutions for everyday settings has received significant attention in Human-Computer Interaction (HCI) studies. This paper aims to contribute to that line of research by proposing an interaction design approach, which extends the scope of user-driven design beyond concept generation to support interaction design activities within the entire task-artifact cycle. We analyze key challenges for creating IoT-based solutions for domestic environments, and argue that dealing with such challenges requires adopting an approach characterized by (a) a focus on users’ activity spaces rather than individual objects and tasks, (b) support and assessment of the actual, and sufficiently extended, use experience of new solutions, (c) augmenting existing everyday objects, to transform them into smart ones, rather than developing entirely new smart objects, and, last but not least, (d) combining IoT prototyping and digital fabrication to ensure physical and esthetical integration of new designs into concrete everyday settings. Keywords: User-driven design
Internet of Things Digital fabrication
1 Introduction A major trend in the ongoing digital transformation, encompassing virtually all areas of modern life, is the development of an increasingly wide range of interactive artifacts and services enabled by Internet of Things (IoT) technologies. The forthcoming shift to the 5G telecommunication standard is expected to provide further stimulus to the development of IoT technologies, which in turn are expected to have a massive impact on everyday settings, such as homes, offices, and public spaces [1]. Research efforts and practical explorations in the field of Human-Computer Interaction (HCI) have been increasingly dealing with issues related to IoT-enabled solutions for everyday settings, such as those related to the “smart home” [2–8]. Arguably, however, more research is needed to understand how to make use of the full potential of IoT. This paper aims to contribute to achieving this goal by proposing an approach to user-driven design of smart objects for domestic environments. We argue that the proposed approach, which puts an emphasis on the situated nature of user-driven
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 664–671, 2020. https://doi.org/10.1007/978-3-030-45691-7_62
Towards Situated User-Driven Interaction Design of Ambient Smart Objects
665
innovation, prolonged real-life use experience, and integration of IoT prototyping with digital fabrication, can be considered a useful complement to existing approaches. In the remainder of the paper we first discuss specific interaction design challenges related to designing IoT-based solutions for domestic environments and present an overview of existing attempts to deal with these challenges in HCI research. After that we argue that the scope of analysis and design explorations in HCI should be expanded to include the entire task-artifact cycle of intrinsic practice transformation. We then propose and discuss a design approach supporting such an expansion. Finally, present several examples of technological designs, illustrating the proposed approach.
2 IoT Technology Innovations in Domestic Settings 2.1
Challenges for Designing IoT Technologies for Everyday Environments
Compared to the development and implementation of IoT technologies in organizational contexts, development and implementation of connected smart objects in domestic settings presents a number of additional challenges. At least three related, but distinct challenges can be identified. First, domestic settings are extremely diverse environments, which are shaped by unique configurations of historical, individual, and other factors. Accordingly, technological solutions, suitable for such environments are in many cases likely to be highly situated, even idiosyncratic ones. Of course, implementation of IoT solutions in organizational settings is inevitably situated, as well, but in such settings certain rules and constraints should also be enforced to make sure that work practices in the organization in question are consistent with broader standards and policies. Second, domestic environments, compared to organizational ones, are more personal. Again, this is a matter of degree, since organizational settings are also personal to a certain extend. Undoubtedly, however, individual preferences and tastes play a significantly more important role in domestic environments than they do in organizational settings. Third and finally, as opposed to technologies for organizational settings, many IoT solutions for the home are not primarily intended for increasing the effectiveness and efficiency of certain tasks. Instead, a key assessment criterion is often whether the technology in question helps to create a “nice” environment. Therefore, the design of IoT solutions for domestic settings should be especially concerned with creating ambient technologies, which generally contribute to a positive experience of the environment in question and do not necessarily require users’ attention. 2.2
HCI Studies of User-Driven Design of IoT-Based Smart Home Solutions
New challenges related to designing technologies for the home have long been acknowledged in HCI research. Almost two decades ago Crabtree et al. [6] observed:
666
V. Kaptelinin and M. Hansson
“The move from the office, and working environments in general, has highlighted the need for new techniques for understanding the home and conveying findings to technology developers” [6].
In particular, the need to make sure that new technological solutions reflect the situated nature of domestic environments has been addressed by involving users in design. A range of tools, such as editors and design kits, intended to make it possible for end-users to develop their own solutions for their own environments, was explored in HCI and related areas. An early end-user design tool for creating situated configurations of connected devices is the Jigsaw editor [8], allowing the users to link various technological components, e.g., a doorbell and a camera, to create a desired solution, e.g., taking a picture when someone rings the doorbell, by assembling pieces of a puzzle corresponding to the components. Some examples of IoT design kits for end users, created and tested more recently, include Un-Kit [3] and the Loaded Dice [4]. The Un-Kit is a deliberately and explicitly incomplete set of elements, such as sensors and actuators, to support engaging end users, such as older adults, in co-design of IoT solutions for the home. The Loaded Dice is a combination of two cubes, one containing a set of six different sensors, an one containing a set of six different actuators, to help users explore different combinations of sensors and actuators. A key advantage of the editors and kits described above is making it possible for end users to express their ideas about potential design solutions, which are firmly contextualized in the users’ real-life environments. However, using the editors and kits is typically limited by the ideation phase, and solutions developed at this phase typically do not become a natural part of users’ settings. As argued below, this limitation is an important one, which indicates the need for further development of concepts, methods, and tools for supporting user-driven innovation of IoT-based solutions.
3 Beyond Concept Generation: Embedding IoT Solutions in Complete Task-Artifact Cycles 3.1
Task-Artifact Cycle, Intrinsic Practice Transformation, and Appropriation
The notion of task-artifact cycle (TAC) refers to the process of dynamic co-evolution of human activities and the tools involved in carrying out the activities. Carroll [1] describes the notion as follows: “Artifacts are designed in response, but inevitably do more than merely respond. Through the course of their adoption and appropriation, new designs provide new possibilities for action and interaction. Ultimately, this activity articulates further human needs, preferences, and design visions” [1].
Building on the notion of TAC, Kaptelinin and Bannon [9] argue that the impact of tools on human practices can be described as an interplay between intrinsic and extrinsic practice transformation. Extrinsic transformation results from implementing the solutions envisioned by designers of novel artifacts, while intrinsic transformation is
Towards Situated User-Driven Interaction Design of Ambient Smart Objects
667
achieved by users themselves, who use all available resources to address the problems they experience. The most central aspect of TAC and intrinsic practice transformation is the process of appropriation. When appropriating a technology people make the technology their own; they integrate an initially novel unfamiliar artifact into a tool tightly integrated into their everyday life [10]. 3.2
Making the Case for an Expanded Scope of End-User Design of IoT Solutions
The notions of TAC, intrinsic practice transformation, and appropriation all imply that generating a new interaction design concept cannot be considered the end goal of interaction design. Understanding long-term co-evolution of practices and tools [1, 9, 10] is essential for developing interactive artifacts for successful, sustainable everyday use. Arguably, this conclusion is especially valid in regard of IoT-enabled interactive technologies for the home. Many smart objects in domestic settings are intended as ambient objects, which are quietly functioning in the background and most of the time do not require users’ attention. For instance, energy-saving management of temperature and lighting in a smart home can be performed automatically depending on changing conditions and anticipated inhabitants’ needs so that people may not even be aware of the technology performing the management. The only way to properly assess a design of an ambient smart object is to make it possible for people to use the object without placing it in the focus of their attention. However, a fundamental problem with short-term efforts directed at development and assessment of design concepts is that the technology in question is always in the focus of users’ attention [11]. This problem concerns all types of design and evaluation, in which generating a new design concept is the main goal. It includes, in particular, end-user design of IoT-based solutions, supported by such tools as the Jigsaw editor, the Un-Kit, and the Loaded Dice [3, 4, 8]. A potential way to address the problem is to use “research products” instead of conventional prototypes. Research products are described by Odom et al. [12] as research artifacts, which have a high quality of finish, can fit into everyday environments, and can be used independently in everyday settings. These features of research products make it possible for people to engage with the artifacts for a prolonged period of time. Undoubtedly, using research artifacts has a number of advantages. However, it cannot be employed in the context of end user design. While research artifacts are brought into an everyday setting “from the outside” and for “external” purposes (namely, research), end user design of IoT-based solutions is driven by the users themselves and for their own purposes. Therefore, existing approaches, concepts, and tools in interaction design do not provide sufficient support for long-term engagement of end users with the interactive prototypes they create. An approach addressing this issue is outlined below.
668
V. Kaptelinin and M. Hansson
4 Supporting Situated User-Driven IoT-Based Innovations for Domestic Settings: An Outline of a Design Approach The key underlying ideas of the proposed design approach for user-driven IoT-based innovations for domestic environments are as follows. 4.1
Focus on Users’ Activity Spaces Rather than Individual Objects and Tasks
Since IoT-based solutions are intended to become a part of users’ everyday environments, the design of such technologies cannot be limited to individual objects and tasks. The scope of the design should include a transformation of users’ entire activity spaces [9], which can be achieved by (a) starting from problems and opportunities related to an entire environment when establishing the need for a new technology, and (b) taking into account the impact of the new technology on the environment when implementing and assessing the technology. 4.2
Enabling Extended Real-Life Use Experience
As mentioned, IoT-based solutions for the home are often intended as ambient technologies, so their proper assessment (and, if necessary re-design) requires prolonged everyday engagement with such technologies, sufficient for the technologies to become ambient. Therefore, such technologies should be designed to make sure that they share certain qualities with “research products”, discussed in [12]. Namely, the technologies should fit into the everyday environment and have a reasonably high quality of finish. 4.3
Augmenting Existing Everyday Objects Rather than Developing New Ones
The design approach we propose is intended to be accessible to a broad group of technology users. Since the approach is not oriented toward design professionals, it has the explicit aim to require minimum design skills for a person to be able to adopt and practice it. To achieve this goal, the approach emphasizes augmenting existing everyday objects rather than developing entirely new ones. This feature of the approach allows people to make relatively simple additions to existing objects to convert them into smart ones, rather than facing a potentially challenging task of designing a completely new artifact. 4.4
Integrating IoT Prototyping with Digital Fabrication
The need to make sure IoT-enabled artifacts designed by the users fit into their environments and have high finish, determines the last idea shaping our approach, namely, integrating IoT prototyping with digital fabrication. Our evidence suggests that people who use digital fabrication technologies to create everyday objects for themselves usually consider the objects “professionally looking” (which, as found by [12], may be
Towards Situated User-Driven Interaction Design of Ambient Smart Objects
669
different if objects introduced to users’ everyday environments are designed and manufactures by someone else).
5 Examples of Ambient Smart Objects, Which Can Be Designed by Users Themselves In this section we give three illustrative examples of smart technology concepts which we consider as realistic for users to have come up with and implemented with varying degrees of support from ‘expert’ researchers. Our aim here is to give a broad impression of the levels of user independence different digital fabrication toolkits may seek to support. The smart display case (Fig. 1) reminds the homeowner to dust off the objects on display after a certain amount of time has passed. The implementation consists of a microcontroller equipped with a simple IR motion detector, a small speaker and a battery. If the motion detector does not detect the door being opened within a certain amount of time, an audio que is triggered, alerting the user. These components are contained inside a 3D Fig. 1. An infrared motion detector printed case, designed so that it can be attached hidden in a display case with double sided tape, hidden from plain sight.
Fig. 2. A mug augmented with a heat sensing base
Fig. 3. A smart book divider
The smart mug (Fig. 2) reflects the temperature of its content through a coloured glow. The implementation consists of a microcontroller, a temperature sensor, a colour LED ring and a battery. These components are housed in a 3D printed base designed to fit a mug. The plastic used to print the base acts as a diffusor for the LEDs, softening its glow. Finally, the smart bookshelf (Fig. 3) is a bookshelf augmented with a book divider which allows the user to scan ISBN barcodes. When new books are scanned, they are automatically added to a digital counterpart of the bookshelf on a service like
670
V. Kaptelinin and M. Hansson
Goodreads1. The implemented consists of a raspberry-pi that is wirelessly connected to the Internet, a barcode scanner and a USB powerbank. The components are then housed inside a box assembled out of laser cut pieces of wood, designed to look and function like a book divider. The examples illustrate the four underlying ideas of the approach we propose. First, the aim of the technologies is to make the whole environment cleaner, safer, and more manageable. Second, the technologies are intended for prolonged everyday use, during which their advantages are assumed to be increasingly evident. Third, the technologies are used to augment existing objects, instead of designing brand new display cases, mugs, or bookshelves. Fourth and finally, in all these examples digital fabrication is employed to make an IoT-enabled artifact a part of an everyday setting.
6 Conclusion The approach proposed in this paper aims to give the user a larger role in designing IoT-based solutions for domestic environments. The continued developments of IoTdesign toolkits, as well as new toolkits with a focus on digital fabrication, makes it possible even for novice users to, respectively, create IoT solutions and manufacture high-finish objects. The proposed approach takes advantage of a synergy between these developments to enable the people in to incrementally augmenting their everyday environments by introducing, assessing, and modifying new smart solutions. Our next step in implementing the approach is identifying forms of collaboration between experts and novices, which can support a sustainable culture of user-driven innovation.
References 1. Carrol, J.: Human computer interaction - brief intro. In: The Encyclopedia of HumanComputer Interaction, 2nd edn. https://www.interaction-design.org/literature/book/theencyclopedia-of-human-computer-interaction-2nd-ed/human-computer-interaction-briefintro. Accessed 16 Nov 2019 2. Ambe, A.H., Brereton, M., Soro, A., Roe, P.: Technology individuation: the foibles of augmented everyday objects. In: Proceedings of CHI 2017, pp. 6632–6644. ACM Press, Denver (2017) 3. Ambe, A.H., Brereton, M., Soro, A., Chai, M.Z., Buys, L., Roe, P.: Older people inventing their personal Internet of Things with the IoT Un-Kit experience. In: Proceedings of CHI 2019, pp. 322:1–322:15. ACM, New York (2019) 4. Berger, A., Odom, W., Storz, M., Bischof, A., Kurze, A., Hornecker, E.: The inflatable cat: idiosyncratic ideation of smart objects for the home. In: Proceedings of CHI 2019, pp. 401:1–401:12. ACM, New York (2019) 5. Brackenbury, W., Deora, A., Ritchey, J., Vallee, J., He, W., Wang, G., Littman, M.L., Ur, B.: How users interpret bugs in trigger-action programming. In: Proceedings of CHI 2019, pp. 552:1–552:12. ACM, New York (2019)
1
https://www.goodreads.com/.
Towards Situated User-Driven Interaction Design of Ambient Smart Objects
671
6. Crabtree, A., Hemmings, T., Rodden, T.: Pattern-based support for interactive design in domestic settings. In: Proceedings of DIS 2002, pp. 265–276. ACM, New York (2002) 7. Forlizzi, J.: The product ecology: understanding social product use and supporting design culture. Int. J. Des. 2(1), 11–20 (2008) 8. Rodden, T., Crabtree, A., Hemmings, T., Koleva, B., Humble, J., Åkesson, K.-P., Hansson, P.: Between the dazzle of a new building and its eventual corpse: assembling the ubiquitous home. In: Proceedings of DIS 2004, pp. 71–80. ACM, Cambridge (2004) 9. Kaptelinin, V., Bannon, L.J.: Interaction design beyond the product: creating technologyenhanced activity spaces. Hum.-Comput. Interact. 27, 277–309 (2012) 10. Tchounikine, P.: Designing for appropriation: a theoretical account. Hum.-Comput. Interact. 32, 155–195 (2017). https://doi.org/10.1080/07370024.2016.1203263 11. Odom, W., Zimmerman, J., Davidoff, S., Forlizzi, J., Dey, A.K., Lee, M.K.: A fieldwork of the future with user enactments. In: Proceedings of DIS 2012, pp. 338–347. ACM, New York (2012) 12. Odom, W., Wakkary, R., Lim, Y., Desjardins, A., Hengeveld, B., Banks, R.: From research prototype to research product. In: Proceedings of CHI 2016, pp. 2549–2561. ACM, New York (2016)
The Impact of Homophily and Herd Size on Decision Confidence in the Social Commerce Context Mariam Munawar, Khaled Hassanein(&), and Milena Head DeGroote School of Business, McMaster University, Hamilton, ON, Canada {munawarm,hassank,headm}@mcmaster.ca
Abstract. Online shopping creates uncertainty in consumers, negatively impacting their decision confidence. Social commerce is a new variant of ecommerce, fitted with social media technologies that allow users to observe how others are behaving in the online shopping space. These observations may drive herd behaviour, a tendency of people to imitate others in an effort to reduce uncertainty. Various characteristics of a herd can result in the propagation of herd behaviour. This work-in-progress paper hones in on how homophily and herd size, as characteristics of a herd, can drive herd behaviour and ultimately impact a consumer’s decision confidence in the social commerce context. A research model is proposed and an experimental methodology is outlined. Potential contributions to both theory and practice are discussed. Keywords: Social commerce Homophily
Decision confidence Herd behaviour
1 Introduction and Theoretical Background Online shopping is experiencing exponential growth, with forecasts suggesting an increase in sales in excess of 275% in the next two years, culminating in a $4.9 trilliondollar industry by 2021 [1]. Research suggests that its rapid uptake can be attributed to the increasing integration of social media technologies within the e-commerce interface, resulting in a new variant known as social commerce (s-commerce) [2]. Estimates indicate that almost 70% of online purchases can now be attributed to s-commerce [3], with this percentage likely increasing exponentially in the coming years [4]. Despite the growing usage of s-commerce, shopping online is fraught with uncertainty, given the spatial and temporal separation between consumer and vendor [5]. Uncertainty can have a dire impact on a consumer’s decision-making process, shaking his/her confidence in a purchase decision [6]. Although research in scommerce is gaining momentum [2], particularly in regard to consumers’ buying intentions [7], there is limited research in understanding the factors impacting consumer’s decision confidence (DC) when using this medium. DC is a critical construct in understanding consumer behaviour and is important in understanding an individual’s decision-making process [8]. It has been shown to influence consumers’ purchase intentions [9], whilst reducing their purchase anxiety and uncertainty [10]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 672–678, 2020. https://doi.org/10.1007/978-3-030-45691-7_63
The Impact of Homophily and Herd Size on DC in the Social Commerce Context
673
The s-commerce context presents an opportunity for individuals to lessen the uncertainty inherent in shopping online, and thereby increase decision confidence, through its integration with tools known as social information markers. These markers gauge and track the activities and attributes of online consumers, providing convenient statistics on various measures, such as the total number of purchases for a product [11]. Uncertainty identity theory (UIT) [12, 13] suggests that when individuals are facing situations of uncertainty, such as in online shopping, they strive to find mechanisms to reduce it such as identifying with a group of similar others. Social information markers in the s-commerce medium can serve as a means to facilitate group identification [14]. In particular, group identification through these markers can be driven by herd behaviour. Herd behaviour is a fundamental human tendency that arises in situations when individuals are faced with uncertainty and exposed to the actions of others. It can be defined as “the phenomenon of individuals deciding to follow others and imitating group behaviours” [15, p. 282]. The “herd” is the group an individual chooses to follow, and various aspects of a herd can influence an individual’s decision to follow it. This research focuses on two such aspects: herd size and homophily. While herd size refers to the number of individuals in a group, homophily is the degree to which the individuals in a herd are similar to the individual observing their actions. Social information markers can provide information on both these aspects. There has been a steady rise in empirical research on herd size in the scommerce context [16]. However, the research is sparse, and there remains a gap in understanding the interaction of herd size with homophily, and the mediators they work through to impact an individual’s DC. For example, when individuals identify with groups to reduce uncertainty, chances for group influence become more likely, and these can be mediated through various channels [17]. Research suggests that trust may arise when individuals identify with a group [18]. Furthermore, one’s sense of community may also arise when an individual seeks association with others [18]. As such, the aim of this study is to understand how herd size and homophily influence trust and sense of community, and ultimately consumers’ decision confidence in s-commerce contexts.
2 Research Model and Hypotheses The proposed research model is shown in Fig. 1. The constructs and hypotheses included in the model, along with their appropriate support, are described below. 2.1
Impact of Herd Size
Trust is defined as the “willingness to rely on an exchange partner in whom one has confidence” [19, p. 315]. Herding can be interpreted as a consequence of individuals deferring their decisions to the collective actions of a herd because they feel it is more knowledgeable about the situation than they are. This suggests confidence in the actions of the herd, which implies the presence of trust in the herd’s actions. Because a higher herd size is directly linked to herd behaviour [20], it may also be deduced that a higher herd size may lead to higher levels of trust in the actions of the herd.
674
M. Munawar et al.
Fig. 1. Proposed research model
Furthermore, according to UIT [17, 18], individuals identify with groups in an effort to reduce uncertainty. Because herding implies an individual’s acquiescence to a group’s actions, it may be assumed that it implies his/her identification with that group. This identification can serve as a mechanism for uncertainty reduction, which would ultimately work in increasing decision confidence. This leads to the following hypothesis H1: Herd size is positively related to trust; and H2: Herd size is positively related to perceived decision confidence. 2.2
Homophily
In this research we adopt the view that homophily is the similarity between an individual making a purchasing decision and a herd/group to which he/she may be exposed to. Homophily has been shown to influence online behaviour such that users rely more on information emanating from homophilous online sources [21]. Homophily can be examined from various angles. Individuals can perceive homophily according to value or status [22]. Value homophily suggests an individual’s beliefs, values, attitudes or norms [22], while status homophily refers to those various sociodemographic factors that stratify society (e.g., age, gender, education) [22]. In this research, we adopt the term interests-homophily to encapsulate value homophily, and demographichomophily to represent status homophily. Similarity between individuals in these various homophily dimensions (interests and demographics) has been demonstrated to influence behaviour [23], however, whether one type of homophily is more impactful than the other remains a matter of contention [24, 25]. This research adopts the line of reasoning that interests-homophily may be a stronger influence over demographic-homophily [25]. Additionally, the combination of both types of homophily may have a stronger impact than either alone [24]. According to cue summation theory, the impact on an individual’s learning is increased if the number of cues/or stimuli is increased [26] and, thus, increasing the types of homophily to which an individual is exposed may serve to reinforce its overall impact. Previous research indicates that homophily between individuals predisposes them to higher levels of interpersonal attraction and trust [27]. Individuals are more likely to trust sources that are homophilous to themselves [28–30]. Research further indicates that interests-homophily may be more potent in establishing feelings of trust than demographic-homophily [31]. Based on the above arguments and extrapolating from
The Impact of Homophily and Herd Size on DC in the Social Commerce Context
675
cue summation theory, it is hypothesized that the combination of both types of homophily is more potent in building trust than either alone. This leads to the following hypothesis H3: Homophily is positively related to trust, whereby the combination of both interests and demographics-based homophily has a stronger impact than interests-based homophily, which has a stronger impact than demographics-based homophily. When a consumer is faced with uncertainty in the online shopping context, if he/she observes a high number of individuals making a specific purchase or recommendation, then his/her trust in those recommendations is likely to increase if more people are making them. Furthermore, it may be hypothesized that this relationship is strengthened when the recommendations are made by people who are more homophilous to the individual, given that individuals tend to be attracted more to similar others [32]. This leads to the following hypothesis H4: Homophily will moderate the relationship between herd size and trust, such that the effect is stronger for the combination of both homophily, which itself is stronger than interests-based homophily, which is stronger than demographics-homophily. Homophily also plays an important role in the development of an individual’s sense of community (SOC). One of the basic characteristics of SOC is one’s perception of similarity amongst individuals in a group [33]. Furthermore, this sense of community may be enhanced when homophily is interests-based [32]. Attraction within groups increases as individuals discover greater attitudinal similarity as opposed to demographic similarity [23]. Because an individual’s SOC is rooted in the attraction one feels towards a group [33], it is hypothesized that interests-homophily may play a more influential role than demographic-homophily in influencing SOC, and that the combination of both may have a stronger impact than either alone. This leads to the following hypothesis H5: Homophily is positively related to sense of community, whereby the combination of both homophily has a stronger impact than interests-based homophily, which has a stronger impact than demographics-based homophily. 2.3
Trust, Sense of Community and Decision Confidence
Decision confidence is the belief about the goodness of one’s judgments or choices. It is important to highlight that DC and uncertainty have often been applied in relation to the same construct, such that confidence in one’s choice is viewed as the converse of subjective uncertainty in one’s decisions [34], implying that the greater an individual’s DC, the less the uncertainty. In regards to trust, the literature suggests that trust helps users overcome feelings of uncertainty within the e-commerce context [35]. As trust serves as an uncertainty-reducing mechanism, it builds up one’s DC. Furthermore, one’s SOC is rooted in the extent to which an individual identifies with a group [36]. As noted in UIT, group identification serves as a mechanism to reduce an individual’s uncertainty [17, 18], and thereby may increase his/her decision confidence. This leads to the following hypotheses H6: Trust is positively related to decision confidence; and H7: Sense of community is positively related to decision confidence.
676
M. Munawar et al.
3 Methodology To empirically validate our research model, a controlled experimental design approach will be utilized. A fictitious website resembling a social commerce platform will be developed for the experiments to eliminate branding bias. Subjects will complete various tasks involving shopping related decisions, after which a survey will be administered to assess the various constructs utilized in this study. All measures used in this survey will be adapted from existing and validated scales to ensure content validity. A 2 4 factorial design will be employed to assess the main and interaction effects of herding and homophily on an individual’s decision confidence within the scontext. Herd size will be operationalized as the number of recommendations made for a product/service. The threshold to determine what is perceived to be a small and large herd will be determined through a pilot study. Homophily will be operationalized using a homophily index that will be shown as a percentage representing the subject’s similarity with those consumers who are making recommendations. The homophily index will be developed by asking participants to fill out a demographics and/or interests-based questionnaire. Participant will be told that this questionnaire will be used to form the homophily index. There will be four incremental levels for homophily: (0) no homophily index; (1) demographic homophily; (2) interests-homophily; or (3) both demographic and interest-based homophily. 3.1
Model Validation, Sample Size and Post Hoc Analyses
Structural equation modelling using the partial least squares method (PLS) will be used to assess the proposed model. PLS is utilized because of its ability to model latent constructs under conditions of non-normality, and small-medium sample sizes [37]. The measurement model will be assessed for item loading, internal consistency, convergent and discriminant validities. The quality of the structural model will also be assessed. Open-ended questions will be used to gain a deeper understanding of how subjects arrived at their decisions. Content analysis techniques will be applied to identify emerging trends and patterns in the open-ended questions responses. A saturated model analysis will also be performed to identify any significant nonhypothesized relations. The minimum sample size to validate a PLS model is 10 times the largest of: (i) the items for the most complex construct scale in the model, and (ii) the greatest number of paths leading to a construct [37]. The highest number of items in this model is four (the SOC scale [38]). Hence, the minimum sample size is 40. However, to account for our design requirement of 40 participants per cell (in a 2 4 factorial design), 320 subjects will be recruited.
4 Potential Contributions and Limitations This research attempts to bridge the gap that exists in understanding the roles of homophily and herd size within the s-commerce context, and how social information markers can aid in developing feelings of trust and community which can ultimately
The Impact of Homophily and Herd Size on DC in the Social Commerce Context
677
impact one’s decision confidence. Furthermore, this study aids in presenting an understanding of the role of the different types of homophily in impacting behaviour in the online commercial setting. By attempting to map the factors leading to one’s decision confidence within s-commerce, insights can be generated on how to best develop platforms that result in a consumer’s overall satisfaction. Implications for practice involve the recommendation of a novel tool (homophily index) to help capture the similarity of users, so that they are better able to make decisions with higher confidence.
References 1. Orendorff, A.: Global ecommerce statistics and trends to launch your business beyond borders (2019). https://www.shopify.com/enterprise/global-ecommerce 2. Zhang, K.Z., Benyoucef, M.: Consumer behaviour in social commerce: a literature review. Decis. Support Syst. 86, 95–108 (2016) 3. Ramachandran, M.: The evolution of social shopping in the ecommerce landscape (2018). https://www.adweek.com/digital/ 4. Smith, A., Anderson, M.: Social media use in 2018. www.pewresearch.org 5. Pavlou, P.A., Liang, H., Xue, Y.: Understanding and mitigating uncertainty in online exchange relationships: a principal-agent perspective. MIS Q. 31, 105–136 (2007) 6. Lee, A.S.: Editorial. MIS Q. 25(1), iii–vii (2001) 7. Hajli, N.: Social commerce constructs and consumer’s intention to buy. Int. J. Inf. Manag. 35(2), 183–191 (2015) 8. Oney, E., Oksuzoglu-Guven, G.: Confidence: a critical review of the literature and an alternative perspective for general and specific self-confidence. Psychol. Rep. 116(1), 149– 163 (2015) 9. Laroche, M., Kim, C., Zhou, L.: Brand familiarity and confidence as determinants of purchase intention: an empirical test in a multiple brand context. J. Bus. Res. 37(2), 115–120 (1996) 10. Locander, W.B., Hermann, P.W.: The effect of self-confidence and anxiety on information seeking in consumer risk reduction. J. Mark. Res. 16(2), 268–274 (1979) 11. Munawar, M., Hassanein, K., Head, M.: Understanding the role of herd behaviour and homophily in social commerce. In: SIGHCI 2017 Proceedings, vol. 11 (2017) 12. Hogg, M.A.: Subjective uncertainty reduction through self-categorization: a motivational theory of social identity processes. Eur. Rev. Soc. Psychol. 11(1), 223–255 (2000) 13. Hogg, M.A., Abrams, D.: Towards a single-process uncertainty-reduction model (1993) 14. Hajli, N., Lin, X., Featherman, M., Wang, Y.: Social word of mouth: how trust develops in the market. Int. J. Mark. Res. 56(5), 673–689 (2014) 15. Baddeley, M.: Herding, social influence and economic decision-making: socio-psychological and neuroscientific analyses. Philos. Trans. R. Soc. B: Biol. Sci. 365(1538), 281–290 (2010) 16. Cheung, C.M., Xiao, B.S., Liu, I.L.: Do actions speak louder than voices? The signaling role of social information cues in influencing consumer purchase decisions. Decis. Support Syst. 65, 50–58 (2014) 17. Hogg, M.A.: Uncertainty–identity theory. Adv. Exp. Soc. Psychol. 39, 69–126 (2007) 18. Hogg, M.A.: Uncertainty-identity theory. Handb. Theor. Soc. Psychol. 2, 62 (2011) 19. Moorman, C., Zaltman, G., Deshpande, R.: Relationships between providers and users of market research: dynamics of trust. J. Mark. Res. 29(3), 314–328 (1992) 20. Rook, D.W.: The buying impulse. J. Consum. Res. 14(2), 189–199 (1987)
678
M. Munawar et al.
21. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011) 22. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001) 23. Phillips, K.W., Northcraft, G.B., Neale, M.A.: Surface-level diversity and decision-making in groups: when does deep-level similarity help? Group Process. Intergroup Relat. 9(4), 467– 482 (2006) 24. Ensher, E.A., Grant-Vallone, E.J., Marelich, W.D.: Effects of perceived attitudinal and demographic similarity on protégés’ support and satisfaction gained from their mentoring relationships. J. Appl. Soc. Psychol. 32(7), 1407–1430 (2002) 25. Liden, R.C., Wayne, S.J., Stilwell, D.: A longitudinal study on the early development of leader-member exchanges. J. Appl. Psychol. 78(4), 662 (1993) 26. Severin, W.: Another look at cue summation. AV Commun. Rev. 15(3), 233–245 (1967) 27. Ruef, M., Aldrich, H.E., Carter, N.M.: The structure of founding teams: homophily, strong ties, and isolation among US entrepreneurs. Am. Sociol. Rev. 68, 195–222 (2003) 28. Brown, J.J., Reingen, P.H.: Social ties and word-of-mouth referral behaviour. J. Consum. Res. 14(3), 350–362 (1987) 29. Matsuo, Y., Yamamoto, H.: Community gravity: measuring bidirectional effects by trust and rating on online social networks. In: Proceedings of the 18th International Conference on World Wide Web, pp. 751–760. ACM (April 2009) 30. Golbeck, J.: Trust and nuanced profile similarity in online social networks. ACM Trans. Web (TWEB) 3(4), 12 (2009) 31. Taylor, D.A., Altman, I.: Intimacy-scaled stimuli for use in studies of interpersonal relations. Psychol. Rep. 19(3), 729–730 (1966) 32. Byrne, D.: An overview (and underview) of research and theory within the attraction paradigm. J. Soc. Pers. Relatsh. 14(3), 417–431 (1997) 33. Davidson, W.B., Cotter, P.R.: Psychological sense of community and support for public school taxes. Am. J. Community Psychol. 21(1), 59–66 (1993) 34. Sniezek, J.A.: Groups under uncertainty: an examination of confidence in group decision making. Organ. Behav. Hum. Decis. Process. 52(1), 124–155 (1992) 35. McKnight, D.H., Choudhury, V., Kacmar, C.: Developing and validating trust measures for e-commerce: an integrative typology. Inf. Syst. Res. 13(3), 334–359 (2002) 36. McMillan, D.W., Chavis, D.M.: Sense of community: a definition and theory. J. Community Psychol. 14(1), 6–23 (1986) 37. Chin, W.W., Marcolin, B.L., Newsted, P.R.: A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Inf. Syst. Res. 14(2), 189–217 (2003) 38. Peterson, N.A., Speer, P.W., McMillan, D.W.: Validation of a brief sense of community scale: confirmation of the principal theory of sense of community. J. Community Psychol. 36(1), 61–73 (2008)
Multimodal Intelligent Wheelchair Interface Filipe Coelho1,2(B) , Lu´ıs Paulo Reis1 , Br´ıgida M´ onica Faria1,3 , 1,3 Alexandra Oliveira , and Victor Carvalho2 1 Laborat´ orio de Inteligˆencia Artificial e Ciˆencia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal {lpreis,aao}@fe.up.pt 2 Optimizer, Pra¸ca Dr. Francisco S´ a Carneiro 219 2Esq, 4200-313 Porto, Portugal {filipe.coelho,victor.carvalho}@optimizer.pt 3 Escola Superior de Sa´ ude do Instituto Polit´ecnico do Porto (ESS-IPP), Rua Dr. Ant´ onio Bernardino de Almeida 400, 4200-072 Porto, Portugal [email protected]
Abstract. Intelligent wheelchairs allow individuals to move more freely, safely, and facilitate users’ interaction with the wheelchair. This paper presents the results focused on the study and analysis of the state of the art related to topics as interaction, interfaces, intelligent wheelchairs and the analysis of the Intellweelsproject. The main goal is to create and implement a multimodal adaptive interface to be used as the control and interaction module of an intelligent wheelchair. Moreover, it will be important to have in mind the usability, by facilitating the control of a complex system, interactivity, by allowing the control using diverse kinds of input devices, and expansibility, by integrating easily with several intelligent external systems. This project features a complex input/output system with linked parameters simplified by a node system to create the input/output actions with automatic input recording and intuitive output association as well as a powerful, device-agnostic design, providing an easy way to extend the inputs, outputs and event the user interface. Results reveal a positive users’ feedback and a responsive way when using the multimodal interface in simulated environment.
Keywords: Adaptability Multimodal interfaces
1
· Intelligent wheelchair · Interaction ·
Introduction
With the increase in life expectancy observed in the last decades, the need for care has also increased. People in need might, in particular, have minimal autonomy in terms of mobility throughout the surrounding and in terms of interaction with other systems that might be at their disposal. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 679–689, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_64
680
F. Coelho et al.
Wheelchairs presents as major technological devices for individuals with a low level of autonomy and/or mobility. Although traditional wheelchairs can provide some degree of mobility, it is necessary that the user has the proper upper body strength and muscle control. Even in individuals with proper physical conditions for handling a traditional wheelchair, the powered one is a technological shift that can be controlled more comfortably, can provide more independence mainly in longer distances movement. Hence, a powered system allows and eases the control of the wheelchair. However, intelligent wheelchairs are a more important revolution than powered wheelchairs because they mark a shift in the paradigm and in the ability to control something rather than providing a mere technological shift [22]. Intelligent wheelchairs allow the users to move more freely, safely, to play games, and to run actions without any user interaction. For example, a single command can tell the intelligent wheelchair to follow a wall or to navigate between rooms inside a house. However, even intelligent wheelchairs might not accommodate all types of users. There might be people that can not speak an entire sentence correctly or might not have the accuracy to manoeuvre a joystick. It is with these difficulties that a multimodal interface can make a difference and improve the interaction, by allowing different types of controls to perform the same action. The main objective of this project is to create an adaptive and easily configurable multimodal interface for an intelligent wheelchair. This multimodal interface should be able to handle several types of inputs and use them to trigger any available action in the intelligent wheelchair. This multimodal interface should allow combining individual input actions into sequences of inputs made in series and parallel. These input sequences can then be associated with any output sequences that, once again, are output actions in series and parallel. This method provides a lot more versatility in controlling the systems attached to the interface, by allowing the user to choose the actions and the devices he/she is more comfortable or able to reproduce correctly. To evaluate the multimodal interface performance was used a simulator where the user controls the firstperson character around a house-interior environment with the help of several, very distinct input devices, such as a joystick, voice recognition or head motion. The user moves around in the environment, performing a set of tasks. This paper is organized with five sections. The first one is composed by the introduction and related work. The second presents the developed multimodal interface and in the next section, a brief description of the simulator is presented. Section four shows experiments and results using the simulator. In Section five it is presented the conclusions and pointed out some directions for future work. 1.1
Related Work
Considering the final goal of this project under the Intellweelsumbrella is to provide a multimodal interface to be integrated into an intelligent wheelchair, this project relates to many research topics, mainly the interaction interfaces topics themselves, like Human-Machine/Computer Interaction, User Interfaces [2–5,12–14,24], but also the study of Intelligent Wheelchairs [7,11] and the IntellweelsProject itself.
Multimodal Intelligent Wheelchair Interface
681
“The term “user interface” refers to the methods and devices that are used to accommodate interaction between machines and the human who use them” [26]. In a mechanical system the interface can be seen as a communication provider between the system itself and the human who controls it. Machines provided men with the tools and the power to accomplish more, but as they became more complex, controlling them became harder. So, naturally, the interfaces started being developed and implemented as a mean of hiding the complex actions and information the system has behind more uncomplicated and intuitive access points, allowing a more natural and powerful way to control, monitor or diagnose them from more straightforward access points like screens and keyboards [26]. Interfaces between the human and the machine are what makes them machines usable, interactive, and useful. Stuart Card refers to the Human-Computer Interaction (HCI) as a simple topic. He gives the example of a person trying to accomplish a given task, such as writing an essay or flying an airplane. The person could realize those tasks without any computer, using a quill pen and ink for writing the essay or hydraulic tubes to manipulate the plane’s systems. However, using an human-computer interface resembles the process of their task, and that is what makes HCI distinctive [16]. HCI studies aim to bring human-centered technology discoveries and provide better ways of interaction [1]. Technology is evolving, new methods of interacting with computers have been developed and improved throughout the years, and it is now possible to use voice commands, touch, gestures, eye movement, and neural activity to interact with computers [15]. Due to the large number of interaction methods available in current days, it is essential to know which ones are the most important and the most reliable. A survey conducted by Churchill, Bowser, and Preece in 2016 on more than 600 people concluded the most critical areas in HCI education, being interaction design the most critical subject, experience design the most important topic, desktop, mobile, and tablet are the most critical interfaces and gesture, keyboard, sensors, and touch are the most essential input modalities [1]. User Interfaces are at the core of the interaction between a human and a computer system, and they are arguably one of the most important factors to the success of an intractable system [17]. Despite how good the system is, the interface is what makes it usable, and if the interface cannot provide a good, intuitive experience to the user, then the system also becomes not usable [17]. Nowadays, there are several types of interfaces, being graphical user interfaces (GUI) and web user interfaces the two most common [26]. Adaptive Interfaces with the appearance of the new forms of computing devices like smartphones, tablets and wearables, the information that was usually only present in desktop-like screens needed to be adapted to these new devices. However, these devices offered a much more convenient and personal experience, and thus, the interfaces should reflect that. One of the ideas to personalize the experience relies on the use of Adaptive User Interfaces (AUI) [20]. AUI design is conceptually hard because it involves constant monitoring and analysis, as well as a deep understanding of the context, the user and his environment in runtime, and adapt to the new parameters without any user intervention [20].
682
F. Coelho et al.
Requires a strict set of pre-defined rules as well as a base of understanding to make the proper changes to the user interface [20]. Multimodal Interfaces provide a more transparent, flexible, and efficient interaction between the human and the computer [23]. While they provide a more natural interaction, closer to the one between humans, they also provide robustness to the system, by increasing the redundancy of the input methods or by complementing the information [23]. These improve the interaction, resulting in better productivity and usability of the system, by being able to combine several distinct types of inputs, for example, ones that require physical contact (keyboard, mouse, joystick, trackpad, etc.) with ones that does not require physical contact (like gestures, speech, eye tracking, etc.) [6,8–10,18]. Intelligent Wheelchairs: people with impairments, disabilities, or special needs rely on wheelchairs to have a higher degree of autonomy. The appearance powered wheelchairs (PW) brought immediate improvements to the overall life quality of their users since using a manual wheelchair can be impossible for some users. However, even PWs can be hard to maneuver depending on the user. Intelligent wheelchairs (IW) [9,21] or Smart Wheelchairs (SW) are one of the most predominant ways to solve this and similar issues that PW users might be facing. The idea is to improve on existing PW technology by adding degrees of intelligent systems to enhance further the users’ ability to control the wheelchair [25].
2
Multimodal Interface Description
The multimodal interface was developed using the free, open source Godot Engine [19]. This framework was chosen due to its nature as a game engine and its licensing. Being a game engine, Godot is interactive, responsive, portable and feature-rich, providing the possibility of creating high quality visuals, leveraging an already powerful input system, can be deployed to many operating systems and a C/C++ interoperability, as well as compelling networking implementations, means practically any external system can interact with Godot with just a thin layer in between at most. The project created used the built-in language GDScript, which has syntax a functionality very close to python, but it is tailored to better real-time processing, fixing issues that other scripting languages have like threads, garbage collection or native types. The following content in this section the several systems that make up the multimodal interface, as well as the implemented devices to interact with the multimodal interface. 2.1
Architecture
The multimodal interface is divided into two big groups: systems and devices. Systems are the control channels of the multimodal interface, providing all the functionality and logic by being the connection between all the devices and the user. The devices are exactly that, devices, that provide input to the interface and output from it, being completely independent and decoupled from the systems implementation, providing an easy way to expand the multimodal interface
Multimodal Intelligent Wheelchair Interface
683
interaction capabilities and functionality without having to change the way it was designed. Input System: The input system is the most complex system of the multimodal interface. It is responsible for saving, loading and updating the inputs file on disk (for making the input sequences persistent) and by building a node tree of input groups. The input groups represent input actions that must be triggered simultaneously, and their tree branch represent the sequence of actions. For actions that have the same beginning but one has a longer sequence than the other (for example, pressing A+B or pressing A+B+C), a timer with a configurable delay is started, and if the user does not continue the sequence by the time the timer ends, the first output action is called. This timer does not make the interface less responsive because as soon as any input combination is reduced to a single possibility, if the combination matches the one in the mappings, the output sequence is triggered immediately without waiting for the timer to end. To compose all these tasks, the input system has various components: – Input Signatures: These are a data class that represents a single input action, containing the device name, action, the type of action (press, release or motion) and the parameters the input sent. – Action Triggers: These are a group of functions that input devices can call to interact with the input system. These register the input action in the system and initiate the matching process of the input system. There are three functions available for this, one for a button press (can also be used as single action), other for button release (to complement the button press when the action has the same ID) and third for motion inputs (inputs that can be triggered multiple times per second, such as moving a joystick). Each of these functions must receive a device identification, an action identification, and an optional dictionary of parameters. These are then converted to an adequate input signature and sent to the input groups through an input event. – Input Events: An input event is simply an extension of the Godot class InputEventAction, modified to include an input signature and parameters. This event is created and triggered by the action trigger functions and parsed by the input groups. – Input Groups: Input groups are a vital part of the input system. Each group represents the actions that must be triggered simultaneously, there is a parent input group for each action, and each input group can have a child input group representing the next combination. The input group is responsible for parsing the input event sent by the input system and validate the signature against the ones he has. When all the signatures are matched, the child input group is enabled and the current one disabled. If there is no child input group, this means that the sequence for the current action is met and the input system informs the multimodal interface that the output action can be triggered. Since the Godot Engine processes the input from the lower level nodes of the scene tree to the upper level, this method works really efficiently.
684
F. Coelho et al.
Output System: As the input system, the output system is also responsible for saving, loading and updating the outputs file on disk (for making the output sequences persistent) and by building a node tree of output groups, and is composed by the following components: – Output Signatures: These are a representation of a single output action. They are a data class that stores the output device name, the action, a reference to a method call and the parameters the output action accepts. – Output Groups: The output groups, like the input groups, constitute a tree-like structure of the output actions to perform given an input sequence is met. Each output group can have any given number of output signatures and executes them in separate threads in parallel. 2.2
Parameter System
This system is composed only by a parameter signature component and acts in between the input and output system. The parameter signature stores what the value is and where it should be sent. The job of the parameter system is to monitor all the recognized input actions, read their parameters, and according to the output action they are triggering, associate the value with a corresponding output signature. 2.3
Graphical User Interface
The main graphical user interface (GUI) is composed by a header that contains the exit and option buttons, among other information widgets, and a tab container. The GUI system then provides methods to the devices to hook up to this tab container and the options menu. This way, any device can add a main interface by adding a tab to the tab container and a completely new screen by registering a new entry in the options menu. To display new screens, were created some auxiliary methods, providing stack-based navigation. That way, the user can easily navigate forwards and backward in the stack and have a consistent, logical navigation flow. 2.4
Channels
Channels can be seen as pipes between the input and output systems to allow or block the communication and can be activated and deactivated through output actions. It allows the user to have a set of input combinations to control one thing, and by changing the channel and have the same inputs producing completely different results. Keyboard and Joystick represent two basic input devices that provide inputs with binary and analog values to the multimodal interface. Another big of using Godot is that leveraging the already built in input event system to implement the joystick mapping, any human-interface device recognized by the operating system should work without any modification. Speech Recognition is a Windows-only input device because it uses the
Multimodal Intelligent Wheelchair Interface
685
Windows speech recognition engine under the hood. The sentences to be recognized can be set in the multimodal interface, and any recognized sentence is sent to the input system as an action press. Head Motion is a custom made input device capable of detecting the pitch, roll and yaw accelerations of the user’s head. To accomplish this, a printed circuit board was designed to accommodate a gyroscope and a WiFi module mounted on a helmet (Fig. 1). The gyroscope communicates the values to the WiFi module via a serial connection, and the WiFi module connects to the multimodal interface using sockets, taking advantage of the Godot’s built-in network features. Dashboard is a user interface device that displays the connected devices and some system messages. This interface is used as a primary interface of the multimodal interface, has resizable panels, and serves an example of how to implement input devices. Node System is a complex device used to configure the mapping of the input actions (Fig. 2). It leverages functionality provided by the input and output systems, such as reading the input devices, creating and deleting actions from both the input and output systems or hooking up to the input processing. This system allows providing an incredibly intuitive way of mapping new actions because this device can record input sequences and list all the available output devices. It was built upon the Godot’s graph system, a system that provides visual nodes that can be connected. This way, even the most complex action can be mapped by any user without requiring a high technical skill.
Fig. 1. Circuit board on the left and helmet with 3D printed support on the right.
3
Simulator
The simulator is capable of using all the top-level functions of the interface and proving the usefulness of a system like the one developed, in a real-life context. This simulator allows evaluating the functionality and usability of the interface. The simulator is a device that combines the capabilities of input, output, and the graphical interface systems. The simulated environment is a 3-dimensional indoor environment, where the user controls the first-person character and has
686
F. Coelho et al.
Fig. 2. Node system example.
several interactive objects. This device takes advantage of many of the Godot Engine functionality, since the simulator is in its essence, a simple game. The simulator registers the main viewport as a tab to the multimodal interface and registers several output devices for controlling the lights and doors per division or interacting with the object that is in front of the player. Any of these actions can be associated in action mappings through the node system device and leveraging the channel system and all the devices. This way, the user could navigate around the house with the controls he is the most comfortable.
4
Experiments and Results
To evaluate the system was conducted a simulation with 30 different individuals, with 53.3% being male, 40% students and 13.3% teachers, 50% in college/university and 26.7% on high school. They were asked how easy do they find technology, with the responses being 43.3% very easy, 50% easy and 6.7% difficult. Then they were asked to perform the same set of tasks 4 times (consisting of opening doors and turning lights and televisions on) each time with different input device(s) (1st time with the keyboard, second time with the joystick, third time with speech recognition aided by the keyboard or joystick and a final custom task with any combination of devices they choose) and in the end to classify their experience. The evaluation feedback was positive, with some of the collected data visible in the 4 charts (Fig. 3). A very large majority of the users found the multimodal interface and the device they were operating often or always responsive with the exception of the speech recognition. Using multiple devices at the same time showed to have at the very least good efficiency, and most users who changed the previously-configured input actions found the action mapping process very easy. When asked if the several input devices they tested covered a lot of options for interacting with the system, they considered their diversity to be very good to excellent. The Table 1 show the times each experiment took. Two experiments with the head motion sensor are also included, under the column named HMS, proving the possibility to accomplish the same tasks with this special device, while taking longer. It is important to note that the custom task has the lowest average and minimum times, showing how useful the system can be by adapting to the user preferences.
Multimodal Intelligent Wheelchair Interface
687
Table 1. The average, standard deviation, minimum and maximum values of the experiments in seconds. Keyboard Joystick (KB) (JS)
Speech Recognition (SR)
Costume Task (CT)
Head Motion Sensor (HMS)
x ¯
0:02:37
0:02:29
0:02:36
0:02:17
0:06:17
s
0:00:57
0:01:33
0:00:33
0:00:57
0:01:04
Minimum 0:01:31
0:01:19
0:01:46
0:01:16
0:05:32
Maximum 0:04:54
0:07:27
0:03:38
0:04:36
0:07:02
Fig. 3. Experiments: responses from the test subjects about the task they were given and the system perceived performance.
The 30 user experiments provide a solid ground on how the system is performing and what people feel when performing actions with it. Despite having people with so many different backgrounds, being able to perform the same actions with different input devices did present at least one comfortable, reliable and easy configuration to adapt to the users’ preferences.
5
Conclusions and Future Work
Wheelchairs have been a crucial element in today’s society by disabled people to move around. With the technological development, powered wheelchairs started, allowing people to move with a simpler interface, like a joystick. This change was a welcome improvement, and while it gave more freedom to people in wheelchairs, they were still limited to simple movement actions. Nevertheless, powered wheelchairs inspired the creation of intelligent wheelchairs, a complete paradigm shift by introducing intelligent behaviors in the system like wall following, obstacle avoidance or autonomous trajectory calculation and movement.
688
F. Coelho et al.
Still, these systems were more complex, and impaired people need simple interfaces so they can maneuver all the actions and trigger all the functions available in the system. To conclude, this system can aid its users to control existing technology, responding to a diverse and distinct number of input actions, and improving their quality of life. The modular system with a high level of modality and configuration, with the possibility to add input and output actions quickly, also provides a simple integration point to other systems. Acknowledgements. This work is supported by project IntellWheels2.0, funded by Portugal2020 (POCI-01-0247-FEDER-039898). This research was partially supported by LIACC (FCT/UID/CEC/0027/2020).
References 1. Churchill, E.F., Bowser, A., Preece, J.: The future of HCI education: a flexible, global, living curriculum. Interactions 23(2), 70–73 (2016) 2. Faria, B., Dias, D., Reis, L.: Multimodal interaction robotic devices in a simulated environment. In: 11th Iberian Conference on Information Systems and Technologies (2016) 3. Faria, B., Dias, D., Reis, L., Moreira, A.: Multimodal interaction and serious game for assistive robotic devices in a simulated environment. In: Cunha, B., Lima, J., Silva, M., Leitao, P. (eds.) 2016 IEEE International Conference on Autonomous Robot Systems and Competitions, ICARSC, pp. 253–258 (2016) 4. Faria, B., Ferreira, L., Reis, L., Lau, N., Petry, M.: Intelligent wheelchair manual control methods a usability study by cerebral palsy patients. In: Correia, L., Reis, L., Cascalho, J. (eds.) EPIA 2013. Lecture Notes in AI, vol. 8154, pp. 271–282 (2013) 5. Faria, B., Reis, L., Lau, N.: Cerebral palsy EEG signals classification: facial expressions and thoughts for driving an intelligent wheelchair. In: Vreeken, J., Ling, C., Zaki, M., Siebes, A., Yu, J., Goethals, B., Webb, G., Wu, X. (eds.) 12TH IEEE International Conference on Data Mining Workshops, ICDM Workshops, pp. 33–40 (2012) 6. Faria, B., Reis, L., Lau, N.: Manual, automatic and shared methods for controlling an intelligent wheelchair adaptation to cerebral palsy users. In: Calado, J., Reis, L., Rocha, R. (eds.) 13th International Conference on Autonomous Robot Systems (2013) 7. Faria, B., Reis, L., Lau, N.: A survey on intelligent wheelchair prototypes and simulators. In: Rocha, A., Correia, A., Tan, F., Stroetmann, K. (eds.) New Perspectives in Information Systems and Technologies, volume 1, Advance in Intelligent Systems and Computing, vol. 275, pp. 545–557 (2014) 8. Faria, B., Reis, L., Lau, N.: User modeling and command language adapted for driving an intelligent wheelchair. In: Lau, N., Moreira, A., Ventura, R., Faria, B. (eds.) 2014 IEEE International Conference on Autonomous Robot Systems and Competitions, pp. 158–163 (2014) 9. Faria, B., Reis, L., Lau, N.: A methodology for creating an adapted command language for driving an intelligent wheelchair. J. Intell. Robot. Syst. 80(3–4), 609– 623 (2015) 10. Faria, B., Reis, L., Lau, N.: Adapted control methods for cerebral palsy users of an intelligent wheelchair. J. Intell. Robot. Syst. 77(2), 299–312 (2015)
Multimodal Intelligent Wheelchair Interface
689
11. Faria, B., Reis, L., Lau, N., Moreira, A., Petry, M., Ferreira, L.: Intelligent wheelchair driving: bridging the gap between virtual and real intelligent wheelchairs. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) Progress in AI. Lecture Notes in Artificial Intelligence, vol. 9273, pp. 445–456 (2015) 12. Faria, B., Reis, L., Lau, N., Soares, J., Vasconcelos, S.: Patient classification and automatic configuration of an intelligent wheelchair. In: Fred, F.J. (ed.) Communication in Computer and Information Science, vol. 358, pp. 268–282 (2013) 13. Faria, B., Vasconcelos, S., Reis, L., Lau, N.: A methodology for creating intelligent wheelchair users’ profiles. In: Filipe, J., Fred, A. (eds.) ICAART: 4th International Conference on Agents and Artificial Intelligence, volume 1. ICAART (1), vol. 1, pp. 171–179 (2012) 14. Faria, B., Vasconcelos, S., Reis, L., Lau, N.: Evaluation of distinct input methods of an intelligent wheelchair in simulated and real environments: a performance and usability study. Assist. Technol. 25(2), 88–98 (2013) 15. Hasan, M.S., Yu, H.: Innovative developments in HCI and future trends. Int. J. Autom. Computing 14(1), 10–20 (2017). https://doi.org/10.1007/s11633-0161039-6 16. Johnson, J., Card, S.K.: Foreword. In: Designing with the Mind in Mind, pp. ix–xi, January 2014. https://doi.org/10.1016/B978-0-12-407914-4.06001-2 17. Kammer, D., McNutt, G., Senese, B., Bray, J.: Chapter 9 - designing an audio application. In: Bluetooth Application Developer’s Guide, pp. 379 – 417. Syngress, Burlington (2002) 18. Karpov, A.A., Yusupov, R.M.: Multimodal interfaces of human-computer interaction. Herald Russ. Acad. Sci. 88(1), 67–74 (2018) 19. Linietsky, J., Manzur, A.: Free and open source 2D and 3D game engine. https:// godotengine.org/ 20. Park, K., Lee, S.W.: Model-based approach for engineering adaptive user interface requirements. In: Liu, L., Aoyama, M. (eds.) Requirements Engineering in the Big Data Era, pp. 18–32. Springer, Heidelberg (2015) 21. Petry, M., Moreira, A., Faria, B., Reis, L.: IntellWheels: intelligent wheelchair with user-centered design. In: 2013 IEEE 15th International Conference on e-Health Networking, Applications and Services, Healthcom 2013, pp. 414–418. Healthcom (2013) 22. Pineau, J., West, R., Atrash, A., Villemure, J., Routhier, F.: On the feasibility of using a standardized test for evaluating a speech-controlled smart wheelchair. Int. J. Intell. Control Syst. 16(2), 124–131 (2011) 23. Reeves, L., Lai, J., Larson, J.A., Oviatt, S., Balaji, T., Buisine, S., Collings, P., Cohen, P., Kraal, B., Martin, J.C., McTear, M., Stanney, K.M., Su, H., Wang, Q.Y.: Guidelines for multimodal user interface design. Commun. ACM 47(1), 57– 59 (2004) 24. Reis, L., Faria, B., Vasconcelos, S., Lau, N.: Multimodal interface for an intelligent wheelchair. In: Ferrier, J., Gusikhin, O., Madani, K., Sasiadek, J. (eds.) Informatics in Control, Automation and Robotics. Lecture Notes in Electrical Engineering, vol. 325, pp. 1–34 (2015) 25. Vanhooydonck, D., Demeester, E., Huntemann, A., Philips, J., Vanacker, G., Brussel, H.V., Nuttin, M.: Adaptable navigational assistance for intelligent wheelchairs by means of an implicit personalized user model. Robot. Auton. Syst. 58(8), 963– 977 (2010). https://doi.org/10.1016/j.robot.2010.04.002 26. Zhang, P.: Chapter 13 - Human -machine interfaces. In: Advanced Industrial Control Technology, pp. 527–555. William Andrew Publishing, January 2010
Innovation and Robots in Retail - How Far Away Is the Future? Manuel Au-Yong-Oliveira1(&), Jacinta Garcia2, and Cristina Correia2 1
GOVCOPP, Department of Economics, Management, Industrial Engineering and Tourism, University of Aveiro, Aveiro, Portugal [email protected] 2 Prio Energy, Aveiro, Portugal {jacinta.garcia,cristina.correia}@prioenergy.com
Abstract. We live in an age of constant technological evolution where we witness an increasing need for adaptation, in view of market challenges. These transformations create interactions between human beings and machines. This article is a case study based on qualitative and quantitative research, which approaches the implementation of a robotized system in a retail convenience shop, at a petrol station. The main objective was to focus on understanding the applicability of this type of system in a shop, as well as to ascertain its future in the short, medium and long term. Field research performed involved a personal interview with an executive at the Portuguese firm PRIO Energy – the Director of Research, Development and Innovation. The essence of the interview was to enquire about the robot experiment and to understand how innovation occurs, and where the ideas come from. Two other firm employees were also contacted for their testimonials on the project – the convenience store manager and the project innovation manager. Light was shed on which phases the innovation project went through. Finally, the authors has access to the results of a consumer survey involving 210 customers, who interacted with the robot station during its test phase. While not all feedback was good, namely some consumers are concerned that robots will be replacing humans in the workplace, leading to unemployment; the vast majority of the 210 survey respondents saw the experience as positive and one which they would repeat in the future. Keywords: Robot project
Innovation Open innovation Steps in an innovation
1 Introduction We currently live in a world where technology may be seen everywhere. The impact of technology has dictated how firms adapt to the external environment. Digital technology, in particular, has affected the younger generations. Technology areas in exponential development include robotics, automation and artificial intelligence, among others. The adaptation to these technologies will depend on the receptivity to the concept, and on non-resistance. However, firms may realize that if they do not adopt new technology they will not be able to survive in the marketplace. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 690–701, 2020. https://doi.org/10.1007/978-3-030-45691-7_65
Innovation and Robots in Retail - How Far Away Is the Future?
691
Robots, especially, when acting as a complement to employee work, make it possible for employees to do more intellectual tasks, thus making better use of their time. More routine and monotonous tasks may be done by robots, minimizing the margin for error. Employees may turn to focus on improved customer service. This study aims to better understand technological developments in convenience stores in the retail sector of petrol stations. An interview was performed with the Director of Research, Development and Innovation in the Portuguese firm PRIO Energy, which recently tested a robot station in one of its convenience stores, at a PRIO petrol station (in Gaia, Northern Portugal). Two employees linked to the project also gave their testimonials. Additionally, the results of a customer survey involving users of the robot station are discussed. While not all feedback was good, namely some consumers are concerned that robots will be replacing humans in the workplace, leading to unemployment; the vast majority of the 210 survey respondents saw the experience as positive and one which they would repeat in the future.
2 Methodology This is a case study which investigates a contemporary phenomenon in its real life context. The case study involved the collection of both qualitative and quantitative data. An interview was performed, for thirty minutes, on 20-10-2019, with Cristina Correia, the Director of the Department of Research, Development and Innovation, in the PRIO group. The interview took place in Aveiro, in the PRIO offices there, and followed an interview script. The questions were on innovation and the aim was to understand how innovation happens in the PRIO group, namely related to the robot station placed in a PRIO retail shop (in a pilot test), in the North of Portugal. The interview was audio recorded and fully transcribed. The first phase of the interview was to understand how the Department of Research, Development and Innovation functioned. The objective was to understand and analyze how innovative products and services appear, as well as to learn about the current focus of the department at this moment in time. In the second phase of the interview, the discussion was on how a retail robot was tested in a convenience store at one of the PRIO petrol filling stations. The objective here was to understand what led to the innovative experiment and what steps are involved in the innovation process. Finally, the interview focused on what the future holds for the Department of Research, Development and Innovation in the PRIO group. A literature review was also undertaken to support the study. The focus was on the evolution of technology, on the digital transformation in the retail sector, and on how technological innovation may serve as a differentiator, including with robots which interact with employees and customers. As two of the authors are employees of the firm, access was granted to company documents in order to enrich the study.
692
M. Au-Yong-Oliveira et al.
Two additional testimonials were gathered, from people involved in the project – the petrol station manager, who was responsible for the robot in the convenience store, and the project manager of the innovation-seeking pioneering robot-in-retail project. The authors also had access to a survey administered to 210 users of the robot station during the robot station testing phase.
3 A Look at the Literature Industry has been implementing profound change and developments since the First Industrial Revolution (in the late 18th century). Britain was at the forefront of technological change with the introduction of machine manufacturing (moving away from the handicraft economy) and new production techniques, materials and energy sources (the use of iron and steel; the use of coal, the steam engine, the internal combustion engine, the use of electricity and petroleum, etc.). Productivity increases and intensive production soon followed [1], at a time considered of mass production and distribution, with the factory system and the division of labour. Production costs and prices subsequently came down significantly [1], while we also saw developments in transportation and communication (the steamship, the automobile, the airplane, the telegraph and the radio) [2]. Since then other industrial revolutions have followed, including the Fourth Industrial Revolution, or what is known today as Industry 4.0, or the digitalization of processes and the integration of digital ecosystems. Digital firms have become successes, in an age of digital interfaces and innovative services based on data and information [3]. The Amazon case [4] is a case of a pioneer in futuristic convenience stores called “Amazon Go”, where we see self-checking out, with the absence of a cashier. After downloading the “Amazon Go” app customers have only to scan codes in the physical store space. Intelligent computational equipment is used, which identifies the consumer. Software advances support cameras and weight sensors which accompany customers in the store. A check-out at the end will assure that the customer receives an invoice for his/her purchases. Amazon believes that the future of commerce will apply technology in this fashion, as described above. The basis is the reduction of time used in the purchasing experience. Firm employees will focus on the preparing of fresh food and produce, emphasizing improved customer service. In Portugal, in 2019, the Jerónimo Martins [supermarket] group created a store in the Nova School of Business and Economics, in Lisbon, where there are also no cashiers, queues or traditional money used. It is a “Lab Store” directed at the younger university population to test new technologies and analyse new possible avenues for the service. According to [5], Amazon has 45,000 robots in its warehouses to fulfill orders. As regards robots in retail, “the question that traditional retailers face is whether or not investing in robots to operate inside their stores can actually improve business and help them stay relevant with consumers ” [5]. Some retailers – such as Home improvement retailer Lowe’s – have already done the above and have introduced the LoweBot – to
Innovation and Robots in Retail - How Far Away Is the Future?
693
help customers find items and solve other basic customer needs that would have normally been tended to by a human worker [5]. A case study in the pharmaceutical sector describing an automatic system in Portugal can be found in [6]. “The advantage of the system is that it allows for a more personalized service, as the employee does not need to be absent at any time during the delivery of the service” [6, p. 1]. The system is referred to as a robot which fetches items from the warehouse and delivers the items to the front desk automatically, where the employee will be giving his/her undivided attention and advice to a customer. Automation in pharmacies helps front-office employees to avoid repetitive manual tasks which would otherwise cause wasted time [7]. Relevant innovation is thus occurring in Portugal, in retail pharmacies, Portugal being an intermediate technology development country. Automation is also occurring in other industries, such as in the automotive and transport industry. Namely, [8] call attention to the ethics problems related to selfdriving cars. Where are we heading on issues such as programming vehicles to save whom, in case of an accident? [9] and [10] study the issue of automation and unemployment. Indeed, “a big concern in seeking to understand the evolution of the future of jobs and the demand for skills with the increase of automation is shown by the wide range of existing literature regarding the subject” [9, p. 206]. As we shall see herein, there does exist a certain apprehension in society as regards robots in particular, and they are seen as a threat to human beings as they will take away their jobs. What remains to be seen is whether new and more interesting and more challenging jobs will be created in the process, as some optimists state. Innovation, at the firm-level, comes in various forms, including product innovation (of goods and services – the most prominent), process innovation (changing production and delivery methods), organizational innovation (new organizational methods), and marketing innovation (new approaches to marketing involving, for example, pricing methods, product design and packaging, and product promotion) [11]. [12] first spoke of open versus closed innovation in 2003. Open innovation (adopted by our case study firm) is a form of innovation favoured by many companies today. Open innovation involves recognizing that a firm cannot employ all of the best in the field and thus outside a firm resides a lot of talent that we need to tap. [13, p. 75] state that open innovation is “more influential on the performance of employees than closed innovation.” Open innovation is more economical and thus makes innovation more accessible to smaller firms, with less resources, including startups [12]. Innovation practices are also seen to be culture-specific and thus firms would do well to consider what may work in their particular environment [14].
4 Case Study: The First Retail Robot in a PRIO Convenience Store PRIO Energy, S.A (which has a network of 250 petrol filling stations in Portugal), since its Foundation (the PRIO group was founded in 2006) has always had a culture involved in the development of innovation, demonstrating a constant need to apply that
694
M. Au-Yong-Oliveira et al.
innovation in its business model. Innovation is indeed one of its pillars and one of its values which the firm communicates. PRIO currently focuses on distributing and selling liquid fuels and producing biofuels. The Slogan “Top Low Cost” which defined the company was based on superior customer service with a quality product at an accessible price. The company has since changed its positioning several times and the trusting relationship with its customers has been key to it wanting to move away from being seen as a low-cost firm. Currently, the firm wants to be recognized as a firm which provides a quality service at a fair price. The corporate culture is based on applying a strict management system aimed at optimizing cost levels and resource usage all across the different business units, with the investment in innovation increasing ever more, the objective being to be seen in the market as an innovative company. PRIO, in relation to the competition, in order to differentiate itself, must find ways to improve its strategies. Thus, two and a half years ago, the firm felt the need to create a dedicated team focused on research, development and innovation. Currently, the group has two employees under the responsibility of the Director of Research, Development and Innovation. Despite the group having a short history in this area, in 2018 they made an investment of approximately one million euros in projects and different events in which it was involved. The creation of innovative products and services at PRIO is done in various ways. New products and services arise to satisfy needs in the various business areas, as well as through opportunities and novelties that appear linked to business partners and events. At times solutions are found and proactively presented to the business units. In an initial phase, to develop products and services, an open innovation model is adopted, by working with external partners, namely universities, institutes, startups, suppliers and customers. Some projects are developed by a restricted group of people, and the individuals are bounded by a commitment to secrecy. According to the Director of Research, Development and Innovation, the projects may be seen from an incremental perspective or from a more radical perspective. The incremental perspective involves adding a service to the portfolio (e.g. project APP PRIO GO – payment of petrol via a smartphone app). A radical Project example would be the testing of a robot in a convenience store at a PRIO petrol station (a PRIO station in Gaia-Porto) (Figs. 1 and 2). The idea of having a robot in a PRIO retail shop appeared via an external innovation program for startups, named “Jump Start”, in the 2018 edition. An important pillar of the Program is the customer experience. “Jump Start” is a program which openly shares, with the outside world, via a website, various promotional events, challenges or areas where innovation is intended, which require solutions, new services and new products. The idea is always that “if you have a firm or solution which makes sense to the value chain, fill in an application and together we will build something”. Startups are thus involved from the outset and are invited to submit projects with their product/service. As a prize for the best idea or service improvement concept PRIO offers the possibility to put a pilot project into practice as a partner. In the 2nd edition of the “Jump Start” program there was a submission by a Spanish firm Macco Robotics which presented the retail robot project, which was then cocreated with PRIO (PRIO elaborated the business case). PRIO at the time had no idea
Innovation and Robots in Retail - How Far Away Is the Future?
695
that it was going to be the first company in the world in the petrol industry to have a retail robot in one of its convenience stores. Robots in retail are still really taking their first steps, whilst in industry they have existed already for a number of years. According to information given by the Director of Research, Development and Innovation, the robot station was placed in the Gaia PRIO petrol station on 07-10-2019 and remained there until 15-10-2019 (test phase). The robot station was programmed to have a feminine voice, speaking in Brazilian Portuguese. It was able to move its eyes and communicate. The robot actually only had a body, with no legs; and thus was fixed (it is a table-top robot). It could move its arms in an area of around one metre in breadth, serving coffee and Tabaco, among other products.
Fig. 1. News item – PRIO testing its first robot station in Gaia
Fig. 2. The robot station implemented by PRIO
696
M. Au-Yong-Oliveira et al.
In order to understand the opinions and receptivity of the users, a survey was developed with the following questions: “How would you classify your purchasing experience with the robot station?”; “Do you see value in the service given and in the purchasing experience with the robot station?”; “Would you repeat the experience again?”. Thus, the objectives of the analysis were to evaluate the robot station’s service in the purchasing process and if it indeed added value. A total of 210 users were involved in the test, of whom 71.4% gave very positive feedback. Generally speaking the users were receptive to the experience, having enjoyed it very much. Furthermore, 90% of the respondents recognized value in the service and purchasing experience. Additionally, 93.3% of the respondents stated that they would repeat the experience. As regards the petrol station employees, they saw the project as being interesting, having enjoyed the employee-robot interaction. Furthermore, the employees did not see the robot station as being a threat to their jobs (a current theme, involving the replacement of human participation in the workplace). There is some uncertainty regarding the future. There are however two paths which may be followed: having a complementary service in the convenience stores using robot technology; or designing a robot station to promote the brand, promoting also products while in a partnership with certain brands – e.g. Super Bock beer and Delta coffee, which were major players in this pilot project. These brands which supported the test intend to be at the forefront of this technology and to understand the receptivity of consumers. In the future, PRIO’s innovation objective does not involve placing robot stations in all of its convenience stores. There is, however, the possibility of robotizing a convenience store completely (100%) in order to reap retail benefits where there is currently no convenience store. Albeit, these technologies may be seen as a disadvantage from a security viewpoint, and as concerns protection against vandalism. The concept applied to a convenience store intends to be a perfectioning of the “Amazon Go” concept. The robot station may also involve it being a complement to the service provided in larger petrol stations, which have a lot of customers going in and coming out. As such, the objective will thus be to have restricted customer areas where they may have a drink or a snack while escaping long queues. Employees in this model will be freed from tasks so as to perform other less routine tasks, providing a better customer experience. The Director of Research, Development and Innovation affirms that PRIO is not facing a change in its organizational culture, as the project was limited in time and the medium to long term effects are still not known. However, the Director is convinced that the technological developments integrated as a strategy will not threaten its employees and that this application will be a complement to work being done, giving PRIO employees the chance to do more intellectual tasks.
Innovation and Robots in Retail - How Far Away Is the Future?
697
5 Additional Field Work 5.1
Testimonial by Nuno Gonçalves (Petrol Station Manager) – on Robots and Customers
Nuno Gonçalves, the PRIO petrol station manager in Gaia, reinforced that PRIO is the first petrol company in Portugal to have a robot in its convenience stores and serving customers, which is a reason to be proud. PRIO is a pioneer in Innovation and Development and this step thus made a lot of sense. Nuno Gonçalves accompanied the experiment every day and it was very enriching to see customers interact with the robot. Indeed, it made the station more dynamic and one has only to see the videos made of the interactions to realize how positively the robot was received. The unique relationships developed for this automatic system were customer-robot and operator-robot. Customers stated that they would repeat the experience in the future, which they enjoyed. In actual fact, the robot did the same tasks as a vending machine (serving coffee and beer), albeit with a greater impact, due to the visual interaction and the robot using its arms, much like a human being. Due to the resemblance to a human being, some customers referred that the robot was replacing people in the workplace (seen to be negative and causing unemployment). In the future, robots may be used in premium petrol stations as they are an innovative attraction for customers. On the other hand, retail robots such as the one used are still very expensive to implement. The investment will have to be studied to see if it is worth it, in terms of benefits reaped. 5.2
Testimonial by Ana Sofia Machado (Project Manager)
Technical Aspects of the Robot Project The retail robot project was born in the 2nd edition of the “Jump Start” program, in 2018. Macco Robotics was one of the startups in the competition and after a discussion of the strategic alignment and value proposition of the project, and how it contributes to the future vision of PRIO, this led to the startup being one of the winners. Between July 2018 and January 2019, PRIO and Macco designed the project concept together as well as the development phases. After some months of discussion they reached a consensus which predicted the following steps: 1. Kick-off – Development – Specification of the functionality and design; software development; UI – user interfaces; integrations; laboratory tests. 2. Functional prototype – Functional test in a petrol station with a convenience store Functional test in a controlled environment; first interactions with customers; interface adjustments. 3. Autonomous prototype – Automatic testing in a petrol station – Test in a real environment; functioning with no direct human control; autonomous interaction with the client. 4. PRIO robot – end result.
698
M. Au-Yong-Oliveira et al.
From January 2019 onwards the development work started. In February the Macco team came to Portugal with one of its robots for a demo/ official kick-off meeting with all of the internal stakeholders. In May, Macco started to work on the interactions with products and with the coffee machine made available by Delta. For the first test phase (functional test in a petrol station with a convenience store) a selection of four types of products was made: hot drinks (available in the Delta machine), water, canned fruit juice and orange juice. During the development phase the list of products changed. In July, when the PRIO team went to visit the Macco offices, for an intermediate demonstration, the robot was already being trained to serve hot drinks, canned drinks, water in a cup and beer. This visit was good in so far as the progress of the project and the robot’s progress were verified and the expectations aligned for the functional testing in a petrol station. The robot was already able to perform the tasks successfully, however adjustments to its movements were necessary, as well as in the design of the robot station, its software, among others. The inclusion of beer had not been initially planned, but Macco decided to prepare this in the intermediate demonstration so as to showcase the potential of the robot to serve this kind of products. Serving beer involves more complex movements and requires more fine motor skills and thus is a visually and technically more appealing capability than just simply delivering a coffee or a canned drink. That was when it was considered that approaching the Super Bock Group (a major beer manufacturer) made sense, for them to become a partner in the project. The robot was also seen as a means of communicating supplier brand products, for publicity purposes. Just as firms pay for their brands to appear in strategic places in supermarkets and hypermarkets, thus the robot could also serve as a communication platform associated to technology and innovation. In this perspective, Delta Coffee and Super Bock beer were brands associated to the pilot experiment, supplying products and equipment for the demo and showing interest as stakeholders. Note that both Delta and the Super Bock Group are both recognized for their innovativeness and so they saw a new window of opportunity in the project and as a way to break into the robotics world, in retail and serving the client. In this process PRIO was mainly concerned with decoration tasks and logistics and transport and with the preparing of the station to receive the robot. The pilot study was not immaculate and some errors occurred. Namely, the coffee dispensing machine failed on more than one occasion to provide a cup automatically and in the exact right fixed place required by the robot. The robot could only serve if the exact positions were respected, just as occurred in the training sessions. More fine-tuning by Macco is thus under way so that the final robot may be in place in the first semester of 2020. The project manager has no doubt that in the future we shall see more mixed interactions involving people and robots which will inhabit the same space. It must be emphasized however that though the customers were very enthusiastic, they also were wary that robots may be taking people’s jobs in the future, thus causing unemployment. The status quo in society will be affected and people naturally fear the unknown. This having been said, customers were much more receptive to the concept than was initially imagined. A customer survey was done and the results were surprisingly very positive.
Innovation and Robots in Retail - How Far Away Is the Future?
699
The robot – Project Impact The robot space was quite large and thus caused some discomfort, along with the “intrusion” of the Macco engineering team to implement the robot. During the set-up process, the presence of wires, computers and tools interfered with the convenience store’s normal functioning and upset customers negatively (according to feedback that they gave). Work at night was preferred but some work had to happen during the day. Customers also showed some concern, as mentioned above, with jobs and with the future of humankind, despite the positive impact of the robot on them. PRIO store employees reassured customers that robots such as these would share tasks with human beings (collaborating) rather than replace them in the workplace and as regarding customer service. Children were delighted with the project and a number were brought on purpose to the store to see the robot functioning. The favourite task by the robot, and according to customers, was serving beer. This may also have been because the beer was sold at a cheap price. Albeit, the robot serving beer was an impressive, elegant and complex feat, even during the demo. PRIO business partners were invited to the store to see the robot and the feedback was also very good. The serving of beer was a highlight. In marketing terms the Project generated some buzz. A Spanish channel came to the PRIO petrol station to do an interview and the posts on social media, all unsponsored, had a lot of visualizations and interaction. As this project was a pioneer in Portugal, PRIO was generating and feeding a discussion on robotics in the society of the future. Thus, despite some negative reactions to the program on social media, the fact that PRIO, a company linked to the energy sector, and where retail is a secondary activity, was leading a discussion on innovation and future tendencies was a very positive factor.
6 Conclusion From this study we may conclude that the Portuguese firm PRIO made some important initial steps in technology applied to convenience stores. In terms of the technology involved, it is considered to be of intermediate development, invented with a concern for the adaptation and improvement of customer service. The retail robot station is still, however, very far from what is already possible in the industrial sector. On the other hand, the literature lacks work in this area and this study aims to fill that gap, in whatever small way, and to show how important technological innovation is to the survival of a firm in the marketplace. Most of the automation in Portugal in the retail sector is very similar to the Amazon concept. Complete robotization in convenience stores is almost inexistent. PRIO Energy is known for being innovative in the sector where it operates and it was the first firm to apply a robot station in one of its convenience stores. Albeit, in order to improve on the test done, PRIO should complement its customer service by making available more products and services and improving much along the same lines as Amazon.
700
M. Au-Yong-Oliveira et al.
After analysing the results of the customer survey, which PRIO very kindly gave us, the results are encouraging as concerns the future of this technology. In a future survey, we suggest that the age group of the respondents also be gathered, even though the current tendency is for consumer markets to be treated as being quite uniform – even across different age groups and different generations. Despite robots already being a certainty in industrial and logistics environments, in the retail sector robots are taking their first steps. The future is still unknown, but there are certainly various applications to the system, in helping to create value both for firms as well as for customers.
References 1. Jensen, M.C.: The Modern industrial revolution, exit, and the failure of internal control systems. J. Finan. 48(3), 831–880 (1993) 2. Encyclopaedia Britannica. https://www.britannica.com/event/Industrial-Revolution. Accessed 24 Nov 2019 3. PWC Portugal. https://www.britannica.com/event/Industrial-Revolution. Accessed 24 Nov 2019 4. Observador. https://observador.pt/2018/01/22/na-nova-loja-da-amazon-da-para-fazer-comprascom-a-sensacao-de-nao-pagar-e-o-futuro/. Accessed 24 Nov 2019 5. Emerj. https://emerj.com/ai-sector-overviews/robots-in-retail-examples/. Accessed 25 Nov 2019 6. Caetano da Rosa, R., Au-Yong-Oliveira, M., Gonçalves, R., Branco, F.: Um caso de estudo do setor farmacêutico em Portugal - A implementação de um sistema automatizado na Farmácia Giro para aumentar a competitividade [A case study in the pharmaceutical sector in Portugal - The implementation of an automated system at Farmácia Giro to increase competitiveness]. In: The proceedings of the 13th Iberian Conference on Information Systems and Technologies (CISTI 2018), pp. 1–6, Cáceres, Spain, June 2018 7. Khader, N., Yoon, S.W.: The performance of sequential and parallel implementations of FPgrowth in mining a pharmacy database. In: Proceedings of IIE Annual Conference, p. 2601, Institute of Industrial and Systems Engineers (IISE) (2015) 8. Jael, B., Au-Yong-Oliveira, M., Branco, F.: Self-driving cars and considerations on ethics – where are we heading with automation? In: Liargovas, P., Kakouris, A. (eds.) Proceedings of the 14th European Conference on Innovation and Entrepreneurship (ECIE 2019), Kalamata, Greece, 19–20 September, vol. 1, pp. 424–433 (2019) 9. Au-Yong-Oliveira, M., Almeida, A.C., Arromba, A.R., Fernandes, C., Cardoso, I.: What will the future bring? the Impact of automation on skills and (Un)employment. In: Rocha, Á., et al. (ed.), New Knowledge in Information Systems and Technologies, Advances in Intelligent Systems and Computing (Book of the AISC series), WorldCIST 2019, La Toja Island, Spain, 16–19 April, vol. 930, pp. 206–217, Springer (2019) 10. Vieira, A.I., Oliveira, E., Silva, F., Oliveira, M., Gonçalves, R., Au-Yong-Oliveira, M.: The role of technologies: creating a new labour market. In: Rocha, Á., et al. (ed.) New Knowledge in Information Systems and Technologies, Advances in Intelligent Systems and Computing (Book of the AISC series), WorldCist 2019, La Toja Island, Spain, 16–19 April, vol. 930, pp. 176–184. Springer (2019) 11. OECD and Eurostat: Oslo Manual – Guidelines for Collecting and Interpreting Innovation Data, 3rd edn. OECD Publishing, Paris (2005)
Innovation and Robots in Retail - How Far Away Is the Future?
701
12. Chesbrough, H.W.: The era of open innovation. MIT Sloan Manag. Rev. 44(3), 35–41 (2003) 13. Alawamleh, M., Ismail, L.B., Aladwan, K., Saleh, A.: The influence of open/closed innovation on employees’ performance. Int. J. Organ. Anal. 26(1), 75–90 (2018) 14. Kratzer, J., Meissner, D., Roud, V.: Open innovation and company culture: internal openness makes the difference. Technol. Forecast. Soc. Change 119, 128–138 (2017)
Usability Evaluation of Personalized Digital Memory Book for Alzheimer’s Patient (my-MOBAL) Anis Hasliza Abu Hashim-de Vries1(&), Marina Ismail1, Azlinah Mohamed1, and Ponnusamy Subramaniam2 1 Faculty of Computer Science and Mathematics, Universiti Teknologi MARA Malaysia, 40450 Shah Alam, Malaysia [email protected] 2 Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
Abstract. The rising population of older adult has contributed to the increasing number of elderly suffering from Alzheimer’s disease. Abundant new exciting applications have been developed to assist people with Alzheimer’s to support a better quality of life. The design of software product for older people should be carefully carried out as it could reduce the gap between computer technology and older people. Efficient and effective user design needs to be considered when developing an application for the elderly with memory and cognitive impairments. This paper presents a report on the usability evaluation of a personalized digital memory book application (my-MOBAL) for people with Alzheimer’s disease. It is developed for a 74 years old woman who is diagnosed with mild AD. The application was evaluated on its usability and functionality using seven (7) experts in Gerontology and HCI with more than ten (10) years experience in their fields. The results show that the application is user friendly, simple and easy to navigate. It was also suggested that the application would be a suitable tool to assist people with AD to use in their non-pharmacological therapy session. Keywords: Alzheimer’s Disease (AD) Assistive Technology (AT) Dementia Information and Communications Technology (ICT) Usability
1 Introduction Alzheimer’s disease (AD) is the most common cause of dementia. It is a type of brain disease that deteriorates gradually. A report from Alzheimer’s Disease Foundation Malaysia (2016) [1] mentioned that currently, there are approximately 50,000 people in Malaysia suffering from AD. In 2018, Alzheimer’s has become the sixth (6th) major cause of death in the United States, meanwhile an estimation of 5.8 million Americans of all ages suffering from Alzheimer’s in 2019 [2, 3]. AD can bring problems with memory, behavior and thinking ability. Early clinical symptom of Alzheimer’s usually minor memory problems that includes difficulty to remember recent conversations, names and events [3]. Some other signs of Alzheimer’s © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 702–712, 2020. https://doi.org/10.1007/978-3-030-45691-7_66
Usability Evaluation of Personalized Digital Memory Book
703
are confusion and disorientation, problems with speech and language, as well as difficult to make decisions and planning [4]. The progress of dementia symptoms differs from person to person. The effects are obvious when they are in severe stage of Alzheimer’s [3]. As the symptoms become more severe, social communication with AD patients could be more difficult. Nevertheless, technology could help to improve individuals who are affected by AD [5]. Although AD has no cure, there are some treatments available to slow down the progress of the disease. Drugs, psychosocial and lifestyle interventions are some of the options available to reduce cognitive and behavioral symptoms [6]. Nonpharmacological (NP) intervention is another solution to assist people with AD in their treatments. It has shown to be effective without unwanted side effects [7], as well as flexible and cost effective [6]. There are many different types of NP, such as Reminiscence Therapy, Music Therapy, Cognitive Stimulation Therapy, Art Therapy and etc. [8, 9]. The mobile application developed for this study integrates Reminiscence Therapy (RT) and Cognitive Stimulation Therapy (CST) into one application. RT refers to recalling past memories of one’s life events, whilst CST is an evidence-based intervention that provides various exciting activities that could stimulate thinking, concentration and memory.
2 Assistive Technology (AT) Intervention The use of Information and Communications Technology (ICT) in health care has benefited many patients in the recent years. It has attracted many researchers’ attention to develop applications and systems that could assist people with health difficulties. ICT could help health experts and caretakers to give better support to people with AD, that has cognitive, functional and behavioral issues [10]. Interventions could help people with AD to still be involved in activities and to boost self-awareness [11]. Assistive Technology (AT) is an item or equipment that facilitate individuals with disabilities to incorporate in society [12] and to perform activities that they unable to carry out [13]. Various AT are available to help the elderly in meeting their daily requirements. AT could also be used to facilitate more effective conversation between people with AD and their family members. AT could also be integrated with nonpharmacological therapies, such as Reminiscence Therapy (RT). Memory wallet [14], Digital Memory Book (DMA) [15], and iWander [16] are some of the ATs that are used to support people with AD to improve their quality of life and wellbeing. Augmented Reality (AR) and digital gaming technologies are also being used to promote communication in people with AD [17, 18]. A multimedia reminiscence system called CIRCA with touch screen interface was used to assist communication between people with dementia and caretakers, and the outcome showed positive impact in people with dementia [19]. The continuous advancement in ICT particularly in mobile technology has enlarged the scope of IT-based assistive technologies to support better quality of life for visually impaired people with [12]. It is also reported that touch screen technology has its
704
A. H. Abu Hashim-de Vries et al.
potential in psychosocial therapy to help people with dementia to improve their cognitive function [20]. 2.1
Usability
Mobile devices and their applications give significant benefits to their users concerning their portability, location and accessibility [21]. The interaction between users and the products such as websites and applications is referred to usability [22]. It is an important element that affects the successful of software applications [23]. The usability of mobile devices and their applications differs from other computer systems, as they have different features [21]. Holzinger [24] stated usability as “the ease of use and acceptability of a system for a particular group of users, performing specific tasks in a specific condition”. Although there are many usability evaluation methods available, software developer should quickly be able to choose the best method suited to every situation in a software project [24]. Preece et al. (2002, as cited in [25]) described that evaluation is a preparation of data collection that relates to user’s perception of a product concerning a specific user in a particular situation. Alshehri and Freeman [25] asserted that the objectives of usability test are to obtain a third-party evaluation of user characteristics, as well as to measure the effectiveness and efficiency of user being able to view the content or to perform task on a specified device. Usability in touchscreen-based mobile devices is necessary and should be considered when initiating a new product [26]. As mentioned by Inostroza et al. [26] new usability evaluation is needed particularly for touchscreen-based mobile devices. Further usability evaluation studies for tailored Alzheimer’s mobile apps are needed as the existing studies are still in the early stages and need more improvement as well as validation [27]. This paper presents the usability evaluation of a personalized digital mobile application called my-MOBAL that was developed for a patient who is diagnosed with mild AD.
3 Methodology The study was a single participant exploratory case study. The approach was also used in many studies [15, 28–30]. Janosky [31] recommended that single subject could be used when investigating the appropriateness result for a specific patient. Data collection from the expert review was carried out to evaluate the usability of the application. 3.1
Personalized Digital Memory Book
A personalized digital memory book for Alzheimer’s patient application (my-MOBAL) is an application that was designed and developed for an AD patient, (see Fig. 1). myMOBAL application is a mobile application that combines two types of nonpharmacological therapies into one system, reminiscence therapy (RT) and cognitive stimulation therapy (CST).
Usability Evaluation of Personalized Digital Memory Book
705
The application consisted of three main activities, daily task reminder, reminiscence activity and cognitive training activity, (see Fig. 2). Daily task reminder contains daily activities of the patient. It includes time to eat and favorite programs on the television. Alarm will go on when one of the events needs to be performed. The caretakers provided all the activities’ information during the design of the application. Reminiscence activity contains picture of family members and events. Cognitive training activity for my-MOBAL consists of interactive games, jigsaw puzzle and flash card games.
Fig. 1. The introduction screen of the application that displays the patient’s name
Fig. 2. The splash screen that displays the three options in the application
Fig. 3. My-MOBAL was installed on 10-inch tablet
The application was installed on a 10-inch mobile device with touchscreen accessibility, and ran on Android platform (see Fig. 3). my-MOBAL was used by the patient for eight weeks, with the frequency of two times per week. Since my-MOBAL was installed on a mobile device, it could be accessed almost any time and in any place. According to Clinical Practice Guideline from Ministry of Health Malaysia [32], patients who received reminiscence therapy for eight weeks have the opportunity to improve their cognitive function. 3.2
Expert Review
The application was evaluated by a group of experts in two areas: gerontology and psychology (n = 4), and human computer interaction (n = 3). Three of the experts were male, and four were female. They were all based in Malaysia and were senior lecturers with PhD at the universities. They all have more than 10 years of experience in their respected fields. Two sets of questionnaires have been created. The gerontology and psychology experts were interviewed on ease of use, interface, concept and approach. They were provided with a questionnaire of 25 questions, with Likert scale from 1 to 4 (strongly disagree [1] to strongly agree [4]). Meanwhile, the HCI experts were given 12 questions that were on the ease of use, interface and navigation. The Likert scale for the questionnaire was also from 1 to 4 (strongly disagree [1] to strongly agree [4]). During the evaluation, the evaluators went through the interface elements a few times, inspecting various interactive elements. At the end of the questionnaire, the evaluators were asked to provide comments or recommendations on how to improve the usability of the application.
706
3.3
A. H. Abu Hashim-de Vries et al.
The Participant
The application was designed individually for an AD patient, Mrs. S. The patient is a 74 years old Malay lady who is a widower and is diagnosed with AD with Mild Cognitive Impairment (MCI). Mrs. S was assessed by a trained psychologist using Mini-Mental State Examination (MMSE) [33], and Clinical Dementia Rating (CDR) [34]. Mrs. S score was 19/30 for MMSE and 1.0 for CDR. She completed her primary school education and later worked in a factory at her hometown. She lives with her daughter who is also her caretaker. Mrs. S has four children, who sometimes take turn to look after her. 3.4
The Design
The development of my-MOBAL application began with design process. A few diagrams such as flowchart, navigational map and structure diagram were sketched to get some ideas on the flow of the system and to determine the scope of the application. Figure 4 shows the structure diagram of my-MOBAL. During design process, the contents for the application were gathered. Pictures of family members and activities were collected, and then were filtered to get the photos that have happy memories. Mizen [35] recommended that smiling pictures of the patient and other people in the photo smiling should be included in order to evoke the happiness feeling. Choosing appropriate content is necessary, as not all users have the same experience with certain events. Contents need to be selected properly so that it will not give sad or bad feeling to the patient [30]. PERSONALIZED DIGITAL MEMORY BOOK FOR ALZHEIMER'S PATIENT DAILY TASK REMINDER Meal time Medicine consumption
REMINISCENCE
COGNITIVE TRAINING
Children
Flash Card Games
Family Members
Jigsaw Puzzle Games
TV Programme Fig. 4. Structure diagram for my-MOBAL application
Storyboards that provide visual representation of the design were created so that details of the location for every item in the application can be visualized. Figure 5 shows a set of the storyboards for my-MOBAL application that were sketched before developing the actual system. The interface for the application was designed to be simple and easy to understand so that all items are visible to the user [20].
Usability Evaluation of Personalized Digital Memory Book
707
Figure 6 displays the screenshots of my-MOBAL application. The design was aimed to improve cognitive function and enhance reminiscence for people with AD. Usability factors such as intuitive design, ease of learning, memorability and user satisfaction were applied to the design [22]. Consistency is important when designing the interface for software product. Taking that into consideration, icons and buttons were placed at the same location for every screen so that the user would be familiar with the flow of the application [20]. The interface was kept clear and consistent to avoid user from feeling confused. Use of text and languages were simplified in order to reduce complexity, whilst large and clear font was utilized in the application so that user could easily understand the content. The information area was positioned in the center of the screen so that user could pay attention to the messages [20, 30, 36].
Fig. 5. Storyboard design for my-MOBAL
Colours were selected carefully. Dark backgrounds with lighter foregrounds’ colours (text and other elements) were used for this application to enhance readability. Buttons with vibrant colours and large interface were used so that user can touch them easily. Bigger buttons and touch areas gave better performance especially for the elderly. The use of touch screen could help people with Alzheimer’s to take better control of the application. my-MOBAL was designed to be user friendly so that user has full control when operating the application. Messages in the application were kept simple and polite to support patient with AD.
708
A. H. Abu Hashim-de Vries et al.
my-MOBAL features two interactive games, jigsaw puzzle and flash card games. These games are used to stimulate cognitive function for people with AD. Yamagata et al. [5] stated that using brain, memory and solving problem games could help to stimulate the brain and reduce the AD symptoms. Touch screen technology gives flexibility for users to use. With touch screen intervention, the games would be more interesting and fun. Games in my-MOBAL came with a few levels. Levels were not used to challenge the user, but to give varieties in order to make the games more fun and entertaining. User can proceed to the next level when successfully finished the current game.
Fig. 6. Design of my-MOBAL application
4 Results and Discussion Figure 7 displays the result from gerontology and psychology experts. The average experts’ score for ease of use was 3.8, interface was 3.4, concept was 3.3 and Approach was 3.1. From the feedback, the experts agreed that my-MOBAL application is easy to use, and its interface suitable for an AD patient. They also agreed that my-MOBAL application could assist AD patients in their non-pharmacological therapy sessions. They also conceded that my-MOBAL would be a suitable tool to facilitate people with AD in their treatment.
Usability Evaluation of Personalized Digital Memory Book
709
Fig. 7. Average findings for gerontology and psychology experts
The feedback from HCI experts is presented in Fig. 8. HCI experts had came to conclusion that my-MOBAL is an easy to use application and its interface is suitable for the target audience (elderly). The average score for ease of use was 3.4, interface was 3.4, and navigation was 2.9. The navigation’s score was quite low because one out of three respondents agreed that the application contains a few subfolders that might be confusing for the user to navigate. The respondent commented that design for the elderly with impairment should be simple and clear. Any sub folders should be avoided and all pages should be made easy to access without the needs to go through many pages. The respondents from HCI group suggested that daily tasks reminder could be improved by using info graphic format to make it more attractive and easy to understand.
Fig. 8. Average findings for HCI experts
Respondents from both groups approved that interface design for my-MOBAL is suitable for older adults. However, they were concerned with older adults that have
710
A. H. Abu Hashim-de Vries et al.
issue with visual impairment since the size of the tablet is only 10-in. Therefore, it is necessary to keep the application simple and not too crowded with information. Icons or pictures can be used where applicable, in exchange from text. They also agreed that the use of touch screen has the advantage to navigate easily and would be appropriate for the application.
5 Conclusion Computer technology has the potential to support people with AD to have better quality of life and social interaction. Designing software product for the elderly should be carried out properly as it could result in the success or failure of the application. Good design could decrease the gap between older adults and computer technology. Therefore, it could increase more confidence for them to use the technology that they could get the benefit. This paper contributes to the growing understanding of usability evaluation on personalized mobile application product developed for people with AD. There are only a few studies investigated on AT usability for people with AD. Therefore, more research should be conducted and explored in this domain [27, 37]. The research presented the design of my-MOBAL application and the usability evaluation that was carried out by a few experts in gerontology & psychology, and Human Computer Interface (HCI) fields. my-MOBAL application is a personalized digital memory book for Alzheimer’s patient that integrates reminiscence therapy (RT) and cognitive stimulation therapy (CST) into one application. Feedbacks from the experts showed that the application is identified convenient to be used by older adults. Although the results were based on single subject case study, it could perform as a platform for future design research that implementing RT and CST in non-pharmacological treatment for people with AD.
References 1. Alzheimer’s Disease Foundation (Malaysia) (2016) About Alzheimer’s. http://www.adfm. org.my. Accessed 29 Oct 2019 2. Alzheimer’s Disease Facts and Figures. https://www.alz.org/media/documents/facts-andfigures-2018-r.pdf. Accessed 11 Nov 2019 3. Alzheimer’s Association: Alzheimer’s disease facts and figures. Alzheimers Dement. 15(3), 321–87 (2019) 4. NHS Signs and Symptoms of Alzheimer’s disease. https://www.nhs.uk/conditions/ alzheimers-disease/. Accessed 29 Oct 2019 5. Yamagata, C., Kowtko, M., Coppola, J.F., Joyce, S.: Mobile app development and usability research to help dementia and Alzheimer patients. In: Systems, Applications and Technology Conference (LISAT), 2013, pp. 1–6. IEEE Long Island (2013) 6. Sarne-Fleischmann, V., Tractinsky, N., Dwolatzky, T.: Computerized personal intervention of reminiscence therapy for Alzheimer’s patients. Information Systems: Professional Development Consortium (2010)
Usability Evaluation of Personalized Digital Memory Book
711
7. Imtiaz, D., Khan, A., Seelye, A.M.: A mobile multimedia reminiscence therapy application to reduce behavioral and psychological symptoms in persons with Alzheimer’s. J. Healthc. Eng. 2018, 1536316 (2018) 8. Douglas, S., James, I., Ballard, C.: Non-pharmacological interventions in dementia. Adv. Psychiatr. Treat. 10(3), 171–177 (2004) 9. Cammisuli, D.M., Danti, S., Bosinelli, F., Cipriani, G.: Non-pharmacological interventions for people with Alzheimer’s disease: a critical review of the scientific literature from the last ten years. Eur. Geriatr. Med. 7(1), 57–64 (2016) 10. Tsolaki, M., Zygouris, S., Lazarou, I., Kompatsiaris, I., Chatzileontiadis, L., Votis, C.: Our experience with informative and communication technologies. Hellenic J. Nucl. Med. 18(3), 131–139 (2015) 11. Bosco, A., Lancioni, G.: Assistive technologies promoting the experience of self for people with Alzheimer’s disease. Riv. internazionale di Filosofia e Psicologia 6(2), 406–416 (2015). https://doi.org/10.4453/rifp.2015.0039 12. Hakobyan, L., Lumsden, J., OSullivan, D., Bartlett, H.: Mobile assistive technologies for the visually impaired. Surv. Ophthalmol. 58(6), 513–528 (2013) 13. Kenigsberg, P.A., Aquino, J.P., Bérard, A., Brémond, F., Charras, K., Dening, T., et al.: Assistive technologies to address capabilities of people with dementia: from research to practice. Dementia 1, 1471301217714093 (2017) 14. Bourgeois, M.S.: Enhancing conversation skills in patients with Alzheimer’s disease using a prosthetic memory aid. J. Appl. Behav. Anal. 23(1), 29–42 (1990) 15. Kwai, C.K., Subramaniam, P., Razali, R., Ghazali, S.E.: The usefulness of digital memory album for a person with mild dementia. Int. J. Adv. Nurs. Educ. Res. 4, 1–12 (2019). https:// doi.org/10.21742/ijaner.2019.4.1.01 16. Sposaro, F.A., Danielson, J., Tyson, G.: iWander: an Android application for dementia patients. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 3875–3878 (2010) 17. Al-Khafaji, N.J., Al Shaher, M.A., Al-Khafaji, M.J., Ahmed Asmail, M.A.: Use build AR to help the Alzheimer’s disease patients. In: The International Conference on E-Technologies and Business on the Web (EBW2013), pp. 280–284 (2013) 18. Cutler, C., Hicks, B., Innes, A.: Does digital gaming enable healthy aging for community dwelling people with dementia? Games and Culture 2015 (30 August 2015). https://doi.org/ 10.1177/1555412015600580 19. Gowans, G., Campbell, J., Astell, A., Ellis, M., Norman, A., Dye, R.: Designing CIRCA (computer interactive reminiscence and conversation aid). A multimedia conversation aid for reminiscence intervention in dementia care environments. University of Dundee - School of Design, Dundee (2009) 20. Pang, G.K., Kwong, E.: Considerations and design on apps for elderly with mild-tomoderate dementia. In: 2015 International Conference on Information Networking (Icoin), Cambodia, 2015, pp. 348–353 (2015). https://doi.org/10.1109/Icoin.2015.7057910 21. Nayebi, F., Desharnais, J., Abran, A.: The state of the art of mobile application usability evaluation. In: 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4 (2012) 22. Usability: Improving the User Experience. http://www.usability.gov. Accessed 12 Nov 2019 23. Madan, A., Dubey, S.K.: Usability evaluation methods: a literature review. Int. J. Eng. Sci. Technol. 4(2), 590–599 (2012) 24. Holzinger, A.: Usability engineering for software developers. Commun. ACM 48(1), 71–74 (2005)
712
A. H. Abu Hashim-de Vries et al.
25. Alshehri, F., Freeman, M.: Methods for usability evaluations of mobile devices. In: Lamp, J. W. (ed.) 23rd Australian Conference on Information Systems, pp. 1–10. Deakin University, Geelong (2012) 26. Inostroza, R., Rusu, C., Roncagliolo, S., Rusu, V.: Usability heuristics for touchscreen-based mobile devices: update. In: Proceedings of the 1st Chilean Conference of Computer-Human Interaction, pp. 24–29 (2013) 27. Brown, J., Nam Kim, H.: Usability evaluation of Alzheimer’s mHealth applications for caregivers. Proc. Hum. Factors Ergon. Soc. Ann. Meet. 62(1), 503–507 (2018). https://doi. org/10.1177/1541931218621115 28. Clare, L., Wilson, B.A., Carter, G., Hodges, J.R., Adams, M.: Long-term maintenance of treatment gains following a cognitive rehabilitation intervention in early dementia of Alzheimer type: a single case study. Neuropsychol. Rehabil. 11(11), 477–494 (2001) 29. Massimi, M., Berry, E., Browne, G., Smyth, G., Watson, P., Baecker, R.M.: An exploratory case study of the impact of ambient biographical displays on identity in a patient with Alzheimer’s disease. Neuropsychol. Rehabil. 18(5/6), 742–765 (2008) 30. Lazar, A., Thompson, H.J., Demiris, G.: Design recommendations for recreational systems involving older adults living with dementia. J. Appl. Gerontol. (2016). https://doi.org/10. 1177/0733464816643880 31. Janosky, J.E.: Use of the single subject design for practice based primary care research. Postgrad. Med. J. 81(959), 549–551 (2005) 32. Clinical Practice Guidelines on Management of Dementia, 2nd edn. Ministry of Health Malaysia (2009) 33. Folstein, M.F., Folstein, S.E., McHugh, P.R.: “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12(3), 189–198 (1975) 34. Morris, J.C.: The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology 43(11), 2412–2414 (1993) 35. Mizen, M.B.: Scrapbook photo albums are therapeutic for Alzheimer’s patients (2003). http://www.biaoregon.org/docetc/pdf/conf05/Alzheimer%20Info.pdf 36. Kurniawan, S.H., Zaphiris, P.: Research-derived web design guidelines for older people. In: ASSETS (2005) 37. Asghar, I., Cang, S., Yu, H.: Usability evaluation of assistive technologies through qualitative research focusing on people with mild dementia. Comput. Hum. Behav. 79, 192– 201 (2018)
Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy? Pedro Pinto1, Romeu Lages1, and Manuel Au-Yong-Oliveira1,2(&) 1
Department of Economics, Management, Industrial Engineering and Tourism, University of Aveiro, 3810-193 Aveiro, Portugal {pdsp,romeu.lages,mao}@ua.pt 2 GOVCOPP, Aveiro, Portugal
Abstract. Current European Union legislation demands that websites that use web cookies to extract information about the personal preferences of Internet users request their permission to obtain such data. Widespread misinformation about web cookies threatens users’ feelings of security while leading also to a lack of privacy problems due to the need to accept cookies. However, not accepting cookies raises functionality issues on websites, and the value created by companies is not optimized. This article reviewed the existing literature and conducted a survey (with 102 valid responses) to understand Internet users’ behaviour in terms of fear of accepting cookies and the benefit generated by cookies to users. We conclude that the trade-off between security and performance of websites portrayed in the literature is illusory and is mainly due to the lack of information about cookies by users. We also conclude that there is a paradox: although most users feel insecure, they eventually accept cookies to simplify their online browsing. Keywords: Web cookies Internet Internet user Information
Website Security Data Privacy
1 Introduction Internet access, whether for business or leisure reasons, is increasingly common and widespread among the world’s population [1], providing a very efficient way to access, organize and communicate information [2]. Because of this generalization, it is perfectly acceptable to consider that individuals today share more information online than in the past. Of course, not all of this information consists of publishing scientific articles, developing a website about a topic of interest, or even posting to social networks. Much of the information provided by individuals is obtained indirectly, that is, it is not directly provided by individuals, and it is not available to the general public. An example of this is information related to Internet users’ preferences and online activity, which may be of interest to companies to better understand their consumers, suggesting to them their best products based on their previous research. Thus, the way companies operate is much more efficient, particularly in areas such as Digital Marketing, with advantages for both organizations and customers [3].
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 713–722, 2020. https://doi.org/10.1007/978-3-030-45691-7_67
714
P. Pinto et al.
Since information about consumer preferences is not directly provided by consumers, there must be tools for collecting such information. These tools are cookies. However, such information gathering is limited. Since May 25, 2011, Internet users in the European Union who want to or do not mind sharing their information and their personal preferences must accept cookies, meaning that there is no freedom for websites to collect information without a user’s permission. This legal obligation has affected not only the way internet users surf online, but also the way websites work, because there are features on websites that do not work properly if cookies are not accepted. Digital Marketing has also been affected, since that if cookies are not accepted, the effectiveness of marketing can be at risk. In addition, many consumers find requesting permission for cookies too intrusive and unsafe, making online browsing difficult [4]. That said, motivated by a spirit of insecurity, discomfort and a lack of privacy and poor functionality of websites without cookies, internet users have to deal with a trade-off that must be solved: to accept or to decline cookies? Given the above, we propose to answer the following research question: • What are the reasons that motivate the acceptance of web cookies by internet users? To answer this research question, the article aims to define the concept of cookies and to identify what are the main types of cookies that exist and how they work. We shall always be taking into account the functionality issues of websites that have cookies, security issues and Internet users’ privacy and the economic value generated by the information collected by cookies, highlighting the importance that cookies have not only for individuals, but also for organizations. Subsequently, an analysis is performed based on a set of data collected by a survey made available to Internet users purposely for this article, which will identify the main behavioural trends regarding the relationship of Internet users with cookies. This quantitative analysis is very important to complement the information present in the literature review. The purpose of the survey is to understand the behaviour of Internet users in terms of cookie acceptance, how often they clean cookies, whether cookies contribute positively or negatively to users’ browsing, whether advertising targeted at them suits their personal preferences, among other issues.
2 Literature Review 2.1
Web Cookies – Definition and Types
“Web-cookies revolutionized the web because they gave it a memory. Cookies gave your actions on the web a ‘past’, which you have no idea about or access to.” [30, p. 277].
When we surf the internet, we are often faced with the issue of accepting cookies. But what are cookies and why do we have to authorize them? Cookies are small text files created by websites, usually encrypted, which are stored on the Internet user’s computer through the browser. These text files contain information about internet users’ personal preferences. The existence of cookies is associated with the proper functioning of the websites and there are strictly necessary cookies which, if not
Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy?
715
enabled, some of the features will not be provided. In addition, cookies are also useful for statistical purposes, allowing websites’ performance to increase, to make websites more functional and tailored to each individual, and to customize advertising and widgets, considering user preferences and tastes. However, many users are suspicious of cookies and therefore there are still false rumours about information that may be stored by cookies, thus conditioning the way websites function and preventing them from being effective. These suspicions are unjustified, as cookies only contain information that does not compromise the security of Internet users, i.e., cookies do not contain information such as personal passwords, credit card numbers, lists of software installed on personal computers or other types of personal data [5, 6]. When a user accesses a website, information within the reach of that website is sent to the browser that creates a text file and whenever the user accesses the website again, that file is taken back by the browser and sent to the website server. However, when a user accesses a page, cookies are not only created by the website, but also by the widgets and other elements that exist on the page, such as advertising, if they come from a third-party website [7]. There are two types of cookies: session cookies and persistent or tracking cookies. Session cookies are temporary files that disappear when the browser is closed, i.e., when the browser is reopened, the website that is visited again does not recognize the user or its preferences. If someone has logged on a website that has this type of cookies, the login will have to be done again, if the browser has been closed. Persistent cookies are files that can be stored in the browser until they are deleted manually or automatically by the browser when the duration of the text file has expired [8]. The latter type of cookies, when compared to the first type, tend to be viewed as more dangerous by Internet users, who are reticent about providing their personal information. 2.2
The Effects of Web Cookies on the Functionality and Efficiency of Websites and Organizations
Although there are cookies for various purposes, the use of cookies is widely common on e-commerce websites. They are used to accumulate information in databases, customize the look of a page and present a unique website for each user [9]. An example of this is Amazon, which uses cookies to propose other products that the company thinks its customers like, based on other purchases and searches made by them. But that is not the sole purpose of Amazon, which also uses cookies to configure its website according to visitor language preferences, for example, and to deliver relevant content to its customers, including advertisements on Amazon Web Services websites [10]. Therefore, without the permission of cookies, websites such as Amazon or similar e-commerce websites can no longer function properly and can even harm its businesses and sales. When it comes to cookies, targeted advertising is a theme that comes to mind immediately. This type of advertising is called online behavioural advertising (OBA) and it is the use of data collected from Internet users, such as the type of websites that someone visits and the type of videos someone watches, to better target ads to its intended audience [11]. The growing importance of OBA is noted by the consecutive increases in advertising revenues in the United States, which are
716
P. Pinto et al.
pioneering the most technological markets [12]. Cookies are also used to identify users who have visited websites without purchasing anything, redirecting them later to the website through targeted advertising [13]. That said, it can be argued that there is an economic interest in this data collected by websites from companies, which can thus better understand their consumers and operate more efficiently through targeted advertising [11]. After all of this, we can conclude that the use of cookies benefits users, but also benefits businesses, as they can monitor their visitors and obtain information from websites [14]. This information obtained is critical for companies to create value. However, how do companies monetize the value created by information? There are two ways. The first is to license or sell access to data to clients. The second way is to sell data targeting services to customers after the careful handling of data through data analysis and fusion techniques [15]. 2.3
Security and Privacy Issues Related to Web Cookies
In addition to the functionality of websites, there are security issues to consider when talking about cookies. It is quite common to hear that there are people who do not want to accept cookies for fear of sharing their information. Albeit, are there reasons for this? When a device, such as a computer, is used by multiple users, and if the websites accessed require logins to function, this requires great care from users, as failure to log out may compromise the security and privacy of users, if there are third parties with malicious intent. In addition, there are psychological issues that make Internet users unwilling to disclose personal information to preserve their own privacy [16]. There are still other security and privacy issues associated with cookies, namely third-party cookies, which are cookies generated by third-party websites that are hosted on websites that users visit [17]. The first problem is related to the “surprise factor”, that is, Internet users come across cookies from websites they are sure they have not visited, and this leads them to fear that they have not been asked to accept cookies or that their device has been violated. Another problem is that third-party cookies facilitate the ability to create profiles, generating distrust among Internet users. For example, suppose there is a third-party website that displays ads on other sites. Thus, this third-party website has the ability to gather information about these same various websites that the Internet user visits, thus creating a profile of this based on those websites visited by him or her. As stated above, cookies are not allowed to accumulate user-identifiable information and therefore it is not possible to associate the profile with a specific user. But even so, there is a sense of persecution, insecurity and privacy violation felt by netizens as these profiles can be shared or sold to other companies [18]. This data, in an increasingly digital world, can be a valuable asset that can be marketed, creating value for businesses. This concept of selling this type of information is known as “privacy as a commodity” [19]. Apart from these issues, people are also more aware of online security issues, after controversies like the Facebook - Cambridge Analytica data scandal and the alleged leak of personal information. For this reason, people demand the strengthening of legislation to prevent something similar happening again [20]. However, even without proper legislation, Internet users can protect their privacy by blocking cookies,
Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy?
717
activating the Do Not Track (DNT) mechanism in their browser or even installing certain browser plugins that ensure the security of user information, such as AdBlock Plus, NoScript, Disconnect, Ghostery, and Privacy Badger [21]. However, there are studies that guarantee that these tools can compromise website functionality between 6% and 27%. These studies also ensure that the DNT mechanism allows tracking scripts, which are pieces of Javascript code whose function is to communicate the data collected by cookies by creating different types of requests to external domains, to work between 37% and 78%, failing to accomplish its goal. [22, 23]. “Effectiveness is defined as the balance between correctly blocking tracking JavaScript programs (true positives) and incorrectly blocking functional JavaScript programs (false positives) […] We find that PP-Tools’ [privacy preserving tools] true positive rates vary from 37% to 78% and false positives range from 6% to 21%.” [22, p. 86]. The insecurities felt by some people make them take special care with cookies and therefore erase them. Studies show that at least 30% of Internet users delete first-party cookies once a month [24]. Regarding third-party cookies, which are the ones that raise the most security issues, studies also point out that only 4.2% of Internet users who delete cookies delete third-party cookies, but not first-party cookies. The most common behaviour is to delete all cookies (49.4%) [25]. Despite all of the security issues, in order to facilitate their online browsing, many users end up accepting cookies. Here comes the concept of the “privacy paradox”, which states that individuals have major privacy concerns, but are willing to share their personal information under certain circumstances, whether for ease of navigation, ignorance of how cookies work or even due to finding it tedious to have to delete cookies [26, 27].
3 Methodology To answer the research question that the article intends to solve, a two-step method was adopted. Firstly, an investigation and analysis of the relevant existing literature on the topic of web cookies was done. The literature review intends to introduce the topic, highlighting how web cookies function, the distinction between the various types of existing web cookies, as well as all issues related to the efficiency of websites, security and privacy issues of web users and the benefits of web cookies for organizations. Therefore, the first phase of the method corresponds to obtaining information through a secondary information source, i.e., a non-direct information source that corresponds to articles and reports written by other authors, websites and books on the subject. Subsequently, with the knowledge acquired in the literature review, a survey was developed in order to obtain information from a primary data source. The survey included a set of questions that aimed at obtaining data about the respondents’ level of knowledge about web cookies, their willingness to accept web cookies, if they feel safe when accepting cookies, if they feel obliged to accept cookies, if they feel that cookies benefit their online browsing, if they stopped visiting some websites due to the obligation to accept cookies, if they frequently clean cookies, if they are bothered by the presence of online advertising on websites, if they feel that such advertising fits in with
718
P. Pinto et al.
their preferences and previous searches, among other questions. In addition to these questions, some were formed on data identifying respondents such as gender, age and employment status. This survey consisted of closed-ended, dichotomic or multiple-choice questions with five-level Likert scales ranging from strongly disagree to strongly agree. The choice of closed questions was made so that a quantitative research could be carried out. The results of this research are expressed in statistical numbers. The survey was conducted in Portuguese and Google Forms was the online platform used to collect respondents’ answers. Survey sharing was done in two ways. The first way was sharing the link on the social networks of the authors of this article, such as Facebook, Twitter, Instagram and WhatsApp. The second form was oral and personal sharing to individuals who are frequent internet users. Given this, it was possible to obtain a set of 102 responses over the two weeks in which the survey was available for response. The sample of this study is therefore a convenience sample, not depicting the population structure of any particular region or group of people. However, convenience samples are perfectly acceptable in the social science studies [28]. After data collection, data processing was performed. First, a statistical analysis was performed by observing the data, consisting of a descriptive statistical analysis. Subsequently, using SPSS software, a different analysis was made, where correlations between some variables were observed and some Spearman correlation coefficients were calculated, in order to justify some behaviours, taking into account, for example, the knowledge that users have about cookies. Mann-Whitney tests were also performed to determine if there were significant differences in responses according to gender, employment status and age.
4 Discussion on the Fieldwork 4.1
The Survey – Descriptive Statistics
The results of the survey suggest that most individuals do not have enough knowledge about web cookies. About 51.9% of the respondents are totally or partially unaware of what web cookies are, despite surfing online, just like other respondents. Therefore, the first conclusion we may draw is that, despite the lack of knowledge about cookies and the need to accept cookies, this is not an impediment to online browsing. As noted earlier in the literature review, website functionality sometimes requires the acceptance of cookies and, according to the survey, 78.4% of respondents always or almost always accept cookies, although most feel insecure (only 2.9% of the respondents feel completely safe about cookies). That said, we can confirm the behaviour described in the literature review as a “privacy paradox”, that is, people accept cookies to facilitate their online browsing, but feel insecure due to cookies. The truth is that most Internet users feel obliged to accept cookies. 46.1% of respondents always feel obligated to accept web cookies, while only 7.8% never feel obligated. Here we find that there is one of two problems: either there is a huge lack of knowledge on the part of Internet users about the possibility of rejecting web cookies or the rejection of cookies
Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy?
719
implies an additional effort that users are unwilling to make, and so they eventually end up accepting cookies. Regarding the feeling of cookies benefiting browsing, 48.1% of the respondents feel indifference in the performance of websites. These answers possibly result from the fact that most users accept cookies, not noticing the difference in the performance of websites without cookies. According to the responses, about half of the respondents stopped visiting websites due to the obligation to accept cookies. These values are high and demonstrate the high levels of discomfort. It can be concluded that cookies are a deterring factor in online navigation. Regarding security issues, and while most users feel relatively insecure about the need to accept web cookies, 40.2% of the respondents never clean cookies in their browsers. This indicator may mean that there is either no knowledge about this cleaning practice or that the levels of insecurity felt are not that high. Regarding the issue of online advertising, although 83.3% of the respondents feel bothered or very bothered by the existence of advertising on websites, almost no one thinks to stop accessing certain websites due to this issue. One reason for this, in addition to the usefulness that certain websites can provide to users, may be that online advertising adjusts to users’ personal preferences and tastes and may be useful for individuals to learn about products that may bring them greater welfare. Of the respondents, 48.1% answered that the advertising targeted to them fits their preferences while 30.4% think it is indifferent. 4.2
The Survey – An Analysis of Correlations in the Data
When the correlations between the different survey variables are observed, based on the calculation of the Spearman correlation coefficient, for a confidence level of 99%, other additional trends are found. Below, the strongest correlations between the analysed variables will be presented, all of which are considered moderate correlations [29]. For instance, the correlation coefficient between the variables “knowledge about web cookies” and “security felt in relation to web cookies” is 0.347, which means that the greater the knowledge about cookies is, the greater is the security felt by the user. The correlation between the “knowledge about web cookies” and “feeling of obligation to accept web cookies” is −0.388, which means that the greater the knowledge about cookies is, the lower is the sense of obligation to accept cookies. These values are expected as the literature review confirmed that there is no reason to be concerned about security nor that there is a legal obligation to accept cookies. Also, the more the user knows about cookies, the higher the tendency is to clear cookies frequently from the browser (correlation coefficient = 0.392). Other interesting correlations are related to the fact that the more a user agrees that he or she feels safe when accepting cookies, the greater is the feeling that cookies will benefit their online browsing (correlation coefficient = 0.395) and the lower is the probability of stopping visiting a website due to cookies (correlation coefficient = 0.357). That said, it can be inferred that the greater the knowledge of Internet users, the greater the security they feel towards cookies and, consequently, the greater the benefit to them. Therefore, information can solve the trade-off between user security and website functionality.
720
4.3
P. Pinto et al.
The Survey – Mann-Whitney Tests
In addition, Mann-Whitney tests were performed to ascertain whether there are significant differences between groups according to gender, employment status and age, at a confidence level of 95%. We conclude that there are only significant differences by gender. The masculine gender considers having better knowledge of cookies than females; the female gender feels a greater obligation to accept cookies than males; females feel more bothered by online advertising than males; and finally, males are more likely to stop visiting websites due to online advertising. These Mann-Whitney tests were performed to ascertain some further trends in online browsing.
5 Conclusions and Suggestions for Future Research The present study aimed to discuss and contribute to the knowledge of Internet users’ behavioural attitudes towards web cookies, especially in a period where the Internet is an important work and leisure tool. Due to the study, it was possible to identify and confirm the existence of a trade-off between security/privacy of users and functionality/efficiency of websites. The existence of cookies enables websites to increase their performance and information gathering to improve the online user experience. However, the requirement to press a button to accept or reject cookies poses insecurity issues for users because there is a lot of misinformation. Despite this, it was possible to verify through the survey that information is crucial to help break the trade-off. If there is information about the functioning of web cookies, there is no risk that users will feel insecure and therefore the consequent acceptance of cookies means that there are no functionality issues on websites. Thus, the disclosure of information about the functioning of cookies can be seen as a fundamental process that companies will have to take, since if they can obtain data about the personal preferences of Internet users, they can optimize their sales, for example. Moreover, companies are able to target advertising to their audience due to the existence of cookies and thus can minimize their advertising costs. In addition to creating economic value for businesses, cookies are also beneficial to users as they can increase their well-being if they have access to information that best suits their preferences. For example, through cookies and advertising, users can learn about products that suit their preferences. Thus, it is possible to increase their welfare. However, according to the survey, there is no feeling that cookies are beneficial, although most consider that the advertising targeted at them is tailored to their personal tastes. It is also possible to conclude that most Internet users do not give much importance to the issue of cookies, although many consider it invasive and annoying that they have to accept or reject cookies. For instance, issues such as browser cookie cleaning are largely ignored. It is then confirmed that there is a privacy paradox.
Web Cookies: Is There a Trade-off Between Website Efficiency and User Privacy?
721
Finally, the authors consider that further research should be done to better characterize the population in terms of cookie acceptance. There are geographical and age differences that this study was not able to identify, but it would be interesting to see these differences. In addition, it would also be interesting to measure the economic impact for companies of disclosing information about the safety of cookies in order to determine whether it is beneficial to spend resources to obtain extraordinary revenues due to the existence of cookies.
References 1. Internetworldstats.com: Internet Growth Statistics 1995 to 2019 - The Global Village Online. https://www.internetworldstats.com/emarketing.htm. Accessed 21 Oct 2019 2. Peterson, R., Balasubramanian, S., Bronnenberg, B.: Exploring the implications of the Internet for consumer marketing. J. Acad. Mark. Sci. 25(4), 329–346 (1997) 3. Reza-Kiani, G.: Marketing opportunities in the digital world. Internet Res. 8(2), 185–194 (1998) 4. Lee, P.: The impact of cookie ‘consent’ on targeted adverts. J. Database Mark. Cust. Strat. Manag. 18(3), 205–209 (2011) 5. Altice Portugal. https://www.telecom.pt/pt-pt/Paginas/cookies.aspx. Accessed 10 Oct 2019 6. Harding, W., Reed, A., Gray, R.: Cookies and web bugs: what they are and how they work together. Inf. Syst. Manag. 18(3), 17–24 (2001) 7. Allaboutcookies.org: All About Computer Cookies - Session Cookies, Persistent Cookies, How to Enable/Disable/Manage Cookies. https://www.allaboutcookies.org/. Accessed 09 Oct 2019 8. Allaboutcookies.org: Session Cookies / Persistent Cookies Explained. https://www. allaboutcookies.org/cookies/cookies-the-same.html. Accessed 09 Oct 2019 9. Palmer, D.: Pop-ups, cookies, and spam: toward a deeper analysis of the ethical significance of Internet marketing practices. J. Bus. Ethics 58(1–3), 271–280 (2005) 10. Amazon Web Services, Inc.: Cookies. https://aws.amazon.com/pt/legal/cookies. Accessed 15 Oct 2019 11. Boerman, S., Kruikemeier, S., Zuiderveen-Borgesius, F.: Online behavioral advertising: a literature review and research agenda. J. Advert. 46(3), 363–376 (2017) 12. IAB: First Quarter U.S. Internet Ad Revenues Hit Record-Setting High at Nearly $16 Billion, According to IAB. https://www.iab.com/news/first-quarter-u-s-internet-ad-revenueshit-record-setting-high-nearly-16-billion-according-iab. Accessed 20 Oct 2019 13. Patil, D., Bhakkad, D.: Redefining Management Practices and Marketing in Modern Age, 1st edn. Atharva Publications, Dhule (2014) 14. Goecks, J., Mynatt, E.D.: Supporting privacy management via community experience and expertise. In: Van Den Besselaar, P., De Michelis, G., Preece, J., Simone, C. (eds.) Communities and Technologies 2005, pp. 397–417. Springer, Dordrecht (2005) 15. Li, W.C., Nirei, M., Yamana, K.: Value of data: there’s no such thing as a free lunch in the digital economy. U.S. Bureau of Economic Analysis (2018) 16. Jegatheesan, S.: Cookies – invading our privacy for marketing, advertising and security issues. Int. J. Sci. Eng. Res. 4(5), 764–768 (2013) 17. Tirtea, R., Castelluccia, C., Ikonomou, D.: Bittersweet cookies: some security and privacy considerations. In: European Union Agency for Network and Information Security – ENISA (2011)
722
P. Pinto et al.
18. Kristol, D.: HTTP cookies: standards, privacy, and politics. ACM Trans. Internet Technol. 1(2), 151–198 (2001) 19. Dinev, T.: Why would we care about privacy? Eur. J. Inf. Syst. 23(2), 97–102 (2014) 20. Kerry, C.: Why protecting privacy is a losing game today—and how to change the game. Brookings. https://www.brookings.edu/research/why-protecting-privacy-is-a-losing-gametoday-and-how-to-change-the-game/. Accessed 11 Oct 2019 21. Gonzalez, R., Jiang, L., Ahmed, M., Marciel, M., Cuevas, R., Metwalley, H., Niccolini, S.: The cookie recipe: untangling the use of cookies in the wild. In: 2017 Network Traffic Measurement and Analysis Conference, pp. 1–9. IEEE, Dublin (2017) 22. Ikram, M., Asghar, H., Kaafar, M., Mahanti, A., Krishnamurthy, B.: Towards seamless tracking-free web: improved detection of trackers via one-class learning. Proc. Priv. Enhanc. Technol. 2017(1), 79–99 (2017) 23. Kervizic, J.: Cookies, tracking and pixels: where does your Web data comes from? Medium. https://medium.com/analytics-and-data/cookies-tracking-and-pixels-where-does-your-webdata-comes-from-ff5d9b8bc8f7. Accessed 10 Nov 2019 24. Soltani, A., Canty, S., Mayo, Q., Thomas, L., Hoofnagle, C.J.: Flash cookies and privacy. In: 2010 AAAI Spring Symposium Series (2010) 25. Abraham, M., Meierhoefer, C., Lipsman, A.: The impact of cookie deletion on the accuracy of site-server and ad-server metrics: an empirical comscore study, vol. 14, no. 1. Accessed Oct 2007 26. Smith, H., Dinev, T., Xu, H.: Information privacy research: an interdisciplinary review. MIS Q. 35(4), 989–1015 (2011) 27. Pavlou, P.: State of the Information privacy literature: where are we now and where should we go? MIS Q. 35(4), 977–988 (2011) 28. Bryman, A., Bell, E.: Business Research Methods, 3rd edn. Oxford University Press, Oxford (2011) 29. Wooldridge, J.: Introductory Econometrics, 2nd edn. Thomson South-Western, Mason (2003) 30. Carmi, E.: Review: cookies – more than meets the eye. Theory Cult. Soc. 34(7–8), 277–281 (2017)
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages: A UK-China Comparison John McAlaney1(&), Manal Aldhayan1, Mohamed Basel Almourad2, Sainabou Cham1, and Raian Ali3 1
Faculty of Science and Technology, Bournemouth University, Poole, UK {jmcalaney,maldhayan,scham}@bournemouth.ac.uk 2 College of Technological Innovation, Zayed University, Dubai, UAE [email protected] 3 College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar [email protected]
Abstract. The excessive and obsessive use of the internet and digital technologies, known as Digital Addiction (DA), is becoming a social issue. Given that it inherently involves the use of technological devices this provides the opportunity to deliver interactive, intelligent prevention and intervention strategies in real-time. However, for any large-scale, multi-national prevention campaign to be optimised cultural differences within the target population must be considered. This study aimed to contribute towards this literature by exploring cultural differences in the acceptance of DA prevention messages in the UK vs China. An initial series of exploratory interviews were conducted with a sample within the UK to determine what strategies may be used to address the overuse of digital devices. These interviews were subjected to content analysis, which was then used as the basis for an online survey that was disseminated throughout the UK and China. A total of 373 useable surveys were returned. There were several statistically significant differences in preferences over how an intervention system should operate. UK participants wished for the system to be easily under their control, whilst behaving largely autonomously when needed, and to also be transparent as to why a message had been triggered. Chinese participants, on the other hand, were less likely to state a preference for such a high degree of control over any such system. Overall, the preferred implementation of such systems does appear to vary between the UK and China, suggesting that any future prevention and intervention strategies take cultural dimensions into consideration. Keywords: Digital addiction Culture Prevention
Internet addiction Persuasive messaging
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 723–733, 2020. https://doi.org/10.1007/978-3-030-45691-7_68
724
J. McAlaney et al.
1 Introduction The excessive and obsessive use of the internet and digital technologies, known as Digital Addiction (DA), is becoming a social issue [1]. It has been suggested that DA is associated with a number of negative consequences, including poor academic performance, reduced social and recreational activities, relationships breakups, low involvement in real-life communities, poor parenting, depression and lack of sleep [2, 3]. Consistent with addiction it has also been argued that those experiencing DA can display symptoms of dependency and withdrawal such as depression, cravings, insomnia, and irritability [4]. It has been suggested that 6%–15% of the general population meet the requirement of DA [5], although these figures are dependent on varying definitions and conceptualisations of DA that exist between different countries. It has been argued that university and college students are at particular risk of DA [5]; with 18.3% of students in the UK identified as meeting the requirements of DA [6]. As such treatment and prevention approaches are needed. In addition, there is a lack of research that considers the role of software developers in the phenomenon of DA, with some exceptions such as [7]. DA has unique characteristics in comparison to what could be considered traditional addictions. Given that it inherently involves the use of technological devices this provides the opportunity to deliver prevention and intervention strategies in real-time and whilst the targeted behaviour is occurring. This is not typically the case in behaviour change – in smoking, for example, people may see anti-smoking education campaign messages through a variety of mediums, but it would be difficult to ensure that these messages are seen while someone is in the act of smoking a cigarette. Messages for DA can also be easily customised to the individual and to the social environment in which they operate. This has the potential to make prevention and intervention campaigns far more salient to the individual which, as based on social psychological research [8], could be expected to increase the efficacy of such campaigns. An important factor in the social environment of any individual will be culture. This is particularly relevant to DA, where the use of digital technologies and online platforms transcend national boundaries. As has been found in behaviour change campaigns conducted in other domains the success of an intervention is dependent on how well it is tailored to local, cultural and political contexts [9], although it has also been noted that this is still an under-researched area [10]. This article aims to begin work on addressing this gap by exploring attitudes towards DA interactive messaging with participants in the UK and China. There is a substantial body of literature documenting and discussing cultural differences between the West and China. In particular, Hofstede’s [11] research on cultural dimensions and Nisbett’s research on cultural cognition [12] implies that messages presented to culturally different sets of users will attract different responses and interpretations. Nisbett’s work indicates that Chinese people, users in the case of this paper, will tend to be much more holistic than UK users in how they perceive and process information, i.e. there will be a preference for understanding relations and how information makes sense as a whole. UK users instead will tend to be more analytical,
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages
725
breaking down the different information elements, trying to establish logical and causal relations wherever possible. The UK and China differ significantly along two cultural dimensions, power distance and individualism [11]. China scores high in power distance as compared to the UK [13]. This means there will be more distance between figures of authority and those under them, and these differences are more likely to be accepted. Compared to the UK, China also scores low on individualism [13]. This indicates that China is a highly collectivist society and that the interests of the collective will be given priority to individual benefits. Hofstede’s cultural dimensions have been used to account for differences between how users respond to websites for countries with different levels of power distance and individualism, [14]. For a national culture like that of China, high in power distance and low on individualism, the tendency is to prefer content representing symbols of power or emerging from authority; any motivation should be aimed at the benefits of the collective and not only the individual. In contrast for a national culture like that of the UK, user interface content can be more informal and direct, accentuating the benefits for the individual. Several authors have studied the effect of Asian and Western cultures on the use of interactive systems. For instance, Evers and Day [15] showed how Chinese, Indonesian and Australian users differed in perceived usefulness and ease of use towards the acceptance of a system; whereas Choi et al. [16] demonstrated the value of culturespecific models and dimensions in the design of mobile data services by studying Finnish, Korean and Japanese users. Overall, the discussion of cultural effects on internet users’ preferences is relevant to how users perceive DA messages and can provide insights on how to implement its use in culturally acceptable ways. In this paper, we compare the perception of DA interactive and persuasive messages between users in China and UK. We reveal differences in the preferences towards both the content and the control over such interactive data-driven mechanisms towards behaviour change. As main technology companies have started to offer digital wellness and time management techniques, e.g. the Google Digital Wellbeing1 and Apple Screen Time2, which are meant to be used globally, our research highlights the importance of having cultural-sensitive design to maximise acceptance and user retention and avoid harmful experience users can have with these applications such as trivialisation of issues, creating an alternative source of preoccupation, over-trust and reactance [17].
2 Method The methodology consisted of an initial exploratory interview stage followed by a survey. Screening items were used to ensure that the people who participated in the interviews and survey could be considered to demonstrate problematic levels of digital
1 2
https://wellbeing.google/. https://support.apple.com/en-us/HT208982.
726
J. McAlaney et al.
media use. These items were based on the first three items of the CAGE measure [18], which was developed as a tool to quickly screen for alcohol use disorders. The four items of the original measure ask individuals if they have ever felt they should i) cut down on their drinking, ii) been annoyed by people criticizing their drinking, iii) felt guilty about their drinking and iv) needed to drink in the morning to alleviate a hangover (eye-opener). For the purposes of this study ‘drinking’ was replaced by ‘use of social networks or gaming’ for the first three items. The fourth item of the original CAGE measure was not adapted for use as this was not relevant to the topic of digital addiction. The initial exploratory stage consisted of interviews with 11 participants, five male and six female, aged between 19 and 35 years old. Four of them were professionals and seven were students. The interviews were to explore and understand how users see digital addiction messages. To ensure a range of views the sample included seven participants who liked the idea of messaging and four participants who did not feel messaging was an efficient idea. The interviews were recorded, and content analyzed and a set of statements on the messaging design, content, delivery and intelligence were obtained. The second phase was based on an online survey. The purpose of the survey was to enhance and confirm the results obtained through the analysis of the interview. Our study in Ali and Jiang [7] reports the results of the interview analysis and confirmation with 72 participants who were mainly based in the UK and EU. To discover whether there could be cultural differences in the perception and preferences of DA messages, the survey was translated to Chinese and further disseminated in both its English and Chinese versions through mailing lists to students at a university, professional mailing lists and also the social media and mailing lists of the authors. A total of 373 usable surveys were returned, 151 from the UK (146 completed) and 222 from China (184 completed).
3 Results There were 151 participants from the UK, which consisted of 81 females (54%), 68 males (45%) and 2 participants (2%) who preferred not to declare their gender. There were 222 participants from China, which consisted of 117 females (53%) and 103 males (46%) and 2 participants who preferred not to declare their gender (1%). Most participants in the UK (56%) and China (62%) were aged between 18–25. Most participants from both the UK and China (60%) reported that they felt they sometimes felt they should cut down on their digital media use, including use of social networks or games. As shown in Table 1, the most popular option for participants from both countries was messages based on the time already spent on the software, which participants from the UK were significantly more likely to agree to (v2 = 13.85, df = 1, p < .001). UK participants were also more significantly more likely to agree that they would like
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages
727
messages that tell them the number of times they have checked/visited the software (v2 = 18.1, df = 1, p < .001), provided them the information in the form of a usage bill (v2 = 11.78, df = 1, p = .001) or warned them about negative consequences to offline social life (v2 = 5.22, df = 1, p = .022). Participants from China, on the other hand, were significantly more to report that they would accept messages which suggest offline activities based on their online usage (v2 = 12.38, df = 1, p < .001).
Table 1. Percentage of participants in agreement that intervention message type would be useful (italics indicate statistically significant difference) Message type
Time spent on apps/software Number of checks/visits
% in agreement UK China 77 59
Message type
% in agreement UK China 14 22
51
29
Heavily used features and apps Potential risks of excessive use Usage “bill” Public profile damage
14
16
Consequences for online contacts Consequences on offline social life Physical/mental health risks
26
32
Risk of information
26
18
43 33
26 26
36 31
27 49
Consequences on online relationships
21
20
Advice on usage regulation Suggestions of real-life activities Factual statements on the benefits of regulation
31
27
35
24
48
47
Of the three most common preferences for the delivery of DA messages (Table 2), participants from the UK were significantly more likely to prefer each of these; namely pop-up notifications (v2 = 8.96, df = 1, p = .003), time based progress statuses such as a clock display (v2 = 20.89, df = 1, p < .001) and dynamic coloring of the interface (v2 = 5.77, df = 1, p = .016). Participants from the UK were also significantly more likely to prefer messages delivered offline by for example email (v2 = 5.50, df = 1, p = .019), although overall this option was only preferred by a minority of participants. Of the three most common preferences for the theme and source of warning messages overall (Table 2) participants from the UK were significantly more likely to prefer each of these; namely supportive content that is positive and encouraging (v2 = 11.16, df = 1, p = .001), non-repetitive content (v2 = 8.43, df = 1, p = .004), and non-overly negative content (v2 = 22.24, df = 1, p .001).
728
J. McAlaney et al.
Table 2. Percentage of participants in agreement that different delivery types, themes and sources would be useful (italics indicate statistically significant difference) Message delivery style
% in agreement UK China 48 32
Pop-up notifications Time based progress status (e.g. clock) Dynamic colouring of the interface
58
34
45
33
Theme and source of the message Positive and supportive content Non-repetitive content Non-overly negative content
% in agreement UK China 60 42 48
34
45
33
Participants were asked to state how much they would like to be able to control different aspects of the intervention message software (Table 3). Compared to participants from China participants from the UK were significantly more likely to report that they would prefer to control the type of information contained in the message (v2 = 20.81, df = 1, p < .001); the frequency with which messages are received (v2 = 11.68, df = 1, p = .001); the presentation of the message in terms of graphics, sounds, emails etc. (v2 = 12.58, df = 1, p .001); what actions trigger the message (v2 = 13.15, df = 1, p .001) and the source of the message in terms of software developers, institutions etc. (v2 = 19.01, df = 1, p .001). Participants from the UK were also significantly more likely to report that they would like the software to act autonomously once it is set up (v2 = 4.92, df = 1, p = .027). The only item which was reported as being significantly more desired by participants from China was having control over the strategy through which messages are generated, i.e. proactive or reactive to usage, or based on comparisons to others or an absolute scale (v2 = 4.28, df = 1, p = .039).
Table 3. Percentage of participants in agreement they would like control over different functions of the intervention software (italics indicate statistically significant difference) Control over function Type of information Timing of messages Frequency of messages Message presentation style
% in agreement UK China 41 19 39 21 60
42
50
32
Control over function
Actions that trigger a message Source of the message Strategy through which messages are triggered The degree to which software is autonomous
% in agreement UK China 38 21 29 11 26
36
26
18
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages
729
4 Discussion The majority of participants in both the UK and China stated that they sometimes felt they should cut down on their use of social networks or gaming. Given the use of the screening measure to identify those who appeared to use digital technologies excessively, this is not in itself surprising, but it does demonstrate that there is a demand for products which may aide people in better managing their use of digital technologies. This relates to the wider controversy on the status of DA as a mental health disorder. Following an extensive debate, DA was not included within the main body of the recently published Diagnostic and Statistical Manual (DSM-5) of the American Psychiatric Association [19], although it was included in the appendix as needing further research. More recently, in 2018, the World Health Organization has included Gaming Disorder within the International Classification of Diseases. Most participants in both countries agreed that DA interactive messages would be useful, with participants from China significantly more likely to report this. Given that DA is an emergent phenomenon there is a lack of research into how DA messages should be created and delivered that the results of this study can be compared to. However, parallels can be drawn with research into cross-cultural differences in consumer psychology, which shares the same goal of marketing different products or ideas across different cultures [20]. The UK and China vary on a number of cultural dimensions, of which the most commonly researched is the individualist-collectivistframework [21]. Individualistic cultures are characterised by a focus on the success of the individual, whereas collectivist cultures focus more on the role of the individual into contributing to the wellbeing of the group. In support of this, individuals from the UK demonstrate characteristics of self-independence whereas individuals in China are more likely to display characteristics of interdependence and mutual reliance [22]. As further noted by Hofstede [23] individuals in a collectivistic culture such as China are more likely to adhere to a hierarchical structure, as well as being more sensitive to group membership [24]. The greater acceptance of participants from China towards the idea of warning messages is consistent with cross-cultural research in consumer psychology, which suggests that Chinese consumers place more emphasis on reputation than UK consumers [25]. Using anything to excess, including digital technologies, implies that the individual is in some way departing from the social norm of that group. As such, it could be expected that individuals in China would be more responsive to a system that warns them when their usage is becoming somehow excessive. However, it was also found that UK participants were significantly more likely to agree that they would like messages based on potential consequences to their offline social life, which is in contrast to what would be expected from the aforementioned research on the importance of group membership to those in a collectivistic culture. Overall the significant differences between participants from the UK and China could be summarised as the UK participant expressing a wish for greater control and customisation of any potential DA messaging software. This is again consistent with cross-cultural research into consumer psychology, which has demonstrated that individuals from the UK like to be able to customise products [25]. In keeping with dual
730
J. McAlaney et al.
entitlement theory [26] people from individualistic cultures may, in fact, feel they are entitled to customisation, as this is what they have experienced in the past [25]. This finding is further reinforced by research on the relationship between locus of control and individualism as a cultural dimension. Locus of control is a social psychology concept that refers to how much individuals believe they are in control of events affecting them [27]. Spector and Cooper [28] demonstrate that individualistic cultures, like that of UK, will tend to reflect an internal locus of control, i.e. individuals expect to have control, while for collectivistic cultures like that of China an external locus of control is the case, i.e. individuals will expect not to have control. This could suggest that the ability to customize DA messaging is not just something which will increase the appeal of the software to UK users; such customization may in fact be necessary if users are to accept the software at all. The effects of individualism-collectivism on website acceptability and usability have already been documented by several authors [14, 29, 30]. In this context, the findings reported here on the influence of individualism-collectivism on user preferences for DA messages are consistent with those of Cyr [29] who demonstrated how the cultural localization of websites for Canadian users (individualistic) and for a Chinese users (collectivistic) in terms of visual direction, information design and navigation design had a positive effect on satisfaction and trust towards content presented to them. Hofstede’s survey and the resulting set of national culture models have been subject to criticism by various authors: McSweeney [31] considers the process to identify cultural dimensions through the analysis of differences in the survey as tautological and biased, e.g. only IBM employees took part. Kamppuri [32] refers to many instances where the expectations of Hofstede’s cultural dimensions for particular sets of users were not met; Abufardeh and Magel [33] note that his survey was mainly developed from models of organisational culture while not exhaustively assessing national cultures. Overall, there is a general discussion around the applicability of national models to user interface design. As pointed by Kamppuri [32], this is seen as a contradiction: on one hand Hofstede encourages the application of his models and, on the other hand, he declares his model represents preferences underpinned by cultural values, which are more stable, and not cultural practices. Whilst there were several statistically significant differences between the preferences of participants from the UK and China what is particularly interesting is the degree of overlap between countries. Even when differences were found it could be argued that most of these were not particularly substantive. Messages based on time spent on a platform or game appealed to most participants in both countries, which is a type of message that it would be relatively simple to generate for a number of software platforms. This wish for some kind of time-based message could relate to psychological research into the phenomena of being in a flow state [34], characterised as being ‘in the zone’, in which an individual may lose track of time whilst engaged with a social networking platform or whilst gaming [35]. Participants in both countries also expressed a desire for DA messages to be positive, supportive and to generally avoid being confrontational or negative. This dislike of individuals to being manipulated or coerced into behaving a certain way is known in social psychological research as
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages
731
reactance [36] and is important to consider when designing prevention and intervention campaigns. In this paper, we argued the need for a culture-sensitive design of DA interactive messages which is a brief intervention mechanism towards a regulated and informed digital media usage. We investigated the differences between UK and Chinese users and demonstrated the existence of significant differences in the preferences and attitude towards such messages and their content, delivery method, autonomy, configuration and control. These results have implications for the development and conceptualization of technologies designed to address digital addiction, particularly in terms of how applicable and transferable these technologies are between cultures. However, culture is only one stage which may inform initial decisions about the viability and potential adoption of certain design options. The customisation of DA messages would require also a consideration of personal context as well as usage style and patterns. For example, messages requiring the sharing of usage statistics and sent by peers in a peer group setting may not be a preferred option for introvert users. Our future work will investigate the various variables which may influence the acceptance and right configuration of the DA messaging service to get the right behaviour change and avoid adverse effects. Acknowledgements. We thank Jingjie Jiang for helping in conducting the Chinese version of the survey. This research has been partially funded by Zayed University, UAE under grant number R18053.
References 1. Moreno, M.A., et al.: Problematic Internet use among US youth: a systematic review. Arch. Pediatr. Adolesc. Med. 165(9), 797–805 (2011) 2. Echeburua, E., de Corral, P.: Addiction to new technologies and to online social networking in young people: a new challenge. Adicciones 22(2), 91–95 (2010) 3. Kuss, D.J., Griffiths, M.D.: Online social networking and addiction - a review of the psychological literature. Int. J. Environ. Res. Public Health 8(9), 3528–3552 (2011) 4. Griffiths, M.: The psychology of online addictive behaviour. In: Attrill, A. (ed.) Cyberpsychology. Oxford University Press, Oxford (2015) 5. Young, K.S., Yue, X.D., Ying, L.: Prevalence estimates and etiologic models of Internet addiction. In: Young, K.S., Nabuco de Abreu, C. (eds.) Internet Addiction: A Handbook and Guide to Evaluation and Treatment. Wiley, Canada (2011) 6. Kuss, D.J., Griffiths, M.D., Binder, J.F.: Internet addiction in students: prevalence and risk factors. Comput. Hum. Behav. 29(3), 959–966 (2013) 7. Ali, R., Jiang, N., Phalp, K., Muir, S., McAlaney, J.: The emerging requirement for digital addiction labels. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 198–213. Springer, Cham (2015) 8. Neighbors, C., Larimer, M.E., Lewis, M.A.: Targeting misperceptions of descriptive drinking norms: efficacy of a computer delivered personalised normative feedback intervention. J. Consult. Clin. Psychol. 72(3), 434–447 (2004) 9. Castro, F.G., Barrera, M., Martinez, C.R.: The cultural adaptation of prevention interventions: resolving tensions between fidelity and fit. Prev. Sci. 5(1), 41–45 (2004)
732
J. McAlaney et al.
10. Beckfield, J., Olafsdottir, S., Sosnaud, B.: Healthcare systems in comparative perspective: classification, convergence, institutions, inequalities, and five missed turns. Ann. Rev. Sociol. 39(1), 127–146 (2013) 11. Hofstede, G.H., Hofstede, G.J., Minkov, M.: Cultures and Organizations: Software of the Mind: Intercultural Cooperation and its Importance for Survival, 3rd edn., vol. xiv, p. 561. McGraw-Hill, New York (2010) 12. Nisbett, R.: The geography of thought: how Asians and Westerners think differently… and why. Simon and Schuster (2010) 13. Hofstede Insights: Compare countries (2019). https://www.hofstede-insights.com/product/ compare-countries/. Accessed 28 Apr 2019 14. Marcus, A., Gould, E.: Crosscurrents: cultural dimensions and global web user-interface design. Interactions 7(4), 32–46 (2000) 15. Evers, V., Day, D.: The role of culture in interface acceptance. In: Howard, S., Hammond, J., Lindgaard, G. (eds.) IFIP TC13 International Conference on Human-Computer Interaction INTERACT 1997, 14th–18th July 1997, Sydney, Australia, pp. 260–267. Springer, Boston (1997) 16. Choi, B., Lee, I., Kim, J.: Culturability in mobile data services: a qualitative, study of the relationship between cultural characteristics and user-experience attributes. Int. J. Hum.Comput. Interact. 20(3), 171–206 (2006) 17. Alrobai, A., McAlaney, J., Phalp, K., Ali, R.: Exploring the risk factors of interactive ehealth interventions for digital addiction. Int. J. Sociotechnol. Knowl. Dev. (IJSKD) 8(2), 1– 15 (2016) 18. Ewing, J.A.: Detecting alcoholism. The CAGE questionnaire. JAMA 252(14), 1905–1907 (1984) 19. American Psychiatric Association: Diagnostic and statistical manual of mental disorders, Washington, DC (2013) 20. Madden, T.J., Roth, M.S., Dillon, W.R.: Global product quality and corporate social responsibility perceptions: a cross-national study of halo effects. J. Int. Mark. 20(1), 42–57 (2012) 21. Nijssen, E.J., Douglas, S.P.: Consumer world-mindedness and attitudes toward product positioning in advertising: an examination of global versus foreign versus local positioning. J. Int. Mark. 19(3), 113–133 (2011) 22. Chang, K., Lu, L.: Characteristics of organizational culture, stressors and wellbeing: the case of Taiwanese organizations. J. Manag. Psychol. 22(6), 549–568 (2007) 23. Hofstede, G.: Culture’s Consequences: International Differences in Work-Related Values. Sage Publications, Newbury Park (1980) 24. Hui, C.H., Triandis, H.C., Yee, C.: Cultural-differences in reward allocation - is collectivism the explanation. Br. J. Soc. Psychol. 30, 145–157 (1991) 25. Nguyen, B., Chang, K., Simkin, L.: Customer engagement planning emerging from the “individualist-collectivist”-framework: an empirical examination in China and UK. Mark. Intell. Plan. 32(1), 41–65 (2014) 26. Kahneman, D., Knetsch, J.L., Thaler, R.: Fairness as a constraint on profit seeking entitlements in the market. Am. Econ. Rev. 76(4), 728–741 (1986) 27. Rotter, J.B.: Generalized expectancies for internal versus external control of reinforcement. Psychol. Monogr. 80(1), 1–28 (1966) 28. Spector, P.E., et al.: Locus of control and well-being at work: how generalizable are western findings? Acad. Manag. J. 45(2), 453–466 (2002) 29. Cyr, D.: Modeling web site design across cultures: relationships to trust, satisfaction, and eloyalty. J. Manag. Inf. Syst. 24(4), 47–72 (2008)
On the Need for Cultural Sensitivity in Digital Wellbeing Tools and Messages
733
30. Smith, A., et al.: A process model for developing usable cross-cultural websites. Interact. Comput. 16(1), 63–91 (2004) 31. McSweeney, B.: Hofstede’s model of national cultural differences and their consequences: a triumph of faith - a failure of analysis. Hum. Relat. 55(1), 89–118 (2002) 32. Kamppuri, M.: Theoretical and Methodological Challenges of Cross-Cultural Interaction Design. University of Eastern Finland, Eastern Finland (2011) 33. Abufardeh, S.. Magel, K.: The impact of global software cultural and linguistic aspects on global software development process (GSD): issues and challenges. In: 2010 4th International Conference on New Trends in Information Science and Service Science (NISS) (2010) 34. Csikszentmihalyi, M.: Flow: the Psychology of Optimal Experience, 1st edn., vol. xii, p. 303. Harper & Row, New York (1990) 35. Kaye, L.K.: Exploring flow experiences in cooperative digital gaming contexts. Comput. Hum. Behav. 55, 286–291 (2016) 36. Brehm, S., Brehm, J.: Psychological Reactance: A Theory of Freedom and Control. Academic Press, New York (1981)
Measurement of Drag Distance of Objects Using Mobile Devices: Case Study Children with Autism Angeles Quezada1(&), Reyes Juárez-Ramírez2, Margarita Ramirez2, Ricardo Rosales2, and Carlos Hurtado1
2
1 Instituto Tecnológico de Tijuana, Tijuana, Mexico {angeles.quezada,carlos.hurtado}@tectijuana.edu.mx Universidad Autónoma de Baja California, Campus Tijuana, Tijuana, Mexico {reyesjua,maguiram,ricardorosales}@uabc.edu.mx
Abstract. Today, children with autism show a great interest in using modern technology such as tablets and smartphones. Also they show a kind of skills to operate such devices; however, the set of operators used to interact with these devices were not designed for people with impairments. There is a lot of research that has been done and is focused on helping users with this type of disorder, but a challenge still remains to create applications that adapt to their physical, cognitive and motor skills. This article focuses on identifying the distance of drag that users with autism can perform with less complexity, as well as identifying the time that the user with autism needs to complete the task. The results show that the greater the drag distance, the more complicated this type of user will be. With these results we can recommend to consider smaller drag size and an image size greater than 63 pixels to design and develop mobile applications to support the teaching of autistic users. Keywords: Autism Spectrum Disorder
Usability, drag, autism
1 Introduction Autism Spectrum Disorder (ASD) is a pervasive neurodevelopmental disorder characterized by impairments in social communication and restricted, repetitive patterns of behavior, interests or activities (American Psychiatric Association [APA], 2013). On the other hand, recent studies indicate that people with ASD may present deficiencies in motor skills [1]. The accelerated increase in mobile games in the market is being driven by a powerful development of tablets and mobile phones. This tendency, have made the development of mobile applications have increased. Although the number of developments aimed at people with autism has grown in recent years, as are studies that seek to produce knowledge on how to find solutions for possible motor difficulties that people with autism can present. Due to this, it is necessary to develop technology that adapts to the motor disabilities that people with autism can present. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 734–743, 2020. https://doi.org/10.1007/978-3-030-45691-7_69
Measurement of Drag Distance of Objects Using Mobile Devices
735
Years ago different models have been developed that determine the usability of various types of applications, but these models cannot be applied to users with autism, such as (KLM-GOMS [2] and FLM [3]). The original GOMS model has been modified due to the increasing progress of technology. Despite this, research has been insufficient on how to improve user behavior techniques in users with autism; in particular, those where it is sought to evaluate the time used to achieve frequent interactions when using touchscreen devices. The objective of this study is to identify the drag time (D) that users with autism can run with less difficulty when using mobile applications. The rest of the paper is organized as follows. In Sect. 2, we analyze related work. Section 3 describes the experimental design. In Sect. 4, we present the obtained results and then Sect. 5 presents the discussion, and finally, Sect. 6 presents conclusions and future work.
2 Related Work With the progressive popularity of mobile devices, the KLM-GOMS model has recently been revised to evaluate interactions based on touch-screen devices [4, 5]. The model KLM-GOMS determined 5 operators: Drawing (D), Keystroke (K), Mental Act (M), Pointing (P) and Homing (H). Similarly, in [3] the authors proposed a modified version of the KLM-GOMS model called FLM (finger stroke level model). The purpose of this study was to define the time it takes to perform the operators of mobile devices with direct movements of the fingers (Drag (D), Point (P), Move (F) and (Touch (T)). In the experimentation using KLM-GOMS and FLM models, the authors analyzed the interactions only in typical adults with experience in the use of technology. In [6] a study is shown that evaluated the operators of drag, zoom and movement for mobile devices. The research consisted of comparing efficiency and user satisfaction during navigation with 2D documents on mobile screens. Although the results obtained were positive, the experiment was only applied to users with a typical psychological development. Also, in [7] a study was presented about the accuracy of drag-and-drop interactions for older adults when analyzing the number of additional attempts to position a target during the execution of tactile puzzle games in two different screen sizes, a smartphone and a tablet, with finger and ballpoint interaction. This study shows that drag and drop is an effective technique to move objectives; even on small touch screen devices, interaction with the pen can support older users to execute more accurate drag-and-drop interactions on touch-screen devices. In the other hand [8] examines the size of the objective and the distance between each one with smartphones. This study was applied to older adults who had little experience in the use of touch technology. The results of this study show that the larger the size of the lens (image), the easier it will be to use touch technology for this type of user. In the same way, in [9] analyzed the interaction of autistic users with mobile devices. This study took into account 6 operators: M, K, G, I (Initial Act), T (Tapping), and S (Swipe). The results suggested that users with level 1 of autism are more likely to perform operations such as K, G, I, and T than users with level 2 of autism. However,
736
A. Quezada et al.
the diversity in the design of the applications implies the use of different drag distances (D) and image sizes; therefore, it is necessary to perform an analysis of the size of the drag distance with different image sizes using mobile devices. The drag and drop operation on a touch screen was proposed for users with motor problems [10]. In this study, the authors proposed a new technique for the selection of the objective called barrier signaling; this was done with the aim of helping users with motor problems. In this article, the drag time (D) with two different drag distances is evaluated using two applications that comply with what is specified in [11], this is to define the drag distance that users with autism can make with less difficulty and the time that are needed to carry out the task.
3 Experimental Design 3.1
Method
The objective of this study is to identify the drag time (D) that users with autism can run with less difficulty when using mobile applications. It is for this reason that two applications with different sizes of drag were evaluated to define the time that a user needs to carry out the task, as shown in Fig. 1.
Fig. 1. Methodology of experimentation
3.1.1 Participants The experiment was conducted in a special education school with 11 users diagnosed with autism. In this sample, 6 users were diagnosed with level 1 of autism and 5 with level 2 of autism. The users were diagnosed by specialist psychologists and each user was associated with a level of autism according to the DSM-V [12]. Users with autism are between 5 and 11 years old. 3.1.2
Instruments
a) Kids Animals Jigsaw Puzzles It is an application to assemble puzzles developed for children offered by App Family Kids - Games for boys and girls. Each relaxing puzzle presents a beautiful different scene drawn by a professional cartoon artist, and a unique reward when the
Measurement of Drag Distance of Objects Using Mobile Devices
737
puzzle is completed. The scenes include things like cute animals, unicorns and rewards may depend on the level of the game. For this experiment, a Samsung Galaxy Tab 4 tablet was used with specifications that included a 7-in. resolution screen and 1280 800 pixels in the Android operating system. Figure 2 shows the characteristics of this application Kids Animals Jigsaw Puzzles, such as image size and drag size.
Fig. 2. Interface characteristics of app Kids Animals Jigsaw Puzzles
b) Drag and Learn the Colors For the application Drag and Learn the Colors was developed considering the drag sizes. In this application a screen with three different colored boxes is presented, which the user has to drag depending on the color that appears in the bottom part, this is to support the user in the identification of colors. This is repeated several times until the user decides to leave this activity. The objective of this application is to support users in learning colors (Fig. 3).
Fig. 3. Interface characteristics of app Drag and Learn
738
A. Quezada et al.
To measure the interaction time of the users, a video camera was used to record the interaction in the video, and then we used the ELAN is a professional tool for the creation of complex annotations on video and audio resources to measure the time of the video. 3.1.3 Procedure The experiment was carried out in the place where users attend teaching classes. A room was chosen where children attend their therapy so that they could interact with the tablet in a quiet environment. Before carrying out the experiment, the parents of the subjects signed a letter of consent for the video recording, only their hands were recorded and only while using the tablet. During the experiment the support staff (psychologists) were explained what the procedure consisted of and the use of the application. The support staff was responsible for supporting the children to perform each of the tasks. Participants used the index finger of their dominant hand to perform each of the tasks set. While the experiment was running, the participants were asked to execute the task of the first drag as quickly and accurately as possible. When the users started to interact with the applications, these interactions were recorded to measure the time later. All the tasks were repeated at least 5 times, and for the measurement they were only used from the second interaction, since the first one was considered as training.
4 Results In this section we present the results of the tests for executing the tasks. We present data about the times that users need to perform each task for each different drag distances. Figure 4 and 5 show the times for each group of users using the application Kids Animals Jigsaw Puzzles and the application Drag and Learn the Colors. Group of Autism Level 1. The results show that the maximum time of use of a user of level 1 was 2.0 s and the minimum of 1.7 s for the distance of drag, with a standard deviation of 0.1 and a median of 1.8 as shown in Fig. 4.
Fig. 4. Drag task for level 1 user for app Kids Animals Jigsaw Puzzles.
Measurement of Drag Distance of Objects Using Mobile Devices
739
Group of Autism Level 2: In the case of users with level 2 of autism, the results show that the maximum time of use of a user of level 2 was 3.3 s and the minimum of 2.1 s for the distance of drag, with a standard deviation of 0.4 and a median of 2.9, as shown in Fig. 5. It can be seen that the results show that users with level 2 of autism needed more time than users with level 1 of autism, this is understood by the nature of their spectrum, but it does not mean that they could not perform the task.
Fig. 5. Drag task for level 2 user for Kids Animals Jigsaw Puzzles.
Table 1. Comparative analysis between the times used by each child User
App Drag and learn Autism level 1 1 1.4 2 1.7 3 1.8 4 2.0 5 1.9 6 1.2 Max 2.0 Min 1.2 Median 1.7 S.D. 0.3
Puzzle 1.9 1.6 1.8 1.7 1.7 1.6 1.9 1.6 1.7 0.1
Drag and learn User Autism level 2 1 1.7 2 1.4 3 2.0 4 2.5 5 2.9 2.9 1.4 2.1 0.5
Puzzle 2.1 3.3 3.1 2.8 3.2 3.3 2.1 2.9 0.4
740
A. Quezada et al.
Table 1 shows a comparison of the times used in the use of each application for each child depending on the level of autism. As you can see that the application with which it was easier to carry out the task was that of Drag and Learn, due to its ease of use and its design. Users with level 1 autism performed the task in less time, due to the implications of level 2 users. In the case of the interaction with the application Drag and Learn the Color, users with level 1 of autism executed the task in a maximum of 2.0 s and in a minimum time of 1.2 s. For users with level 2 of autism the maximum time to execute the same task was 2.9 s and a minimum of 1.4 s, as can be seen there is a slight variation between both groups, as shows in that Fig. 6.
Fig. 6. Drag task for level 1 and level 2 user for drag and learn the colors.
5 Discussion The results in Fig. 4 and Fig. 5 show that there is a variation between each group of users, users of level 1 required less time to perform the first task that consisted of dragging an image as shown in Fig. 7, in which the time to perform the same task as users with level 2 autism required more time to carry out. This indicates that due to its nature of the spectrum this type of users require more time to be able to execute tasks using mobile applications. In addition, there is mention that most users successfully completed the assigned task, which consisted in dragging the image of the puzzle to each part of it. In the case of some users, it was necessary to instruct the use but the
Measurement of Drag Distance of Objects Using Mobile Devices
741
majority had much or little knowledge in the use of puzzles. The same case occurs in the execution of the second task with the second application Drag and Learn the Color. In this case the task was to choose the color of the image and drag it to the box this was with audio support which indicated the task to be executed. As can be seen in Fig. 6 there is a variation between the group with level 1 of autism and level 2. For this task, the users with level 1 of autism required less time to carry it out; otherwise it happens with the level 2 which required a longer time. This may be due to the size of the icon that is within size recommended by Android: the tactile lenses must be at least 48 48 dp (density-independent pixels) (Fig. 8). Based on these resulted, it may be appropriate to use larger tactile lenses to accommodate a wider spectrum of users, such as children with lower motor skills. We can see that the bigger the image, the better the user interacts with the interface. It was also observed that the applications should be as simple as possible to achieve the child’s attention. This indicates that developers should take into account the motor skills of users as are users with autism.
Fig. 7. Interaction for Kids Animals Jigsaw Puzzles
742
A. Quezada et al.
Fig. 8. Interaction for app drag and learn the colors.
6 Conclusions This article presents an experimental design to evaluate the time required for two groups of users (level 1 of autism and level 2 of autism) to perform trawl operations with different distances using a puzzle of animals for children. In this experiment, the time required by each group to perform the operations proposed by KLM and FLM, which are variants of the original model proposed by GOMS, was evaluated. In the case of the application called Drag and Learn Colors, which was considered to be larger than 60 pixels, the results showed that the interaction was easy for both users, for both applications. This indicates that the larger is the size of the image and the smaller the drag, then it will be easier for users with autism to use. The results obtained in this study allow us to conclude that users with level 1 of autism performed tasks in less time compared to users with level 2 autism, due to the cognitive and motor deficits that each user level has, despite the fact that have the same task. The difference is identified for each of the trawls with the distances used in each of the applications. As future work, more experiments will be carried out with typical users and with users who have some motor disability; two new levels will be developed in the drag and learn the colors application, where the objective will be to change the size of the images, this is to be able to counteract the results obtained with this experimentation and be able to compare the results between each group of users, this is with the purpose of being able to develop applications that they adapt to the different motor and cognitive abilities of each user.
Measurement of Drag Distance of Objects Using Mobile Devices
743
References 1. Fitzpatrick, P., et al.: Evaluating the importance of social motor synchronization and motor skill for understanding autism. Autism Res. 10(10), 1687–1699 (2017) 2. Card, S., Moran, T., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates Inc, Hillsdale (1983) 3. Lee, A., Song, K., Ryu, H.B., Kim, J., Kwon, G.: Fingerstroke time estimates for touchscreen-based mobile gaming interaction. Hum. Mov. Sci. 44, 211–224 (2015) 4. Jung, K., Jang, J.: Development of a two-step touch method for website navigation. Appl. Ergon. 48, 148–153 (2015) 5. El Batran, K., Dunlop, M.D.: Enhancing KLM (keystroke-level model) to fit touch screen mobile devices. In: Proceedings of the 16th International Conference on Human-Computer Interaction with Mobile Devices Services, MobileHCI 2014, pp. 283–286 (2014) 6. Spindler, M., Schuessler, M., Martsch, M., Dachselt, R.: Pinch-drag-flick vs. spatial input: rethinking zoom & pan on mobile displays. In: Proceedings of the SIGCHI Conference on Human Factors Computer System, vol. v, pp. 1113–1122 (2014) 7. Motti, L.G., Vigouroux, N., Gorce, P.: Drag-and-drop for older adults using touchscreen devices: Effects of screen sizes and interaction techniques on accuracy. In: 26th FrenchSpeaking Conference on Human-Machine Interaction HMI 2014, pp. 139–146 (2014) 8. Leitão, R., Silva, P.: Target and spacing sizes for smartphone user interfaces for older adults: design patterns based on an evaluation with users. In: Conference on Pattern Language Programs, vol. 202915, pp. 19–21 (2012) 9. Quezada, A., Juárez-Ramírez, R., Jiménez, S., Noriega, A.R., Inzunza, S., Garza, A.A.: Usability operations on touch mobile devices for users with autism. J. Med. Syst. 41(11), 184 (2017) 10. Froehlich, J., Wobbrock, J.O., Kane, S.K.: Barrier pointing - using physical edges to assist target acquisition on mobile device touch screens. In: Proceedings of the 9th International Conference on Computer Access, pp. 19–26 (2007) 11. Quezada, A., Juárez-Ramírez, R., Jiménez, S., Ramírez-Noriega, A., Inzunza, S., Munoz, R.: Assessing the target’ size and drag distance in mobile applications for users with autism. In: Advances in Intelligent Systems and Computing, vol. 746, pp. 1–10 (2018) 12. Lord, C., Rutter, M., Couteur, A.L.: Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 24(5), 659–685 (1994)
Right Arm Exoskeleton for Mobility Impaired Marius-Nicolae Risteiu
and Monica Leba(&)
University of Petrosani, Petrosani str. Universitatii, Petroșani, Romania [email protected]
Abstract. This paper is focused on the research regarding the development of an exoskeleton robotic device for people who have limitations in the basic movements of the upper limbs. Although some movements may seem trivial, they are essential in the rehabilitation process, but also in daily life, because the patient will no longer be dependent on another person for certain operations. First part of this research is focused on modeling and simulating of the human arm movements to determine their limits and particularities. The exoskeleton device is then modeled and simulated, setting the limits and the movements that it will execute following the movements of the arm. The proposed device is equipped with electromyography (EMG) signal acquisition and analysis system that is able to perceive the wearer’s movement intentions, and the drive system will help and support the independent movement of the robotic exoskeleton. Keywords: Robotic exoskeleton
Modelling Electromyography
1 Introduction In the last years, more and more techniques and technologies have appeared that assist certain operations, which has led to the emergence of specially designed devices [1] for the rehabilitation of people with different degrees of disability. Especially in the case of the amputated arms or if the patient has reduced mobility and control over the limbs, several types of devices have been designed [2]. Currently, the prostheses used as extensions of any member bring both aesthetic and flexibility improvements [3]. In this research have been studied the movements made by a healthy human arm in order to be reproduced by an exoskeleton device that will help and support the movement of the arm for a person with mobility impairments. To be able to do that it must to be known all the movements that can be done by the arm and simulate them to establish which are the limits. After that the exoskeleton movements will be simulate to determine if its movements can be synchronized with the human arm. Next step is to create an exoskeleton device and implement a control program for it to follow the arm to help and sustain the movement desired by the wearer without forcing the human arm. This research is legitimate due to the fact that more and more people have arm movements problem, due to aging, accidents etc., and they fail to make many elementary movements. For the movement or rehabilitation of the limbs each patient would need a certain exoskeleton model. This has led to the emergence of customized devices, the only disadvantage being the cost of implementation [4]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 744–754, 2020. https://doi.org/10.1007/978-3-030-45691-7_70
Right Arm Exoskeleton for Mobility Impaired
745
2 State of the Art In the paper Developments in Hardware Systems of Active Upper-Limb Exoskeleton Robots: A Review, is analyzed the evolution of the active exoskeleton robots used for the upper limbs. This paper presents major evolutions that have been made over time, essential landmarks during evolution and the major research challenges in the current context of exoskeleton systems for upper limbs. Moreover, the paper offers a classification, a comparison and an overview of the mechanisms, drive and power transmission for most of these types of devices, which have been found in the literature. There is also a brief overview of the control methods used on exoskeletons for the upper limbs [5]. This field of exoskeletons is constantly developing in this case, as it is difficult to find a single definition of these types of devices. They are generally defined as: • Exoskeletons are portable devices that work in tandem with the user. The opposite of an exoskeleton device would be an autonomous robot that works instead of the operator. • Exoskeletons are placed on the user’s body and are used to amplify, strengthen or restore human performance. The opposite would be a mechanical prosthesis, such as an arm or a robotic leg that replaces the original part of a person’s body that has problems. • Exoskeletons can be made from rigid materials, such as metal or carbon fibers, or can be made entirely of soft and elastic parts. They can be powered and equipped with sensors and actuators or they can be completely passive. • Exoskeletons can be mobile or fixed/suspended (usually for rehabilitation or teleoperation). • Exoskeletons can cover the entire body, only the upper or lower extremities, or even a specific segment of the body, such as the ankle or hip. • Exoskeleton technology - the use of an external framework that can increase human physical capacity [6]. So far, many robotic devices have been successfully introduced in the process of rehabilitation of the upper limbs. However, robots such as the MIT-MANUS, GENTLE/S, Tire-WREX and NEREBOT are specially designed to assist with flexion/extension exercises of the arm but lack the ability to assist wrist and forearm movements. On the other hand CRAMER, RICE WRIST, HAND MENTOR and HVARD are robotic devices specially designed to perform exercises for the forearm and wrist movements but do not allow the arm to perform other movements. An approach made in such a way that the limitations mentioned above are overcome and the device created can be used in different ways that would allow the rehabilitation of an entire superior member is realized by the developers of the ARMin project. Developers came up with the proposal for a universal haptic device (UHD) based on a haptic unit with two degrees of freedom and a mechanical design that allows the
746
M.-N. Risteiu and M. Leba
upper limbs to be rehabilitated in two different ways: ARM or WRIST. In the ARM mode, the universal haptic device can perform flat exercises that focus on the shoulder and elbow. In the WRIST mode, it performs the tasks that move the forearm and wrist [7]. That the user not to be affected by the weight of the exoskeleton, certain models such as SUEFUL 7 have been specially designed to be attached to wheelchairs. This model can assist shoulder, elbow movements or forearm rotation. Considering the design criteria, such as reducing the weight of the exoskeleton to ensure a low moment of inertia, safety in operation, wear resistance and ease of maintenance, the exoskeleton-type robotic system of the upper limbs has been developed to help to rehabilitation and daily activities of people with force problems [8]. In the paper Exoskeleton for Improving Quality of Life for Low Mobility Persons is presented an exoskeleton that must be fixed on the right arm of the user and EMG sensor electrodes must be positioned on the muscle groups so that a better signal can be taken [9].
3 Modeling and Simulation 3.1
Modeling and Simulation of the Human Arm
The simplified approach of the human arm for the development of kinematic and dynamic mathematical models are presented. For kinematic models, Denavit Hartenberg formalism is used to determine the direct and inverse kinematic model. For the kinematic model, all rotational movements from the three arm rotating joints are considered: shoulder (3 rotation), elbow (2 rotation) and wrist (2 rotation). For the direct and inverse kinematic model, was only considered the 5 rotation joints from shoulder and elbow because this research is focused on the position of the wrist. For dynamic modeling, the Jacobian method is used to determine speeds, both direct and inverse [10]. 3.2
Direct Kinematic Model of the Human Arm
In order to model the human arm, has been achieved a simplified model of it, considering the following joints: • A joint with 3 rotations around the axes x, y and z in shoulder; • A joint with 2 rotations around the axes y and z in elbow; • A joint with 2 rotations around the axes x and z in wrist; In Fig. 1 is presented the simplified arm diagram. For further calculations were considered simple joints having a single rotational motion around an axis. Thus, in the shoulder are 3 rotating joints with the distance between them null, in the elbow and in the wrist, and 2 simple rotating joints with the distance between them also null. In order to determine the mathematical model for arm control was applied the Denavit-Hartenberg formalism. Figure 1 shows, in addition to the simplified arm
Right Arm Exoskeleton for Mobility Impaired
747
model, with rotating joints in the shoulder and elbow, the reference systems specific to this formalism. It is considered in the analysis below that the kinematic chain ended in the wrist, having 5 degrees of freedom and a supplement sixth reference frame to get the position of the wrist. Was taken into consideration the reference system against which analyzed the movement of the arm having the center in the center of the shoulder joint.
Fig. 1. Simplified model of the human arm
In Table 1 are determined the Denavit-Hartenberg coordinates, resulted by the successive comparison two by two of the coordinate systems in Fig. 1. In this table only the dimensions d3 and d5 corresponding to the non-null elements up to the wrist appear, the other, d1 , d2 related to the shoulder and d4 related to the elbow, being replaced with the value 0, as we considered in the simplification above.
748
M.-N. Risteiu and M. Leba Table 1. Denavit-Hartenberg coordinates Elem. Coord. ai ai 1 0 p=2 2 0 p=2 3 0 0 4 0 p=2 5 d5 0
di 0 0 0 d3 0
hi h1 p=2 þ h2 h3 p=2 þ h4 p=2 þ h5
Denavit-Hartenberg direct kinematic model above has been implemented in Matlab Simulink. Figure 2 shows the direct kinematic model of the arm having 5 joints.
Fig. 2. 3D arm simulation model
4 Design of an Exoskeleton for the Right Arm The mechanical design of an exoskeleton for the right arm that will allow the human arm to be tracked and that can support the arm in case the person using the exoskeleton is tired is presented. For this, both kinematic and dynamic mathematical models were developed. Similar to the arm, the kinematic models use the Denavit-Hartenberg formalism to determine the direct and inverse kinematic model. For dynamic models, the Jacobian method is used to determine the speeds, both direct and inverse. As a
Right Arm Exoskeleton for Mobility Impaired
749
modeling-simulation environment, MatLab was used, together with the SimMechanics toolbox in Simulink, which allows the implementation of kinematic and dynamic mathematical models to simulate exoskeleton motion. 4.1
Mechanical Design
In order to carry out the design of the mechanical part of the exoskeleton, a research was performed first on the existing devices and the patents with this subject. After studying in detail several types of such devices, was developed another type of exoskeleton model used to support the movement of the right upper limb. In Fig. 3 is presented the diagram of the mechanical part of exoskeleton.
Fig. 3. Diagram of the mechanical part of the exoskeleton
As can be seen in Fig. 3, the proposed exoskeleton consists of three joints. The first joint is a translation one, included as correspondent to the shoulder up-down movement instead of a rotation one that would have needed a higher torque drive motor. The other two joint are rotation joints, one for the left-right movement of the shoulder and the other for the same movement of the elbow. 4.2
Denavit-Hartenberg Direct Kinematic Model for the Exoskeleton
The exoskeleton consists of three joints, one translation and two rotating ones.
750
M.-N. Risteiu and M. Leba
The kinematic analysis of this exoskeleton is done by applying the DenavitHartenberg formalism. In this way was obtained the direct and inverse kinematic model of the exoskeleton. In Fig. 4 the schematic model of the exoskeleton is presented together with the Denavit-Hartenberg coordinates in each joint (Table 2).
Fig. 4. Denavit-Hartenberg formalism for the exoskeleton
Table 2. Denavit-Hartenberg coordinates Coord. Elem. ai 1 d2 2 d3 3 d4
ai 0 0 0
di d1 0 0
hi 0 h2 h3
The direct kinematic model determined above was implemented in a MatLab model, shown in Fig. 5.
Right Arm Exoskeleton for Mobility Impaired
Fig. 5. Exoskeleton block model
In Fig. 6 is presented the exoskeleton 3D model top view and front view.
Fig. 6. Exoskeleton position: top view, front view
751
752
M.-N. Risteiu and M. Leba
5 Control System Design For the control part of the exoskeleton, signals were acquired from EMG and IMU sensors. EMG electrodes are placed on the subject’s skin to determine the movement intention and the IMU sensor is attached to the moving arm to determine the arm orientation during movement. These data were entered into an artificial neural network using the EMG signal as input and arm position (IMU signal) as output [11]. Subsequent to the training of the neural network only input signals were used, based on which the control algorithm determines the position in which the arm is, in order to bring the exoskeleton into the same position. In Fig. 7 is presented the block diagram for control the proposed exoskeleton device.
Fig. 7. Block diagram of exoskeleton device
For the training part of the neural network, the Matlab program was used, namely the Neural Network Start library for non-linear input-output systems, called ntstool [12]. The identification of the system is based on a nonlinear NARX model, using a serial - parallel architecture of the neural network [13]. For a NARX model, with an architecture of 10 neurons with 2 delays, the following results were obtained (Fig. 8):
Right Arm Exoskeleton for Mobility Impaired
753
Fig. 8. Arm and exoskeleton position at begin and end of movement
Figure 9 presents the linear regression obtain after training part of Neural Network.
Fig. 9. Linear regression
754
M.-N. Risteiu and M. Leba
6 Conclusions Exoskeleton-type robots have become increasingly common devices in everyday life. These are in continuous development and have become customized for each individual user. Exoskeletons have become more used for rehabilitation exercises and in the recovery of the limbs. The exoskeleton proposed in this paper has been modeled and tested beforehand to see if its movements exactly respect the movements of the human arm. The control part is made using non-invasive EMG sensors and can be used by people who still can control their intention to make a move but do not have enough muscle strength to fully do it, so the movement is sustained by the exoskeleton.
References 1. Riurean, S., Antipova, T., Rocha, Á., Leba, M., Ionica, A.: VLC, OCC, IR and LiFi reliable optical wireless technologies to be embedded in medical facilities and medical devices. J. Med. Syst. 43, 308 (2019) 2. Negru, N., Leba, M., Rosca, S., Marica, L., Ionica, A.: A new approach on 3D scanningprinting technologies with medical applications. In: IOP Conference Series: Materials Science and Engineering, vol. 572 (2019) 3. Kiguchi, K., Rahman, M.H., Sasaki, M., Teramoto, K.: Development of a 3DOF mobile exoskeleton robot for human upper-limb motion assist. Robot. Auton. Syst. 56(8), 678–691 (2008) 4. Panaite, A.F., Rişteiu, M.N., Olar, M.L., Leba, M., Ionica, A.: Hand rehabilitation- a gaming experience. In: IOP Conference Series: Materials Science and Engineering, vol. 572 (2019) 5. Gopura, R.A.R.C., Bandara, D.S.V., Kiguchi, K., Mann, G.K.: Developments in hardware systems of active upper-limb exoskeleton robots: a review. Robot. Auton. Syst. 75, 203–220 (2016) 6. https://exoskeletonreport.com/what-is-an-exoskeleton. Accessed 28 Oct 2019 7. Nef, T., Guidali, M., Klamroth-Marganska, V., Riener, R.: ARMin, exoskeleton robot for stroke rehabilitation. In: World Congress on Medical Physics and Biomedical Engineering, pp. 127–130 (2009) 8. Gopura, R.A.R.C., Kiguchi, K,, Li, Y.: SUEFUL-7: a 7DOF upper-limb exoskeleton robot with muscle-model-oriented EMG-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). St. Louis, MO, pp. 1126–1131 (2009) 9. Risteiu, M., Leba, M., Arad, A.: Exoskeleton for improving quality of life for low mobility persons. Qual. Access to Success. Supplament1 20, 341–346 (2019) 10. Risteiu, M.N., Rosca, S.D., Leba, M.: 3D modelling and simulation of human upper limb. In: IOP Conference Series: Materials Science and Engineering, vol. 572 (2019) 11. Kiguchi, K., Hayashi, Y.: An EMG-based control for an upper-limb power-assist exoskeleton robot. IEEE Trans. Syst. Man Cybern. B Cybern. PP(99), 1–8 (2012) 12. Anastassiou, G.A.: A recurrent neural fuzzy network. J. Comput. Anal. Appl. 20(2) (2016) 13. Rosca, S.D.; Leba, M.: Using brain-computer-interface for robot arm control. In: MATEC Web of Conferences, vol. 121, pp. 08006 (2017)
Semi-automatic Eye Movement-Controlled Wheelchair Using Low-Cost Embedded System Gustavo Caiza1 , Cristina Reinoso2(&) , Henry Vallejo3 Mauro Albarracín4 , and Edison P. Salazar4
,
1
3
Universidad Politécnica Salesiana, Quito 170146, Ecuador [email protected] 2 Universidad Técnica de Ambato, Ambato 180103, Ecuador [email protected] Escuela Superior Politécnica de Chimborazo, Riobamba 060155, Ecuador [email protected] 4 Universidad Técnica de Cotopaxi, Latacunga 050102, Ecuador {mauro.albarracin,edison.salazar}@utc.edu.ec
Abstract. An intelligent wheelchair prototype is presented, reusing old damaged equipment and incorporating low cost elements for repowering it. The system can be operated in manual mode (total user control) and semi-automatic depending on tasks to be performed and whether other secondary tasks are carried out (answering questions or picking up objects) as simulated in this experiment. When manual mode is executed, a coherence algorithm has been incorporated that allows the wheelchair to be guided with the eye movement and in the semi-automatic mode, system takes control when user do not execute a control action when an obstacle is about to crash with the chair. For a greater interaction between system and user, a basic, friendly and easy-to-use interface has been developed that empowers the performance of activities throughout this experiment. Participants selection, as well as the experimental tests performed are described in this document. The qualitative and quantitative results obtained allow to validate the efficiency of this system, as well as the satisfaction of the users through the respective tests. Keywords: Embedded systems interaction Wheelchair
Eye movement Human computer
1 Introduction On a global scale life expectancy of the inhabitants has increased thanks to the continuous discoveries in the medical field that go hand in hand with the technological development, and the processes of current globalization [1–3]. The majority of the population has at least one health problem, which often times does not distinguish age, gender, or social status [4]. According to the International Day of People with Disabilities1 for the year 2025, 20% of the population will have at least one type of 1
https://idpwd.org/.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 755–764, 2020. https://doi.org/10.1007/978-3-030-45691-7_71
756
G. Caiza et al.
disability or will require any type of pre-ambulatory care [5]. In Ecuador according to the National Council for the Equality of Disabilities, 1.205% of the population has a type of motor disability, where 52,84% of the cases the subjects have an age between 30 to 65 years old. People who suffer from any degenerative disorder in their lower extremities usually use crutches and wheelchairs for their movement [6]. These medical support tools have undergone great changes over time, thus adjusting to the needs of the patient [7]. Currently there are several types of wheelchairs available, and they work with electromechanically systems, which have been automated with the purpose of increasing the autonomy of the patient [8]. This is how the term “intelligent wheelchair” emerges, which in addition to the displacement function of its predecessors, includes other functions that seek to efficiently meet the requirements of a society that is increasingly demanding [9–11]. Conventionally this medical tool can be moved when it is pushed by the patient or by an external assistant, but researchers have developed other ways to perform this action. With help of the most recent technological advances, it is possible to obtain brain wave, eye movement, and sound signal, gestures, among others. Depending on the needs of the disabled person. Eye movement is used to provide more autonomy to users who present catastrophic illness due to physical traumas [12]. Here we have medical conditions such as paraplegia, amputations caused by diabetes and total or partial paralysis due to damage to one of the brain hemispheres. Also, these tools are prioritized for patients that suffer from degenerative diseases of both lower and higher limbs such as arthritis Amyotrophic Lateral Sclerosis (ALS) and muscular dystrophy among others. This hereditary issue that causes muscle weakness and loss of muscle mass attacks to human beings regardless of their age. It is a critical condition, since it worsens as the muscles get weaker, until the ability to walk is lost. Associated muscle weakness can hinder an individual’s ability to operate a wheelchair. Due to increased weakness or fatigue in the arms and shoulders preventing the individual from taking long routes, coupled with weakness in the neck muscles that cause discomfort. Electric wheelchairs become the principal conveyance for many people with severe motor disabilities. For those with lower limb disability, there are tools that may be impossible or very difficult to use, as well as not very efficient. In that context, in [13] a user-friendly interface is proposed that allows the movement of a wheelchair through head movements. These movements provide continuous direction and speed commands to steer the wheelchair and incorporate other conditions depending on the user. Preliminary experiments carried out demonstrate the effectiveness of this system. Similarly, [14] briefly describes the design of a wheelchair that can be completely controlled with eye movements and flickering. Significant improvement in performance over traditional image processing algorithms is also demonstrated. This will let these patients to be more independent in their day to day and strongly improve the quality of life at an affordable cost. In this context, intelligence is provided to a conventional wheelchair as the principal objective of this research. This is achieved by adapting a low-cost embedded system that includes a BeagleBone Black board as a controller, and the development of an intuitive interface in LabVIEW software. This allows the patient to learn how to use this system and achieve autonomy. To validate this prototype, the corresponding tests are carried out, with the intention of improving this proposal. This document is divided
Semi-automatic Eye Movement-Controlled Wheelchair
757
into five sections, including the introduction in Sect. 1 and the hardware used in Sect. 2. Section 3 presents the development of the interface and implemented algorithm and Sect. 4 describes the test and results obtained. Finally, the conclusions are presented in Sect. 5.
2 Hardware A brief description of the proposed system can be seen in Fig. 1. For this prototype, authors have recycled a damaged electric wheelchair to use its infrastructure in an efficient way. From the mechanical point of view, 12 VDC and 75 W motors, which are regularly used in this type of equipment, have been adapted. In addition, a 12 V/20 Ah battery and a 1.5 A charger are incorporated. The wheels are 80″ (20 cm) of polyurethane and are designed to support a weight of 135 kg. Motor control is carried out with a driver that receives the BeagleBone Black low-cost board signal that has all the functions of a basic computer.
Fig. 1. Diagram of the control system implemented.
This embedded device has a 720 MHz Sitara ARM Cortex-A8 processor, 256 MB RAM, Ethernet port, microSD slot, a USB port and a multipurpose device port that includes a low-level serial control. To carry out the signal conditioning which includes filters, amplification and impedance coupling, an electronic board has been developed. In addition, for one of the available modes, 4 proximity sensors (ultrasonic) are incorporated to define when an object is approaching to the wheelchair since the eyeball is focused on the path to be followed instead of objects to avoid. For eye-ball image acquisition, a small USB 2.0 camera that attaches to a pair of lenses was used. The processing of images obtained and the interface are performed on a laptop with Windows 10 O.S. 64-bits, i7-3630QM processor - 2.40 GHz and 6 GB of RAM.
758
G. Caiza et al.
Image acquired goes over a Gauss filter using (1). Then, the picture is ready to be processed and immediately determine a control action depending on what user eye-ball does [15]. x2 þ y2 1 gðx; yÞ ¼ pffiffiffiffiffiffiffiffi e 2r2 2pr
ð1Þ
3 Introduction Software 3.1
Interface Design
The control interface is designed in the LabVIEW software, using the Vision Development module to perform the video acquisition and start with the image processing from the installed camera. The user has the chance to choose the operation mode with which the trajectory will be executed, these are: manual and semi-automatic. For manual mode, the eye-ball moves slightly up and it goes down as the wheelchair gets closer to the final point of its trajectory. It defines the speed of the wheelchair. The upper the eye-ball goes, the fastest the wheelchair moves. This mode is operated by the user, allowing him/her to be aware and avoid obstacles across the trajectory. Figure 2 shows the interface presented to the user in manual mode. When an obstacle is approaching a distance of 5 cm, alerts are issued on the screen, in order to keep the user informed. For Semi-automatic mode, speed becomes constant and the user is now only worried about the wheelchair’s direction. The used algorithm ignores the user command when an obstacle is close to the wheelchair. After evading the object, it returns the control to the user in order to continue the path until it reaches the final point.
Fig. 2. Design of the interface in NI LabVIEW.
Manual Mode. Patients receive instructions to control the wheelchair by using the eyeball movement. In this way they can execute tasks of displacement by themselves with total autonomy in decision making. Even in manual mode and for safety reasons,
Semi-automatic Eye Movement-Controlled Wheelchair
759
alert signals are automatically displayed when there are objects nearby that can obstruct the proposed route. Semiautomatic Mode. It is common that devices and systems that provide help require the handling or addressing of users. While human beings, based on their experiences, contribute with knowledge and fast problem solutions, an intelligent device can reduce the user effort when performing tasks and or stress situations. The framework developed in this proposal allows the control and planning of movements to be executed through a computer between the person and this system called intelligent. To reach the desired goal, as in manual mode, messages about nearby objects are presented indicating that they will be evaded. If the corresponding evasion by the user is not carried out, even when system shows the presence of objects, wheelchair takes the control and performs it. Then, returns the control to the user. This is only in collision situations, i.e. the semi-autonomous mode has this unique condition. That is why the amount of system’s autonomy is previously defined by the selected mode to operate considering certain cognitive factors of the person who is going to use the system. 3.2
Development of the Control Algorithm
In manual mode, the movement of the electric wheelchair is performed according to the movement of the patient’s eyes. To achieve it, a coherence algorithm has been used to detect the eyeball movement. This algorithm works in the frames extracted from one of the eyes videos in real time. Image obtained becomes a binary type where the Iris is identified as a black area and the Sclera as the white area. The pupil of the eye is centered on the graphic indicator, and depending on its displacement, numerical values are assigned which will then be used in the control stage. In semi-automatic mode, a tuned PID was developed to give autonomy to the wheelchair when an object is nearby and user do not execute a corrective action. It ensures the completion of the task, without collisions, as long as the wheelchair is guided to the final point of the trajectory. The controller was tuned following (2–4) proposed by Aström and Hägglund. This method is an improved version of Ziegler and Nichols where the tunning base is the response at a stationary state of the system where it can be tested some of the parameters to obtain the PID constants. Constants values are shown in Table 1. 0:3 0:1 Kp ¼ kc Ti ¼
TD ¼
kc 4 K
0:6tc 1 þ 2 kKc
0:15 1 kKc tc 1 0:95 kKc
ð2Þ ð3Þ
ð4Þ
760
G. Caiza et al.
Where: K: Quotient between changes. From an initial stationary state until gets another after system excitation. kc : Critic gain obtained when system produces a periodic oscillation. tc : Swing period maintained. Table 1. PID controller’s constants Constant Kp Ti Td
Value 0.182 0.087 0.001
On the BeagleBone Black board, DebianOn O.S. has been installed. It is a free software distribution for computer systems with different hardware architectures. Complementary libraries are installed for the use of analog outputs and its connection is established with the proper add-ons responsible for activating the motors depending on the signal received.
4 Tests and Experimental Results 4.1
Participants
As an exclusion criterion, authors established that age must be over 18, without considering sex, education level, ethnicity or social/economic situation. People who do not have technology or informatics affinity were also considered even when the graphic interface is considered as easy-use. Each participant was subjected to a different physic exercise and to a questionnaire that finds out the cognitive capacity of the user in order to perform all required task for this experiment. As a result, from a 14 people universe, 6 of them were selected after the previous interview. 4.2
Experimental Test
An induction process for handling the interface as well as the wheelchair was carried out. It lasts between 18 and 20 min per session. Additionally, the capable to control a wheelchair as well as past interactions with technological systems were evaluated in general. Two rooms were selected to perform this experiment. In the first one, basic driving exercises were carried out in order to get the user used to the prototype system and as first test a simple path in straight line and without obstacles were asked to perform. To begin the second experiment, participants order was randomly selected to avoid relation at the executed sequence, so the process finish being equitable. Some patients started testing the manual mode and others the semiautomatic due to they have the
Semi-automatic Eye Movement-Controlled Wheelchair
761
opportunity to choose the one they prefer. While user performs the test, a simple questionnaire was subjected to him/her which was answered verbally. The objective to carrying out this secondary activity is to simulate a situation of distraction that a person could have on daily life. Similarly, a phone was placed at the center of the path and asked to be picked up to test the system flexibility of this proposal. In Fig. 3 the patient can be seen performing the proposed trajectory.
Fig. 3. Patient performing experimental tests in manual mode.
4.3
Result Analysis
As long as participants executed the task, both qualitative and quantitative data were collected. From the technical point of view, an observer collects all information about gestures, postures, expressions and feelings that patient shows while performing the trajectory. Table 2 describes the percentage of completion that every participant achieves per mode. It is important to mention that secondary activities were a determining point to reach the target so, some participants could not complete the path because of the situation in which they were subjected. Time. This time includes the follow-up of the trajectory, execution of the secondary tasks and the evasion of obstacles. Experimentally, it was determined that the semiautomatic mode requires a greater investment of time to complete the established trajectory. Despite waiting for similar times in both modes, in manual mode, patient controls the movement of the wheelchair in a better way and follows a path avoiding obstacles. For its part in the semi-automatic mode, the chair has the control and when approaching obstacles, users lose control and reduce speed excessively in order to avoid obstacles automatically which symbolizes a greater investment of total time.
762
G. Caiza et al. Table 2. Completion of tasks per mode Users Gender M (%) SA (%) FMS Q (%) PPU Patient 1 Male 90 85 Semi-automatic 70 Yes Patient 2 Female 95 90 Manual 100 Yes Patient 3 Male 70 100 Manual 90 No Patient 4 Male 75 100 Semi-automatic 80 No Patient 5 Male 98 90 Semi-automatic 100 Yes Patient 6 Female 100 90 Semi-automatic 90 Yes Abbreviations: M: Manual mode; SA: Semi-automatic mode; FMS: First Mode Selected; Q: Questionnaire; PPU: Phone Picked Up
Comfort. Another way to evaluate the performance of this proposal is to determine the number of commands executed throughout the test. In manual mode, a greater humanmachine interaction is made compared to the semi-automatic mode. In the last one, the computer controls the movement of the equipment when evading obstacles, which reduces the effort made. It should be considered that two secondary activities are also carried out, which increases the complexity of this experiment and could induce the patient in a stressful situation. From this point the manual method is the least efficient. Collisions. The trajectories in manual mode followed by all participants are similar, all of them included 5 objects to avoid and a cell phone to pick up. Another important factor to realize is the number of collisions that occurred when using the manual mode and those caused by the system in the semi-automatic mode. This can be seen in Table 3. Table 3. Collision in each mode of the wheelchair Users Patient Patient Patient Patient Patient Patient
1 2 3 4 5 6
Semi-automatic mode Manual mode 3 1 1 2 0 2 1 4 1 1 1 0
Finally, they are carried out a satisfaction test to establish their sensations regarding the prototype and the interface that has been used by them. This allows to define the level of efficiency that has been obtained in this experiment, where 5 is the highest grade and 1 is the minimum. The main questions can be seen in Table 4.
Semi-automatic Eye Movement-Controlled Wheelchair
763
Table 4. Satisfaction test questions and their results. Question Are you satisfied with the system? Is the system safe for you? Do you feel you control the wheelchair? What is your level of frustration while driving? Do you think the system is flexible and friendly? How difficult was it to avoid obstacles? How difficult was it to carry out secondary activities? Which mode do you think requires major improvements?
Value 4,67 4,17 4,5 1,33 4 1,5 1,83 Semi-automatic
5 Conclusions In this low-cost proposal, a system with a high degree of efficiency and robustness has been implemented. Depending on the previous experiences of patients with technological devices, it was evidenced that some had more difficulty getting used to the proposed system. After that, a greater interaction was evidenced as shown by results obtained from experimental tests. To complete the entire journey, an average time of 62,8 s has been used and the number of crashes has been significantly low, so it means that the system contributes to patient safety and comfort. Although patients could vary the speed at which the chair moved, some of them did not feel satisfied because they felt the need for greater speed. Another limitation was that due to the proposed exclusion criteria, there was a small number of patients who could participate in this experiment so it limits the amount of data obtained and the reactions perceived. In this study, technical data such as time used or the number of crashes has been considered to determine the system’s efficiency. Tests have also been carried out to assess the conformity and satisfaction that individuals have after testing this technological tool. These reactions allow to obtain a better feedback and to be able to constantly improve this prototype until it provides a greater benefit to users of low economic resources. Based on the criteria obtained in this preliminary study, the authors propose, as a future work, to implement an improved prototype that already includes the automatic mode and make a comparison again. It is also proposed to include a greater number of individuals and thus be able to have greater participation and interaction.
References 1. Mertz, L.: Tissue engineering and regenerative medicine: the promise, the challenges, the future. IEEE Pulse (2017). https://doi.org/10.1109/MPUL.2017.2678101 2. Anand, S., Sharma, A.: Internet of medical things: services, applications and technologies. J. Comput. Theor. Nanosci. (2019). https://doi.org/10.1166/jctn.2019.8283
764
G. Caiza et al.
3. Buele, J., Espinoza, J., Bonilla, R., Edison, S.-C., Vinicio, P.-L., Franklin, S.-L.: Cooperative control of robotic spheres for tracking trajectories with visual feedback. RISTI - Rev. Iber. Sist. e Tecnol. Inf. (E19), 134–145 (2019) 4. Galarza, E.E., Pilatasig, M., Galarza, E.D., López, V.M., Zambrano, P.A., Buele, J., Espinoza, J.: Virtual reality system for children lower limb strengthening with the use of electromyographic sensors (2018). https://doi.org/10.1007/978-3-030-03801-4_20 5. Steint, M.A.: Disability human rights. In: Nussbaum and Law (2017). https://doi.org/10. 4324/9781315090412 6. Simon, J.L.: Of bodybuilders and wheelchairs. In: Disability and Disaster: Explorations and Exchanges (2015). https://doi.org/10.1057/9781137486004_21 7. Salazar, F.W., Núñez, F., Buele, J., Jordán, E.P., Barberán, J.: Design of an ergonomic prototype for physical rehabilitation of people with paraplegia. Presented at the 28 October 2019 (2020). https://doi.org/10.1007/978-3-030-33614-1_23 8. Stenberg, G., Henje, C., Levi, R., Lindström, M.: Living with an electric wheelchair - the user perspective (2016). https://doi.org/10.3109/17483107.2014.968811 9. Rabhi, Y., Mrabet, M., Fnaiech, F.: Intelligent control wheelchair using a new visual joystick. J. Healthc. Eng. (2018). https://doi.org/10.1155/2018/6083565 10. Schwesinger, D., Shariati, A., Montella, C., Spletzer, J.: A smart wheelchair ecosystem for autonomous navigation in urban environments. Auton. Robots (2017). https://doi.org/10. 1007/s10514-016-9549-1 11. Tang, J., Liu, Y., Hu, D., Zhou, Z.T.: Towards BCI-actuated smart wheelchair system. Biomed. Eng. Online (2018). https://doi.org/10.1186/s12938-018-0545-x 12. Nguyen, Q.X., Jo, S.: Electric wheelchair control using head pose free eye-gaze tracker. Electron. Lett. (2012). https://doi.org/10.1049/el.2012.1530 13. Gomes, D., Fernandes, F., Castro, E., Pires, G.: Head-movement interface for wheelchair driving based on inertial sensors. In: Proceedings of the 6th IEEE Portuguese Meeting on Bioengineering, ENBENG 2019 (2019). https://doi.org/10.1109/ENBENG.2019.8692475 14. Rajesh, A., Mantur, M.: Eyeball gesture controlled automatic wheelchair using deep learning. In: 5th IEEE Region 10 Humanitarian Technology Conference 2017, R10-HTC 2017 (2018). https://doi.org/10.1109/R10-HTC.2017.8288981 15. Fernández-S, Á., Salazar-L, F., Jurado, M., Castellanos, E.X., Moreno-P, R., Buele, J.: Electronic system for the detection of chicken eggs suitable for incubation through image processing. In: Advances in Intelligent Systems and Computing (2019). https://doi.org/10. 1007/978-3-030-16184-2_21
Design and Control of a Biologically Inspired Shoulder Joint Marius Leonard Olar
, Monica Leba(&)
, and Sebastian Rosca
University of Petrosani, Universitatii 20, Petroșani, Romania [email protected], [email protected] Abstract. In the field of human-centered robotics, there are needed flexible, robust, agile robots that have the natural human like mechanisms in place. In order to achieve these robots, human motor systems must be analyzed and abstracted, from the mechanical level, to the behavioral and cognitive level. In this paper will be presented the abstracted human shoulder joint, seen as a modified Stuart platform, with a platform supported on a pivot and driven by four actuating motors. The elements of resistance imitate the shoulder blade and the humerus, and those of drive and control of the movements represent the four main muscles of the shoulder. They have the shape close to the natural shape of the human bones, and the elements of drive and control, which take over the functions of the muscles, have the location and the points of connection with the elements of resistance similar to the natural connections between bones and muscles. The connections between the natural muscles and the bones are made using the tendons. These flexible links, coupled with the open shape of the shoulder joint, drive the humerus in a 360° rotational motion. To allow a left-upright-down-left circular motion, we introduced the tendon that has a cardan coupling, on the side of the muscles and a sphere on the side of Humerus. Keywords: Robot control
Bio inspired Shoulder movement
1 Introduction In most cases, with the help of humanoid robots, we try to imitate the natural movements of humans, but the mechanisms used are different, anatomically speaking, from those of humans. However, efforts are being made to control the trajectory of the robot members so that it is as smooth as possible [1]. Obviously, the robots used in different industrial environments have the specific movements of their assigned work, movements reduced to the essential, while the humanoid robots, which have implemented the same mechanisms, have the same movement patterns as humans. But if a robot or just one arm of it interact with the human or be part of an exoskeleton, attached to a part of the human, then it should copy not only the shape but also the internal structure of the human body, so that human-machine interaction can be as natural as possible. Two problems arise in the construction of biologically inspired humanoid robots [2], the first refers to building the skeleton and the muscles that create a unitary system with the skeleton and with which to control parts of the skeleton, and the second refers to the abstracting of the component elements, to make them easier to control [3]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 765–774, 2020. https://doi.org/10.1007/978-3-030-45691-7_72
766
M. L. Olar et al.
Humanoid robots are those robots that take not only the human form, but also the structure of the skeleton and biological functions, which allow the perception of the environment, at the sensory level, and the appropriate response of the system to the action of the environment on it. The natural movements, specific to the human being, can be achieved by combining a realistic skeleton driven by artificial muscles composed of actuating motors and elastic links or with several joints. For the control of artificial muscles, the actuating motors take on the functions of the motoneurons, which naturally control the muscle fibers, thus creating a motor unit [2]. The motor unit of an artificial muscle has two phases, one of contraction and one of relaxation. In order to steer the arm in the direction of contraction, the motor units must operate in tandem, so that when one contracts, the other relaxes. By combining the movements of two neighboring actuating motors (left-up, left-down, right-up and right-down), the contraction replaces the movement of a “muscle” that would make a diagonal movement.
2 State of the Art The first steps to control this complex type of robot were taken by Ivo Boblan and the team, in 2004, when they created the ZAR3 robotic arm [4]. The control of the movement of the humerus in the shoulder is done by cables operated with three pairs of actuator motors for each axis. Jäntsch, Wittmeier and Knoll [1] present in 2010 the anthropomimetic robotic arm ECCEROBOT, which tries to imitate the structure and functionality of a human arm, using 11 actuating motors and elastic strings for artificial muscles. Later, Marques, Jäntsch, Wittmeier, Holland, Alessandro, Diamond, Lungarella and Knight [5], created the following anthropomimetic robot ECCE1, this time with chest and two arms, with 43 actuators for the motor units, with sensors for the interaction with the external environment, and the corresponding control systems mounted in the chest. Dehez and Sapin present the ShouldeRO exoskeleton arm, composed of 3 Cardan-type connectors, for displacement on the three planes, sagittal, horizontal and frontal [6].
3 Imitation and Abstraction The imitation and abstraction of the forms and functionalities of the natural structures consists in taking over the model of the natural body shape, which is to be copied, and the functions of the component parts, and then adapting them to an electromechanical model. Imitation of the form as a whole, taking over the component elements, of their form, number and order in the natural organism. Imitating the functionality of the elements that produce the movement, of the muscles, for example, which have effect only at the time of their contraction, bind to bones at certain points, at certain distances from the ends of the bones, and together with the joints form levers, to use minimum energy for maximum efficiency. From the total number of muscles that work the shoulder, we have chosen only four muscles, paired two by two, connected by the shoulder blade and humerus at the points where the natural muscles are connected by the respective bones (see Fig. 1).
Design and Control of a Biologically Inspired Shoulder Joint
767
Fig. 1. The main shoulder muscles a) abstract model, b) natural model - a1 (Deltoid muscle), b3 (Infraspinatus muscle), c5 (Teres Major muscle), d7 (Subscapularis muscle): Proximal attachment - the origin of the muscles on the shoulder blade (positioning of actuator motors); 2 (Deltoid tuberosity), 4 (Greater tubercle), 6 (Crest of lesser tubercle), 8 (Lesser tubercle): Distal attachment - muscle insertions (contact of tendons with different areas on the head of the humerus and the humerus).
The robot arm can be controlled by an operator using a controller [7], a joystick for example, with the help of a neural headset [8], with EMG signals [9] or with an augmented reality device, which has implemented eye-tracking technology [10], in case where the arm has to move where the operator looks [11]. Deltoid muscle, positioned at the top, paires with the lower muscle, Teres Major. On the right is placed the Infraspinatus muscle, paired with the one on the left, the Subscapularis muscle. Because Deltoid muscle has the main role of supporting the weight of the arm, the actuator motor that controls its movements, is chosen accordingly, in size and power, compared to the other actuator motors of the other muscles. On the movable part, by the equivalent of the Humerus, are attached the sockets of the arm. To these sockets are connected the spherical ends of the flexible links, the equivalents of the tendons. These sockets are attached by Humerus from an angle of 30°. Between the sockets of the arm and the shoulder blade are the two contact elements, the head of the humerus, abstracted in the form of a ball pivot and the glenoid cavity, in the form of a hemisphere (see Fig. 2). The sphere of the pivot is considered in contact with the hemisphere as long as the center of the sphere of the pivot coincides with the center of the hemisphere. The extremities of the range in which the neck of the pivot can move, until it reaches the hemisphere edge, form an angle of 130°.
768
M. L. Olar et al.
Fig. 2. Shoulder joint - the sphere pivot represents the head of the Humerus, and the hemispherical socket represents the Glenoid cavity on the shoulder blade.
4 Modeling and Simulation The proposed model of the shoulder joint mimics the natural model of the human shoulder, in that it has the supporting element, the shoulder blade, the actuators, the main muscles of the shoulder, connected at the same points, on the shoulder blade and on the humerus, as the natural muscles and the humerus, as a movable element. These muscles work in tandem, two by two, on their contraction, so when a muscle contracts, its pair relaxes. In the presented model the rods actuated by the actuator motors represent the shoulder muscles, and the Cardan type links, for 360° mobility, together with the contact sphere represent the muscle tendon (see Fig. 3 a and b).
Design and Control of a Biologically Inspired Shoulder Joint
769
Fig. 3. Muscles and tendons of the shoulder, a. The back and b. The front
When the muscles will contract, i.e. the rods bolted by the actuator motors will pull the arm sockets towards the back, the sphere of the pivot will slide inside the hemisphere, leading the Humerus in the direction of the contracted driver muscle. The actuator motors are continuously tensioned, so that the muscles will continuously pull on the sockets of the humerus, maintaining the integrity of the wrist, tensioning the joint. When it is desired to move the arm in a certain direction, one or two muscles will be used as conducting muscles, because they will lead the arm in that direction. For example, if one wants to move upper right the arm, the conductive muscles will be Infraspinatus and Deltoid, and Subscapularis and Teres Major will be the assistant muscles. The movement of the conductive muscles is a synergistic movement, because the two muscles contract simultaneously, with the same tensions, leading the arm in the middle area between the two muscles. When only one muscle is conductive, and its pair
770
M. L. Olar et al.
is assistant, the movement of the two is antagonistic, for one contracts and the other relaxes. The arm will always move to the muscle part of the conductor (see Table 1).
Table 1. Control and actuation elements Muscles and their role Up on the shoulder blade Deltoid ∙ as size and power is twice as large as the other actuators ∙ pick up ∙ tension
∙ in synergy with the left or right actuators ∙ antagonist with the lower one
(4 actuator motors Down on the shoulder blade Teres major ∙ tension ∙ controls the lowering of the arm
∙ in synergy with the left or right actuators ∙ antagonistic to the above
+ 4 flexible links) Left (back) on the shoulder blade Infraspinatus ∙ leads the arm to the left ∙ conductor, together with the upper one, on the left-top diagonal ∙ conductor, together with the lower one, on the left-down diagonal ∙ tension ∙ in synergy with the top or bottom actuators ∙ antagonist with the one on the left
Right (front) on the shoulder blade Subscapularis ∙ leads the arm to the right ∙ conductor, along with the upper one, on the right-up diagonal ∙ conductor, together with the lower one, on the right-down diagonal ∙ tension ∙ in synergy with the top or bottom actuators ∙ antagonist with the one on the right
To simulate the shoulder joint design, the 3D model of the system was first imported into MatLab-Simulink-Simscape. Figure 4a shows the resulting MatLab model. In this there can be seen the joints and the elements. For actuating the robotic system, actuators are added in each joint and a driving algorithm is designed that takes into account the constructive mechanical constraints imposed on the arm design. Figure 4b shows the result of the simulation of the robotic arm, more precisely the horizontal stretched arm position. In order to bring the arm to any other position in its working space, the movement of the joints by corresponding motors is achieved. The shoulder blade, in addition to the four muscles with which it controls the movements of the Humerus, also supports two other actuating motors, left and right, the correspondents of the Biceps muscle (see Fig. 5), with which the forearm can be controlled (see Table 2).
Design and Control of a Biologically Inspired Shoulder Joint
771
Fig. 4. a. Simulation diagram implemented in Matlab; b. The result of the simulation, with the arm stretched horizontally.
772
M. L. Olar et al.
Fig. 5. The actuator motor for the control of the Biceps muscle, mounted on the shoulder blade, controls the movements of the forearm. Table 2. Control and actuation elements Shoulder blade ∙ it has the abstracted form of the human shoulder blade ∙ supports 6 actuator motors (1 up, 1 down, 1 + 1 left, 1 + 1 right; 1 actuator on the left represents the Biceps that connects to Radius, 1 actuator on the right represents the Biceps that connects to Ulna) ∙ supports the hemisphere, the shoulder joint (Glenoid cavity (Glenoid fossa)) Humerus ∙ tube type, with triangular section, with one face facing up ∙ supports the pivot that has a sphere in the part from the shoulder blade (the head of the Humerus) ∙ the sphere of the pivot slides inside the hemisphere of the shoulder blade ∙ supports the arm sockets, in the form of ✚, the place where the flexible connections are fixed (the abstracted form of the elevations of the upper part of the Humerus)
Muscle Biceps in the natural form is a muscle on two joints, indirectly helping to raise the arm, its main role is to raise the forearm. In the part from the shoulder the muscle is divided in two, the short end, which is connected by the tip of the Coracoid Process, and the long end, connected on the supraglenoid tubercle of the shoulder blade. Because the two tendons are positioned front-to-back against the shoulder wrist, we chose, for the control of the Biceps muscle, to position the actuating motors, one on the sides of the shoulder blade. This is part of the further development of the project when the elbow will be built. Naturally, the Biceps muscle is connected to Radius, and to Ulna the Brachialis muscle. In this project, the Biceps muscle will consist of two flexible cross-links, meaning that the right motor link will be linked to the left forearm (Ulna side), and the
Design and Control of a Biologically Inspired Shoulder Joint
773
left motor link will be linked to the right forearm. (Radius part). This will lead to the possibility of rotating the forearm with 30° to the left and right, which will be made in one piece, not two, as in the natural implementation.
5 Conclusions Reducing the number of actuator motors needed to control the artificial muscles for moving the elements leads to reduced electricity consumption, weight reduction, the driving algorithm is simpler. The disadvantage is that through simpler movements, much of the fluidity of the movement is lost and also the arm’s degree of maneuverability is drastically reduced. Further, the project will have three directions of development. For the first one, will be replaced the cardan type links with flexible strings and moving the connection point of the tendon (distal attachment) with Humerus (Deltoid tuberosity), for increasing the amplitude of the humerus movement on the X axis (in the support attached to the head of the Humerus, an oval hole will be created, equivalent to the natural-anatomical area, Intertubercular groove). For the second, will be implemented the same type of joint (with hemispherical socket and the pivot with sphere), with the restrictions characteristic of the natural joint, for elbow, wrist and fingers rotation movement on the X, Y and Z axis (see Table 3).
Table 3. The movement of the elements in the joints Joint elbow wrist fingers
X 120° 120° 90° 3
Y 90° 0° 0°
Z 0° 60° up to 30°
In the third direction of development, will be transformed this robotized arm into an exoskeleton type, which will be weared on a human arm. In this case, the elements of the exoskeleton will be like an armor over the arm, where the socket with the hemisphere and the pivot with sphere will be the natural joints of the human arm.
References 1. Jäntsch, M., Wittmeier, S., Knoll, A.: Distributed control for an anthropomimetic robot. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, pp. 5466–5471 (2010). https://doi.org/10.1109/iros.2010.5651169 2. Holland, O., Knight, R.: The anthropomimetic principle. In: Proceedings of the AISB 2006 Symposium on Biologically Inspired Robotics. pp. 1–8 (2006) 3. Prilutsky, B.: Coordination of two-and one-joint muscles: functional consequences and implications for motor control. Mot. Control 4(1), 1–44 (2000)
774
M. L. Olar et al.
4. Boblan, I., Bannasch, R., Schwenk, H., Prietzel, F., Miertsch, L., Schulz, A.: A human-like robot hand and arm with fluidic muscles: biologically inspired construction and functionality. In: Embodied Artificial Intelligence, pp. 160–179. Springer, Heidelberg (2004) 5. Marques, H.G., Jäntsch, M., Wittmeier, S., Holland, O., Alessandro, C., Diamond, A., Lungarella, M., Knight, R.: ECCE1: the first of a series of anthropomimetic musculoskeletal upper torsos. In: 2010 10th IEEE-RAS International Conference on Humanoid Robots, pp. 391–396. IEEE, December 2010 6. Dehez, B., Sapin, J.: ShouldeRO, an alignment-free two-DOF rehabilitation robot for the shoulder complex. In: 2011 IEEE International Conference on Rehabilitation Robotics, pp. 1–8. IEEE, June 2011 7. Panaite, A.F., Rişteiu, M.N., Olar, M.L., Leba, M., Ionica, A.: Hand rehabilitation- a gaming experience. In: IOP Conference Series: Materials Science and Engineering, vol. 572 (2019) 8. Rosca, S.D., Leba, M.: Using brain-computer-interface for robot arm control. In: MATEC Web of Conferences, vol. 121, p. 08006. EDP Sciences (2017) 9. Risteiu, M., Leba, M., Arad, A.: Exoskeleton for improving quality of life for low mobility persons. Qual.- Access Success 20, 341–346 (2019) 10. Olar, M.L., Risteiu, M.N., Leba, M.: Interfaces used for smartglass devices into the augmented reality projects. In: MATEC Web of Conferences, vol. 290, p. 01010. EDP Sciences (2019) 11. Negru, N., Leba, M., Rosca, S., Marica, L., Ionica, A.: A new approach on 3D scanningprinting technologies with medical applications. In: IOP Conference Series: Materials Science and Engineering, vol. 572 (2019)
Modelling and Simulation of 3D Human Arm Prosthesis Sebastian Daniel Rosca , Monica Leba(&) and Arun Fabian Panaite
,
University of Petrosani, 332006 Petrosani, Romania [email protected], [email protected], [email protected]
Abstract. Arising out of military conflicts, defects acquired by birth, the emergence of diseases at any age, but especially the increasing incidence of trauma caused by accidents both work and traffic, a growing number of people remain without parts of the upper limbs. So, necessity of find suitable solution for the people who suffered amputation in order to improve their quality of life must became a priority for the researchers for many domains as doctors, engineers and psychologists. In this paper we propose a 3D modeled solution developed with the support of a computer aided design software that is able to mimic the human arm anatomy being therefore easy to product and suitable to combine the existent and future technology from the market, in order to replace the solutions that are either expensive, uncomfortable or have reduced functionality. In order to validate the model proposed we develop the mathematical model of the human arm and we simulate the entire functionality of the model. Keywords: Kinematic model
Degrees of freedom Prosthesis
1 Introduction Over the last 10 years, 3D design environments have experienced rapid growth among student communities, 3D design enthusiasts, artists and engineers. This was made possible thanks to the free access of the general public to computer-aided design (CAD) tools that offer beside basic design and modeling capabilities support for assembly, application of constraints between 3D objects but also the most important, the directly export in 3D printable format. Combining these advantages, many studies have focused on prototyping 3D devices that overcome the shortcomings offered by classical prostheses and that can easily fit to human arm, especially due to the fact that the amputation of a limb can produce both physical and psychological repercussions with devastating effects on a patient’s quality of life [1]. So, replacing a member of the human body can be quite a complicated task, and it is necessary to reproduce as close as possible the reality of the human part taking into account the worldwide increasing number of situations that require upper limbs amputees. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 775–785, 2020. https://doi.org/10.1007/978-3-030-45691-7_73
776
S. D. Rosca et al.
A pilot study conducted over 2.5 decades in the Kingdom of Saudi Arabia showed that the most frequent amputations were made for road accident victims with an incidence of 86.9% on upper limbs [2]. Another cause of amputees of upper extremity at worldwide level is represented by dysvascular disease and cancer and the prediction for 2020 regarding the amputations of the upper limbs is 2.2 million only at the United Stated level [3]. Over time, scientists have been trying to replace lost limbs with classical humanmade devices. The first occurrences of prostheses are considered as passive devices, which did not offer multiple possibilities for their control and movement, having more an aesthetic role than a functional one. On the market or still in the research phase are several varying categories of prosthetic devices that are generally grouped by the mode in which the device is controlled into: passive prostheses, mechanical/body control prostheses, myoelectric controlled prostheses, direct brain interface. 1.1
Passive Prosthetics
Passive prosthetics are devices that aim to restore the original form of the amputated part and they have been classified in pure cosmetic and functional such as those related to sport and work [4]. This type of prosthetics is suitable for only a small group of patients especially for those who have suffered an amputation at hand level because the force that is required to adjust the grasping mechanism must be externally applied [5]. 1.2
Mechanical Prosthetics
The mechanical prostheses are controlled with the help of the body. Generally, they are simple devices, such as a mechanical hook linked to the movement of the elbow or shoulder. Even though they are relatively simple devices, this is the most popular type of prosthesis today [6]. 1.3
Myoelectric Prosthetics
Compared to mechanical prostheses, the myoelectrical types present constructive advantages like: the use of motors and batteries as actuation means, the better resemblance with anatomical hand and the elimination of the necessity to use heavy belts or harnesses present on mechanical prosthesis that be-sides being visible can destroy the user clothes [6]. These types of prostheses are specially designed to fit and be attached to the remaining limb. It offers the advantage that, once attached, the prosthesis uses electronic electromyography (EMG) sensors for muscle activity, resulting in a movement as close as possible to the natural one, the user being able to control the strength and speed of the movements [7].
Modelling and Simulation of 3D Human Arm Prosthesis
1.4
777
Prosthetics Controlled Directly by the Brain
At present, brain-control is the most advanced control method in this area. A braincomputer interface (BCI), also known as the Brain-machine Interface (BMI), is a system that permit a person to control a calculator using only his thoughts [8]. The phenomena attributed on thoughts generates electrical activity in the nerve cells of the brain that can be detected in the form of impulses based on the frequency of brainwaves. In order to read brain signals, a chip can be implanted directly at the brain level or based on a non-invasive solution through a series of electrodes that can be placed at the scalp level [9]. This technology offers the benefit of targeting a wide range of users, including people who suffer from amputation but more importantly for people with severe impairment for whom the BCI system is the only solution to control the movement of their muscles, limbs and prostheses [10].
2 Related Work Customized 3D modeled assistive devices are both expensive from the engineering point of view, but they are required to perform many tasks. For the end user, considering that the deficiencies are unique, the prostheses should be treated modularly and customized, according to the missing anatomical structure. The performance of a prosthesis is evaluated in the literature by the number of movements (out of 33 possible grasps) that the robotic hand can perform, related to the daily tasks of human hand achievements [11]. 2.1
3D Model Design of Body-Controlled Prostheses
This type of prosthesis is used mainly by people with congenital or acquired amputation, offering them a transitional device that allowing to perform bi-manual and unilateral activities with functional grasping. The body-powered and manually adjustable 3D upper extremity device is presented in Fig. 1.
Fig. 1. 3D model design of upper extremity prosthetic device [12]
778
S. D. Rosca et al.
This 3D design model as shown in Fig. 1 in sections (a), (b) based on protection and retraction movements from the residual shoulder allow movements of flexion and extension with an angle between 0° to 110° for shoulder joint and respectively 0° to 90° for the elbow joint. The internal and external rotation of shoulder (0° to 45°) can be adjusted and locked through a spring-loaded lever placed in proximity of elbow and shoulder joints presented in Fig. 1(a). The shoulder and elbow adjustments can be made every 22° for flexion and extension and 15° for internal and external shoulder rotations. Related to the joint of the wrist, this allows for full pronation and supination movements between 0 to 90 based on a rotation mechanism presented in Fig. 1(b) and (c) compound by an inner circular disc/shaft together with a circle of embedded magnets with matching polarity placed around the disc [12].
3 Anatomical Approach on the Human Arm Regarding the anatomical structure, the human arm is able to offer 7 degrees of freedom through the shoulder, elbow and wrists joints that form a kinematic chain. 3.1
Shoulder Joint
The human shoulder is composed of three bones: the clavicle, the scapula and humerus with four independent joints presented in Fig. 2. The first joint represented by the sternoclavicular links the clavicle to thorax, the second joint represented by acromioclavicular links the scapula to the clavicle, the third joint represented by scapulothoracic defines the motion of scapula over the thorax and the last one represented by the glenohumeral links the humerus to the scapula [13].
Fig. 2. Structure of the shoulder joint [12]
The glenohumeral joint provides a very wide range of movements, especially due to the unstable bone structure: abduction motion with an angle between 150–180°, flexion movements with a maximum angle of 180°, extension with an angle between 45–60° and an external rotation with an angle of 90° [14].
Modelling and Simulation of 3D Human Arm Prosthesis
3.2
779
Elbow Joint
In the term of anatomical point of view two more bones are connected to the humerus: the ulna bones together with radius as presented in Fig. 3.
Fig. 3. Structure of the elbow joint [15]
It allows two degrees of freedom based on flexion-extension movements and supination-pronation [16]. The normal range of movements of the elbow is around 140° of flexion and 10° of extension. In terms of functional range of movements for daily activities the elbow suitable range of motion is between 30–130°. The normal range of motion for supination and pronation is around 90°. It also offers a limited lateral and medial angle of movement between the abduction and adduction movements [17]. 3.3
Wrist Joint
It allows two degrees of freedom based on five types of movements: flexion, extension, abduction, adduction and circumduction. The most important movements are those of flexion and extension both with an angle of movement between 0–60° [14]. From anatomical point of view the wrist and hand contain many bones and joints to connect the forearm to hand. The wrist is connected to forearm through the proximal bones radius and ulna as presented in Fig. 4. It consists of 8 carpal bones arranged in two rows, four are more proximal: scaphoid, lunate, triquetral, pisiform and four are more distal: trapezium, trapezoid, capitate and hamate [15].
Fig. 4. Structure of the wrist and [14]
780
S. D. Rosca et al.
4 Human Arm Kinematic Model Based on anatomical form of the human arm the model can be reduced to the simplified form represented by the kinematic chain that contains three elements connected through three kinematic joints: a spherical one for the shoulder, a cylindrical one for the elbow and another cylindrical joint for the wrist as presented in the Fig. 8.
Fig. 5. Human upper-limb kinematic chain
The degrees of freedom of the human arm is determined by the maximum number of movements that can be achieved by each kinematic joint [14]. So, the human upperlimb is designed to allow seven degrees of freedom for the arm and sixteen degrees of freedom for the hand. Based on the Denavit - Hartenberg (DH) formalism presented in the bellow relationship we determined the direct kinematic model, a first step in the modelling of the 3D human arm prosthetic. 2
Ti1;i
coshi 6 sinhi ¼6 4 0 0
sinhi cosai coshi cosai sinai 0
sinhi sinai coshi sinai cosai 0
3 ai coshi ai sinhi 7 7 di 5 1
ð1Þ
The DH parameters table presented bellow results from the chosen reference system presented in Fig. 5 based on which we determined the movement matrix for each finger (Table 1).
Modelling and Simulation of 3D Human Arm Prosthesis
781
Table 1. Denavit - Hartenberg parameters Elem. Coord. ai ai 1. 0 p=2 2. 0 p=2 3. 0 p=2 4. 0 p=2 5. 0 p=2 6. 0 p=2 7. d3 0 8. 0 p=2 9. d4 0 10. d5 0 11. d6 0
di 0 0 d1 0 d2 0 0 0 0 0 0
hi h1 p=2 þ h2 p=2 þ h3 h4 h5 p=2 þ h6 p=2 þ h7 p=2 þ h8 hdig p=2 þ h9 hdig h10 hdig h11 hdig
5 The 3D Prosthetic Arm Design In accordance with the anatomical rules of the human arm we designed our model of prosthetics aimed to replace amputated right arm of the patient using SolidWorks computer aided design software, an innovative and powerful software that allows developers to project, assembly and analyze their model using 2D sketches as support. The 3D model of prosthetics as presented in Fig. 6 consists of three joints that ensure six degrees of freedom one less than human arm but we designed it in a manner that mimic the real movements of the human arm in the workspace without any limitation. The only difference between our model and the human arm from anatomical point of view is represented by suppressing the internal/external rotation of the elbow and assigning it to the wrist to simplify the actuation mechanism of the prosthetics.
782
S. D. Rosca et al.
Fig. 6. 3D Prosthetics design
The 3D model of the prosthetics was imported into MatLab - Simulink to validate the mathematical model. In Fig. 7 is presented the Matlab-Simulink model.
Fig. 7. Simulation model
Modelling and Simulation of 3D Human Arm Prosthesis
783
In Fig. 8 is presented the prosthetics position after moving with angles 30°, −30°, −90°, 30°, −15°,15°. That validated the model for each coupling based on the position of the final effector.
Fig. 8. Simulation result
6 Conclusions In this paper we propose a solution of a 3D modeled prosthesis that offers all the movements specific to the human arm and which is capable of replacing the classical passive and mechanical prostheses that have either a purely aesthetic role or a reduced functional role. Compared to the mechanical prostheses the prototype proposed by us offers the advantage of a low production cost and of the electric actuation which eliminates the need to use harnesses. This type of prosthesis is suitable for a wide range of people depending on the level of amputation of the upper limb. Thus, it can be controlled using a solution controlled direct by brain based on the BCI interface suitable in case of paralyzed persons or with complete amputation for which a solution based on electromyography is not suitable due to functional considerations. Also, a hybrid solution based on EEG-EMG is also suitable for a similar type of prosthesis because it eliminates the shortcomings of each of the two. Having a modular design can be adapted according to the degree of disability of the patient and can be printed from multitude of material (PLA, ABS or metal) on any kind of 3D printer including those in the hobby category with minimum costs compared to the other types of prostheses presented.
784
S. D. Rosca et al.
7 Results Compared directly with the model of the body-controlled prosthesis studied in Sect. 2., the 3D models prosthesis proposed by us it can reproduce a wide range of movements regarding the shoulder joint including the abduction movement at an angle between 0° to 90°. Regarding the joint of the wrist it can also reproduce movements on two axes of rotation in a cylindrical workspace, offering in addition the possibility of performing the flexion/extension movement at the same angle as in the case of the anatomical joint of the wrist (0° to 60°). The prosthetic hand also offers 16 degrees of freedom allowing individual movement of each joint with angles for little finger, ring finger, middle finger and index finger between 0 and 90°. The 6 degrees of freedom of the prosthesis are obtained based on the elementary rotation angles that determine the functional anatomical movements of the human arm, namely: the movements allowed by the shoulder joint through the angles: h1 which corresponds to addcution and abduction movements, h2 which corresponds to the flexion and extension movements, h3 corresponding to external and internal rotation movements; the movement allowed by the elbow joint: h4 corresponding to the flexion movement; the movements allowed by the closure: h5 which corresponds to the nonexistent pronation and supination movements in the functional anatomical model but used in the design of the prosthetic limbs to simplify the actuation. h6 that corresponds to the movements of flexion and extension of the wrist.
References 1. Larson, J.V., Kung, T.A., Cederna, P.S., Sears, E.D., Urbanchek, M.G., Langhals, N.B.: Clinical factors associated with replantation after traumatic major upper extremity amputation. Plast. Reconstr. Surg. 132(4), 911–919 (2013) 2. Mansuri, F.A., Al-Zalabani, A.H., Zalat, M.M., Qabshawi, R.I.: Road safety and road traffic accidents in Saudi Arabia: a systematic review of existing evidence. Saudi Med. J. 36(4), 418–424 (2015) 3. Wheaton, L.A.: Neurorehabilitation in upper limb amputation: understanding how neurophysiological changes can affect functional rehabilitation. J. Neuroeng. Rehabil. 14(1), 41 (2017) 4. Cordella, F., Ciancio, A.L., Sacchetti, R., Davalli, A., Cutti, A.G., Gugliemelli, E., Zollo, L.: Literature review on needs of upper limb prosthetics users. Front. Neurosci. 10, 209 (2016) 5. Matt, B., Smit, G., Plettenburg, D., Breedveld, P.: Passive prosthetic hand and tools: a literature review. Prosthet. Orthot. Int. 42(3), 66–74 (2018) 6. Carey, S.L., Lura, D.J., Highsmith, M.J.: Differences in myoelectric and body-powered upper-limb prostheses: systematic literature review. J. Rehabil. Res. Dev. 52(3), 247–263 (2015) 7. Van der Riet, D., Stopforth, R., Bright, G., Diegel, O.: The low-cost design of a 3D printed multi-fingered myoelectric prosthetic hand. In: Mechatronics: Principles, Technologies and Applications. pp. 85–117. Nova Publishers (2015) 8. Rosca, S.D., Leba, M.: Design of a brain-controlled video game based on a BCI system. In: MATEC Web of Conferences, 9th International Conference on Manufacturing Science and
Modelling and Simulation of 3D Human Arm Prosthesis
9. 10.
11. 12.
13.
14.
15. 16. 17.
785
Education – MSE 2019 “Trends in New Industrial Revolution”, vol. 290, p. 01019. EDP Sciences (2019) Hussein M.E.: 3D printed myoelectric prosthetic arm, B. Engineering (Mechatronics). Thesis (2014) Negru, N., Leba, M., Rosca, S., Marica, L., Ionica, A: A new approach on 3D scanningprinting technologies with medical applications. In: IOP Conference Series: Materials Science and Engineering, vol. 572, no. 1, p. 012049. IOP Publishing (2019) Jurišica, L., Duchoň, F., Dekan, M., Babinec, A., Sojka, A.: Concepts of teleoperated robots. Int. J. Robot. Autom. Technol. 5, 7–13 (2018) Zuniga, J.M., Carson, A.M., Peck, J.M., Kalina, T., Srivastava, R.M., Peck, K.: The development of a low-cost three-dimensional printed shoulder, arm, and hand prostheses for children. Prosthet. Orthot. Int. 41(2), 205–209 (2017) Niyetkaliyev, A.S., Hussain, S., Ghayesh, M.H., Alici, G.: Review on design and control aspects of robotic shoulder rehabilitation orthoses. IEEE Trans. Hum.-Mach. Syst. 46(6), 1134–1145 (2017) Risteiu, M.N., Rosca, S.D., Leba, M.: 3D modelling and simulation of human upper limb. In: IOP Conference Series: Materials Science and Engineering, vol. 572, no. 1, p. 012094. IOP Publishing (2019) Milner, C.E.: Functional Anatomy for Sport and Exercise: A Quick A-to-Z Reference. Routledge, Abingdon (2019) Rooker, J.C., Smith, J.R., Amirfeyz, R.: Anatomy, surgical approaches and biomechanics of the elbow. Orthop. Trauma 30(4), 283–290 (2016) Malagelada, F., Dalmau-Pasttor, M., Verga, J., Golanó, P.: Elbow anatomy. In: Doral, M., Karlsson, J. (eds.) Sports Injuries. Springer, Heidelberg (2015)
Ethics, Computers and Security
On the Assessment of Compliance with the Requirements of Regulatory Documents to Ensure Information Security Natalia Miloslavskaya1(&) and Svetlana Tolstaya2 1
The National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), 31 Kashirskoye shosse, Moscow, Russia [email protected] 2 The Bank of Russia, 12 Neglinnaya Street, Moscow, Russia [email protected]
Abstract. Examples of different types of assessments are all around us, providing our assurance that the goods we use won’t harm us, that the system components will work correctly, that services are being delivered consistently, that manufacturers are effectively managing the impact of their activities on health, safety, and the environment, etc. One of the essential forms of assessment is a compliance assessment designed to check how the requirements of regulatory documents to ensure information security (IS) are fulfilled or not on the assessment object, for example, a product, process, system, or service. This short paper discusses work-in-progress results as a part of research aimed at determining the ways of possible improvement, unification and greater formalization of an objective assessment of compliance with the mandatory requirements of regulatory documents on ensuring IS for the selected assessment objects based on the development of recommendations for applying a risk-based approach. Keywords: Requirements of regulatory documents Assessment Compliance Conformance Conformity Compliance assessment State control (supervision) Information security Risks Non-compliance risk
1 Introduction In the modern world of fierce competition for business success and to increase their confidence in this business, suppliers of goods and services, companies of various fields of activity, subordination and size have to constantly comply with the numerous applicable requirements of supervisory authorities, accreditation and certification bodies, national and international laws, etc., conform with the recommendations of national and industrial standards and specifications, as well as the expectations of customers and other stakeholders. In the constantly changing regulatory environment, companies should permanently conduct assessments to identify different types of organizational risks. For example, they should conduct risk assessments to identify the strategic, financial, operational, etc. risks, to which the company is exposed. These risk assessments are typically © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 789–795, 2020. https://doi.org/10.1007/978-3-030-45691-7_74
790
N. Miloslavskaya and S. Tolstaya
included, among other things, the identification of significant legal or regulatory noncompliance risks that generally require a more focused approach. To obtain an objective picture of such compliance and conformance and to inform the public about its results, a regular assessment with specific requirements should be carried out. The complexity of the risk landscape and the penalties for non-compliance make it essential for companies to conduct thorough assessments. The systems, for which assessments are most often carried out are, for example, quality management systems with the requirements formulated in the ISO 9000 standard family [1]. Currently, conformity and compliance assessment carried out by various companies accredited for these purposes is most often know-how and, to some extent, an art that directly depends on the experience and knowledge of the experts conducting it. Special requirements for accredited experts are defined in such a broad field as ensuring information security (IS). An example of the requirements for IS auditors’ experience and qualifications is given in the ISO/IEC 27007 international standard [2]. In particular, IS management system (ISMS) auditors must have knowledge and skills in the following areas of ensuring IS: • The main organizational, legal and contractual context for the operation of the ISMS being evaluated (for example, laws, information technology, changing processes and business relationships, typical information assets to be protected); • IS management methods, including terminology, management principles and IS risk management methods; • IS ensuring methods for the assessment object, based on the real situation (for example, methods of physical and logical access control, protection against malware, vulnerability management, updates and configurations of network protection tools, etc.); • Existing IS threats, vulnerabilities, controls for the ISMS being evaluated, and much more. Therefore, it is necessary to conduct research whose goal would be to determine the ways of possible improvement, unification and greater formalization of an objective assessment of compliance with the mandatory requirements of regulatory documents on ensuring IS for the selected assessment objects based on the development of recommendations for applying a risk-based approach. It should be noted that discussing and introducing tools for assessing compliance and related issues is out of the paper’s scope. A research object is an assessment of compliance with the mandatory requirements of regulatory documents on ensuring IS for the selected assessment objects. A research subject is a risk-based approach in assessing compliance with the mandatory requirements of regulatory documents on ensuring IS information security for the selected assessment objects. The goal of this work-in-progress paper is to present the results of the analysis of the research background. Section 2 defines the difference between compliance and conformance terms. Section 3 lists existing IS conformance and compliance requirements, which can be applied to the research area. In Sect. 4, the application of a risk-based approach while conducting a compliance assessment is shortly discussed. In conclusion, based on the results obtained, the tasks to be solved in achieving the stated goal are outlined.
On the Assessment of Compliance with the Requirements
791
2 The Terminology of Compliance and Conformance Assessment In English-language literature, there are two main terms related to the topic under consideration: Conformance and Compliance. These terms underline the difference between legal and regulatory requirements established by governments or other authorities and mandatory to be complied with and voluntary elements outlined in the standards recommended to be conformed with. In the second case, you can realize that there are various methods, which can be used to force such requirements to be met. Conformance is a voluntary act and the choice to do something in a recognized way following a standard, rules, specification, requirement, design, process, or practice of one’s own free will. Compliance means compulsory enforcement of laws, rules, regulations, processes, and practices. Therefore, non-compliance can result in fines, while non-conformance means that certification may not be provided, or it may be suspended or revoked if it has been granted. So, in ISO/IEC 17000 on conformity assessment [3], the term “compliance” is used in English to distinguish the action of doing what is required (e.g., an organization complies by making something compliant or by fulfilling a regulatory requirement). In [3], the conformity assessment refers to a demonstration that the specified requirements relating to a product, process, system, person, or body are fulfilled. Further, these entities are called a conformity assessment object. A specified requirement refers to a need or expectation that is stated. The specified requirements may be stated in normative documents such as regulations, standards, and technical specifications. Besides, the Compliance assessment process is usually undertaken with the explicit goal of detecting non-compliance, deficiencies, process gaps, shortages, etc. In the case of Conformance, the process begins with the assumption that the assessment object is conformed to the standard, and the goal of the conformance assessment body is to evaluate the evidence and establish whether its belief in the conformance is correct. Conformity means the result of conformance, while the result of Compliance is also referred to as Compliance. The proof of these arguments is given, for example, by B. Metivier in [4]: “A compliance assessment is really a gap assessment. You are looking to identify gaps between your existing control environment and what is required. It is not a risk assessment, and identified gaps may or may not correlate to risk exposure. Of course, if you’re not meeting the legal requirements, then you will have some non-compliance risk exposure. However, a risk rating is not typically included in a compliance assessment. It’s a different purpose and a different process”. In ISO/IEC 17000 [3], it is stated that the conformity assessment includes activities such as testing, inspection, and certification, as well as the accreditation of conformity assessment bodies. In a more general case, including compliance, the following forms of assessment are recognized: conformance/compliance approval, accreditation, certification, testing, inspection control (inspection), state control (supervision), acceptance, and commissioning of an object, and some others. Summarizing numerous sources on the topic, we define the key terms as follows:
792
N. Miloslavskaya and S. Tolstaya
• Conformance/compliance approval—a special case of assessment, the result of which is documentary evidence (statement) that the assessment object meets the established requirements; • Accreditation – a formal demonstration and an official recognition (evidence) of the competence of the certification body, control body, and testing laboratory to carry out specific assessment tasks based on confirmation of their compliance with established requirements; • Certification – a procedure, by which a third party certifies in writing that the assessment object (product, process, system, service, etc.) meets specified requirements; wherein a third party is a person or body recognized as independent from the parties involved in the matter under consideration, and the parties involved are manufacturers, sellers, performers, consumers or other entities representing their interests; • Testing – one of the processes (sometimes mandatory) in the assessment, which is carried out based on an agreement between the testing laboratory and the assessment object; • Inspection control (inspection) – systematic monitoring of the assessment object (e.g., a product design, product, process, installation, persons, facilities, technology, methodology) as the basis for maintaining the validity of the statement of conformance/compliance (in the form of a declaration or certificate of conformance/compliance); • State control (supervision) – a compliance assessment procedure carried out by government bodies to monitor compliance of the assessment object with mandatory requirements, for example, of technical regulations. In such a short paper, we will not list the advantages of fulfilling all applicable mandatory and voluntary requirements, but only note another important factor, which is the need to manage all types of conformance and compliance within companies on an ongoing basis.
3 A List of the Applicable Information Security Conformance and Compliance Requirements In the framework of our research, a list of a few of the applicable current national and international state and industry regulatory documents and cybersecurity conformance and compliance requirements was made (in the alphabetic order): • Bank of Russia Standards and Recommendations for Standardization (STO BR IBBS and RS BR IBBS), which are an interrelated set of documents on Bank of Russia standardization “Maintenance of Information Security of the Russian Banking System Organizations” [https://www.cbr.ru/eng/analytics/security/]; • Command Cyber Readiness Inspection (CCRI), which is aimed at improving security of the US Department of Defense (DoD) Information Network and is conducted by Defense Information Systems Agency [https://public.cyber.mil/]; • DoD Information Assurance Certification and Accreditation Process (DIACAP), which is a systemic process that provides the certification and accreditation of information systems used within the US DoD [https://public.cyber.mil/];
On the Assessment of Compliance with the Requirements
793
• Federal Information Security Management Act (FISMA), which requires US federal agencies to implement information security plans to protect sensitive data [https:// www.dhs.gov/cisa/federal-information-security-modernization-act]; • Federal Risk and Authorization Management Program (FedRAMP), which is a US Government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud-based services [https://fedramp.gov]; • General Data Protection Regulation (GDPR), which is intended to harmonize data privacy laws across Europe [https://eugdpr.org/]; • Gramm-Leach-Bliley Act or the Financial Modernization Act of 1999 (GLB Act or GLBA), which is a US federal law that requires financial institutions to explain how they share and protect their customers’ private information [https://epic.org/privacy/ glba/]; • Health Insurance Portability and Accountability Act of 1996 (HIPAA), which is a US federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge [https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/ index.html]; • Industrial Information Security Standards of the Open Joint Stock Company GAZPROM (STO GAZPROM), which determine single requirements for the OJSC information security maintenance systems [https://www.gazprom.com/]; • Information Technology Infrastructure Library (ITIL), containing volumes describing a framework of best practices for delivering IT services [https://www. itlibrary.org/]; • International ISO/IEC Standards 27000/1/2 from 27000 series on Information Security Management [http://www.iso.org/]; • National Information Assurance Certification and Accreditation Process (NIACAP), which establishes a standard US national process, set of activities, general tasks, and a management structure to certify and accredit systems that will maintain the information assurance and security posture of a system or site [https://www.hsdl. org/?abstract&did=18454]; • Payment Card Industry Data Security Standard (PCI DSS), which is a set of security standards for any business that processes credit or debit card transactions, formed in 2004 by Visa, MasterCard, Discover Financial Services, JCB International and American Express [https://www.pcisecuritystandards.org/pci_security/]; • Standard for Exchange of Nonclinical Data of the Clinical Data Interchange Standards Consortium (CDISC-SEND), which is an implementation Standard Data Tabulation Model for nonclinical studies, specifying a way to present nonclinical data in a consistent format [https://www.cdisc.org/standards/foundational/send]; • Title 21 (21 CFR Part 11), which is the part of the Code of Federal Regulations (CFR) that establishes the US Food and Drug Administration regulations on electronic records and electronic signatures [https://www.fda.gov/regulatoryinformation/search-fda-guidance-documents/part-11-electronic-records-electronicsignatures-scope-and-application]; and others.
794
N. Miloslavskaya and S. Tolstaya
Mandatory compliance or voluntary conformance with the requirements of the given and similar documents is determined by the type of activity and departmental subordination of the company to which they are applied, as well as the purpose of the assessment. Achieving IS requirements compliance for the assessment object enables consistent, comparable, and repeatable evaluations of its IS controls applied, promotes a better understanding of object-wide mission risks, and creates complete, reliable, and trustworthy information, facilitating a more informed IS risk management decision.
4 Applying a Risk-Based Approach While Conducting a Compliance Assessment Based on the analysis of numerous related works (we intentionally do not provide references here, since this analysis requires a separate publication of its methodology and results that will be done in a due course), the main advantage of using risk-based approach while conducting a compliance assessment is to identify, prioritize and assign for treating existing or potential risks related to legal and regulatory non-compliance that could lead to fines and penalties, reputation damage or the inability of an assessment object to operate in a specific business area. The appropriate risk assessment will help assessment bodies understand the full range of the assessment object’s risk exposure, including the likelihood that a risk event may occur, the reasons it may occur, and the potential severity of its impact. Risk is usually expressed in terms of risk sources, potential events, their consequences and their likelihood [5]. Because the array and types of potential risks while conducting a compliance assessment will be very complex, a robust risk assessment should employ both a framework and methodology. The framework organizes all potential risks expected into risk domains for applying further an effective risk mitigation strategy for each domain. The methodology should objectively assess these risks evaluating both quantitatively and qualitatively their likelihood and potential impact to develop an informed decision on the best applicable and efficient risk mitigation strategy. For example, companies must identify, limit, and monitor all non-compliance risks associated with their business activities. Their non-compliance risks can be driven by the company‘s specific activities, international positioning, regulatory status, size, etc. First of all, this requires a comprehensive risk-based compliance assessment, including an assessment of the existing compliance structure and processes, as well as potential risks that is already being implemented during this research.
5 Conclusion In the future, the research will be narrowed down to the area of the state control (supervision) over compliance with IS requirements at selected assessment objects, namely, organizations of the banking system of the Russian Federation. Therefore, in the framework of this research, the following definition of compliance assessment by an assessment object with the mandatory requirements of normative documents for
On the Assessment of Compliance with the Requirements
795
ensuring IS is introduced: a process and related activities for direct or indirect determination that mandatory requirements for ensuring IS applicable to the assessment object is met or not and to what extent. Despite the simplicity of these definitions, there are actually many facets, which make this process very complicated. The requirements of departmental documents STO BR IBBS and RS BR IBBS of the Bank of Russia, as well as ISO 27000 family standards, are initially selected as mandatory requirements for ensuring IS for the assessment object. Further, laws and regulations, with which the assessment object is required to comply in all jurisdictions where it conducts business, as well as its IS policies and their compliance with the legal and regulatory requirements, will be defined. Based on the briefly presented analysis results and using the decomposition method, it was determined that in order to achieve the goal set at the beginning of the research, it is necessary to solve the following tasks: 1) To substantiate the importance of improving and unifying the assessment of compliance with the mandatory requirements of the national and international regulatory documents on ensuring IS for selected assessment objects; 2) To analyze existing approaches to assess compliance with various requirements; 3) To analyze existing worldwide implementations of the risk-based approach, including when conducting a compliance assessment; 4) To develop recommendations on the application of a risk-based approach when conducting a compliance assessment for its improvement and unification. Further work in this area lies in the consequent solving of the tasks defined. While conducting the research, the listed tasks will be refined and detailed. Acknowledgment. This work was supported by the MEPhI Academic Excellence Project (agreement with the Ministry of Education and Science of the Russian Federation of August 27, 2013, project no. 02.a03.21.0005).
References 1. ISO 9000:2015 quality management systems—fundamentals and vocabulary (2015) 2. ISO/IEC 27007:2017 information technology—security techniques—guidelines for information security management systems auditing (2017) 3. ISO/IEC 17000:2004 conformity assessment—vocabulary and general principles (2004) 4. Metivier, B.: Cybersecurity compliance assessments: it’s all about interpretation (2017). https://www.sagedatasecurity.com/blog/cybersecurity-compliance-assessments-its-all-aboutinterpretation. Accessed 13 Oct 2019 5. ISO 31000:2018 risk management—guidelines (2018)
Iconified Representations of Privacy Policies: A GDPR Perspective Sander de Jong(B) and Dayana Spagnuelo Vrije Universiteit Amsterdam, Amsterdam, The Netherlands [email protected], [email protected]
Abstract. Privacy policies inform on personal data collection and processing practices, allowing people to make informed decisions about a given service. However, they are difficult to understand due to their length and use of legal terminology. To address this issue, regulatory bodies propose the use of graphical representations for privacy policies. This paper reviews the development of current graphical and iconified representations for privacy policies. We conduct a literature study on existing iconified libraries, we categorise them and compare these libraries with regard to the specifications from the European General Data Protection Regulation (GDPR). The results of this paper show that currently no iconified library fully satisfies the criteria specified in the GDPR. Our major contribution lays in the actionable insights offered to researchers, policymakers, and regulatory bodies in an effort to develop standardised graphic and iconified representations of privacy policies. Keywords: GDPR · Standardised icons Icons · Privacy policy
1
· Iconified representation ·
Introduction
Digital products and services have become ubiquitous. According to the International Telecommunication Union (ITU), a United Nations agency for information and communication technologies, by the end of 2018 approximately 51% of the global population was already using the Internet1 . As the number of users continues to grow, so does the volume of data collected and processed. However, users are often unfamiliar with these data collection and processing practices and are thus unaware as to what happens to their data [12]. Privacy policies are in place to notify users of such practices. However, those policies are often laden with legal terminology and descriptions of data processing practices. They have become large and complex bodies of text that most users refrain from reading [7,13]. To address these awareness problems, in the 1
International Telecommunication Union: ICT Statistics 2018.
c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 796–806, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_75
Iconified Representations of Privacy Policies: A GDPR Perspective
797
European Union, the General Data Protection Regulation (GDPR)2 emphasises that users (or data subjects, as described in the regulation) have to be informed about the intended processing operations and their purposes when they start interacting with a service in a concise, transparent, and intelligible manner. There have been developments on alternative representations of privacy policies, for instance, using graphical representations (see [5] for a survey of those). While these proposals yield interesting results, they failed to gain traction at their time. Nonetheless, recent developments by the GDPR offer a reason for their re-examination. More specifically, it suggests that the information on data collection and processing may be provided to data subjects in combination with standardised icons. Furthermore, it requires that if icons are to be used electronically, they should also be machine-readable. In this work, we examine current graphic and iconified libraries for privacy policies which potentially offer a level of compliance with the GDPR. This paper aims to contribute by clarifying: 1. the existing variations of iconified representations for policies; and 2. how they represent data protection concepts. By doing so, we collect evidence that allows us to evaluate to what extent the demands of the GDPR are achieved. In the following (Sect. 2), we present related works. Section 3 elaborates on the research method used for the selection of candidate libraries and determining the GDPR criteria. Section 4 categorises the obtained candidate libraries, and Sect. 5 systematically compares them and summarises the findings. Final considerations and future research are presented in Sect. 6.
2
Related Work
A few works reviewing on iconified representations of privacy policies were carried out in the past. We highlight here the study by Hansen [5]. The study selects independently developed approaches and compares them. Among the representations, different categories are distinguished, and each approach is briefly highlighted. The paper’s main contribution is the recommendations for increasing the widespread use of privacy icons, by applying developments at that time within the European data protection regulation. This study, however, was conducted before the reforms in the European data protection scenario, which led to the formulation of the GDPR. Following a similar direction in 2014, Edwards and Abel summarised the developments on iconified representations of privacy policies [2]. Their work highlights that the use of icons and labelling is considered sufficient for effectively communicating complex and lengthy privacy policies to consumers. The authors include in their study most of the approaches mentioned in the previous related work by Hansen with the addition of new developments that appeared between 2
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).
798
S. de Jong and D. Spagnuelo
the release of both papers. Their paper presents a newer rendition of the previous study on graphical representations of privacy policies, but only briefly discusses them. To the best of our knowledge, no recent works conduct a systematical comparison of such libraries.
3
Research Methodology
Our research methodology has two stages: the selection of candidate graphic and iconified libraries; and the definition of criteria for the systematic comparison. In what follows, we describe the methods we adopt for each stage. Candidate Libraries. We performed an initial search on privacy policy (policies) graphical icons on the indexes of Google Scholar and the Vrije Universiteit’s LibSearch3 index as they incorporate results from various scientific databases. Although this work focuses on the provisions defined in the GDPR, at the time of writing, few works have been developed concerning iconified libraries since GDPR’s enforcement, in 2018. For the sake of completeness, we decided to include works developed in a time-frame pre-GDPR as well. All works describing candidate libraries were checked for content relevance before being used for a snowballing phase. We did that by searching for the keyword “icons” within the body of the article and looking for visual representations of the icons (e.g., embedded images, or link to a repository of icons). References of the candidate libraries were traced in both directions. This helped to generate an overview of past approaches and also to discover other commercial and independent initiatives that did not occur in scientific databases. We decided to include those commercial and independently developed iconified libraries for their relevancy. Examples of such are the icon library by Mehldau [11], and the privacy icons designed during a Mozilla privacy workshop [14]. Although these libraries were not part of an academic project, they can provide insight as to whether proposals from industry experts can fit the GDPR criteria. We selected 10 libraries in total (see Table 1). They vary in terms of approach to the representation of concepts, as well as in the stage of development, ranging from prototypes to privacy software tools available to users as web browser plugins. Five libraries are developed in the context of the European data protection regulation: three developed before the GDPR drafting [4,11,15] one during its draft [3], and one after its approval [17]. The remaining five libraries have been developed outside of the European data protection regulation context [6,8,10, 14,18]. Comparison Criteria. The criteria used for systematically comparing the libraries are based on GDPR Art. 12(7): “The information to be provided to data subjects pursuant to Articles 13 and 14 may be provided in combination with standardised icons in order to give in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing. Where the icons 3
https://vu.on.worldcat.org/discovery.
Iconified Representations of Privacy Policies: A GDPR Perspective
799
Table 1. List of reviewed libraries. IDs were given by us, based on the titles. ID
Year Author(s)
Title
DPDI
2006 Rundle [15]
International Data Protection and Digital Identity Management Tools (Presentation)
DPD
2007 Mehldau [11]
Iconset for Data-Privacy Declarations v0.1
PCI
2009 Helton [6]
Privacy Commons Icon Set
NLP
2009 Kelley et al. [8]
A “Nutrition Label” for Privacy
PL
2010 Fischer-H¨ ubner, Zwingelberg [4]
UI Prototypes: Policy Administration and Presentation – Version 2 PrimeLife icon set
PI
2010 Raskin [14]
Privacy Icons: Alpha Release
COMP 2012 European Parliament [3]
Compromise amendments on Articles 1-29. COMP Article 1. 07.10.2013.
CT
2013 Lanner¨ o [10]
Fighting The Biggest Lie On The Internet: CommonTerms Beta Proposal
PC
2016 Zaeem et al. [18]
PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining
PG
2018 Tesfay et al. [17]
PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation
are presented electronically they shall be machine-readable.”4 . We emphasised key terms that have led to our criteria. The GDPR specifically mentions icons. We decided to adopt a broader interpretation of the word: a sign or representation that stands for what it signifies5 . For this reason, we include in our study alternative iconified representations, and check whether these could also satisfy the criteria from the GDPR. Art. 12 from the GDPR indicates that these icons must give an easily comprehensible overview of the intended processing mentioned in Art. 13 and 14. Finally, it also mentions that icons should be machine-readable. A key development in this area is the Platform for Privacy Preferences (P3P) developed in 2002 by the World Wide Web Consortium (W3C) [1]. This platform specifies a machine-readable format for online privacy policies, which allows web browsers to compare them with user-specified privacy settings [1]. However, due to the complex implementation strategy of the platform, it failed to gain traction and its development ceased in 2006 [9,16]. Nevertheless, each of the selected candidate libraries was also analysed for machine-readability. 4 5
GDPR, Article 12(7). Adapted from the definition in Lexico, by Oxford Dictionary.
800
S. de Jong and D. Spagnuelo
Table 2 shows the nine criteria we formulated. Apart from GDPR provisions, we also base our criteria on the summarised data protection aspects from the GDPR identified in the literature (i.e., [17]). Table 2. Data processing criteria adapted from GDPR and Tesfay et al. [17] Criteria
Description
Relevant art.
Personal Data (PD)
Personal data are processed
Art. 14(1) lit. d
Sensitive Data (SD)
Special category of data are processed
Art. 9, Art. 14(2) lit. d
Third Party Sharing/Collection (TPS)
How personal data might be Art. 13(1) lit. e shared with or collected by third parties
Data Security (DS)
Indicate whether and how data are protected or pseudonymised
Art. 32
Data Retention (DR)
The period for which data will be stored
Art. 13(2) lit. a
Processing Purpose: legal obligations (PP: LO)
Indicate whether personal data have to be collected, processed, and stored due to legal obligations
Art. 13(1) lit. c
Processing Purpose: Indicate whether the data are User tracking (PP: UT) used for user tracking
Art. 13(1) lit. c
Processing Purpose: Profiling (PP: PF)
Indicate whether personal data are used for profiling and automated decision-making
Art. 13(2) lit. f
Machine-readable (MR)
Icons are machine-readable
Art. 12(7) lit. 2
4
Iconified Libraries
After retrieving and analysing the candidate libraries, different categories of iconified representations have been identified. Each candidate library has been categorised along one of the following three categories. Standalone Icons. The majority of the candidate libraries belong to the category of standalone icons [3,4,6,10,11,14,15] These are icons developed to accompany existing privacy policies, either included at each element of the policy or functioning as a summary of the content of a policy as a separate entity. An example of such a library is the icon set proposed by Mehldau [11], with specific icons designed to illustrate each data category and collection purpose (see Fig. 1a).
Iconified Representations of Privacy Policies: A GDPR Perspective
801
Icons as Part of Tool. Another form of representation discovered during this study is the usage of privacy icons as part of a tool (plug-in or standalone software). Two candidates fit this category [17,18]. Both tools allow the user to interact and interpret a given privacy policy, and offer icons to ease its understanding (see Fig. 1b). PrivacyGuide [17] provides a high-level information summary of a website’s privacy policy that is requested through a URL. The privacy aspects and categories are represented by icons which are coloured according to three risk levels interpreted by the software. PrivacyCheck [18] is available to users as an online web browser plug-in and uses data mining methods to extract information from online privacy policies. The authors claim that PrivacyCheck overcomes limitations of other iconified representations that need manual annotation. Nutrition Labels. This design has been proposed by Kelley et al. [8] and is the only nutrition label based library found in this study. The authors take inspiration from nutrition, warning, and energy labelling within the United States. The motivation for choosing a nutrition label is supported with literature indicating
(a) Standalone Icons (image extracted from [11])
(b) Icons as part of Tool (image extracted from [17])
(c) Nutrition Label (image extracted from [8])
Fig. 1. Types of iconified libraries.
802
S. de Jong and D. Spagnuelo
that labels enable users to receive simplified information in a standardised format. Furthermore, the labels allow users to make easier comparisons between different items. Privacy Nutrition Label is depicted in Fig. 1c. The tool consists of a table for which rows indicate the information that is collected from the user, and columns show processing purposes. It also shows which data categories are shared with third parties, and it contains icons that distinguish whether certain information is collected and whether a user can opt-in or out of this collection.
5
Discussion
Table 3 synthesises the comparison between candidate libraries (with samples of their corresponding icons) and the criteria as formulated from the GDPR. The rows correspond to the candidate libraries sorted by year of publication and identified with their ID (as per Table 1). The second column indicates the category of iconified representations the candidate library belongs to (as described in Sect. 4). The remaining columns contain criteria based on GDPR Art. 12(7) which have been abbreviated to improve the readability of the table. One remark must be made, we refrain from displaying examples of labels for NLP [8] as they require more context for their interpretation (see Fig. 1c for an example). Table 3. Comparison of candidate libraries (with samples of icons) against GDPR criteria. Images are extracted from the source cited in the column Library. Library
Category
DPDI [15]
Stand. Icons
DPD [11]
Stand. Icons
PD
SD
TPS
DS
DR
PP: LO
PP: UT
PP: PF
MR
PCI [6]
Stand. Icons
NLP [8]
Nutr. Label
PL [4]
Stand. Icons
PI [14]
Stand. Icons
COMP [3]
Stand. Icons
CT [10]
Stand. Icons
PC [18]
Part of Tool
PG [17]
Part of Tool
5.1
Data Collection
Personal Data (PD). This criterion requires that users are informed of the categories of personal data collected when interacting with a service. Eight out of the
Iconified Representations of Privacy Policies: A GDPR Perspective
803
ten candidate libraries have an iconified representation indicating this, though their implementation varies. For example, NLP [8] splits this criterion into separate labels for contact information, demographic information, and financial information. DPD [11] contains icons which indicate the collection of personal data with more granularity (i.e., name, address, and email address), opposed to PI [14], which contains a single icon indicating that personal data are collected for the intended purpose but does not go into further detail. Sensitive Data (SD). GDPR requires that users are informed when special categories of data are processed. This processing is prohibited when a user does not give explicit consent (Art. 9). Three candidate libraries have a representation for this criterion (i.e., NLP, PL, and PG). PL [4] provides a specific icon to inform users about the processing of data on racial, political or religious categories. PG [17] makes use of colours to indicate the sensitivity of data. Third Party Sharing/Collection (TPS). According to GDPR Art. 13, a user should be informed when their data are shared with third parties. All candidate libraries have an iconified representation for this criterion. Although, there are nuances on how this criterion is accomplished. For instance, DPDI [15] expresses this with an icon indicating that a company agrees “not to trade or sell this data”, and PC [18] uses a colour code in its icons to represent that a given piece of data is shared with third parties. Data Security (DS). In its early drafts, GDPR suggested informing the users on whether personal data are retained in encrypted form. Even though the current version does not require an explicit mention (Art. 32 only requires that sufficient measures are taken to ensure a level of security concerning user data), it seems to be a subject of interest for the users. Half of the studied libraries contain an icon to represent this. Data Retention (DR). This criterion is based on Art. 13 and requires that users are informed of the period for which personal data will be stored. Five candidate libraries have an iconified representation for this criterion. The proposal by the European Parliament, COMP [3], has an icon indicating that personal data will not be retained longer than necessary for the processing. Another proposal, PI [14], has an icon indicating retention period (i.e., one, three, six, 18 months, or indefinitely). 5.2
Processing Purposes
Legal Obligations (PP: LO). Art. 13 states that a user must be aware of the legal basis for the processing of personal data. For instance, when under certain circumstances companies are required to collect and process data due to legal obligations. Four candidate libraries currently have an iconified representation for this specific processing purpose (PL, CT, PI and PC [4,10,14,18]).
804
S. de Jong and D. Spagnuelo
User Tracking (PP: UT). Another processing purpose is user tracking which, for instance, could involve gathering user data for generating statistics on the usage of a service. The GDPR requires that the user is informed of this purpose. PL [4] contains an icon specifically indicating the purpose “user tracking”, while DPD [11] has an icon indicating the purpose of “statistics”. Profiling (PP: PF). Art. 13 also defines a user should be informed if data are used for automated decision-making, including profiling. It is interesting to note that only two candidate libraries have an explicit iconified representation for this process. NLP [8] has a separate column indicating which categories of user data are used for profiling. While PL [4] contains a separate icon indicating that user data are used to create profiles. 5.3
Machine Readability
Machine-readable (MR). Art. 12(7) lit. 2 requires that if privacy icons are to be used electronically, they should also be machine-readable. This could be achieved by using the P3P format. Four candidate libraries mention machine readability, but rarely report on which platform it is achieved. For example, one proposal mentions that its privacy icons will have a machine-readable markup in a future version of the proposal, but no further development could be found in the literature [10].
6
Final Considerations and Conclusion
This paper reviews iconified representations of privacy policies. Candidate libraries were collected, categorised, and compared using criteria formulated from the GDPR. Our study results in the collection of 10 candidate icon libraries developed between 2006 and 2018. Three categories of iconified representations have been identified: Standalone Icons, Icons as part of Tool, and Nutrition Label approach. No candidate library we studied fits all GDPR criteria, but current developments are close to the standard described by GDPR. PrimeLife icon library [4,7], for instance, fits seven out of the nine criteria. There are still improvements to be made to the state of the art. Machine readability, for instance, remains an unsettled concept. The granularity of icons is another issue that requires attention. Proposals by Zaeem et al. and Tesfay et al. are examples of candidate libraries achieving greater granularity by using coloured icons to indicate a risk level of data collection or processing practices [17,18]. Yet most of the candidates do not offer this level of specificity. Our findings enrich the domain of research on iconified representations of privacy policies. Our categorisation and comparison uses GDPR criteria (contrary to the previous literature [2,5]), and could serve as a contribution to the taxonomy on this topic. Our findings also serve as a practical guide to the research community, policymakers, and regulatory bodies, offering insights for anyone interested in developing new iconified representations. For regulatory bodies, this study highlights what aspects remain an unsettled concept, e.g.,
Iconified Representations of Privacy Policies: A GDPR Perspective
805
machine readability—our comparison shows that few candidate libraries have mentioned machine-readable icons. Therefore, regulatory bodies may use this study to reassess this requirement and offer clarity to researchers and industry experts. Nonetheless, our work has some limitations. Although the retrieval of candidate libraries was conducted methodically, we cannot claim to have discovered every possible proposal on this topic. A limitation in this regard is known to us: during the retrieval process, we discovered some publications were withdrawn from their project websites. For some (e.g., [6,14]), we circumvent the problem using internet archival services. Another limitation is implicit to our work—as our criteria are based on European regulation, our findings might not apply to other regions. While we did include international candidates, we have only examined them along with the GDPR criteria. Future research could consider other data protection legislation and explore the differences with the GDPR in the context of iconified representations of policies. Our work unravels other directions for further research. A natural next step is to examine the user experience of candidate libraries. Several candidate libraries did not perform user studies to verify the comprehensibility of the proposed icons. Such study is described by one of our candidates [8], and could serve as inspiration. Another possibility is to combine elements from existing candidate libraries. This would allow for the development of an iconified library that fits all the criteria in this research, enriching the state of the art, as only two libraries have been developed since the GDPR entered into force in 2016 [17,18].
References 1. Cranor, L.F.: P3P: making privacy policies more useful. IEEE Secur. Priv. 1(6), 50–55 (2003) 2. Edwards, L., Abel, W.: The use of privacy icons and standard contract terms for generating consumer trust and confidence in digital services. Technical report, CREATe working paper series (2014). https://doi.org/10.5281/zenodo.12506 3. European Parliament: Compromise amendements on Articles 1-29. Technical report, COMP Article 1. 07.10.2013 (2013) 4. Fischer-H¨ ubner, S., Zwingelberg, H., Bussard, L., Verdicchio, M.: UI prototypes: policy administration and presentation - version 2. Technical report (2010) 5. Hansen, M.: Putting privacy pictograms into practice - a European perspective. GI Jahrestagung 154, 1–703 (2009) 6. Helton, A.: Privacy commons icon set (2009). http://aaronhelton.wordpress.com/ 2009/02/20/privacy-commons-icon-set/. Accessed November 2019 through web archive 7. Holtz, L.E., Zwingelberg, H., Hansen, M.: Privacy policy icons. In: Privacy and Identity Management for Life, pp. 279–285. Springer (2011) 8. Kelley, P.G., Bresee, J., Cranor, L.F., Reeder, R.W.: A nutrition label for privacy. In: Proceedings of the 5th Symposium on Usable Privacy and Security, p. 4. ACM (2009) 9. L¨ ammel, R., Pek, E.: Understanding privacy policies. Empirical Softw. Eng. 18(2), 310–374 (2013)
806
S. de Jong and D. Spagnuelo
10. Lanner¨ o, P.: Fighting the biggest lie on the internet: common terms beta proposal. Metamatrix AB (2013). http://commonterms.org/commonterms beta proposal. pdf. Accessed November 2019 11. Mehldau, M.: Iconset f¨ ur Datenschutzerkl¨ arungen (2007). https://netzpolitik.org/ 2007/iconset-fuer-datenschutzerklaerungen/. Accessed November 2019 12. Murmann, P., Fischer-H¨ ubner, S.: Tools for achieving usable ex post transparency: a survey. IEEE Access 5, 22965–22991 (2017) 13. Proctor, R.W., Ali, M.A., Vu, K.P.L.: Examining usability of web privacy policies. Int. J. Hum.-Comput. Interact. 24(3), 307–328 (2008) 14. Raskin, A.: Privacy icons. http://www.azarask.in/blog/post/privacy-icons/. Accessed November 2019 through web archive 15. Rundle, M.: International data protection and digital identity management tools. In: Presentation at IGF 2006, Privacy Workshop I, Athens (2006). http://www. lse.ac.uk/management/research/identityproject/. Accessed November 2019 16. Schwartz, A.: Looking back at P3P: lessons for the future. Center for Democracy & Technology (2009). https://www.cdt.org/files/pdfs/P3P Retro Final 0. pdf. Accessed November 2019 17. Tesfay, W.B., Hofmann, P., Nakamura, T., Kiyomoto, S., Serna, J.: PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In: Proceedings of the 4th ACM International Workshop on Security and Privacy Analytics, IWSPA 2018, pp. 15–21. ACM (2018). https://doi.org/10.1145/3180445. 3180447 18. Zaeem, R.N., German, R.L., Barber, K.S.: PrivacyCheck: automatic summarization of privacy policies using data mining. ACM Trans. Internet Technol. (TOIT) 18(4), 53 (2018)
Data Protection in Public Sector: Normative Analysis of Portuguese and Brazilian Legal Orders Marciele Berger Bernardes1(&), Francisco Pacheco de Andrade1, and Paulo Novais2 1
2
Escola de Direito, Universidade do Minho, Braga, Portugal [email protected] Escola de Engenharia, Universidade do Minho, Braga, Portugal
Abstract. Considering that information technology penetrates all areas and domains of the public sector, it has to be considered the extension of the required regulation needed for warranting that this phenomenon becomes an advantage and not a threat. In this sense, this study has as aims to discuss certain aspects associated with fair use of emerging and disruptive technologies and (such as Artificial Intelligence, Internet of Things, Big Data) in the public sector. The emphasis may fall upon the treatment of this subject by traditional regulatory instances, such as Data Protection Regulation-GDPR, in the sense of enhancing the capacity of Governments to ensure privacy, data protection, and the protection of citizens. Keywords: Data protection in public sector General Data Protection Regulation General data protection Law-Brazil
1 Introduction The introduction of collaborative innovations in urban environments requires an interdisciplinary look. The governmental and technological structures and human beings (living complex organisms under constant evolution) are now integrated by instruments collecting data. These are connected by different sources, under the principle that (non) governmental and private beings interact and exchange information for a better, more sustainable life, with fewer costs and more participated. The fact is that, besides that, cities became massive centers of data collection. So the challenge is clear: how to consider the enormous amount of data produced while respecting the main principles of the fundamental rights to privacy and intimacy? Furthermore, more importantly, how to ensure that the administrations shall use ICTs while placing citizens in the center of these processes, respecting legal security and the person’s digital sovereignty. For answering these questions, this study was divided into three items. First, we presented a review of the literature on Smart Cities, data protection, and privacy. Second, it was presented the normative profile of data protection in both Portuguese and Brazilian contexts. Finally, by a combination of theoretical and practical implications, the research questions in this study were answered. It was suggested the way © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 807–817, 2020. https://doi.org/10.1007/978-3-030-45691-7_76
808
M. B. Bernardes et al.
followed by traditional regulatory instances, such as the General Data Protection Regulation- GDPR, in the sense of enhancing the capacity of Governments to ensure privacy, data protection, and the protection of the citizens.
2 Smart Cities, Data and Privacy: Challenges and Prospects Unquestionably, street sensors allow to enhance public security levels, and access to high-speed Internet for all populations and the integration between government, corporations, and civil society allow to create a city more accessible for everyone. In the same measure, people are aware that many speeches on the subject are marked with an optimistic view, so often not very critical towards new technologies. Thus being, difficulties arise as it has to be recognized some emerging concerns that urban centers must face when turning to digital environments. Among the main risks identified by doctrine, it must be considered the issues related to privacy and trust. - It must be considered the potential to create new forms of social regulation with the erosion of privacy and the potential of creating systemic vulnerabilities in the whole infrastructure and the security of data, instead of producing a stable and trustable structure for the citizens. As it may be noticed, one of the great subjects of debate around Smart Cities is related to the dilemma between the fundamental rights of privacy and publicity and the more and more generalized context in which private data are used as an instrument of municipal public management. Reflecting on this scenario, Edwards [1] points out that cities congregate three central challenges to personal privacy, namely Internet of Things - IoT), Big Data, and cloud computing. The potential impact in the implementation of applications and smart citie’s platforms makes it convenient to make a review of the main characteristics of these technologies. Internet of Things refers to the connection of objects that may be read by machines exclusively identified through the Internet. In the cities, some examples of these are street lights, temperature sensors, noise sensors, sensors of rain and air quality, traffic lights, security cameras, public transportation, and citizen’s cellphones. According to the authors, the main elements characterizing the Internet of Things is that data are collected from these objects and sent to citie’s platforms or applications to be stored and processed. Big data is a product of the Society of Sharing (or Informational Society). It is related to the accelerated technological development and with the economic model arising out of it. In a simplified way, big data may be translated as the set of techniques and tools for manipulating and storing a significant volume of data. Among its main characteristics, it may be mentioned: volume, a great quantity of data generated, variety, data from different sources and with different structures, and speed, being that many services depend on fast processing or even of real-time processing. New tools of Big Data in smart cities were made possible thanks to the widespread use of devices and sensors, based on technological structures, which allow cities to become important data collecting centers. So, it is worth to refer that, in the context of smart cities, Big Data includes all actions and communications in digital platforms. From the more simple ones (as the
Data Protection in Public Sector
809
use of cellphones, laptops or even the recognition of patterns in traffic, using historical data, the forecast of quantities of electrical energy in different days and schedules, using the flows of data in real-time, and the forecast of the use of public transportation), to the detection of public security problems, arising out of monitoring through security cameras. Concerning cloud computing, it is a “new modality of services provision, through the use of internal and external servers, allowing omnipresent access to a wide range of services and resources,” that is to say that it includes the infrastructure for the storing and processing of data. Among its essential non-functional requirements for smart cities, Kon and Santana [2] point out the cloud of things (storing and processing of data from sensors in an environment of cloud computing) and sensing as a service (infrastructure in charge of the provision of data from the sensors to applications as services in the cloud). Based on this, mainly considering the almost instantaneous possibility of storing, processing, and distributing information, new theories arise on the better grounding of decisions based on data analysis by governments. Thus, decisions may have a better grounding, and so lead to a “radical increase in the efficiency of processes and allocation of resources. Including the detection of failures and fraud”. Meanwhile, as it was noticed by Pierre Lévy [3], “to digitalize an information is just to translate it in numbers”. And this leads to legal reflection related to the object of this research. That means that we must consider that the (wrong) use of technologies may create unbalances or even violations of rights. By the way, it is worth to refer that the previous warning and the consent of the citizen – holder of rights – are considered to be the cornerstone of data protection and privacy. However, they may become weakened in the context of Smart Cities. The imminent risk of this was already pointed out by Lawrence Lessig [4] when he clarified that depending on the use; such devices make possible a permanent and tendentially integral control of the persons: “The struggle in this world will not be on the government. It will be to warrant that essential freedom is preserved in this environment of perfect control”. In the face of this dilemma, some issues arise: everyday rights will be (re)negotiated with the State and with the new emerging economic model. So, different questions come to our minds, among which: whose rules to apply to public powers? How should be the consent of the holder of rights? What is the responsibility of public power in the personal database’s management? To what sanctions must public powers be subject? In search of answers for these questions, it was considered, in the next item, the normative analysis in Portuguese and Brazilian legal orders.
3 Portugal and Brasil: Normative Profile of Data Protection Professor Ernesto Valdés [5], in work under the title “Privacy and Publicity,” reasons on different situations in which it is alleged that there is a violation of the private sphere, according to social norms. Still in the North American scenario, it is worth remembering the news of 2013, according to which National Security Agency – NSA intercepted domestic telephonic
810
M. B. Bernardes et al.
calls and collected its data, through the Internet. NSA also intercepted calls of nonAmerican persons and even from other country’s governments. These revelations on surveillance arose form documents of the North American government, revealed by Edward Snowden (ex NSA agent), concerned with the collection of data by the American government. This kind of issue requires clear policies and contínuos deliberative processes and updating. In the age of Big Data, as cities and administrations use more and more data to generate operational and political advances, it must be considered the security of data and the rules on warranties of anonymity. Thus being, it is required an analysis of the legal solutions identified in the European Union (Portugal) and Brazil in order to face possible threats to privacy in smart cities. From this perspective, we must refer to the work of Schönberger [6], who points out four different generations of data protection laws in Europe. The first generation reflected the technological framework of the time and aimed at controlling technology, regulating the authorizations for the creation of databases. One of the critical points in this period was the 1970 Hesse (german state) data protection law. The normative motivation occurred as a reply to the alleged technological threat, characterized by generality and abstraction, which has lead to a quick mismatching in front of the quick multiplication of data processing centers. The second wave of the European normative has as exponent the french law on data protection (1978) being its main characteristic a change of paradigm: the focus was not anymore on providers and went to citizens. Citizens were supposed to identify the undue use of their personal information and to propose its protection. The main obstacle was that citizens were forced to choose between social exclusion and the provision of their data. In the 1980s, arises in Europe, the third generation of laws, focused on the citizen, but with a sophisticated guardianship based on the right to informational selfdetermination. As an example of this, we may refer to the Norwegian Law on Data Protection. Finally, the fourth generation, referred by Schönberger [6], tried to overcome the disadvantages of the individual focus, stating that data guardianship may not restrict individual choices of the citizens: laws are required to enhance the collective pattern of data protection. In the context of the European Union, we must refer to the European Directive 95/46/CE, created with a double aim: to support the creation of a normative mark of data protection and the free circulation of data among the member states. It is still to be referred that, as a result of the transposition of such Directive, it was issued in Portugal the Data Protection Law (Law n.º 67/98). As the years went by, the reffered norms became clearly not enough to assure the needs of data protection, as the reality was quickly changing, also in the different Member States of the EU, and modernization and a unique legal act was required in order to “reinforce the fundamental rights of the persons in the digital age (…) thus ending with fragmentation and the costly administrative charges”. This scenario of
Data Protection in Public Sector
811
normative mismatching and of technological evolution lead to the approval of Regulation (UE) n.º 2016/679 General Data Protection Regulation-GDPR1, concerning the protection of individuals in what concerns personal data processing and the free circulation of data. According to Ronaldo Lemos [7], this framework evidences the role that Europe is performing to become a regulatory superpower (not just considering data protection but also subjects such as Intellectual Property and Competition Law). Furthermore, as the Author refers it, two primary and visible effects of the Regulation are visible. The first one is a macro aspect. The fact that Europe implemented such a Regulation enhanced the adoption of data protection laws in other countries, such as Brazil. The second one is a micro aspect. “Every Corporation working with data will have to take into account GDPR – even if they do not have a siege in Europe.” Considering that the focus of this paper falls upon a comparative analysis of the Portuguese and Brazilian scenarios, it is convenient to do a brief review of GDRP (now the main rule for data protection in Portugal). And of the Brazilian Data Protection Law (whose contents were inspired in the European model). 3.1
General Data Protection Regulation (GDPR)
Following logic Viktor Schönberger’s [6], on the different generations of data protection laws in Europe, above referred, it is believed that GDPR may be classified as the fifth generation of European Legislation on Data Protection. It must be clarified that the said study is from 1997, and thus it could not have foreseen the appearance of the European Regulation. Yet, in a recent study by Schönberger and Kenneth [8] it was stated that data are for Informational Society as fuel was for Industrial Society. The authors have warned that there is a risk of the outcome of what they called the “Barons of the Big Data” in the 21st century, as it happened with the “Barons of Rubber” in the XIX century, who dominated railroads, metallurgy and telegraphic networks in the United States. So, for these authors, in the age of Big Data (in which it is not possible to foresee the extent of technological evolution), the challenge is to develop measures that allow transactions of data. Thus, thinking analogically, Schönberger and Kenneth suggest a strategy of identification of general principles for a regulation of the subject. in order to ensure the safeguard of minimal rights. Given this context, GDPR starts to show itself, having as legal support the following instruments. The Treaty on the Functioning of the European Union - TFUE (article 16); the Chart of the Fundamental Rights of the European Union (articles 7 and 8); the Convention 108 of 1981, “the first international instrument legally binding adopted in the domain of data protection”; the European Convention on Human Rights - ECHR (article 8); the Treaty of Lisbon, “providing a more solid base for the
1
GDPR came into force on the 25th may 2016 and became fully applicable on the 25th may 2018. It expressly revoked Diretive 95/46/CE and, being an European Regulation, it is mandatory and directly applicable in all EU State Members, thus replacing Portuguese Data Protection Law in all that is not compatible with tghe Regulation. (articles 94. º and 99 of GDPR).
812
M. B. Bernardes et al.
development of a more efficient and clear system of data protection”; the Directive 95/46/CE –“concerning data protection.” Out of these arose GDPR, which has gone through, until its full applicability, important temporal steps. 2012 (in January, it was presented the initial proposal of Regulation of data protection by the European Commission). 2014 (in March, the European Parliament approved its version of the Regulation). 2015 (in June, the Council of the European Union approved its version and, in December, the Parliament and the Council reached an agreement); 2016 (in May, the Regulation was approved). 2018 (after two years, GDPR became fully applicable all over the European Union on the 25th of May 2018). It may be said that GDPR had several aims2: the harmonization (“being the Regulation directly applicable in all Member States, there is not the need of a national legislation in each Member State”); the expansion of reach (“ the Regulation is applied to all the organisations acting within the European Union […] and also the ones with siege outside of EU but that monitor and/ or offer services and goods to individuals in the EU”); the unique schema/ “one-stop shop” (a new concept o fone stop shop “means that the organizations will have just to deal with one only supervising authority, […] making it simpler and cheaper for corporations making business in the EU”). GDPR also presentes new concepts and some other, already existing, were considerably revised. It is to be noted, among others, the following ones: personal data (article 4, nr.1, GDPR); special category of data (article 4 nrs. 13,14,15 and article 9 GDPR); vilation of personal data (article 4, nr. 12, GDPR); pseudonimisation (artigo 4, nr. 5, GDPR); right to the erasure of data “right to be forgotten” (article 17, GDPR); privacy by design (article 25, nr.1 GDPR); privacy by default (article 25, nr. 2 GDPR), right to the portability of data (article 20, GDPR) and thw agentes of the treatment (“Responsible for the treatment” and “Subcontracor”, article 4, nrs. 7 and 8, GDPR). So, it will be subject to GDPR, every person “natural or collective person, public authority, agency or other organism receiving communication of personal data, regardless of being or not a third party” (article 4, no. 9, GDPR). Concerning the range of its application, GDPR states possibilities for material application, focusing on all those who treat personal data by means totally or partially automatized, including all public and private entities (article 2 GDPR). The range of territorial application includes those residents in Europe and corporations processing data of persons located in the European territory (article 3 GDPR). Besides that, the possibilities of exclusion of the application of GDPR are mentioned in article 2 nr. 2 GDPR. It must be said that GDPR was built considering certain principles, such as lawfulness, loyalty, transparency”, “limitation of purpose”, “data minimization”, “accuracy,” “limitation of conservation” and “integrity and confidentiality” (article 5 GDPR). All this keeps a deep connection with the rights of the holders of data for the exercise of the subjective right of data protection, present in chapter 3 GDPR. 1) the right to be informed – access to information on the processing. 2) the right of access – 2
That, in a general way, were built on the base of the argument that it is up to the European Union to ensure that “the fundamental right to data protection, established in the Chart of Fundamental Rights of the European Union is applied in a coherent way (…), specially in a world society characterised by quick technologial changes”.
Data Protection in Public Sector
813
access to personal data stored by the controller. 3) the right to rectification – correction of any nonconformity concerning the processed personal data (article 16, GDPR). 4) The right to erasure or “right to be forgotten” – exclusion of data stored or processed (article 17, GDPR); 5) The right to portability – transfer of data to another (article 20, GDPR); 6) The right to the objection of treatment – temporary restriction to the processing of personal data (article 4 nr. 24 GDPR). GDPR also provides a modernized framework of compliance based on responsibility concerning data protection. It was thus included a new figure of the Data Protection Officer3. For this aim, DPOs assumed a central role in the normative framework, as participants of the system of data governance. It must be noted that GDPR does not define what authority or public organism are, leaving this task for the national legislator in each member state of the EU. For this purpose, the Portuguese Proposal of Law nr. 120/XIII, in article 12, nr. 2, assumes as public entities: a) the State; b) the autonomous regions; c) the local authorities; d) the independent administrative entities and the Bank of Portugal; e) the public institutes; f) the institutions of public higher education of foundational nature; g) the public enterprises on legal public form; h) the public associations. Besides that, DPO must be selected in accordance to his legal and specialized knowledge in terms of data protection (article 37 nr. 5, GDPR), without forgetting the capacity to perform the functions referred by article 39 (computer skills), evidencing the multidisciplinary character of this figure. He may perform several tasks referred on article 39 nr. 1, GDPR, consisting of supervision and monitoring of the internal application, ensuring respect for data protection norms. For exercising this function, DPO must be independent and can not receive instructions of the controller or processor in what the performance of his tasks is concerned. He can not be dismissed or penalized for performing his tasks, and he may perform other functions within the Corporation (provided that there is no conflict of interest). Still, the DPO may be an internal collaborator or an external agent, hired as a services provider (articles 37 nr. 6 and article 38 nr. 6 GDPR). The efficacy of the GDPR is still bound to the creation/designation of an entity, namely the Independent Control Entity (article 51, GDPR), with several attributions (article 57 GDPR), having even the power to investigate and imposing sanctions (article 58 GDPR). For the prosecution of these functions, they must act with total independence. Furthermore, GDPR brought along a new paradigm: it allowed the relationship between authorities of control, through the cooperation in trans-border treatment (article 56 GDPR). In Portugal, the role of the “Authority of Control” is in
3
According to the Working Group of Article 29 [9], the concept of DPO is not new, since the “Directive 95/46/CE3 did not oblige any organization to designate a DPO but still the practice of designating a DPO was being developed in several member states along the years.”. Furthermore, the referred Working Group mentions that the main aims of DPO are to “ease the conformity through the implementation of responsibilization instruments ( for instance, making it viable evaluations of data protection impact, and making or audits)” and also serving as “intermediaries between the interested parties, for instance, authorities of control, the data holders and the entrepreneurial units within an organization”.
814
M. B. Bernardes et al.
charge of CNPD – National Commission for Data Protection, following article 3 of the Proposal of Law 120/XIII. Once occurring a treatment in violation of personal data – for instance, not complying with the basic principles of treatment (not having the consent of the client, or violating the holder’s rights) sanctions will be applied (article 82 GDPR). The Regulation includes a list of staggering financial sanctions. The most significant alteration was the establishment of higher sanctions for the responsible for the treatment and the subcontractor not complying with the established rules. Some violations are subject to sanctions that go to 20 million euros or, in the case of a Corporation, to until 4% of its annual business volume (article 83 nr. 5 GDPR). At last, it is convenient to remind that GDPR does not arise out of nothing, to safeguard “as in magic” all the individual rights of data protection. However, instead, it evidences the long road taken by the EU towards the guardianship of these rights, each day threatened by technological transformation and, so, requiring permanent normative updating. Regardless of the Regulation’s impact, all the EU member states already had legislation directed to the treatment of data. The Regulation was an opportunity to revise and to uniformize the treatment of data according to the principles of data processing and especially “limitation of purposes.” 3.2
Brazilian General Data Protection Law (GDPL)
The road towards the approval of the Brazilian General Data Protection Law [10] was not supported neither in a wide normative framework, nor in an updating of previous legislation on data protection. Considering the total absence of legislation, data protection in this country was promoted based on constitutional interpretation (article 5 § 2 CF/88) and in co-related legislation (Law on Access to Information - Law nr. 12.527/11 and “Marco Civil da Internet – Civil Mark for Internet” – Law nr. 12.965/14). In front of this, it was presented the Brazilian GDPL, whose way until approval went through the following: 1) 2010 (Draft Law elaborated by the Ministry of Justice through the public debate). 2) 2012 (presentation of the Project of Law nr. 4.060/12 by the Chamber of de Deputies). 3) 2013 (presentation of the Project of Law nr. 330/13, by the Federal Senate). 4) 2015 (new draft elaborated by the Ministry of Justice went to public debate). 5) 2016 (Project of Law nr. 5.276/16 was sent to the National Congress); 6) 2018 (The General Data Protection Law was approved on the 14th August 2018 and, after 18 months, it will get into force and thus will be applied in the whole Brazilian territory). It may be affirmed that the brazilian GDPL had a Strong influence of the european model, starting with basic concepts, such as: personal data; sensitive personal data; anonimised data; treatment agent, article 5 GDPL. Other subjects were inspired in GDPR, such as the right to be forgotten “elimination of personal data” (article 18, GDPL). Data protection, since the conception (“it determines the adoption of security, technical and administrative measures, adequate to protect personal data, from the phase of the conception of the product or the service to its execution”), article 46, §2, GDPL.
Data Protection in Public Sector
815
From this perspective, GDPL is applied to any operation of data treatment performed by natural or legal persons (of public or private law), according to any of the following requirements. 1) data collected and treated in Brazil; 2) data having as holders individuals located in Brazil; 3) data having as purpose the offering of products or services in Brazil (article 3 GDPL). The possibilities of exclusion of GDPL application are mentioned exhaustively in article 4: a natural person for personal private purposes; aims exclusively related to journalistic, artistic or academic purposes; public security; data “in transit.” Just as its inspiring model, GDPL is based on a series of principles directed towards the treatment of data, such as: “finality, adequation, need, free access, quality of the data, transparency, security prevention, non-discrimination, responsibilization and accounting” (article 6 GDPL). GDPL assumes consent as a “free, informed and unequivocal manifestation, from which the holder agrees with the treatment of its personal data for a determined purpose” (article 5 XII GDPL). Besides that, two possibilities are established for the consent of the data holder: (i) written consent or (ii) by any other mean demonstrating the will of the data holder, such as a checkbox of the privacy policy (article 8 GDPL). In front of that, GDPL establishes the following individual rights: the right to be informed; the right to rectification; the right to the portability of data; the right of access; the right to the exclusion of data; the right of the revocation of consent (article 18 GDPL). It is also included in GDPL, the figure of the Data Protection Officer (DPO), who must be indicated by every data controller. The “identity and contact data must be disclosed publicly, clearly and objectively, preferably in the electronic site of the” (article 41 §1 GDPL). As it happened with the European model, the corporations willing to adopt a broader pattern of protection of data must strongly consider the to hire a Data Protection Officer. Besides that, some critical practical aspects were not considered in GDPL. Such as: the qualification required for the DPO (technical and/or legal), the need of certification and the possibility of accumulating functions. As well as GDPR, the efficacy of GDPL in Brazil is bound to the creation of an entity responsible for auditing and ensuring the compliance with the Law. Fot that, the figures of “National Authority on Data Protection (ANPD)” and of the “National Council of Privacy and Data Protection” were established in articles 55-A to 58-A of GDPL4. ANPD was projected as a Federal Local Authority, bound to the Ministery of Justice. Its regulation and organizational structure would be regulated by Presidential Decree. Within its attributions, it may be referred the elaboration of orientations for the National Policy of Data Protection. But also the supervision of the compliance to the law and application of sanctions, the fulfillment of requests by holders of rights against controllers; lawyering; advocacy; to publicize regulations and procedures for the protection of personal data and the elaboration of reports on the impact of personal data protection.
4
Wording given by Law nº. 13,853 of 2019.
816
M. B. Bernardes et al.
Besides that, there are four main pillars on which ANPD would act, and these still are considered in GDPL: security of data, treatment of incidentes, reparation of damages, and sanctions. Thus being, in case of non-compliance with the GDPL, ANPD would be the entity responsible for the application of administrative sanctions. Among these sanctions, there is the possibility of warnings, fines, or even the total or partial prohibition of activities related to data protection. Fines may go up to two percent of the billing of the private law legal person, or entity’s group in Brazil, in the last exercise, excluding the tributes and limited, in total, to fifty millions of reais per infraction (article 52, GDPL). Still, there is the possibility of daily fines to compel the entity to end with such violations. From the above exposed, it may be said that GDPL keeps many similarities with its inspiring model, GDPR. Many challenges must be transposed for adequate data protection in Brazil. However, it must not be forgotten that the referred Law inaugurates a new era in the guardianship of the rights of the citizen and inclusion of Brazil in an international eco-system – a new regulation of information and data. Final Considerations As it was seen, smart cities launches new challenges to human rights, such as the scale of privacy and intimacy that may be harmed5 by the distortion of the use of the technology for an “intelligent” policing. As it was analysed, EU is assuming the iniciative of adequation to this new scenario, from its historic efforts and recently by the approval and entrance into force of the General Data Protection Regulation, a concrete opportunity for citizens to regain some data sovereignity, mainly through the concern with the guardianship of the fundamental rights, serving this as a norm for the Portuguese State and as an inspiring model for the creation of the Brazilian GDPL However, one thing is the theoretical debate on smart cities (use of data for improving and enhancing governance and participation) and the normative prevision (Law on Data Protection, Law on Access to Information, …), and another issue is its practical application. Acknowledgments. This work has been supported by FCT Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019.
References 1. Edwards, L.: Privacy, security and data protection in smart cities a critical EU law perspective. Eur. Data Prot. Law Rev. 1(2), 28–58 (2016) 5
The above listed is justa n example but, it may be thought as an alert for the fact that, in smart citie’s programmes, the debate goes beyond the “mere” use of data. As Teresa Moreira and Francisco Andrade say [11]: These Technologies bring along the risk of an intensive use of personal data. We are confronted with a real threat of constant treatment of personal data, which leads us to the overwhelming perspective of a progressive transformation of persons into electronic persons, while object of constant monitoring (or surveillance) by a growing number of informatic applications.
Data Protection in Public Sector
817
2. Kon, F., Santana, E.: Cidades Inteligentes: Conceitos, plataformas e desafios. In: 35º Jornada de Atualização em Informática, Porto Alegre, Brasil (2016) 3. Lévy, P.: Cibercultura. In: Paulo, S. (ed.) Tradução Carlos Irineu da Costa, vol. 34 (1999), p. 92 4. Lessig, L.: Code Version 2.0, p. 4. Basic Books, New York (2006) 5. Valdés, E.: Privacidad y publicidad, vol. 1, pp. 223–244. Doxa, New York (2006). ISSN 0214-8876 6. Schönberger, V.: Desenvolvimento Geracional da Proteção de Dados na Europa. In: Agre, P., Rotenberg, M. (eds.) Tecnologia e Privacidade: The New Landscape, pp. 219–242. MIT Press, Cambridge (1997) 7. Lemos, R.: A GDPR terá um efeito viral. In: Meio e Mensagem. (2018) 8. Schönberger, V., Kenneth, C.: Big Data: como extrair volume, variedade, velocidade e valor da avalanche de informação cotidiana. In: Paulo, S. (2013), p. 130 9. European Union. REGULATION (EU): 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). https://eur-lex.europa.eu/legal-content/EN/TXT/ HTML/?uri=CELEX:32016R0679&from=pt 10. Moreira, T.C., de Andrade, F.P.: Personal data and surveillance: the danger of the “homo conectus”. In: Intelligent Environments (Workshops), pp. 115–124 (2016). Brazil, Lei Geral de Proteção de Dados Pessoais (GDPL). http://www.planalto.gov.br/ccivil_03/_ato20152018/2018/lei/L13709.htm
Common Passwords and Common Words in Passwords Jikai Li1(B) , Ethan Zeigler1 , Thomas Holland1 , Dimitris Papamichail1 , David Greco1 , Joshua Grabentein1 , and Daan Liang2 1 The College of New Jersey, Ewing, NJ 08628, USA {jli,zeiglee1,hollandt4,papamicd,grecod6,grabenj2}@tcnj.edu 2 Department of Civil, Construction and Environmental Engineering, University of Alabama, Tuscaloosa, USA [email protected]
Abstract. Passwords often include dictionary words or meaningful strings. Figuring out these words or strings may significantly reduce the number of password guessing. The wordlists used by password cracking software, such as Hashcat, typically include the words from various dictionaries and leaked plain passwords. Is it really necessary to put all dictionary words and leaked passwords into the wordlist? In this work, we use Mac system dictionary and rockyou.com leak as two sample wordlists to check the substrings of over 600 million leaked passwords from different websites. We find only a small portion of words from these two wordlists are used by the leaked passwords. More specifically, about 90,000 out of 235,886 Mac dictionary words and about six millions out of 13 millions rockyou.com unique passwords are used by the leaked passwords. In addition to that, we find that a small portion of unique passwords are shared by a large portion of accounts.
Keywords: Password
1
· Hashcat · Dictionary · Substring
Intoduction
As Internet service is pervasive in our society, passwords are important part of our daily life. No matter whenever we check emails, browse social media, do online banking, or shop online, it is likely that we use passwords either as the primary authentication or as part of multi-factor authentication. The security of passwords relies on several components: client side, server side, and communication between client and server. However, in reality, there are several factors that make password vulnerable. First, if the client is a human being, the client has a tendency to generate meaningful passwords so that they are memorable. These meaningful passwords may express feeling, such as This work is supported by ELSA high performance computing cluster at The College of New Jersey. ELSA is funded by National Science Foundation grant OAC-1828163. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 ´ Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 818–827, 2020. A. https://doi.org/10.1007/978-3-030-45691-7_77
Passwords Words
819
“iloveyou” and “hahaha”; represent certain patterns in life, such as “123456” and “qwerty” from English keyboard; use dictionary words or modified dictionary words, such as “password” and “passw0rd”; use words related to the client, such as “sophie” and “nathan”; combine the words of different types, such as “abc123” or “test1234”. It is relatively easy to guess the meaningful passwords. If the same meaningful passwords are adopted by a large group of users, these accounts will be particularly vulnerable. In addition to the fact that humans have a tendency to create meaningful passwords that are likely to be weak, the stored passwords on server, no matter plaintext or hashed, are hacker’s targets. Hundreds of millions of passwords information were leaked in the past. For example, Gmail had 5 million passwords leaked in 2014 [1], Yahoo had about 3 billion passwords leaked [2], LinkedIn had about 160 million passwords leaked [3]. ClixSense had about 6 million plaintext passwords leaked [4]. Even the leaked passwords are hashed, it is still possible to use tools such as Hashcat [5] to crack them. If the hashed passwords use popular patterns, cracking them can be done within reasonable time. This work analyzes to what extent different accounts use identical passwords and to what extent different passwords uses the same words.
2
Background
To crack hashed passwords, people can use software, such as John the Ripper [6], Cain and Abel [7], and Hashcat [5]. All these software take guessed password as input, hash it, then compare the hashed result with the target hash. If the hashed value matches the target hash, the hashed password is cracked. To speed up cracking a hashed password, the most likely passwords should be generated first, the less likely passwords should be generated later. Brute force attack [8] ignores the possibilities of different passwords and typically treats all possible passwords equally, searching the key space systematically. If the password length is long, the cost of brute force attack can be prohibitive. Tatli [9] analyzed the passwords from rockyou.com leak [10] and identified several patterns from it. The identified patterns were used by [9] to improve their dictionary attack. The work in [11] generates password structures in highest probability order. The work in [12] provides a framework for semantic generalization of passwords. In [12], authors built a specialized word list including names, cities, surnames, months, countries to support semantic classification. Xu, Chen, Wang et al. analyzed the patterns of digits in passwords of Chinese websites [13]. Melicher, Ur, Segreti, et al. proposed to use neural network to evaluate password strength [14]. Bonneau [15] developed partial guessing metrics to evaluate the strength of a large set of passwords. The work in [16] examine whether the distribution of passwords matches Zipf’s law. Han et al. discussed how Pinyin is used in the leaked passwords of Chinese websites [17]. These works either try to, implicitly or explicitly, identify the patterns of leaked passwords or use the identified patterns to crack hashed passwords. A accurate understanding of word usage in passwords can help build proper dictionary/wordlist and rules for cracking software, such as Hashcat. In this work, we
820
J. Li et al.
analyzed over 600 millions leaked plaintext passwords and ranked the popular passwords from six different websites. In addition to that, we use two different dictionaries to match the substrings in each password. We found most words in our sample wordlists are not used by any password we checked. It is well-known that a proper dictionary/wordlist is critical to password cracking software, such as Hashcat. An over-sized dictionary can significantly increase the run time of the software. An under-sized dictionary/wordlist may miss the substring(s) in the hashed password and may fail at cracking. The result of this work can help build wordlists in future. In this paper, we let p represent one password, |p| be the length of password p. Furthermore, let P be a group of passwords, |P | be the size of P . Please note P may have duplicated passwords. If we remove all duplicated copies of each password and keep one instance of each password, we have set P . The size of P is |P |. Unequivocally, let Px be the group of all passwords of organization x; let Px be the set of unique passwords of Px . For example, we will use Pgmail.com and P gmail.com be the leaked passwords of gmail.com with and without duplications respectively. Morris and Thompson [18] introduced brute force and dictionary attack in 1979. Generally speaking, brute force attack and dictionary attack are suitable for relatively short password and simple password. To crack longer and stronger passwords, it is necessary to narrow down the searching space. The most common way to narrow down searching space is to use dictionary words or character strings in the guessed password. The intuition of using words and strings to crack passwords lies in the fact that human being tends to create passwords that are meaningful because they are easy to remember. Modern cracking software takes advantage of the fact that people uses words and meaningful strings in passwords. For example, Hashcat is the world’s fastest and most advanced password recovery tool. Hashcat supports seven attack modes: brute-force attack, dictionary attack, combinator attack, hybrid attack, mask attack, rule-based attack, toggle-case attack. Of these seven modes, brute-force attack enumerates a unified character set for each character in a password, mask attack may enumerate different character sets for each character in a password. Fundamentally, mask attack is a special brute force attack in Hashcat. Excluding brute-force attack and mask attack, all other five attacks in Hashcat use words from some dictionary or wordlist. In this paper, we use dictionary and wordlist interchangeably. To certain extend, the performance of the cracking software like Hashcat depends on the quality of the dictionary or wordlist. If the size of the wordlist is too small, it is likely that no word matches the substrings of the targeted password and the cracking may fail. However, if the size of the wordlist is excessively large and most words are not used by any password, the number of password guessing can be excessively large as well. Ideally, the wordlist should be comprehensive and large enough, but not overly large. In this work, we use Mac system dictionary and rockyou.com leak as two sample wordlists to analyze over 600 million leaked passwords from different websites.
Passwords Words
821
Here, we briefly discuss why words or strings are so important in password cracking. For a password p, the total number of possible passwords is |C||p| , where C is the set of possible of characters in a password and |C| is the size of set C. In a English website, it is common that passwords can be composed of the following characters: – – – –
Upper case letters [A-Z] Lower case letters [a-z] Digits [0–9] Punctuations and special characters: !"$%&’()*+-,./:;?@[]\^ ‘{|}~
In total, there are 95 choices. If |p| is 10, the number of possible passwords is 9510 , which is about 1018 . If there is one word (string) w in p and this particular word (string) exists in dictionary D. If w is known and the location of w is known, a cracking software just needs to enumerate the combinations of the remaining |p| − |w| characters, the number of enumerations is 95|p|−|w| , which is significantly smaller than 95|p| . If w is known and the location of w is unknown, the number of enumerations is (|p| − |w|) × 95|p|−|w| , which is still much smaller than 95|p| . If w is unknown and w belongs to a dictionary D (size is |D|), we will take out word w one by one from dictionary D and brute force the remaining characters of the password. As we discussed above, given one p, the number of enumeration is (|p| − |w|) × 95|p|−|w| . In the worst case, the total number of enumerations is (|p| − |w|) × 95|p|−|w| w∈D
According to the above expression, it is important to have the dictionary comprehensive enough to include the possible words in password, but at the same time, the dictionary should not be over-sized and include unused word. In this work, we use above 600 millions leaked password to check what words are used and how often they are used. Please note the above discussion does not consider letter alternation, such as changing one or more letters to upper case or substituting certain characters with others. If we consider all these factors, the above equation will be slightly more complicated.
3
Data Analysis
We wrote Python scripts on top a cluster node with Skylake Gold CPU and 768 GB RAM. The data set used in this paper is a collection of 1.4 billion leaked passwords [19,20]. Each account has one email address and one password. We group the password based on the domain name. For example, we grouped the passwords of gmail.com domain. All usernames and email address were ignored in our following data analysis. In this research, we analyze the passwords of six domains: 126.com, 163.com, qq.com, gmail.com, hotmail.com, and yahoo.com. The website language of the first three domains are Chinese, the language of the later three domains are English. Table 1 lists the number of password of each
822
J. Li et al.
domain, including the total number of password |P | and the number of unique passwords |P |. The total number of passwords of all six domains is 647 million. Table 1. Passwords counts of different websites Number of passwords of different websites 126.com 163.com qq.com gmail.com
hotmail.com yahoo.com
|P |
4,885,328 12,610,913 15,059,042 121,188,711 210,241,616
284,805,015
|P |
3,151,879 7,638,858
107,260,168
10,311,659 67,923,649
103,442,953
Table 2 lists the top 25 passwords of each domain. All these domains have “123456” as the most popular password, “111111” and “123456789” as either the second or third most popular password. Password “iloveyou” is a top password of gmail.com, hotmail.com, and yahoo.com, but not a top password of 126.com, 163.com, and qq.com. However, “5201314”, mening “I love you forever” in Chinese, is a top password of 126.com, 163.com, and qq.com. The difference of users’ language and culture background may explain why users choose different passwords, including the top passwords in Table 2. Table 3 lists top passwords’ percentage in all passwords P . For example, the top password of 126.com consists of 3.082% of all 126.com passwords. It is interesting to notice although hotmail.com and yahoo.com have more than 200 million passwords, the top 3 million popular passwords take more than 40% of Photmail.com and Pyahoo.com . In other words, about half accounts use passwords from a small pool. How many passwords are shared across different domains? Table 4 lists the number of shared passwords among different domains. As we can see in Table 2, the domains with different culture backgrounds have quite different passwords. In Table 4, we check the common passwords among domains with similar culture background: 126.com, 163.com, qq.com are grouped together, gmail.com, hotmail.com, and yahoo.com are grouped together. The later group share about 10 million common passwords. Table 4 lists the number of shared passwords across different domains. Because top passwords take a big percentage of all password, this means people have a strong tendency to use them, including using them as substring in passwords. We take out common ones of top 10 passwords among gmail.com, hotmail.com, and yahoo.com. Table 5 lists the number of substring matches in P and P . As we can see from this table, these top passwords are used frequently as substring in the passwords. For example, “123456” appears over 1, 400, 000 times in Pgmail.com and over 130, 000 times in P gmail.com . Table 5 checks how often the top passwords are used as substring in passwords. To grasp how often the dictionary words are used in password, we match wordlist words against the passwords of gmail.com, hotmail.com, and yahoo.com. In this part, we use two different wordlists. The first wordlist used is the word list located at /usr/share/dict/words on Mac machine. The number of words in
Passwords Words
823
Table 2. Top 25 Passwords of different websites Rank Top passwords of different websites 126.com
163.com
qq.com
gmail.com
hotmail.com yahoo.com
1
123456
123456
123456
123456
123456
123456
2
111111
111111
123456789
homelesspa
123456789
123456789
3
123456789
123456789
111111
123456789
password1
abc123
4
000000
000000
123123
password
password
password
5
123123
123123
000000
qwerty
12345678
12345
6
aaaaaa
5201314
5201314
12345678
abc123
password1
7
5201314
123321
a123456
12345
12345
qwerty
8
123321
12345678
123321
abc123
1234567
12345678
9
12345678
0
1314520
111111
qwerty
1234567
10
666666
666666
0
password1
111111
1234567890
11
0
1234567
1234567890 1234567
1234567890
iloveyou
12
888888
7758521
1234567
1234567890
123123
123456a
13
1234567
654321
12345678
Sojdlg123aljg 000000
123
14
654321
888888
7758521
123123
123456a
111111
15
9958123
123
123
3rJs1la7qE
1234
myspace1
16
hm9958123 a123456
123456a
qwerty123
myspace1
a123456
17
7758521
1314520
666666
1234
654321
1234
18
1314520
qwqwqw
520520
Status
iloveyou
123123
19
123
112233
zxcvbnm
1q2w3e4r
a123456
123abc
20
1234567890 11111111
112233
linkedin
666666
ashley
21
112233
1234567890
woaini
1qaz2wsx
123abc
654321
22
a123456
111222tianya 123123123
1q2w3e4r5t
qwerty1
fuckyou1
23
123654
123654
888888
000000
987654321
000000
24
88888888
woaini
123654
YAgjecc826
123321
love123
25
11111111
88888888
qq123456
iloveyou
blink182
iloveyou1
Table 3. Top passwords’ percentage in all passwords Top passwords Percentage of all passwords 126.com 163.com qq.com gmail.com hotmail.com yahoo.com 1
3.082%
2.895%
2.528% 0.4675%
0.635%
0.6023%
10
5.933%
5.386%
4.795% 1.482%
1.369%
1.393%
100
9.198%
8.259%
7.137% 2.624%
2.494%
2.857%
1,000
12.85%
11.65%
10.01% 5.159%
5.535%
6.775%
10,000
20.73%
19.72%
14.97% 10.42%
11.82%
14.82%
100,000
32.93%
30.90%
23.05% 19.21%
21.70%
26.99%
1,000,000
55.95%
47.36%
37.37% 31.94%
35.27%
42.83%
2,000,000
76.42%
55.29%
44.81% 36.44%
39.57%
47.84%
3,000,000
96.89%
63.22%
51.45% 39.43%
42.31%
51.02%
824
J. Li et al. Table 4. Number of common top passwords
Top passwords Number of passwords in common 126.com 163.com qq.com gmail.com hotmail.com yahoo.com 1
1
1
10
7
8
100
73
50
1,000
574
501
10,000
5,164
5,537
100,000
46,398
51,983
1,000,000
126,056
523,574
2,000,000
161,674
899,940
3,000,000
200,283
1,208,710
10,000,000
2,823,469
20,000,000
3,729,833
30,000,000
4,437,694
40,000,000
5,443,138
50,000,000
6,315,522
60,000,000
7,169,000
All passwords
278,987
10,237,574
Table 5. Top passwords are used as substrings in other passwords Top password Number of substring-match in passwords gmail.com gmail.com All unique
hotmail.com hotmail.com yahoo.com yahoo.com All unique all unique
123456
1,460,763
134,514
3,419,364
191,071
4,548,171
217,982
123456789
335,590
36,349
953,345
56,817
1,094,707
52,125
password
304,334
21,076
521,158
15,043
777,286
20,121
qwerty
315,221
21,796
315,942
16,003
605,087
57,896
12345678
450,582
46,151
1,156,983
68,334
1,344,508
65,235
abc123
84,509
3,633
161,785
4,661
370,752
5,711
password1
79,538
3,045
204,582
2,315
291,316
3,053
this dictionary is 235, 886. The second wordlist used is rockyou.com leaked password [10], the number of unique passwords in this dictionary is 13, 830, 640. In this work, words of length two or less are removed from the dictionaries, our search program is case-insensitive. Given password p ∈ P , we use Aho-Corasick algorithm [21] to find all words that match substrings in p. It is possible there are multiple words matching substrings in p and these substrings overlap with each other. For example, password “password” matches pass, ass, word, password.
Passwords Words
825
Table 6. Top “words” of different websites Rank Words with highest matches in passwords gmail.com gmail.com hotmail.com hotmail.com yahoo.com yahoo.com Mac Dict rockyou Mac Dict rockyou Mac Dict rockyou 1
bob
bob
bob
bob
love
bob
2
love
123
love
123456
bob
123456
3
les
123456
man
123
man
123
4
spa
homeless
you
123456789
you
123456789
5
home
201
ito
pole
baby
pole
6
man
lin
ita
666
boy
lunch
7
the
123456789 the
lunch
girl
frame
8
wert
password
password
201
ers
glass
9
password
007
mar
frame
big
table
10
you
the
hot
the
the
201
11
mar
2010
son
glass
sexy
july
12
son
pole
boy
table
hot
abc123
13
sha
666
ers
july
password
march
14
ers
qwerty
eli
shark
son
phone
15
lin
kedin
chi
12345
dog
shark
16
boy
111
baby
100
red
12345
17
dog
1234
girl
phone
july
the
18
mon
777
ana
march
sha
@yahoo.com
19
red
100
lin
password
march
lil
20
san
lunch
mon
password1
wert
password
21
din
12345
dog
1234
star
101
22
pass
frame
wert
007
phone
love
23
sam
man
ton
2010
mar
777
24
shi
011
che
12345678
angel
big
25
star
000
july
mar
money
666
These matched substrings overlap with each other. In this research, we only keep the longest match. To find out which substring(s) provides the longest match is a weighted interval scheduling problem, which can be solved by dynamic programming. In this particular example, our searching algorithm returns password as the matching result. Table 6 lists the top words used in gmail.com, hotmail.com, and yahoo.com. Table 6 lists the results of two wordlists: the wordlist from Mac machine and rockyou.com leak. It is surprising that “bob” is the most often used word in all these websites. It is also interesting to note “123456” is the second or third most used word if we use rockyou.com leak as wordlist. But “123456” is not
826
J. Li et al.
in the Mac dictionary at all! Similarly, there are other words, such as “201” and “qwerty”, are not in traditional dictionaries at all. The dictionary from Mac machine has 235,886 words and the number of unique passwords we checked with this dictionary is about 280 million. However, only a small portion of the dictionary words were used by the passwords. More specifically, gmail.com uses 86,104 words from the Mac dictionary, hotmail.com uses 85,450 words, yahoo.com uses 89,118 words. In other words, the passwords of each domain uses less than 40% of Mac dictionary words! Although passwords do not use a large set of dictionary words, they do use a large set of strings to build passwords. If we use rockyou.com leak as dictionary, we find millions of passwords from rockyou.com leak are used as substrings in leaked passwords. More specifically, the passwords of gmail.com use 4,867,107 passwords of rockyou.com leak as substrings; the passwords of hotmail.com use 6,019,604 passwords of rockyou.com leak as substrings; yahoo.com use 6,384,362 passwords of rockyou.com leak as substrings. Based on the observation of Mac dictionary and rockyou.com leak, it is not surprising to see a larger dictionary can improve the hit ratio of substring match. The magnitude of the improvement is quite surprising. For example, among the 284 million passwords of yahoo.com, 24.6% of them do not have any word of Mac dictionary; however, only 4.5% of them do not have any word of rockyou.com leak!
4
Conclusion
A good understanding of what words or strings are used by mass passwords is critical to the success of password cracking. In this work, we use Mac system dictionary and rockyou.com passwords as wordlists to match the substrings of over 600 million leaked passwords. We find less than 90,000 Mac dictionary words are actually used by leaked passwords. These 90,000 words represent about 40% of Mac system dictionary. In other words, about 60% Mac dictionary words are not used by any leaked password. Interestingly, more than six million rockyou.com passwords were used as substrings in leaked passwords and about seven millions rockyou.com passwords are not used. About 24.6% passwords of yahoo.com do not have any substring from Mac dictionary, but only 4.5% passwords of yahoo.com do not have any substring from rockyou.com leak.
References 1. Google Says Not To Worry About 5 Million ‘Gmail Passwords’ Leaked. https:// www.forbes.com/sites/kashmirhill/2014/09/11/google-says-not-to-worry-about5-million-gmail-passwords-leaked/#307f08f07a8d 2. Selena, L.: Every single Yahoo account was hacked - 3 billion in all (2017). http://money.cnn.com/2017/10/03/technology/business/yahoo-breach-3billion-accounts/index.html
Passwords Words
827
3. Jeremi, M.G.: How LinkedIn’s password sloppiness hurts us all (2016). https:// arstechnica.com/information-technology/2016/06/how-linkedins-passwordsloppiness-hurts-us-all/ 4. Dan, G.: 6.6 million plaintext passwords exposed as site gets hacked to the bone (2016). https://arstechnica.com/information-technology/2016/09/plaintextpasswords-and-wealth-of-other-data-for-6-6-million-people-go-public/ 5. Hashcat. https://hashcat.net/wiki/ 6. John the Ripper password cracker. https://www.openwall.com/john/ 7. Cain and Abel (software). https://en.wikipedia.org/wiki/Cain and Abel 8. Brute Force Attack. https://www.owasp.org/index.php/Brute force attack 9. Tatli, E.I.: Cracking more password hashes with patterns. IEEE Trans. Inf. Forensics Secur. 10(8), 1656–1665 (2015) 10. https://wiki.skullsecurity.org/Passwords#Password dictionaries 11. Weir, M., Aggarwal, S., De Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: 30th IEEE Symposium on Security and Privacy, pp. 391–405 (2009). https://doi.org/10.1109/SP.2009.8 12. Veras, R., Collins, C., Thorpe, J.: On the semantic patterns of passwords and their security impact. In: Proceedings of the Network Distribution System Security Symposium (2014) 13. Xu, R., Chen, X., Wang, X., Shi, J.: An in-depth study of digits in passwords for Chinese websites. In: IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, pp. 588-595 (2018). https://doi.org/10.1109/ DSC.2018.00094 14. Melicher, W., Ur, B., Segreti, S.M., Komanduri, S., Bauer, L., Christin, N., Cranor, L.F.: Fast, lean, and accurate: modeling password guessability using neural networks. In: USENIX Security Symposium (2016) 15. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: IEEE Symposium on Security and Privacy, pp. 538–552 (2012) 16. Malone, D., Maher, K.: Investigating the distribution of password choices. In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 301-310 (2012) 17. Han, G., Yu, Y., Li, X., Chen, K., Li, H.: Characterizing the semantics of passwords: the role of pinyin for Chinese Netizens. Comput. Stan. Interfaces 54(Part 1), 20– 28 (2017) 18. Morris, R., Thompson, K.: Password Security: A Case History 22(11), 594–597 (1979) 19. Kumar, M.: Collection of 1.4 Billion Plain-Text Leaked Passwords Found Circulating Online. https://thehackernews.com/2017/12/data-breach-password-list.html 20. CrackStation’s Password Cracking Dictionary. https://crackstation.net/ crackstation-wordlist-password-cracking-dictionary.htm 21. Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity for Telehealth Scenario Thiago Poleto1(&), Rodrigo Cleiton Paiva de Oliveira1, Ayara Letícia Bentes da Silva1, and Victor Diogho Heuer de Carvalho2 1 Universidade Federal Do Pará, Belém, Brazil [email protected], [email protected], [email protected] 2 Universidade Federal Alagoas, Delmiro Gouveia, Brazil [email protected]
Abstract. Health organizations are investing in the development of telehealth systems to expand advances in health care to the homes of the Brazilian population. The adoption of telehealth aims to broaden basic monitoring and promote access to health services. Telehealth systems present a confidential data set containing patient health history, medication prescriptions, and medical diagnostics. However, in Brazil, there are no cybersecurity studies to address factors that impact patient manipulation and data transfer. Understanding cybersecurity impacts is critical for telehealth development strategies. The research reported here used several factors related to cyberattacks and cybersecurity vulnerabilities, combined with the approach to Fuzzy Cognitive Maps (FCMs), to identify the links between these elements. An evaluation using FCMs has proven to be able to describe the complexity of the system by providing an appropriate visual tool for staff to develop planning. The experimental results of the study contributed to supporting cybersecurity improvements in telehealth. Keywords: Telehealth Cybersecurity Cyberattacks Fuzzy Cognitive Maps
1 Introduction The telehealth system in Brazil was developed in 2007 by the Ministry of Health as a primary family health strategy, along with the use of information and communication technologies (ICT) for health-related distance activities. Telehealth in Brazil has the potential to increase access to health services in areas far from hospitals. According to IBGE data (Brazilian Institute of Geography and Statistics), only 24% of the population lives in the major cities [1]. The Ministry of Health established the Guidelines for Telehealth in Brazil, under the National Health System, through decree nº 9795, from May 17, 2019, to improve
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 828–837, 2020. https://doi.org/10.1007/978-3-030-45691-7_78
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity
829
user satisfaction and the quality of services provided to citizens [2]. Telehealth systems present a confidential data set containing patient health history, drug prescriptions, and medical diagnoses. However, this is sensitive data with a high level of relevance and may potentially be the target of cyberattacks. In 2015, there was a 22% increase in the number of cyberattacks, with 112 million compromised records related to medical information [3]. Related to various factors encourage the occurrence of telehealth cyberattacks. Researchers emphasize that cybersecurity should not be analyzed as practices of compliance after a particular failure that results in additional costs occurs [3]. On the contrary, it must be designed in a planned and contingent manner that considers all systems since the beginning of the telehealth process [3–6]. The IT infrastructure of these services contributes to increased cyberattacks on healthcare organizations [7]. Consequently, IT infrastructure is a relevant factor when analyzing cybersecurity risks [3, 8]. Although cybersecurity risks are critical to telehealth, little research has been conducted on the level of physical safety control to prevent misusing or compromising patients’ clinical health. The cybersecurity aspects of telehealth can be explored to determine causal relationships that support problem-solving to ensure safe environments for associated practices. This paper proposes an approach based on fuzzy cognitive map (FCM) to model the mental representation of experts regarding causal relationships between cyberattacks’ factors that impact on telehealth systems to support strategic planning and decision-making. This method can represent all relationships, including the causal relationships of the factors that trigger cyber attacks in the telehealth field. This article is structured as follows: Sect. 2 is dedicated to the background presentation on the topics studied. Section 3 is devoted to applying FCM to Telehealth cyberattacks. Section 4 contains the conclusion and future research.
2 Literature Review 2.1
Information Security in Telehealth
Telehealth services require preventive actions and security tools against privacy because of sensitive data as digital signatures, credentials, financial data, patient diagnostic images [9]. The relevance of the data reinforces the importance of ensuring network health, as in the case of loss of confidentiality, it can cause moral damage. Furthermore, failure to comply with legal regulations may result in financial, criminal penalties. [10, 11]. Table 1 presents factors identified through specialized literature that influence cyberattacks on telehealth services.
830
T. Poleto et al. Table 1. Description of telehealth factors in cyberattacks.
ID FT1 FT2 FT3
FT4
Cyberattack factors Selecting new IT software Information security investment decision Software Implementation
FT6
Infrastructure technology IT experience and skills Malware
FT7
Logical attacks
FT8 FT9
IT control loss IT governance loss
FT5
Description Acquiring new information systems may create incompatibility with system architecture Telehealth management involves decisions regarding information security investments Operational failures occur in telehealth due to users not being prepared to adopt information security protocols Obsolete IT technology cannot meet new telehealth requirements Lack of IT skills applied in telehealth
Authors [12]
Malware Intrusion into Operational Telehealth Servers Unauthorized attack trying to break into telehealth Deploying decentralized infrastructure Outsourcing of IT systems and the use of external cloud services
[17]
[13] [14]
[15] [16]
[18] [19] [20]
The telehealth vulnerabilities may result in data loss or leakage, such as passwords, impacting the diagnosis results of the patients [21–23]. These are elements identified as openings in the Telehealth system that can cause cyberattacks. Table 2 presents the cyberattacks that affect telehealth.
Table 2. Types of cyberattacks that can affect telehealth. ID VN1 VN2 VN3 VN4 VN5 VN6 VN7
Vulnerabilities IP configuration error Remote code execution Corrupted memory Software code integrity Battery drain attack Hardware Failure Scanning MITM, eavesdropping
Description of consequences PHF script exploration
Authors [15, 24]
Malicious SQL Command Execution
[22]
Data loss and compromised system performance
[25]
System discovery, deactivation, and replacement attacks, Code integrity software engines Decreased battery life and service accessory reliability Backdoor on systems controlled by external agents
[26]
Change of system data destination between clients with systems; illegal client tracking
[23]
[27] [28]
(continued)
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity
831
Table 2. (continued) ID VN8
VN9 VN10 VN11
Vulnerabilities Firmware Configuration Change DoS (Denial of Service) Attack Cross-site Scripting Brute force
Description of consequences Improper implementation of encryption and hash functions threaten underlying system security
Authors [17]
Server crash and arbitrary code execution
[15]
Interface simulation and other elements to steal customer data from the system A leak of access passwords for system users and mediators
[25] [4]
As a result, growing studies on cybersecurity, however, as pointed out a gap in research into the more specific causes that trigger a cyberattack in telehealth. 2.2
Fuzzy Cognitive Maps
FCMs, consisting of weighted nodes and arcs, graphically construct a causal relationship between the nodes that influence their degree of involvement [29]. This method can represent features of the modeled complex system that represent events, objectives, inputs and outputs, states, variables, and trends of the modeled system, thus aiding human reasoning by using ambiguous terms supporting decision-making [30]. Thus, existing knowledge about system behavior is stored in the node structure, and map interconnections for the calculation of concept values at each step are proposed in Eq. 1 [31]. Ati ¼ f
Xn
At1 wji þ At1 i j¼1 j6¼i j
ð1Þ
This method allows work using fuzzy logic with a degree of relevance ranging from 0 to 1, which leads to an infinite set of results [31]. Experts evaluate the behaviors of complex systems based on the factors of cybersecurity. They then express the causal relationships between the arcs, but with different weights in the interpellation defined by Eq. 2. W ¼f
XN 1
wk
ð2Þ
The main objective of the method is to analyze a cyberattack problem by predicting its impact from the interactions of the vulnerability factors [32]. Table 3 presents the main FCM applications found in the literature.
832
T. Poleto et al. Table 3. FCM application
Application area Family income problems
Hospital supply chain sustainability The decision to localize resilient supply chain outsourcing Identify knee injuries Hybrid power system analysis
Description The study was conducted by applying a questionnaire to families, according to their respective incomes The study aims to identify the concepts that influence the sustainable supply chain The analysis revealed causal relationships between criteria for assessing supply chain locations A medical decision support system to explore a modern system for diagnosing disease Modeling revealed multiple sources of renewable energy
Authors [33]
[34] [35]
[36] [37]
This study adopts the FCM to visualize in a simplified way, the factors that may contribute to causal relationships in telehealth and assist IT managers in scenario planning.
3 Application, Results, and Discussion The methodology applied in this work is divided into the five parts described in Fig. 1.
Fig. 1. Overview of the methodology.
In step 1, the project scope was defined. Telehealth services need preventive actions and security tools against privacy because they deal with sensitive data (e.g., digital signatures, credentials, personal images). In step 2, the main factors that impact cybersecurity were identified based on the literature presented above, as well as the main forces and the causal relationship between them. These forces were evaluated by experts who are also stakeholders involved in cybersecurity projects, interested in learning about the proposal and how to implement FCM to support their technical and managerial activities. In step 3, the matrices with the links between the factors that influence cybersecurity for telehealth were defined. In step 4, the scenarios developed were validated by the experts. In step 5, to finalize the implementation, the information obtained was presented and delivered to the participants so that they can develop strategic planning dedicated
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity
833
to telehealth cybersecurity. Attention to theoretical contributions, more than just helping with decision-making, will also promote organizational learning. 3.1
Results and Discussion
FCM-based scenarios were developed to identify difficulties that could compromise the telehealth. FCM was built on the knowledge of stakeholders from different backgrounds. The interview portion of this study was conducted individually with cybersecurity experts. Figure 2 shows the FCM obtained from the workshop.
Fig. 2. FCM representation of the possible cyberattacks in telehealth environments.
The FCM in Fig. 2 illustrates operational risks in the context of telehealth cybersecurity. The FCM demonstrates the causal relationships between crucial cybersecurity factors and their consequences and the effect represented as a relationship (+) in terms of vulnerabilities (see Tables 1 and 2 for a description of these elements). Thus, information security investment decisions (FT2) may be appropriate for the network because these, when done without due care, can positively influence the possibility of occurrences of all attacks analyzed. The main vulnerabilities in telehealth systems, according to the FCM, are IP configuration (VN1) [24, 25], and cross-site scripting (VN10) [19]. Although influences on other possible attacks are weak, the appearance of other factors considerably increases these possibilities. On the other hand, regarding users’ selection of new software (FT1) and software implementation (FT3), the main risks presented by the map are attacks on software integrity (VN6), software modification (VN14) [25], and possible change in firmware settings (VN10) [17]. Failures involving the system’s technological infrastructure (FT4) cause memory corruption (VN3), potential data loss (VN4) [25], hardware failures (VN8) [28], and eavesdropping (VN9) [23]. Thus, a moderate influence individually but may also
834
T. Poleto et al.
increase with the combination of factors. Hence, the loss of control from applications (FT8) of IT governance (FT9) tends to increase external elements, beyond the direct supervision of telehealth service providers. Battery drainage, cross-site scripting, and remote code execution are significant risks to this type of vulnerability. Although factors linked to various vulnerabilities due to lack of experience (FT5) using telehealth applications can directly impact changing the data destination (VN9) of the patients in a system [23], providing adequate training or looking for people with experience in selection processes is recommended. DOS attacks (VN11) [15] are linked to factors presented in the scenario, mainly malware (FT6) and logical attacks (FT7). Table 4 presents the links between cybersecurity-related factors and vulnerabilities with their respective degrees of relevance. Table 4. Relevance between cyberattack and vulnerability factors. ID FT1 FT2 FT3 FT4 FT5 FT6 FT7 FT8 FT9
VN1 0 0 0.19 0.22 0 0 0 0 0.18
VN2 0 0.12 0 0 0 0.34 0 0 0.29
VN3 0 0 0 0.3 0 0 0 0 0
VN4 0.33 0.18 0 0 0 0 0 0 0
VN5 0 0.16 0 0 0 0 0 0.34 0
VN6 0 0.05 0 0.47 0 0 0 0.26 0
VN7 0 0 0 0.24 0.39 0 0.38 0 0
VN8 0 0 0.36 0.4 0 0 0 0 0
VN9 0 0.12 0.47 0 0 0.43 0.45 0 0.26
VN10 0 0.33 0 0 0 0.16 0 0.36 0.31
VN11 0 0 0 0.32 0 0.33 0 0 0
The connection matrix that was specified according to Information security experts specified using a connection matrix that organized the sum totals and absolute weight values of the concepts of telehealth cybersecurity in rows and columns, respectively. This scenario reinforces the importance of these factors by demonstrating how such elements may pose high risks, including incorrect IP configuration (VN1), software integrity (VN4), and firmware configuration changes (VN8). These results confirm how inadequate infrastructure changes and configurations can increase telehealth risks [10]. It is recommended to adopt software that is appropriate to the type of technology and whose efficiency has been proven by experts in the field. The IT experts who participated in the evaluation were asked about their impressions of the process, and the form results were presented to obtain an assessment of validity and usefulness for the target area. They established that the tool has great value in supporting the management of information security assets in the area, mainly because telehealth circulates sensitive information of pure interest to the participating health professionals and may, in some way, expose patients to moral damages if there is any unauthorized access and leakage.
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity
835
4 Conclusion Telehealth can benefit from the scenario-planning approach because it plays an essential role in future development related to the decision of planning policies against cyberattacks. In the future, telehealth is expected to be able to provide health services to thousands of people who are limited by geographical constraints. Telehealth, specifically its technological, economic, and environmental characteristics, makes a substantial contribution to society. This paper presented an application of FCM that analyzes information security factors related to telehealth. The FCM model allowed the causal inference of direct chaining as well as numerical data-based updates and cybersecurity experts’ opinions. Preliminary results are encouraging concerning the possibilities offered by the FCM approach to decision makers/IT managers, enabling a good insight into the impact of cyber attacks on telehealth and ensuring a more focused view of the necessary protective actions. These results show the possibility of obtaining scenario planning in cybersecurity, highlighting the most critical factors in telehealth. The analytical process should be carried out annually or semiannually to analyze the impact of improvements in information security, with possible improvements addressing identified critical points. Future work should aggregate other methods to assist IT managers in deciding upon actions that minimize cybersecurity risks. Acknowledgments. This research was partially supported by a foundation affiliated with the Ministry of Education in Brazil, and the Brazilian National Research Council (CNPq). The authors would like to acknowledge PROPESP/UFPA.
References 1. Bernardes, A.C.F., Coimbra, L.C., Serra, H.O.: Use of Maranhão Telehealth Program as a tool to support continuing health education. Rev. Panam. Salud Pública 42, 1–9 (2018) 2. Brazil: Telehealth Core Costs - Instruction Manual, National Telehealth Brazil Network Program. Ministry of Health (2015) 3. Kruse, C.S., Krowski, N., Rodriguez, B., Tran, L., Vela, J., Brooks, M.: Telehealth and patient satisfaction: a systematic review and narrative analysis. BMJ Open 7(8), 1–12 (2017) 4. Ahmed, Y., Naqvi, S., Josephs, M.: Cybersecurity metrics for enhanced protection of healthcare IT systems. In: 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), Oslo, Norway, pp 1–9 (2019) 5. KPMG: Health Care and Cyber Security: Increasing Threats Require Increased Capabilities, kpmg.com (2018) 6. Zhou, L., Thieret, R., Watzlaf, V., Dealmeida, D., Parmanto, B.: A telehealth privacy and security self-assessment questionnaire for telehealth providers: development and validation. Int. J. Telerehabilitation 11(1), 3–14 (2019) 7. Arbelaez, A.: Securing Telehealth Remote Patient Monitoring Ecosystem Cybersecurity for the Healthcare Sector. National Institute of Standards and Technology (NIST), November 2018
836
T. Poleto et al.
8. de Gusmão, A.P.H., Silva, M.M., Poleto, T., e Silva, L.C., Costa, A.P.C.S.: Cybersecurity risk analysis model using fault tree analysis and fuzzy decision theory. Int. J. Inf. Manag. 43, 248–260 (2018) 9. Montero-Canela, R., Zambrano-Serrano, E., Tamariz-Flores, E.I., Muñoz-Pacheco, J.M., Torrealba-Meléndez, R.: Fractional chaos based-cryptosystem for generating encryption keys in Ad Hoc networks. Ad Hoc Netw. 97, 102005 (2019) 10. Andriole, K.P.: Security of electronic medical information and patient privacy: what you need to know. J. Am. Coll. Radiol. 11(12), 1212–1216 (2014) 11. Nagasubramanian, G., Sakthivel, R.K., Patan, R., et al.: Securing e-health records using keyless signature infrastructure blockchain technology in the cloud. Neural Comput. Appl. 32, 639–647 (2020). https://doi.org/10.1007/s00521-018-3915-1 12. Davidson, E., Simpson, C.R., Demiris, G., Sheikh, A., McKinstry, B.: Integrating telehealth care-generated data with the family practice electronic medical record: qualitative exploration of the views of primary care staff. Interact. J. Med. Res. 26, e29 (2013) 13. Li, Y., Fuller, B., Stafford, T., Ellis, S.: Information securing in organizations: a dialectic perspective. In: SIGMIS-CPR 2019 – Proceedings of the 2019 Computer and People Research Conference, pp 125–130 (2019) 14. Jennett, P.A., Andruchuk, K.: Telehealth: ‘real life’ implementation issues. Comput. Methods Programs Biomed. 64(3), 169–174 (2001) 15. Tweneboah-Koduah, S., Skouby, K.E., Tadayoni, R.: Cyber security threats to IoT applications and service domains. Wirel. Pers. Commun. 95(1), 169–185 (2017) 16. Nesbitt, T.S., Cole, S.L., Pellegrino, L., Keast, P.: Rural outreach in home telehealth: assessing challenges and reviewing successes. Telemed. J. E Health 12, 2 (2006) 17. Makhdoom, I., Abolhasan, M., Lipman, J., Liu, R.P., Ni, W.: Anatomy of threats to the internet of things. IEEE Commun. Surv. Tutor. 21(2), 1636–1675 (2019). Secondquarter 18. Dondossola, G., Garrone, F., Szanto, J.: Cyber risk assessment of power control systems - a metrics weighed by attack experiments. In: IEEE Power and Energy Society General Meeting, pp 1–9 (2011) 19. Zheng,Y., Zhang, X.: Path sensitive static analysis of web applications for remote code execution vulnerability detection, In: Proceedings of the International Conference on Software Engineering, pp 652–661 (2013) 20. Rebollo, O., Mellado, D., Fernández-Medina, E.: A systematic review of information security governance frameworks in the cloud computing environment. J. Univers. Comput. Sci. 18(6), 798–815 (2012) 21. Raoof, A. Matrawy, A.: The effect of buffer management strategies on 6LoWPAN’s response to buffer reservation attacks. In: IEEE International Conference on Communication, pp 1–7 (2017) 22. Liu, M., Li, K., Chen, T.: Security testing of web applications: a search-based approach for detecting SQL injection vulnerabilities. In: Companion - Proceedings of the 2019 Genetic and Evolutionary Computation Conference Companion (GECCO 2019), pp 417–418 (2019) 23. Krishnan, S., Anjana, M.S., Rao, S.N.: Security considerations for IoT in smart buildings. In: IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2017), Coimbatore, pp. 1–4 (2017) 24. Iglesias, F., Zseby, T.: Analysis of network traffic features for anomaly detection. Mach. Learn. 101(1–3), 59–84 (2015) 25. Qin, F., Lu, S., Zhou, Y.: SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs. In: Proceedings of the International Symposium on High-Performance Computer Architecture, pp 291–302 (2005)
Using Fuzzy Cognitive Map Approach for Assessing Cybersecurity
837
26. Falcarin, P., Scandariato, R., Baldi, M.: Remote trust with aspect-oriented programming. In: Proceedings of the International Conference on Advanced Information Networking and Applications (AINA 2006), vol. 1, pp 451–456 (2006) 27. Arjunkar, M.R., Sambare, A.S., Jain, S.R.: A survey on: detection & prevention of energy draining attacks (vampire attacks). Int. J. Comput. Sci. Appl. 8(1), 13–16 (2015) 28. Fournaris, A.P., Fraile, L.P., Koufopavlou, O.: Exploiting hardware vulnerabilities to attack embedded system devices: a survey of potent microarchitectural attacks. Electronics 6, 3 (2017) 29. Kosko, B.: Fuzzy cognitive maps. Int. J. Man Mach. Stud. 24(1), 65–75 (1986) 30. Stylios, C.D., Groumpos, P.P.: Fuzzy cognitive maps multi-model for complex. Manufact. Syst. 34, 8 (2001) 31. Papageorgiou, E.I., Hatwágner, M.F., Buruzs, A., Kóczy, L.T.: A concept reduction approach for fuzzy cognitive map models in decision making and management. Neurocomputing 232, 16–33 (2017) 32. Papageorgiou, E.I., Subramanian, J., Karmegam, A., Papandrianos, N.: A risk management model for familial breast cancer: a new application using Fuzzy Cognitive Map method. Comput. Methods Programs Biomed. 122(2), 123–135 (2015) 33. Chandrasekaran, A.D., Ramkumar, C., Siva E.P., Balaji, N.: Average maps model of super fuzzy cognitive to analyze middle class family problem. In: AIP Conference Proceedings, vol. 2112(1) (2019) 34. Mirghafoori, S.H., Sharifabadi, A.M., Takalo, S.K.: Development of causal model of sustainable hospital supply chain management using the intuitionistic fuzzy cognitive map (IFCM) method. J. Ind. Eng. Manag. 11(3), 588–605 (2018) 35. López, C., Ishizaka, A.: A hybrid FCM-AHP approach to predict impacts of offshore outsourcing location decisions on supply chain resilience. J. Bus. Res. 103(2017), 495–507 (2017) 36. Anninou, A.P., Groumpos, P.P., Polychronopoulos, P.: Modeling health diseases using competitive fuzzy cognitive maps. In: IFIP Advances in Information and Communication Technology (2013) 37. Karagiannis, I.E. Groumpos, P.P.: Modeling and analysis of a hybrid-energy system using fuzzy cognitive maps. In: 2013 21st Mediterranean Conference on Control and Automation (MED 2013) - Conference Proceedings (2013)
Author Index
A Abu Hashim-de Vries, Anis Hasliza, 702 Ahumada, Danay, 221 Aláiz-Moretón, Héctor, 329 Alatalo, Janne, 464 Albarracín, Mauro, 755 Alcarria, Ramón, 125 Aldhayan, Manal, 723 Ali, Raian, 518, 723 Almeida, Ana Filipa, 188 Almeida, José João, 170 Almourad, Mohamed Basel, 723 Alvarez-Montenegro, Dalia, 77 Andrade, Viviana, 147 Arango-López, Jeferson, 221 Arcolezi, Héber H., 424 Au-Yong-Oliveira, Manuel, 147, 209, 584, 690, 713 Ayala, Manuel, 238 B Badicu, Andreea, 457 Barberán, Jeneffer, 238 Barros, Celestino, 318 Barros-Gavilanes, Gabriel, 56 Bartolomeu, Paulo, 197 Batista, Josias G., 493, 511 Baziyad, Mohammed, 251 Beaudon, Giles, 482 Benavides-Cuéllar, Carmen, 329 Benítez-Andrades, José Alberto, 329 Bernardes, Marciele Berger, 807 Boas, Raul Vilas, 181 Bordel, Borja, 125 Brito, Geovanni D., 77
Buele, Jorge Luis, 238 Byanjankar, Ajay, 471 C Caiza, Gustavo, 755 Caldeira, Cristina, 445 Caravau, Hilma, 188 Cardoso, Henrique Lopes, 445 Carneiro, Davide, 501 Carvalho, Célio, 102, 113 Carvalho, Victor, 679 Castañón-Puga, Manuel, 262 Cerna, Selene, 424 Cham, Sainabou, 723 Coelho, Filipe, 679 Cordeiro, Bárbara, 584 Correia, Cristina, 690 Costa, João, 147 Costa, Pedro, 102, 113 Costa, Vítor Santos, 401, 414 Couturier, Raphaël, 424 Czako, Zoltan, 534 D da Silva, Amélia Ferreira, 14 da Silva, Ayara Letícia Bentes, 828 da Silva, Francisco H. V., 493 Dahal, Keshav, 25 de Andrade, Francisco Pacheco, 807 de Carvalho, Victor Diogho Heuer, 828 de Fátima Vieira, Maria, 643 de Freitas, Deivid M., 493 de Jong, Sander, 796 de Lima, Alanio Ferreira, 493, 511 de Oliveira, Manoel E. N., 511
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 Á. Rocha et al. (Eds.): WorldCIST 2020, AISC 1160, pp. 839–842, 2020. https://doi.org/10.1007/978-3-030-45691-7
840 de Oliveira, Rodrigo Cleiton Paiva, 828 Delgado, Isidro Navarro, 231 Dias, Gonçalo Paiva, 339 Díaz, Jaime, 221 Dimitrijević, Dejan, 544 Domínguez, Rodrigo, 238 Drias, Habiba, 564 Durães, Dalila, 170 E Escobar-Naranjo, Juan, 77
Author Index I Ionica, Andreea, 293 Ismail, Marina, 702 J Jeong, Jongpil, 359 Jiang, Nan, 518 Juárez-Ramírez, Reyes, 734 Júnior, José N. N., 493
F Faria, Brígida Mónica, 445, 679 Fernandes, Duarte, 574 Ferreira, Adelino, 66 Ferreira, Joaquim, 197 Ferreira, Rui, 209
K Kacem, Imed, 435 Kamel, Ibrahim, 251 Kaptelinin, Victor, 664 Karmali, Karim, 102, 113 Karmali, Salim, 102, 113 Kato, Toshihiko, 306 Kokkonen, Tero, 464
G Garcia, Carlos A., 77 Garcia, Jacinta, 690 Garcia, Marcelo V., 77 García-Ordás, María Teresa, 329 García-Rodríguez, Isaías, 329 García-Valdez, Mario, 262 Gayet, Anne, 482 Gebauer, Joaquin, 221 Ginters, Egils, 621 Gomes, Bruno, 102, 113 Gomes, Hélder, 339 Gonçalves, Maria José Angélico, 14 Grabentein, Joshua, 818 Grafeeva, Natalia, 555 Greco, David, 818 Guaiña-Yungan, Jonny, 275 Guyeux, Christophe, 424
L Laato, Antti, 349 Laato, Samuli, 349, 631 Lages, Romeu, 713 Leba, Monica, 293, 744, 765, 775 Lesiński, Wojciech, 607 Li, Jikai, 818 Li, Wenbin, 35 Liang, Daan, 818 Licea, Guillermo, 262 Liewig, Matthieu, 35 Lima, Ana Carolina Oliveira, 197, 643 Lima, Jean C. C., 511 Lima, Lázaro, 181 Lopes, Fabio Silva, 88 Lozada-Yánez, Pablo, 275 Lozada-Yánez, Raúl, 275 Lucarelli, Giorgio, 435
H Habib, Sami J., 368 Hangan, Anca, 534 Hansson, Mikael, 664 Haruyama, Shiho, 306 Hassanein, Khaled, 672 Head, Milena, 672 Heilimo, Eppu, 464 Henriques, Pedro Rangel, 181 Heredia, Andrés, 56 Hireche, Celia, 564 Holland, Thomas, 818 Hurtado, Carlos, 262, 734 Hussain, Ijaz, 457
M Ma, Jianbing, 518 Machado, André Caravela, 584 Machado, Gabriel F., 493 Maciel, Perla, 137 Magalhães, Gustavo, 445 Mäkelä, Antti, 464 Mäkelä, Jari-Matti, 45 Mäkeläinen, Ari, 45 Marcondes, Francisco S., 170 Marimuthu, Paulvanna N., 368 Marques, Fábio, 339 Martins, Ana Isabel, 188, 197, 643 Martins, Fernando, 381
Author Index Martins, Márcia, 584 Matalonga, Santiago, 25 McAlaney, John, 723 Mejía, Jezreel, 137 Melo, Nilsa, 102, 113 Melo, Pedro, 574 Mezei, József, 471 Mikhailova, Elena, 555 Miloslavskaya, Natalia, 789 Mohamed, Azlinah, 702 Molina-Granja, Fernando, 275 Monteiro, João, 574 Moreira, Fernando, 221 Moreira, Rui S., 102, 113 Moreta, Darwin, 238 Mouzinho, Lucilene Ferreira, 197, 643 Munawar, Mariam, 672 Muñoz, Mirna, 137 Mutka, Petri, 464 N N’Da, Aboua Ange Kevin, 25 Naiseh, Mohammad, 518 Nazé, Théo, 435 Necula, Lucian, 457 Nedevschi, Sergiu, 544 Nedić, Nemanja, 544 Névoa, Rafael, 574 Novais, Paulo, 170, 329, 501, 574, 807 Nunes, Diogo, 501 O Obregón, Ginna, 238 Obregón, Javier, 238 Ohzahata, Satoshi, 306 Olar, Marius Leonard, 765 Oliveira, Alexandra, 679 Oliveira, Ana, 445 Orlikowski, Mariusz, 3 P Pacheco, Osvaldo Rocha, 209 Panaite, Arun Fabian, 775 Papamichail, Dimitris, 818 Paredes, Hugo, 318 Park, Byung Jun, 359 Pereira, Rui Humberto, 14 Pinto, Pedro, 713 Pinto, Renato Sousa, 209 Poleto, Thiago, 828 Polewko-Klim, Aneta, 596, 607
841 Polvinen, Tuisku, 45 Pracidelli, Lilian Pires, 88 Puuska, Samir, 464 Q Quezada, Angeles, 262, 734 Quiñonez, Yadira, 137 R Rabie, Tamer, 251 Rafea, Ahmed, 391 Ramirez, Margarita, 734 Rauti, Sampsa, 45, 631 Reinoso, Cristina, 755 Reis, José Luís, 158 Reis, Luís Paulo, 401, 414, 445, 679 Remache, Walter, 56 Ribeiro, Tiago, 158 Risteiu, Marius-Nicolae, 744 Riurean, Simona, 293 Robles, Tomás, 125 Rocha, Álvaro, 293 Rocha, Filipe Marinho, 401, 414 Rocha, Nelson Pacheco, 188, 197, 643 Rocha, Rui Miranda, 209 Rocio, Vítor, 318 Rodrigues, Rafael, 102, 113 Rosa, Ana Filipa, 188 Rosales, Ricardo, 734 Rosca, Sebastian, 765 Rosca, Sebastian Daniel, 775 Royer, Guillaume, 424 Rudnicki, Witold R., 596, 607 S Salazar, Edison P., 755 Sandoval, Georgina, 231 Sandoval, Miguel Ángel Pérez, 231 Sebestyen, Gheorghe, 534 Sepúlveda, Samuel, 221 Shalaby, May, 391 Silva, António, 574 Silva, Carlos, 381 Silva, Catarina, 339 Silva, F. O., 14 Silva, Manuel, 14 Simões, Cláudia, 574 Soares, Christophe, 102, 113 Sobral, Pedro, 102, 113 Soulier, Eddie, 482 Sousa, André, 318
842 Sousa, Cristóvão, 501 Souza, Darielson A., 493, 511 Spagnuelo, Dayana, 796 Stoicuta, Olimpiu, 293 Subramaniam, Ponnusamy, 702 Suciu, George, 457 T Tamagusko, Tiago, 66 Tammi, Jani, 45 Teixeira, Daniel, 501 Todorović, Vladimir, 544 Tolstaya, Svetlana, 789 Tonieto, Márcia T., 511 Torres, José M., 102, 113 U Ușurelu, Teodora, 457
Author Index V Vaidyanathan, Nageswaran, 654 Vallejo, Henry, 755 Vidal-González, Sergio, 329 W Wang, Xiaolu, 471 Y Yamamoto, Ryo, 306 Ylikännö, Timo, 45 Z Zanini, Greice, 181 Zečević, Igor, 544 Zêdo, Yúmina, 147 Zeigler, Ethan, 818 Zúquete, André, 339